| 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453145414551456145714581459146014611462146314641465146614671468146914701471147214731474147514761477147814791480148114821483148414851486148714881489149014911492149314941495149614971498149915001501150215031504150515061507150815091510151115121513151415151516151715181519152015211522152315241525152615271528152915301531153215331534153515361537153815391540154115421543154415451546154715481549155015511552155315541555155615571558155915601561156215631564156515661567156815691570157115721573157415751576157715781579158015811582158315841585158615871588158915901591159215931594159515961597159815991600160116021603160416051606160716081609161016111612161316141615161616171618161916201621162216231624162516261627162816291630163116321633163416351636163716381639164016411642164316441645164616471648164916501651165216531654165516561657165816591660166116621663166416651666166716681669167016711672167316741675167616771678167916801681168216831684168516861687168816891690169116921693169416951696169716981699170017011702170317041705170617071708170917101711171217131714171517161717171817191720172117221723172417251726172717281729173017311732173317341735173617371738173917401741174217431744174517461747174817491750175117521753175417551756175717581759176017611762176317641765176617671768176917701771177217731774177517761777177817791780178117821783178417851786178717881789179017911792179317941795179617971798179918001801180218031804180518061807180818091810181118121813181418151816181718181819182018211822182318241825182618271828182918301831183218331834183518361837183818391840184118421843184418451846184718481849185018511852185318541855185618571858185918601861186218631864186518661867186818691870187118721873187418751876187718781879188018811882188318841885188618871888188918901891189218931894189518961897189818991900190119021903190419051906190719081909191019111912191319141915191619171918191919201921192219231924192519261927192819291930193119321933193419351936193719381939194019411942194319441945194619471948194919501951195219531954195519561957195819591960196119621963196419651966196719681969197019711972197319741975197619771978197919801981198219831984198519861987198819891990199119921993199419951996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202420252026202720282029203020312032203320342035203620372038203920402041204220432044204520462047204820492050205120522053205420552056205720582059206020612062206320642065206620672068206920702071207220732074207520762077207820792080208120822083208420852086208720882089209020912092209320942095209620972098209921002101210221032104210521062107210821092110211121122113211421152116211721182119212021212122212321242125212621272128212921302131213221332134213521362137213821392140214121422143214421452146214721482149215021512152215321542155215621572158215921602161216221632164216521662167216821692170217121722173217421752176217721782179218021812182218321842185218621872188218921902191219221932194219521962197219821992200220122022203220422052206220722082209221022112212221322142215221622172218221922202221222222232224222522262227222822292230223122322233223422352236223722382239224022412242224322442245224622472248224922502251225222532254225522562257225822592260226122622263226422652266226722682269227022712272227322742275227622772278227922802281228222832284228522862287228822892290229122922293229422952296229722982299230023012302230323042305230623072308230923102311231223132314231523162317231823192320232123222323232423252326232723282329233023312332233323342335233623372338233923402341234223432344234523462347234823492350235123522353235423552356235723582359236023612362236323642365236623672368236923702371237223732374237523762377237823792380238123822383238423852386238723882389239023912392239323942395239623972398239924002401240224032404240524062407240824092410241124122413241424152416241724182419242024212422242324242425242624272428242924302431243224332434243524362437243824392440244124422443244424452446244724482449245024512452245324542455245624572458245924602461246224632464246524662467246824692470247124722473247424752476247724782479248024812482248324842485248624872488248924902491249224932494249524962497249824992500250125022503250425052506250725082509251025112512251325142515251625172518251925202521252225232524252525262527252825292530253125322533253425352536253725382539254025412542254325442545254625472548254925502551255225532554255525562557255825592560256125622563256425652566256725682569257025712572257325742575257625772578257925802581258225832584258525862587258825892590259125922593259425952596259725982599260026012602260326042605260626072608260926102611261226132614261526162617261826192620262126222623262426252626262726282629263026312632263326342635263626372638263926402641264226432644264526462647264826492650265126522653265426552656265726582659266026612662266326642665266626672668266926702671267226732674267526762677267826792680268126822683268426852686268726882689269026912692269326942695269626972698269927002701270227032704270527062707270827092710271127122713271427152716271727182719272027212722272327242725272627272728272927302731273227332734273527362737273827392740274127422743274427452746274727482749275027512752275327542755275627572758275927602761276227632764276527662767276827692770277127722773277427752776277727782779278027812782278327842785278627872788278927902791279227932794279527962797279827992800280128022803280428052806280728082809281028112812281328142815281628172818281928202821282228232824282528262827282828292830283128322833283428352836283728382839284028412842284328442845284628472848284928502851285228532854285528562857285828592860286128622863286428652866286728682869287028712872287328742875287628772878287928802881288228832884288528862887288828892890289128922893289428952896289728982899290029012902290329042905290629072908290929102911291229132914291529162917291829192920292129222923292429252926292729282929293029312932293329342935293629372938293929402941294229432944294529462947294829492950295129522953295429552956295729582959296029612962296329642965296629672968296929702971297229732974297529762977297829792980298129822983298429852986298729882989299029912992299329942995299629972998299930003001300230033004300530063007300830093010301130123013301430153016301730183019302030213022302330243025302630273028302930303031303230333034303530363037303830393040304130423043304430453046304730483049305030513052305330543055305630573058305930603061306230633064306530663067306830693070307130723073307430753076307730783079308030813082308330843085308630873088308930903091309230933094309530963097309830993100310131023103310431053106310731083109311031113112311331143115311631173118311931203121312231233124312531263127312831293130313131323133313431353136313731383139314031413142314331443145314631473148314931503151315231533154315531563157315831593160316131623163316431653166316731683169317031713172317331743175317631773178317931803181318231833184318531863187318831893190319131923193319431953196319731983199320032013202320332043205320632073208320932103211321232133214321532163217321832193220322132223223322432253226322732283229323032313232323332343235323632373238323932403241324232433244324532463247324832493250325132523253325432553256325732583259326032613262326332643265326632673268326932703271327232733274327532763277327832793280328132823283328432853286328732883289329032913292329332943295329632973298329933003301330233033304330533063307330833093310331133123313331433153316331733183319332033213322332333243325332633273328332933303331333233333334333533363337333833393340334133423343334433453346334733483349335033513352335333543355335633573358335933603361336233633364336533663367336833693370337133723373337433753376337733783379338033813382338333843385338633873388338933903391339233933394339533963397339833993400340134023403340434053406340734083409341034113412341334143415341634173418341934203421342234233424342534263427342834293430343134323433343434353436343734383439344034413442344334443445344634473448344934503451345234533454345534563457345834593460346134623463346434653466346734683469347034713472347334743475347634773478347934803481348234833484348534863487348834893490349134923493349434953496349734983499350035013502350335043505350635073508350935103511351235133514351535163517351835193520352135223523352435253526352735283529353035313532353335343535353635373538353935403541354235433544354535463547354835493550355135523553355435553556355735583559356035613562356335643565356635673568356935703571357235733574357535763577357835793580358135823583358435853586358735883589359035913592359335943595359635973598359936003601360236033604360536063607360836093610361136123613361436153616361736183619362036213622362336243625362636273628362936303631363236333634363536363637363836393640364136423643364436453646364736483649365036513652365336543655365636573658365936603661366236633664366536663667366836693670367136723673367436753676367736783679368036813682368336843685368636873688368936903691369236933694369536963697369836993700370137023703370437053706370737083709371037113712371337143715371637173718371937203721372237233724372537263727372837293730373137323733373437353736373737383739374037413742374337443745374637473748374937503751375237533754375537563757375837593760376137623763376437653766376737683769377037713772377337743775377637773778377937803781378237833784378537863787378837893790379137923793379437953796379737983799380038013802380338043805380638073808380938103811381238133814381538163817381838193820382138223823382438253826382738283829383038313832383338343835383638373838383938403841384238433844384538463847384838493850385138523853385438553856385738583859386038613862386338643865386638673868386938703871387238733874387538763877387838793880388138823883388438853886388738883889389038913892389338943895389638973898389939003901390239033904390539063907390839093910391139123913391439153916391739183919392039213922392339243925392639273928392939303931393239333934393539363937393839393940394139423943394439453946394739483949395039513952395339543955395639573958395939603961396239633964396539663967396839693970397139723973397439753976397739783979398039813982398339843985398639873988398939903991399239933994399539963997399839994000400140024003400440054006400740084009401040114012401340144015401640174018401940204021402240234024402540264027402840294030403140324033403440354036403740384039404040414042404340444045404640474048404940504051405240534054405540564057405840594060406140624063406440654066406740684069407040714072407340744075407640774078407940804081408240834084408540864087408840894090409140924093409440954096409740984099410041014102410341044105410641074108410941104111411241134114411541164117411841194120412141224123412441254126412741284129413041314132413341344135413641374138413941404141414241434144414541464147414841494150415141524153415441554156415741584159416041614162416341644165416641674168416941704171417241734174417541764177417841794180418141824183418441854186418741884189419041914192419341944195419641974198419942004201420242034204420542064207420842094210421142124213421442154216421742184219422042214222422342244225422642274228422942304231423242334234423542364237423842394240424142424243424442454246424742484249425042514252425342544255425642574258425942604261426242634264426542664267426842694270427142724273427442754276427742784279428042814282428342844285428642874288428942904291429242934294429542964297429842994300430143024303430443054306430743084309431043114312431343144315431643174318431943204321432243234324432543264327432843294330433143324333433443354336433743384339434043414342434343444345434643474348434943504351435243534354435543564357435843594360436143624363436443654366436743684369437043714372437343744375437643774378437943804381438243834384438543864387438843894390439143924393439443954396439743984399440044014402440344044405440644074408440944104411441244134414441544164417441844194420442144224423442444254426442744284429443044314432443344344435443644374438443944404441444244434444444544464447444844494450445144524453445444554456445744584459446044614462446344644465446644674468446944704471447244734474447544764477447844794480448144824483448444854486448744884489449044914492449344944495449644974498449945004501450245034504450545064507450845094510451145124513451445154516451745184519452045214522452345244525452645274528452945304531453245334534453545364537453845394540454145424543454445454546454745484549455045514552455345544555455645574558455945604561456245634564456545664567456845694570457145724573457445754576457745784579458045814582458345844585458645874588458945904591459245934594459545964597459845994600460146024603460446054606460746084609461046114612461346144615461646174618461946204621462246234624462546264627462846294630463146324633463446354636463746384639464046414642464346444645464646474648464946504651465246534654465546564657465846594660466146624663466446654666466746684669467046714672467346744675467646774678467946804681468246834684468546864687468846894690469146924693469446954696469746984699470047014702470347044705470647074708470947104711471247134714471547164717471847194720472147224723472447254726472747284729473047314732473347344735473647374738473947404741474247434744474547464747474847494750475147524753475447554756475747584759476047614762476347644765476647674768476947704771477247734774477547764777477847794780478147824783478447854786478747884789479047914792479347944795479647974798479948004801480248034804480548064807480848094810481148124813481448154816481748184819482048214822482348244825482648274828482948304831483248334834483548364837483848394840484148424843484448454846484748484849485048514852485348544855485648574858485948604861486248634864486548664867486848694870487148724873487448754876487748784879488048814882488348844885488648874888488948904891489248934894489548964897489848994900490149024903490449054906490749084909491049114912491349144915491649174918491949204921492249234924492549264927492849294930493149324933493449354936493749384939494049414942494349444945494649474948494949504951495249534954495549564957495849594960496149624963496449654966496749684969497049714972497349744975497649774978497949804981498249834984498549864987498849894990499149924993499449954996499749984999500050015002500350045005500650075008500950105011501250135014501550165017501850195020502150225023502450255026502750285029503050315032503350345035503650375038503950405041504250435044504550465047504850495050505150525053505450555056505750585059506050615062506350645065506650675068506950705071507250735074507550765077507850795080508150825083508450855086508750885089509050915092509350945095509650975098509951005101510251035104510551065107510851095110511151125113511451155116511751185119512051215122512351245125512651275128512951305131513251335134513551365137513851395140514151425143514451455146514751485149515051515152515351545155515651575158515951605161516251635164516551665167516851695170517151725173517451755176517751785179518051815182518351845185518651875188518951905191519251935194519551965197519851995200520152025203520452055206520752085209521052115212521352145215521652175218521952205221522252235224522552265227522852295230523152325233523452355236523752385239524052415242524352445245524652475248524952505251525252535254525552565257525852595260526152625263526452655266526752685269527052715272527352745275527652775278527952805281528252835284528552865287528852895290529152925293529452955296529752985299530053015302530353045305530653075308530953105311531253135314531553165317531853195320532153225323532453255326532753285329533053315332533353345335533653375338533953405341534253435344534553465347534853495350535153525353535453555356535753585359536053615362536353645365536653675368536953705371537253735374537553765377537853795380538153825383538453855386538753885389539053915392539353945395539653975398539954005401540254035404540554065407540854095410541154125413541454155416541754185419542054215422542354245425542654275428542954305431543254335434543554365437543854395440544154425443544454455446544754485449545054515452545354545455545654575458545954605461546254635464546554665467546854695470547154725473547454755476547754785479548054815482548354845485548654875488548954905491549254935494549554965497549854995500550155025503550455055506550755085509551055115512551355145515551655175518551955205521552255235524552555265527552855295530553155325533553455355536553755385539554055415542554355445545554655475548554955505551555255535554555555565557555855595560556155625563556455655566556755685569557055715572557355745575557655775578557955805581558255835584558555865587558855895590559155925593559455955596559755985599560056015602560356045605560656075608560956105611561256135614561556165617561856195620562156225623562456255626562756285629563056315632563356345635563656375638563956405641564256435644564556465647564856495650565156525653565456555656565756585659566056615662566356645665566656675668566956705671567256735674567556765677567856795680568156825683568456855686568756885689569056915692569356945695569656975698569957005701570257035704570557065707570857095710571157125713571457155716571757185719572057215722572357245725572657275728572957305731573257335734573557365737573857395740574157425743574457455746574757485749575057515752575357545755575657575758575957605761576257635764576557665767576857695770577157725773577457755776577757785779578057815782578357845785578657875788578957905791579257935794579557965797579857995800580158025803580458055806580758085809581058115812581358145815581658175818581958205821582258235824582558265827582858295830583158325833583458355836583758385839584058415842584358445845584658475848584958505851585258535854585558565857585858595860586158625863586458655866586758685869587058715872587358745875587658775878587958805881588258835884588558865887588858895890589158925893589458955896589758985899590059015902590359045905590659075908590959105911591259135914591559165917591859195920592159225923592459255926592759285929593059315932593359345935593659375938593959405941594259435944594559465947594859495950595159525953595459555956595759585959596059615962596359645965596659675968596959705971597259735974597559765977597859795980598159825983598459855986598759885989599059915992599359945995599659975998599960006001600260036004600560066007600860096010601160126013601460156016601760186019602060216022602360246025602660276028602960306031603260336034603560366037603860396040604160426043604460456046604760486049605060516052605360546055605660576058605960606061606260636064606560666067606860696070607160726073607460756076607760786079608060816082608360846085608660876088608960906091609260936094609560966097609860996100610161026103610461056106610761086109611061116112611361146115611661176118611961206121612261236124612561266127612861296130613161326133613461356136613761386139614061416142614361446145614661476148614961506151615261536154615561566157615861596160616161626163616461656166616761686169617061716172617361746175617661776178617961806181618261836184618561866187618861896190619161926193619461956196619761986199620062016202620362046205620662076208620962106211621262136214621562166217621862196220622162226223622462256226622762286229623062316232623362346235623662376238623962406241624262436244624562466247624862496250625162526253625462556256625762586259626062616262626362646265626662676268626962706271627262736274627562766277627862796280628162826283628462856286628762886289629062916292629362946295629662976298629963006301630263036304630563066307630863096310631163126313631463156316631763186319632063216322632363246325632663276328632963306331633263336334633563366337633863396340634163426343634463456346634763486349635063516352635363546355635663576358635963606361636263636364636563666367636863696370637163726373637463756376637763786379638063816382638363846385638663876388638963906391639263936394639563966397639863996400640164026403640464056406640764086409641064116412641364146415641664176418641964206421642264236424642564266427642864296430643164326433643464356436643764386439644064416442644364446445644664476448644964506451645264536454645564566457645864596460646164626463646464656466646764686469647064716472647364746475647664776478647964806481648264836484648564866487648864896490649164926493649464956496649764986499650065016502650365046505650665076508650965106511651265136514651565166517651865196520652165226523652465256526652765286529653065316532653365346535653665376538653965406541654265436544654565466547654865496550655165526553655465556556655765586559656065616562656365646565656665676568656965706571657265736574657565766577657865796580658165826583658465856586658765886589659065916592659365946595659665976598659966006601660266036604660566066607660866096610661166126613661466156616661766186619662066216622662366246625662666276628662966306631663266336634663566366637663866396640664166426643664466456646664766486649665066516652665366546655665666576658665966606661666266636664666566666667666866696670667166726673667466756676667766786679668066816682668366846685668666876688668966906691669266936694669566966697669866996700670167026703670467056706670767086709671067116712671367146715671667176718671967206721672267236724672567266727672867296730673167326733673467356736673767386739674067416742674367446745674667476748674967506751675267536754675567566757675867596760676167626763676467656766676767686769677067716772677367746775677667776778677967806781678267836784678567866787678867896790679167926793679467956796679767986799680068016802680368046805680668076808680968106811681268136814681568166817681868196820682168226823682468256826682768286829683068316832683368346835683668376838683968406841684268436844684568466847684868496850685168526853685468556856685768586859686068616862686368646865686668676868686968706871687268736874687568766877687868796880688168826883688468856886688768886889689068916892689368946895689668976898689969006901690269036904690569066907690869096910691169126913691469156916691769186919692069216922692369246925692669276928692969306931693269336934693569366937693869396940694169426943694469456946694769486949695069516952695369546955695669576958695969606961696269636964696569666967696869696970697169726973697469756976697769786979698069816982698369846985698669876988698969906991699269936994699569966997699869997000700170027003700470057006700770087009701070117012701370147015701670177018701970207021702270237024702570267027702870297030703170327033703470357036703770387039704070417042704370447045704670477048704970507051705270537054705570567057705870597060706170627063706470657066706770687069707070717072707370747075707670777078707970807081708270837084708570867087708870897090709170927093709470957096709770987099710071017102710371047105710671077108710971107111711271137114711571167117711871197120712171227123712471257126712771287129713071317132713371347135713671377138713971407141714271437144714571467147714871497150715171527153715471557156715771587159716071617162716371647165716671677168716971707171717271737174717571767177717871797180718171827183718471857186718771887189719071917192719371947195719671977198719972007201720272037204720572067207720872097210721172127213721472157216721772187219722072217222722372247225722672277228722972307231723272337234723572367237723872397240724172427243724472457246724772487249725072517252725372547255725672577258725972607261726272637264726572667267726872697270727172727273727472757276727772787279728072817282728372847285728672877288728972907291729272937294729572967297729872997300730173027303730473057306730773087309731073117312731373147315731673177318731973207321732273237324732573267327732873297330733173327333733473357336733773387339734073417342734373447345734673477348734973507351735273537354735573567357735873597360736173627363736473657366736773687369737073717372737373747375737673777378737973807381738273837384738573867387738873897390739173927393739473957396739773987399740074017402740374047405740674077408740974107411741274137414741574167417741874197420742174227423742474257426742774287429743074317432743374347435743674377438743974407441744274437444744574467447744874497450745174527453745474557456745774587459746074617462746374647465746674677468746974707471747274737474747574767477747874797480748174827483748474857486748774887489749074917492749374947495749674977498749975007501750275037504750575067507750875097510751175127513751475157516751775187519752075217522752375247525752675277528752975307531753275337534753575367537753875397540754175427543754475457546754775487549755075517552755375547555755675577558755975607561756275637564756575667567756875697570757175727573757475757576757775787579758075817582758375847585758675877588758975907591759275937594759575967597759875997600760176027603760476057606760776087609761076117612761376147615761676177618761976207621762276237624762576267627762876297630763176327633763476357636763776387639764076417642764376447645764676477648764976507651765276537654765576567657765876597660766176627663766476657666766776687669767076717672767376747675767676777678767976807681768276837684768576867687768876897690769176927693769476957696769776987699770077017702770377047705770677077708770977107711771277137714771577167717771877197720772177227723772477257726772777287729773077317732773377347735773677377738773977407741774277437744774577467747774877497750775177527753775477557756775777587759776077617762776377647765776677677768776977707771777277737774777577767777777877797780778177827783778477857786778777887789779077917792779377947795779677977798779978007801780278037804780578067807780878097810781178127813781478157816781778187819782078217822782378247825782678277828782978307831783278337834783578367837783878397840784178427843784478457846784778487849785078517852785378547855785678577858785978607861786278637864786578667867786878697870787178727873787478757876787778787879788078817882788378847885788678877888788978907891789278937894789578967897789878997900790179027903790479057906790779087909791079117912791379147915791679177918791979207921792279237924792579267927792879297930793179327933793479357936793779387939794079417942794379447945794679477948794979507951795279537954795579567957795879597960796179627963796479657966796779687969797079717972797379747975797679777978797979807981798279837984798579867987798879897990799179927993799479957996799779987999800080018002800380048005800680078008800980108011801280138014801580168017801880198020802180228023802480258026802780288029803080318032803380348035803680378038803980408041804280438044804580468047804880498050805180528053805480558056805780588059806080618062806380648065806680678068806980708071807280738074807580768077807880798080808180828083808480858086808780888089809080918092809380948095809680978098809981008101810281038104810581068107810881098110811181128113811481158116811781188119812081218122812381248125812681278128812981308131813281338134813581368137813881398140814181428143814481458146814781488149815081518152815381548155815681578158815981608161816281638164816581668167816881698170817181728173817481758176817781788179818081818182818381848185818681878188818981908191819281938194819581968197819881998200820182028203820482058206820782088209821082118212821382148215821682178218821982208221822282238224822582268227822882298230823182328233823482358236823782388239824082418242824382448245824682478248824982508251825282538254825582568257825882598260826182628263826482658266826782688269827082718272827382748275827682778278827982808281828282838284828582868287828882898290829182928293829482958296829782988299830083018302830383048305830683078308830983108311831283138314831583168317831883198320832183228323832483258326832783288329833083318332833383348335833683378338833983408341834283438344834583468347834883498350835183528353835483558356835783588359836083618362836383648365836683678368836983708371837283738374837583768377837883798380838183828383838483858386838783888389839083918392839383948395839683978398839984008401840284038404840584068407840884098410841184128413841484158416841784188419842084218422842384248425842684278428842984308431843284338434843584368437843884398440844184428443844484458446844784488449845084518452845384548455845684578458845984608461846284638464846584668467846884698470847184728473847484758476 |
- \# --------------------------------------------------------------------------
- \#
- \# Copyright 1996-2018 The NASM Authors - All Rights Reserved
- \# See the file AUTHORS included with the NASM distribution for
- \# the specific copyright holders.
- \#
- \# Redistribution and use in source and binary forms, with or without
- \# modification, are permitted provided that the following
- \# conditions are met:
- \#
- \# * Redistributions of source code must retain the above copyright
- \# notice, this list of conditions and the following disclaimer.
- \# * Redistributions in binary form must reproduce the above
- \# copyright notice, this list of conditions and the following
- \# disclaimer in the documentation and/or other materials provided
- \# with the distribution.
- \#
- \# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
- \# CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
- \# INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
- \# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
- \# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
- \# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- \# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
- \# NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- \# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- \# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- \# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
- \# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
- \# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- \#
- \# --------------------------------------------------------------------------
- \#
- \# Source code to NASM documentation
- \#
- \M{category}{Programming}
- \M{title}{NASM - The Netwide Assembler}
- \M{year}{1996-2017}
- \M{author}{The NASM Development Team}
- \M{copyright_tail}{-- All Rights Reserved}
- \M{license}{This document is redistributable under the license given in the file "LICENSE" distributed in the NASM archive.}
- \M{summary}{This file documents NASM, the Netwide Assembler: an assembler targetting the Intel x86 series of processors, with portable source.}
- \M{infoname}{NASM}
- \M{infofile}{nasm}
- \M{infotitle}{The Netwide Assembler for x86}
- \M{epslogo}{nasmlogo.eps}
- \M{logoyadj}{-72}
- \& version.src
- \IR{-D} \c{-D} option
- \IR{-E} \c{-E} option
- \IR{-F} \c{-F} option
- \IR{-I} \c{-I} option
- \IR{-M} \c{-M} option
- \IR{-MD} \c{-MD} option
- \IR{-MF} \c{-MF} option
- \IR{-MG} \c{-MG} option
- \IR{-MP} \c{-MP} option
- \IR{-MQ} \c{-MQ} option
- \IR{-MT} \c{-MT} option
- \IR{-MW} \c{-MW} option
- \IR{-O} \c{-O} option
- \IR{-P} \c{-P} option
- \IR{-U} \c{-U} option
- \IR{-X} \c{-X} option
- \IR{-a} \c{-a} option
- \IR{-d} \c{-d} option
- \IR{-e} \c{-e} option
- \IR{-f} \c{-f} option
- \IR{-g} \c{-g} option
- \IR{-i} \c{-i} option
- \IR{-l} \c{-l} option
- \IR{-o} \c{-o} option
- \IR{-p} \c{-p} option
- \IR{-s} \c{-s} option
- \IR{-u} \c{-u} option
- \IR{-v} \c{-v} option
- \IR{-W} \c{-W} option
- \IR{-Werror} \c{-Werror} option
- \IR{-Wno-error} \c{-Wno-error} option
- \IR{-w} \c{-w} option
- \IR{-y} \c{-y} option
- \IR{-Z} \c{-Z} option
- \IR{!=} \c{!=} operator
- \IR{$, here} \c{$}, Here token
- \IR{$, prefix} \c{$}, prefix
- \IR{$$} \c{$$} token
- \IR{%} \c{%} operator
- \IR{%%} \c{%%} operator
- \IR{%+1} \c{%+1} and \c{%-1} syntax
- \IA{%-1}{%+1}
- \IR{%0} \c{%0} parameter count
- \IR{&} \c{&} operator
- \IR{&&} \c{&&} operator
- \IR{*} \c{*} operator
- \IR{..@} \c{..@} symbol prefix
- \IR{/} \c{/} operator
- \IR{//} \c{//} operator
- \IR{<} \c{<} operator
- \IR{<<} \c{<<} operator
- \IR{<=} \c{<=} operator
- \IR{<>} \c{<>} operator
- \IR{=} \c{=} operator
- \IR{==} \c{==} operator
- \IR{>} \c{>} operator
- \IR{>=} \c{>=} operator
- \IR{>>} \c{>>} operator
- \IR{?} \c{?} MASM syntax
- \IR{^} \c{^} operator
- \IR{^^} \c{^^} operator
- \IR{|} \c{|} operator
- \IR{||} \c{||} operator
- \IR{~} \c{~} operator
- \IR{%$} \c{%$} and \c{%$$} prefixes
- \IA{%$$}{%$}
- \IR{+ opaddition} \c{+} operator, binary
- \IR{+ opunary} \c{+} operator, unary
- \IR{+ modifier} \c{+} modifier
- \IR{- opsubtraction} \c{-} operator, binary
- \IR{- opunary} \c{-} operator, unary
- \IR{! opunary} \c{!} operator, unary
- \IR{alignment, in bin sections} alignment, in \c{bin} sections
- \IR{alignment, in elf sections} alignment, in \c{elf} sections
- \IR{alignment, in win32 sections} alignment, in \c{win32} sections
- \IR{alignment, of elf common variables} alignment, of \c{elf} common
- variables
- \IR{alignment, in obj sections} alignment, in \c{obj} sections
- \IR{a.out, bsd version} \c{a.out}, BSD version
- \IR{a.out, linux version} \c{a.out}, Linux version
- \IR{autoconf} Autoconf
- \IR{bin} bin
- \IR{bitwise and} bitwise AND
- \IR{bitwise or} bitwise OR
- \IR{bitwise xor} bitwise XOR
- \IR{block ifs} block IFs
- \IR{borland pascal} Borland, Pascal
- \IR{borland's win32 compilers} Borland, Win32 compilers
- \IR{braces, after % sign} braces, after \c{%} sign
- \IR{bsd} BSD
- \IR{c calling convention} C calling convention
- \IR{c symbol names} C symbol names
- \IA{critical expressions}{critical expression}
- \IA{command line}{command-line}
- \IA{case sensitivity}{case sensitive}
- \IA{case-sensitive}{case sensitive}
- \IA{case-insensitive}{case sensitive}
- \IA{character constants}{character constant}
- \IR{codeview} CodeView debugging format
- \IR{common object file format} Common Object File Format
- \IR{common variables, alignment in elf} common variables, alignment
- in \c{elf}
- \IR{common, elf extensions to} \c{COMMON}, \c{elf} extensions to
- \IR{common, obj extensions to} \c{COMMON}, \c{obj} extensions to
- \IR{declaring structure} declaring structures
- \IR{default-wrt mechanism} default-\c{WRT} mechanism
- \IR{devpac} DevPac
- \IR{djgpp} DJGPP
- \IR{dll symbols, exporting} DLL symbols, exporting
- \IR{dll symbols, importing} DLL symbols, importing
- \IR{dos} DOS
- \IR{dos archive} DOS archive
- \IR{dos source archive} DOS source archive
- \IA{effective address}{effective addresses}
- \IA{effective-address}{effective addresses}
- \IR{elf} ELF
- \IR{elf, 16-bit code and} ELF, 16-bit code and
- \IR{elf shared libraries} ELF, shared libraries
- \IR{elf32} \c{elf32}
- \IR{elf64} \c{elf64}
- \IR{elfx32} \c{elfx32}
- \IR{executable and linkable format} Executable and Linkable Format
- \IR{extern, obj extensions to} \c{EXTERN}, \c{obj} extensions to
- \IR{extern, rdf extensions to} \c{EXTERN}, \c{rdf} extensions to
- \IR{floating-point, constants} floating-point, constants
- \IR{floating-point, packed bcd constants} floating-point, packed BCD constants
- \IR{freebsd} FreeBSD
- \IR{freelink} FreeLink
- \IR{functions, c calling convention} functions, C calling convention
- \IR{functions, pascal calling convention} functions, Pascal calling
- convention
- \IR{global, aoutb extensions to} \c{GLOBAL}, \c{aoutb} extensions to
- \IR{global, elf extensions to} \c{GLOBAL}, \c{elf} extensions to
- \IR{global, rdf extensions to} \c{GLOBAL}, \c{rdf} extensions to
- \IR{got} GOT
- \IR{got relocations} \c{GOT} relocations
- \IR{gotoff relocation} \c{GOTOFF} relocations
- \IR{gotpc relocation} \c{GOTPC} relocations
- \IR{intel number formats} Intel number formats
- \IR{linux, elf} Linux, ELF
- \IR{linux, a.out} Linux, \c{a.out}
- \IR{linux, as86} Linux, \c{as86}
- \IR{logical and} logical AND
- \IR{logical or} logical OR
- \IR{logical xor} logical XOR
- \IR{mach object file format} Mach, object file format
- \IA{mach-o}{macho}
- \IR{mach-o} Mach-O, object file format
- \IR{macho32} \c{macho32}
- \IR{macho64} \c{macho64}
- \IR{macos x} MacOS X
- \IR{masm} MASM
- \IA{memory reference}{memory references}
- \IR{minix} Minix
- \IA{misc directory}{misc subdirectory}
- \IR{misc subdirectory} \c{misc} subdirectory
- \IR{microsoft omf} Microsoft OMF
- \IR{mmx registers} MMX registers
- \IA{modr/m}{modr/m byte}
- \IR{modr/m byte} ModR/M byte
- \IR{ms-dos} MS-DOS
- \IR{ms-dos device drivers} MS-DOS device drivers
- \IR{multipush} \c{multipush} macro
- \IR{nan} NaN
- \IR{nasm version} NASM version
- \IR{netbsd} NetBSD
- \IR{nsis} NSIS
- \IR{nullsoft scriptable installer} Nullsoft Scriptable Installer
- \IR{omf} OMF
- \IR{openbsd} OpenBSD
- \IR{operating system} operating system
- \IR{os/2} OS/2
- \IR{pascal calling convention}Pascal calling convention
- \IR{passes} passes, assembly
- \IR{perl} Perl
- \IR{pic} PIC
- \IR{pharlap} PharLap
- \IR{plt} PLT
- \IR{plt} \c{PLT} relocations
- \IA{pre-defining macros}{pre-define}
- \IA{preprocessor expressions}{preprocessor, expressions}
- \IA{preprocessor loops}{preprocessor, loops}
- \IA{preprocessor variables}{preprocessor, variables}
- \IA{rdoff subdirectory}{rdoff}
- \IR{rdoff} \c{rdoff} subdirectory
- \IR{relocatable dynamic object file format} Relocatable Dynamic
- Object File Format
- \IR{relocations, pic-specific} relocations, PIC-specific
- \IA{repeating}{repeating code}
- \IR{section alignment, in elf} section alignment, in \c{elf}
- \IR{section alignment, in bin} section alignment, in \c{bin}
- \IR{section alignment, in obj} section alignment, in \c{obj}
- \IR{section alignment, in win32} section alignment, in \c{win32}
- \IR{section, elf extensions to} \c{SECTION}, \c{elf} extensions to
- \IR{section, macho extensions to} \c{SECTION}, \c{macho} extensions to
- \IR{section, win32 extensions to} \c{SECTION}, \c{win32} extensions to
- \IR{segment alignment, in bin} segment alignment, in \c{bin}
- \IR{segment alignment, in obj} segment alignment, in \c{obj}
- \IR{segment, obj extensions to} \c{SEGMENT}, \c{elf} extensions to
- \IR{segment names, borland pascal} segment names, Borland Pascal
- \IR{shift command} \c{shift} command
- \IA{sib}{sib byte}
- \IR{sib byte} SIB byte
- \IR{align, smart} \c{ALIGN}, smart
- \IA{sectalign}{sectalign}
- \IR{solaris x86} Solaris x86
- \IA{standard section names}{standardized section names}
- \IR{symbols, exporting from dlls} symbols, exporting from DLLs
- \IR{symbols, importing from dlls} symbols, importing from DLLs
- \IR{test subdirectory} \c{test} subdirectory
- \IR{tlink} \c{TLINK}
- \IR{underscore, in c symbols} underscore, in C symbols
- \IR{unicode} Unicode
- \IR{unix} Unix
- \IR{utf-8} UTF-8
- \IR{utf-16} UTF-16
- \IR{utf-32} UTF-32
- \IA{sco unix}{unix, sco}
- \IR{unix, sco} Unix, SCO
- \IA{unix source archive}{unix, source archive}
- \IR{unix, source archive} Unix, source archive
- \IA{unix system v}{unix, system v}
- \IR{unix, system v} Unix, System V
- \IR{unixware} UnixWare
- \IR{val} VAL
- \IR{version number of nasm} version number of NASM
- \IR{visual c++} Visual C++
- \IR{www page} WWW page
- \IR{win32} Win32
- \IR{win32} Win64
- \IR{windows} Windows
- \IR{windows 95} Windows 95
- \IR{windows nt} Windows NT
- \# \IC{program entry point}{entry point, program}
- \# \IC{program entry point}{start point, program}
- \# \IC{MS-DOS device drivers}{device drivers, MS-DOS}
- \# \IC{16-bit mode, versus 32-bit mode}{32-bit mode, versus 16-bit mode}
- \# \IC{c symbol names}{symbol names, in C}
- \C{intro} Introduction
- \H{whatsnasm} What Is NASM?
- The Netwide Assembler, NASM, is an 80x86 and x86-64 assembler designed
- for portability and modularity. It supports a range of object file
- formats, including Linux and \c{*BSD} \c{a.out}, \c{ELF}, \c{COFF},
- \c{Mach-O}, 16-bit and 32-bit \c{OBJ} (OMF) format, \c{Win32} and
- \c{Win64}. It will also output plain binary files, Intel hex and
- Motorola S-Record formats. Its syntax is designed to be simple and
- easy to understand, similar to the syntax in the Intel Software
- Developer Manual with minimal complexity. It supports all currently
- known x86 architectural extensions, and has strong support for macros.
- NASM also comes with a set of utilities for handling the \c{RDOFF}
- custom object-file format.
- \S{legal} \i{License} Conditions
- Please see the file \c{LICENSE}, supplied as part of any NASM
- distribution archive, for the license conditions under which you may
- use NASM. NASM is now under the so-called 2-clause BSD license, also
- known as the simplified BSD license.
- Copyright 1996-2017 the NASM Authors - All rights reserved.
- Redistribution and use in source and binary forms, with or without
- modification, are permitted provided that the following conditions are
- met:
- \b Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
- \b Redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimer in the
- documentation and/or other materials provided with the distribution.
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
- CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
- INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
- MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
- DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
- CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
- NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
- OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
- EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- \C{running} Running NASM
- \H{syntax} NASM \i{Command-Line} Syntax
- To assemble a file, you issue a command of the form
- \c nasm -f <format> <filename> [-o <output>]
- For example,
- \c nasm -f elf myfile.asm
- will assemble \c{myfile.asm} into an \c{ELF} object file \c{myfile.o}. And
- \c nasm -f bin myfile.asm -o myfile.com
- will assemble \c{myfile.asm} into a raw binary file \c{myfile.com}.
- To produce a listing file, with the hex codes output from NASM
- displayed on the left of the original sources, use the \c{-l} option
- to give a listing file name, for example:
- \c nasm -f coff myfile.asm -l myfile.lst
- To get further usage instructions from NASM, try typing
- \c nasm -h
- The option \c{--help} is an alias for the \c{-h} option.
- The option \c{-hf} will also list the available output file formats,
- and what they are.
- If you use Linux but aren't sure whether your system is \c{a.out}
- or \c{ELF}, type
- \c file nasm
- (in the directory in which you put the NASM binary when you
- installed it). If it says something like
- \c nasm: ELF 32-bit LSB executable i386 (386 and up) Version 1
- then your system is \c{ELF}, and you should use the option \c{-f elf}
- when you want NASM to produce Linux object files. If it says
- \c nasm: Linux/i386 demand-paged executable (QMAGIC)
- or something similar, your system is \c{a.out}, and you should use
- \c{-f aout} instead (Linux \c{a.out} systems have long been obsolete,
- and are rare these days.)
- Like Unix compilers and assemblers, NASM is silent unless it
- goes wrong: you won't see any output at all, unless it gives error
- messages.
- \S{opt-o} The \i\c{-o} Option: Specifying the Output File Name
- NASM will normally choose the name of your output file for you;
- precisely how it does this is dependent on the object file format.
- For Microsoft object file formats (\c{obj}, \c{win32} and \c{win64}),
- it will remove the \c{.asm} \i{extension} (or whatever extension you
- like to use - NASM doesn't care) from your source file name and
- substitute \c{.obj}. For Unix object file formats (\c{aout}, \c{as86},
- \c{coff}, \c{elf32}, \c{elf64}, \c{elfx32}, \c{ieee}, \c{macho32} and
- \c{macho64}) it will substitute \c{.o}. For \c{dbg}, \c{rdf}, \c{ith}
- and \c{srec}, it will use \c{.dbg}, \c{.rdf}, \c{.ith} and \c{.srec},
- respectively, and for the \c{bin} format it will simply remove the
- extension, so that \c{myfile.asm} produces the output file \c{myfile}.
- If the output file already exists, NASM will overwrite it, unless it
- has the same name as the input file, in which case it will give a
- warning and use \i\c{nasm.out} as the output file name instead.
- For situations in which this behaviour is unacceptable, NASM
- provides the \c{-o} command-line option, which allows you to specify
- your desired output file name. You invoke \c{-o} by following it
- with the name you wish for the output file, either with or without
- an intervening space. For example:
- \c nasm -f bin program.asm -o program.com
- \c nasm -f bin driver.asm -odriver.sys
- Note that this is a small o, and is different from a capital O , which
- is used to specify the number of optimisation passes required. See \k{opt-O}.
- \S{opt-f} The \i\c{-f} Option: Specifying the \i{Output File Format}
- If you do not supply the \c{-f} option to NASM, it will choose an
- output file format for you itself. In the distribution versions of
- NASM, the default is always \i\c{bin}; if you've compiled your own
- copy of NASM, you can redefine \i\c{OF_DEFAULT} at compile time and
- choose what you want the default to be.
- Like \c{-o}, the intervening space between \c{-f} and the output
- file format is optional; so \c{-f elf} and \c{-felf} are both valid.
- A complete list of the available output file formats can be given by
- issuing the command \i\c{nasm -hf}.
- \S{opt-l} The \i\c{-l} Option: Generating a \i{Listing File}
- If you supply the \c{-l} option to NASM, followed (with the usual
- optional space) by a file name, NASM will generate a
- \i{source-listing file} for you, in which addresses and generated
- code are listed on the left, and the actual source code, with
- expansions of multi-line macros (except those which specifically
- request no expansion in source listings: see \k{nolist}) on the
- right. For example:
- \c nasm -f elf myfile.asm -l myfile.lst
- If a list file is selected, you may turn off listing for a
- section of your source with \c{[list -]}, and turn it back on
- with \c{[list +]}, (the default, obviously). There is no "user
- form" (without the brackets). This can be used to list only
- sections of interest, avoiding excessively long listings.
- \S{opt-M} The \i\c{-M} Option: Generate \i{Makefile Dependencies}
- This option can be used to generate makefile dependencies on stdout.
- This can be redirected to a file for further processing. For example:
- \c nasm -M myfile.asm > myfile.dep
- \S{opt-MG} The \i\c{-MG} Option: Generate \i{Makefile Dependencies}
- This option can be used to generate makefile dependencies on stdout.
- This differs from the \c{-M} option in that if a nonexisting file is
- encountered, it is assumed to be a generated file and is added to the
- dependency list without a prefix.
- \S{opt-MF} The \i\c\{-MF} Option: Set Makefile Dependency File
- This option can be used with the \c{-M} or \c{-MG} options to send the
- output to a file, rather than to stdout. For example:
- \c nasm -M -MF myfile.dep myfile.asm
- \S{opt-MD} The \i\c{-MD} Option: Assemble and Generate Dependencies
- The \c{-MD} option acts as the combination of the \c{-M} and \c{-MF}
- options (i.e. a filename has to be specified.) However, unlike the
- \c{-M} or \c{-MG} options, \c{-MD} does \e{not} inhibit the normal
- operation of the assembler. Use this to automatically generate
- updated dependencies with every assembly session. For example:
- \c nasm -f elf -o myfile.o -MD myfile.dep myfile.asm
- If the argument after \c{-MD} is an option rather than a filename,
- then the output filename is the first applicable one of:
- \b the filename set in the \c{-MF} option;
- \b the output filename from the \c{-o} option with \c{.d} appended;
- \b the input filename with the extension set to \c{.d}.
- \S{opt-MT} The \i\c{-MT} Option: Dependency Target Name
- The \c{-MT} option can be used to override the default name of the
- dependency target. This is normally the same as the output filename,
- specified by the \c{-o} option.
- \S{opt-MQ} The \i\c{-MQ} Option: Dependency Target Name (Quoted)
- The \c{-MQ} option acts as the \c{-MT} option, except it tries to
- quote characters that have special meaning in Makefile syntax. This
- is not foolproof, as not all characters with special meaning are
- quotable in Make. The default output (if no \c{-MT} or \c{-MQ} option
- is specified) is automatically quoted.
- \S{opt-MP} The \i\c{-MP} Option: Emit phony targets
- When used with any of the dependency generation options, the \c{-MP}
- option causes NASM to emit a phony target without dependencies for
- each header file. This prevents Make from complaining if a header
- file has been removed.
- \S{opt-MW} The \i\c{-MW} Option: Watcom Make quoting style
- This option causes NASM to attempt to quote dependencies according to
- Watcom Make conventions rather than POSIX Make conventions (also used
- by most other Make variants.) This quotes \c{#} as \c{$#} rather than
- \c{\\#}, uses \c{&} rather than \c{\\} for continuation lines, and
- encloses filenames containing whitespace in double quotes.
- \S{opt-F} The \i\c{-F} Option: Selecting a \i{Debug Information Format}
- This option is used to select the format of the debug information
- emitted into the output file, to be used by a debugger (or \e{will}
- be). Prior to version 2.03.01, the use of this switch did \e{not} enable
- output of the selected debug info format. Use \c{-g}, see \k{opt-g},
- to enable output. Versions 2.03.01 and later automatically enable \c{-g}
- if \c{-F} is specified.
- A complete list of the available debug file formats for an output
- format can be seen by issuing the command \c{nasm -f <format> -y}. Not
- all output formats currently support debugging output. See \k{opt-y}.
- This should not be confused with the \c{-f dbg} output format option,
- see \k{dbgfmt}.
- \S{opt-g} The \i\c{-g} Option: Enabling \i{Debug Information}.
- This option can be used to generate debugging information in the specified
- format. See \k{opt-F}. Using \c{-g} without \c{-F} results in emitting
- debug info in the default format, if any, for the selected output format.
- If no debug information is currently implemented in the selected output
- format, \c{-g} is \e{silently ignored}.
- \S{opt-X} The \i\c{-X} Option: Selecting an \i{Error Reporting Format}
- This option can be used to select an error reporting format for any
- error messages that might be produced by NASM.
- Currently, two error reporting formats may be selected. They are
- the \c{-Xvc} option and the \c{-Xgnu} option. The GNU format is
- the default and looks like this:
- \c filename.asm:65: error: specific error message
- where \c{filename.asm} is the name of the source file in which the
- error was detected, \c{65} is the source file line number on which
- the error was detected, \c{error} is the severity of the error (this
- could be \c{warning}), and \c{specific error message} is a more
- detailed text message which should help pinpoint the exact problem.
- The other format, specified by \c{-Xvc} is the style used by Microsoft
- Visual C++ and some other programs. It looks like this:
- \c filename.asm(65) : error: specific error message
- where the only difference is that the line number is in parentheses
- instead of being delimited by colons.
- See also the \c{Visual C++} output format, \k{win32fmt}.
- \S{opt-Z} The \i\c{-Z} Option: Send Errors to a File
- Under \I{DOS}\c{MS-DOS} it can be difficult (though there are ways) to
- redirect the standard-error output of a program to a file. Since
- NASM usually produces its warning and \i{error messages} on
- \i\c{stderr}, this can make it hard to capture the errors if (for
- example) you want to load them into an editor.
- NASM therefore provides the \c{-Z} option, taking a filename argument
- which causes errors to be sent to the specified files rather than
- standard error. Therefore you can \I{redirecting errors}redirect
- the errors into a file by typing
- \c nasm -Z myfile.err -f obj myfile.asm
- In earlier versions of NASM, this option was called \c{-E}, but it was
- changed since \c{-E} is an option conventionally used for
- preprocessing only, with disastrous results. See \k{opt-E}.
- \S{opt-s} The \i\c{-s} Option: Send Errors to \i\c{stdout}
- The \c{-s} option redirects \i{error messages} to \c{stdout} rather
- than \c{stderr}, so it can be redirected under \I{DOS}\c{MS-DOS}. To
- assemble the file \c{myfile.asm} and pipe its output to the \c{more}
- program, you can type:
- \c nasm -s -f obj myfile.asm | more
- See also the \c{-Z} option, \k{opt-Z}.
- \S{opt-i} The \i\c{-i}\I\c{-I} Option: Include File Search Directories
- When NASM sees the \i\c{%include} or \i\c{%pathsearch} directive in a
- source file (see \k{include}, \k{pathsearch} or \k{incbin}), it will
- search for the given file not only in the current directory, but also
- in any directories specified on the command line by the use of the
- \c{-i} option. Therefore you can include files from a \i{macro
- library}, for example, by typing
- \c nasm -ic:\macrolib\ -f obj myfile.asm
- (As usual, a space between \c{-i} and the path name is allowed, and
- optional).
- Prior NASM 2.14 a path provided in the option has been considered as
- a verbatim copy and providing a path separator been up to a caller.
- One could implicitly concatenate a search path together with a filename.
- Still this was rather a trick than something useful. Now the trailing
- path separator is made to always present, thus \c{-ifoo} will be
- considered as the \c{-ifoo/} directory.
- If you want to define a \e{standard} \i{include search path},
- similar to \c{/usr/include} on Unix systems, you should place one or
- more \c{-i} directives in the \c{NASMENV} environment variable (see
- \k{nasmenv}).
- For Makefile compatibility with many C compilers, this option can also
- be specified as \c{-I}.
- \S{opt-p} The \i\c{-p}\I\c{-P} Option: \I{pre-including files}Pre-Include a File
- \I\c{%include}NASM allows you to specify files to be
- \e{pre-included} into your source file, by the use of the \c{-p}
- option. So running
- \c nasm myfile.asm -p myinc.inc
- is equivalent to running \c{nasm myfile.asm} and placing the
- directive \c{%include "myinc.inc"} at the start of the file.
- \c{--include} option is also accepted.
- For consistency with the \c{-I}, \c{-D} and \c{-U} options, this
- option can also be specified as \c{-P}.
- \S{opt-d} The \i\c{-d}\I\c{-D} Option: \I{pre-defining macros}Pre-Define a Macro
- \I\c{%define}Just as the \c{-p} option gives an alternative to placing
- \c{%include} directives at the start of a source file, the \c{-d}
- option gives an alternative to placing a \c{%define} directive. You
- could code
- \c nasm myfile.asm -dFOO=100
- as an alternative to placing the directive
- \c %define FOO 100
- at the start of the file. You can miss off the macro value, as well:
- the option \c{-dFOO} is equivalent to coding \c{%define FOO}. This
- form of the directive may be useful for selecting \i{assembly-time
- options} which are then tested using \c{%ifdef}, for example
- \c{-dDEBUG}.
- For Makefile compatibility with many C compilers, this option can also
- be specified as \c{-D}.
- \S{opt-u} The \i\c{-u}\I\c{-U} Option: \I{Undefining macros}Undefine a Macro
- \I\c{%undef}The \c{-u} option undefines a macro that would otherwise
- have been pre-defined, either automatically or by a \c{-p} or \c{-d}
- option specified earlier on the command lines.
- For example, the following command line:
- \c nasm myfile.asm -dFOO=100 -uFOO
- would result in \c{FOO} \e{not} being a predefined macro in the
- program. This is useful to override options specified at a different
- point in a Makefile.
- For Makefile compatibility with many C compilers, this option can also
- be specified as \c{-U}.
- \S{opt-E} The \i\c{-E}\I{-e} Option: Preprocess Only
- NASM allows the \i{preprocessor} to be run on its own, up to a
- point. Using the \c{-E} option (which requires no arguments) will
- cause NASM to preprocess its input file, expand all the macro
- references, remove all the comments and preprocessor directives, and
- print the resulting file on standard output (or save it to a file,
- if the \c{-o} option is also used).
- This option cannot be applied to programs which require the
- preprocessor to evaluate \I{preprocessor expressions}\i{expressions}
- which depend on the values of symbols: so code such as
- \c %assign tablesize ($-tablestart)
- will cause an error in \i{preprocess-only mode}.
- For compatiblity with older version of NASM, this option can also be
- written \c{-e}. \c{-E} in older versions of NASM was the equivalent
- of the current \c{-Z} option, \k{opt-Z}.
- \S{opt-a} The \i\c{-a} Option: Don't Preprocess At All
- If NASM is being used as the back end to a compiler, it might be
- desirable to \I{suppressing preprocessing}suppress preprocessing
- completely and assume the compiler has already done it, to save time
- and increase compilation speeds. The \c{-a} option, requiring no
- argument, instructs NASM to replace its powerful \i{preprocessor}
- with a \i{stub preprocessor} which does nothing.
- \S{opt-O} The \i\c{-O} Option: Specifying \i{Multipass Optimization}
- Using the \c{-O} option, you can tell NASM to carry out different
- levels of optimization. Multiple flags can be specified after the
- \c{-O} options, some of which can be combined in a single option,
- e.g. \c{-Oxv}.
- \b \c{-O0}: No optimization. All operands take their long forms,
- if a short form is not specified, except conditional jumps.
- This is intended to match NASM 0.98 behavior.
- \b \c{-O1}: Minimal optimization. As above, but immediate operands
- which will fit in a signed byte are optimized,
- unless the long form is specified. Conditional jumps default
- to the long form unless otherwise specified.
- \b \c{-Ox} (where \c{x} is the actual letter \c{x}): Multipass optimization.
- Minimize branch offsets and signed immediate bytes,
- overriding size specification unless the \c{strict} keyword
- has been used (see \k{strict}). For compatibility with earlier
- releases, the letter \c{x} may also be any number greater than
- one. This number has no effect on the actual number of passes.
- \b \c{-Ov}: At the end of assembly, print the number of passes
- actually executed.
- The \c{-Ox} mode is recommended for most uses, and is the default
- since NASM 2.09.
- Note that this is a capital \c{O}, and is different from a small \c{o}, which
- is used to specify the output file name. See \k{opt-o}.
- \S{opt-t} The \i\c{-t} Option: Enable TASM Compatibility Mode
- NASM includes a limited form of compatibility with Borland's \i\c{TASM}.
- When NASM's \c{-t} option is used, the following changes are made:
- \b local labels may be prefixed with \c{@@} instead of \c{.}
- \b size override is supported within brackets. In TASM compatible mode,
- a size override inside square brackets changes the size of the operand,
- and not the address type of the operand as it does in NASM syntax. E.g.
- \c{mov eax,[DWORD val]} is valid syntax in TASM compatibility mode.
- Note that you lose the ability to override the default address type for
- the instruction.
- \b unprefixed forms of some directives supported (\c{arg}, \c{elif},
- \c{else}, \c{endif}, \c{if}, \c{ifdef}, \c{ifdifi}, \c{ifndef},
- \c{include}, \c{local})
- \S{opt-w} The \i\c{-w} and \i\c{-W} Options: Enable or Disable Assembly \i{Warnings}
- NASM can observe many conditions during the course of assembly which
- are worth mentioning to the user, but not a sufficiently severe
- error to justify NASM refusing to generate an output file. These
- conditions are reported like errors, but come up with the word
- `warning' before the message. Warnings do not prevent NASM from
- generating an output file and returning a success status to the
- operating system.
- Some conditions are even less severe than that: they are only
- sometimes worth mentioning to the user. Therefore NASM supports the
- \c{-w} command-line option, which enables or disables certain
- classes of assembly warning. Such warning classes are described by a
- name, for example \c{orphan-labels}; you can enable warnings of
- this class by the command-line option \c{-w+orphan-labels} and
- disable it by \c{-w-orphan-labels}.
- The current \i{warning classes} are:
- \b \i\c{other} specifies any warning not otherwise specified in any
- class. Enabled by default.
- \b \i\c{macro-params} covers warnings about \i{multi-line macros}
- being invoked with the wrong number of parameters. Enabled by default;
- see \k{mlmacover} for an example of why you might want to disable it.
- \b \i\c{macro-selfref} warns if a macro references itself. Disabled by
- default.
- \b \i\c{macro-defaults} warns when a macro has more default parameters
- than optional parameters. Enabled by default; see \k{mlmacdef} for why
- you might want to disable it.
- \b \i\c{orphan-labels} covers warnings about source lines which
- contain no instruction but define a label without a trailing colon.
- NASM warns about this somewhat obscure condition by default;
- see \k{syntax} for more information.
- \b \i\c{number-overflow} covers warnings about numeric constants which
- don't fit in 64 bits. Enabled by default.
- \b \i\c{gnu-elf-extensions} warns if 8-bit or 16-bit relocations
- are used in \c{-f elf} format. The GNU extensions allow this.
- Disabled by default.
- \b \i\c{float-overflow} warns about floating point overflow.
- Enabled by default.
- \b \i\c{float-denorm} warns about floating point denormals.
- Disabled by default.
- \b \i\c{float-underflow} warns about floating point underflow.
- Disabled by default.
- \b \i\c{float-toolong} warns about too many digits in floating-point numbers.
- Enabled by default.
- \b \i\c{user} controls \c{%warning} directives (see \k{pperror}).
- Enabled by default.
- \b \i\c{lock} warns about \c{LOCK} prefixes on unlockable instructions.
- Enabled by default.
- \b \i\c{hle} warns about invalid use of the HLE \c{XACQUIRE} or \c{XRELEASE}
- prefixes.
- Enabled by default.
- \b \i\c{bnd} warns about ineffective use of the \c{BND} prefix when a relaxed
- form of jmp instruction becomes jmp short form.
- Enabled by default.
- \b \i\c{zext-reloc} warns that a relocation has been zero-extended due
- to limitations in the output format. Enabled by default.
- \b \i\c\{ptr} warns about keywords used in other assemblers that might
- indicate a mistake in the source code. Currently only the MASM
- \c{PTR} keyword is recognized. Enabled by default.
- \b \i\c{bad-pragma} warns about a malformed or otherwise unparsable
- \c{%pragma} directive. Disabled by default.
- \b \i\c{unknown-pragma} warns about an unknown \c{%pragma} directive.
- This is not yet implemented. Disabled by default.
- \b \i\c{not-my-pragma} warns about a \c{%pragma} directive which is
- not applicable to this particular assembly session. This is not yet
- implemented. Disabled by default.
- \b \i\c{unknown-warning} warns about a \c{-w} or \c{-W} option or a
- \c{[WARNING]} directive that contains an unknown warning name or is
- otherwise not possible to process. Disabled by default.
- \b \i\c{all} is an alias for \e{all} suppressible warning classes.
- Thus, \c{-w+all} enables all available warnings, and \c{-w-all}
- disables warnings entirely (since NASM 2.13).
- Since version 2.00, NASM has also supported the \c{gcc}-like syntax
- \c{-Wwarning-class} and \c{-Wno-warning-class} instead of
- \c{-w+warning-class} and \c{-w-warning-class}, respectively; both
- syntaxes work identically.
- The option \c{-w+error} or \i\c{-Werror} can be used to treat warnings
- as errors. This can be controlled on a per warning class basis
- (\c{-w+error=}\e{warning-class} or \c{-Werror=}\e{warning-class});
- if no \e{warning-class} is specified NASM treats it as
- \c{-w+error=all}; the same applies to \c{-w-error} or
- \i\c{-Wno-error},
- of course.
- In addition, you can control warnings in the source code itself, using
- the \i\c{[WARNING]} directive. See \k{asmdir-warning}.
- \S{opt-v} The \i\c{-v} Option: Display \i{Version} Info
- Typing \c{NASM -v} will display the version of NASM which you are using,
- and the date on which it was compiled.
- You will need the version number if you report a bug.
- For command-line compatibility with Yasm, the form \i\c{--v} is also
- accepted for this option starting in NASM version 2.11.05.
- \S{opt-y} The \i\c{-y} Option: Display Available Debug Info Formats
- Typing \c{nasm -f <option> -y} will display a list of the available
- debug info formats for the given output format. The default format
- is indicated by an asterisk. For example:
- \c nasm -f elf -y
- \c valid debug formats for 'elf32' output format are
- \c ('*' denotes default):
- \c * stabs ELF32 (i386) stabs debug format for Linux
- \c dwarf elf32 (i386) dwarf debug format for Linux
- \S{opt-pfix} The \i\c{--(g|l)prefix}, \i\c{--(g|l)postfix} Options.
- The \c{--(g)prefix} options prepend the given argument
- to all \c{extern}, \c{common}, \c{static}, and \c{global} symbols, and the
- \c{--lprefix} option prepends to all other symbols. Similarly,
- \c{--(g)postfix} and \c{--lpostfix} options append
- the argument in the exactly same way as the \c{--xxprefix} options does.
- Running this:
- \c nasm -f macho --gprefix _
- is equivalent to place the directive with \c{%pragma macho gprefix _}
- at the start of the file (\k{mangling}). It will prepend the underscore
- to all global and external variables, as C requires it in some, but not all,
- system calling conventions.
- \S{opt-pragma} The \i\c{--pragma} Option
- NASM accepts an argument as \c{%pragma} option, which is like placing
- a \c{%pragma} preprocess statement at the beginning of the source.
- Running this:
- \c nasm -f macho --pragma "macho gprefix _"
- is equivalent to the example in \k{opt-pfix}.
- \S{opt-before} The \i\c{--before} Option
- A preprocess statement can be accepted with this option. The example
- shown in \k{opt-pragma} is the same as running this:
- \c nasm -f macho --before "%pragma macho gprefix _"
- \S{opt-limit} The \i\c{--limit-X} Option
- This option allows user to setup various maximum values for these:
- \b\c{--limit-passes}: Number of maximum allowed passes. Default is
- effectively unlimited.
- \b\c{--limit-stalled-passes}: Maximum number of allowed unfinished
- passes. Default is 1000.
- \b\c{--limit-macro-levels}: Define maximum depth of macro expansion
- (in preprocess). Default is 1000000.
- \b\c{--limit-rep}: Maximum number of allowed preprocessor loop, defined
- under \c{%rep}. Default is 1000000.
- \b\c{--limit-eval}: This number sets the boundary condition of allowed
- expression length. Default is 1000000.
- \b\c{--limit-lines}: Total number of source lines as allowed to be
- processed. Default is 2000000000.
- In example, running this limits the maximum line count to be 1000.
- \c nasm --limit-lines 1000
- \S{opt-keep-all} The \i\c{--keep-all} Option
- This option prevents NASM from deleting any output files even if an
- error happens.
- \S{opt-no-line} The \i\c{--no-line} Option
- If this option is given, all \i\c{%line} directives in the source code
- are ignored. This can be useful for debugging already preprocessed
- code. See \k{line}.
- \S{nasmenv} The \i\c{NASMENV} \i{Environment} Variable
- If you define an environment variable called \c{NASMENV}, the program
- will interpret it as a list of extra command-line options, which are
- processed before the real command line. You can use this to define
- standard search directories for include files, by putting \c{-i}
- options in the \c{NASMENV} variable.
- The value of the variable is split up at white space, so that the
- value \c{-s -ic:\\nasmlib\\} will be treated as two separate options.
- However, that means that the value \c{-dNAME="my name"} won't do
- what you might want, because it will be split at the space and the
- NASM command-line processing will get confused by the two
- nonsensical words \c{-dNAME="my} and \c{name"}.
- To get round this, NASM provides a feature whereby, if you begin the
- \c{NASMENV} environment variable with some character that isn't a minus
- sign, then NASM will treat this character as the \i{separator
- character} for options. So setting the \c{NASMENV} variable to the
- value \c{!-s!-ic:\\nasmlib\\} is equivalent to setting it to \c{-s
- -ic:\\nasmlib\\}, but \c{!-dNAME="my name"} will work.
- This environment variable was previously called \c{NASM}. This was
- changed with version 0.98.31.
- \H{qstart} \i{Quick Start} for \i{MASM} Users
- If you're used to writing programs with MASM, or with \i{TASM} in
- MASM-compatible (non-Ideal) mode, or with \i\c{a86}, this section
- attempts to outline the major differences between MASM's syntax and
- NASM's. If you're not already used to MASM, it's probably worth
- skipping this section.
- \S{qscs} NASM Is \I{case sensitivity}Case-Sensitive
- One simple difference is that NASM is case-sensitive. It makes a
- difference whether you call your label \c{foo}, \c{Foo} or \c{FOO}.
- If you're assembling to \c{DOS} or \c{OS/2} \c{.OBJ} files, you can
- invoke the \i\c{UPPERCASE} directive (documented in \k{objfmt}) to
- ensure that all symbols exported to other code modules are forced
- to be upper case; but even then, \e{within} a single module, NASM
- will distinguish between labels differing only in case.
- \S{qsbrackets} NASM Requires \i{Square Brackets} For \i{Memory References}
- NASM was designed with simplicity of syntax in mind. One of the
- \i{design goals} of NASM is that it should be possible, as far as is
- practical, for the user to look at a single line of NASM code
- and tell what opcode is generated by it. You can't do this in MASM:
- if you declare, for example,
- \c foo equ 1
- \c bar dw 2
- then the two lines of code
- \c mov ax,foo
- \c mov ax,bar
- generate completely different opcodes, despite having
- identical-looking syntaxes.
- NASM avoids this undesirable situation by having a much simpler
- syntax for memory references. The rule is simply that any access to
- the \e{contents} of a memory location requires square brackets
- around the address, and any access to the \e{address} of a variable
- doesn't. So an instruction of the form \c{mov ax,foo} will
- \e{always} refer to a compile-time constant, whether it's an \c{EQU}
- or the address of a variable; and to access the \e{contents} of the
- variable \c{bar}, you must code \c{mov ax,[bar]}.
- This also means that NASM has no need for MASM's \i\c{OFFSET}
- keyword, since the MASM code \c{mov ax,offset bar} means exactly the
- same thing as NASM's \c{mov ax,bar}. If you're trying to get
- large amounts of MASM code to assemble sensibly under NASM, you
- can always code \c{%idefine offset} to make the preprocessor treat
- the \c{OFFSET} keyword as a no-op.
- This issue is even more confusing in \i\c{a86}, where declaring a
- label with a trailing colon defines it to be a `label' as opposed to
- a `variable' and causes \c{a86} to adopt NASM-style semantics; so in
- \c{a86}, \c{mov ax,var} has different behaviour depending on whether
- \c{var} was declared as \c{var: dw 0} (a label) or \c{var dw 0} (a
- word-size variable). NASM is very simple by comparison:
- \e{everything} is a label.
- NASM, in the interests of simplicity, also does not support the
- \i{hybrid syntaxes} supported by MASM and its clones, such as
- \c{mov ax,table[bx]}, where a memory reference is denoted by one
- portion outside square brackets and another portion inside. The
- correct syntax for the above is \c{mov ax,[table+bx]}. Likewise,
- \c{mov ax,es:[di]} is wrong and \c{mov ax,[es:di]} is right.
- \S{qstypes} NASM Doesn't Store \i{Variable Types}
- NASM, by design, chooses not to remember the types of variables you
- declare. Whereas MASM will remember, on seeing \c{var dw 0}, that
- you declared \c{var} as a word-size variable, and will then be able
- to fill in the \i{ambiguity} in the size of the instruction \c{mov
- var,2}, NASM will deliberately remember nothing about the symbol
- \c{var} except where it begins, and so you must explicitly code
- \c{mov word [var],2}.
- For this reason, NASM doesn't support the \c{LODS}, \c{MOVS},
- \c{STOS}, \c{SCAS}, \c{CMPS}, \c{INS}, or \c{OUTS} instructions,
- but only supports the forms such as \c{LODSB}, \c{MOVSW}, and
- \c{SCASD}, which explicitly specify the size of the components of
- the strings being manipulated.
- \S{qsassume} NASM Doesn't \i\c{ASSUME}
- As part of NASM's drive for simplicity, it also does not support the
- \c{ASSUME} directive. NASM will not keep track of what values you
- choose to put in your segment registers, and will never
- \e{automatically} generate a \i{segment override} prefix.
- \S{qsmodel} NASM Doesn't Support \i{Memory Models}
- NASM also does not have any directives to support different 16-bit
- memory models. The programmer has to keep track of which functions
- are supposed to be called with a \i{far call} and which with a
- \i{near call}, and is responsible for putting the correct form of
- \c{RET} instruction (\c{RETN} or \c{RETF}; NASM accepts \c{RET}
- itself as an alternate form for \c{RETN}); in addition, the
- programmer is responsible for coding CALL FAR instructions where
- necessary when calling \e{external} functions, and must also keep
- track of which external variable definitions are far and which are
- near.
- \S{qsfpu} \i{Floating-Point} Differences
- NASM uses different names to refer to floating-point registers from
- MASM: where MASM would call them \c{ST(0)}, \c{ST(1)} and so on, and
- \i\c{a86} would call them simply \c{0}, \c{1} and so on, NASM
- chooses to call them \c{st0}, \c{st1} etc.
- As of version 0.96, NASM now treats the instructions with
- \i{`nowait'} forms in the same way as MASM-compatible assemblers.
- The idiosyncratic treatment employed by 0.95 and earlier was based
- on a misunderstanding by the authors.
- \S{qsother} Other Differences
- For historical reasons, NASM uses the keyword \i\c{TWORD} where MASM
- and compatible assemblers use \i\c{TBYTE}.
- NASM does not declare \i{uninitialized storage} in the same way as
- MASM: where a MASM programmer might use \c{stack db 64 dup (?)},
- NASM requires \c{stack resb 64}, intended to be read as `reserve 64
- bytes'. For a limited amount of compatibility, since NASM treats
- \c{?} as a valid character in symbol names, you can code \c{? equ 0}
- and then writing \c{dw ?} will at least do something vaguely useful.
- \I\c{RESB}\i\c{DUP} is still not a supported syntax, however.
- In addition to all of this, macros and directives work completely
- differently to MASM. See \k{preproc} and \k{directive} for further
- details.
- \C{lang} The NASM Language
- \H{syntax} Layout of a NASM Source Line
- Like most assemblers, each NASM source line contains (unless it
- is a macro, a preprocessor directive or an assembler directive: see
- \k{preproc} and \k{directive}) some combination of the four fields
- \c label: instruction operands ; comment
- As usual, most of these fields are optional; the presence or absence
- of any combination of a label, an instruction and a comment is allowed.
- Of course, the operand field is either required or forbidden by the
- presence and nature of the instruction field.
- NASM uses backslash (\\) as the line continuation character; if a line
- ends with backslash, the next line is considered to be a part of the
- backslash-ended line.
- NASM places no restrictions on white space within a line: labels may
- have white space before them, or instructions may have no space
- before them, or anything. The \i{colon} after a label is also
- optional. (Note that this means that if you intend to code \c{lodsb}
- alone on a line, and type \c{lodab} by accident, then that's still a
- valid source line which does nothing but define a label. Running
- NASM with the command-line option
- \I{orphan-labels}\c{-w+orphan-labels} will cause it to warn you if
- you define a label alone on a line without a \i{trailing colon}.)
- \i{Valid characters} in labels are letters, numbers, \c{_}, \c{$},
- \c{#}, \c{@}, \c{~}, \c{.}, and \c{?}. The only characters which may
- be used as the \e{first} character of an identifier are letters,
- \c{.} (with special meaning: see \k{locallab}), \c{_} and \c{?}.
- An identifier may also be prefixed with a \I{$, prefix}\c{$} to
- indicate that it is intended to be read as an identifier and not a
- reserved word; thus, if some other module you are linking with
- defines a symbol called \c{eax}, you can refer to \c{$eax} in NASM
- code to distinguish the symbol from the register. Maximum length of
- an identifier is 4095 characters.
- The instruction field may contain any machine instruction: Pentium
- and P6 instructions, FPU instructions, MMX instructions and even
- undocumented instructions are all supported. The instruction may be
- prefixed by \c{LOCK}, \c{REP}, \c{REPE}/\c{REPZ}, \c{REPNE}/\c{REPNZ},
- \c{XACQUIRE}/\c{XRELEASE} or \c{BND}/\c{NOBND}, in the usual way. Explicit
- \I{address-size prefixes}address-size and \i{operand-size prefixes} \i\c{A16},
- \i\c{A32}, \i\c{A64}, \i\c{O16} and \i\c{O32}, \i\c{O64} are provided - one example of their use
- is given in \k{mixsize}. You can also use the name of a \I{segment
- override}segment register as an instruction prefix: coding
- \c{es mov [bx],ax} is equivalent to coding \c{mov [es:bx],ax}. We
- recommend the latter syntax, since it is consistent with other
- syntactic features of the language, but for instructions such as
- \c{LODSB}, which has no operands and yet can require a segment
- override, there is no clean syntactic way to proceed apart from
- \c{es lodsb}.
- An instruction is not required to use a prefix: prefixes such as
- \c{CS}, \c{A32}, \c{LOCK} or \c{REPE} can appear on a line by
- themselves, and NASM will just generate the prefix bytes.
- In addition to actual machine instructions, NASM also supports a
- number of pseudo-instructions, described in \k{pseudop}.
- Instruction \i{operands} may take a number of forms: they can be
- registers, described simply by the register name (e.g. \c{ax},
- \c{bp}, \c{ebx}, \c{cr0}: NASM does not use the \c{gas}-style
- syntax in which register names must be prefixed by a \c{%} sign), or
- they can be \i{effective addresses} (see \k{effaddr}), constants
- (\k{const}) or expressions (\k{expr}).
- For x87 \i{floating-point} instructions, NASM accepts a wide range of
- syntaxes: you can use two-operand forms like MASM supports, or you
- can use NASM's native single-operand forms in most cases.
- \# Details of
- \# all forms of each supported instruction are given in
- \# \k{iref}.
- For example, you can code:
- \c fadd st1 ; this sets st0 := st0 + st1
- \c fadd st0,st1 ; so does this
- \c
- \c fadd st1,st0 ; this sets st1 := st1 + st0
- \c fadd to st1 ; so does this
- Almost any x87 floating-point instruction that references memory must
- use one of the prefixes \i\c{DWORD}, \i\c{QWORD} or \i\c{TWORD} to
- indicate what size of \i{memory operand} it refers to.
- \H{pseudop} \i{Pseudo-Instructions}
- Pseudo-instructions are things which, though not real x86 machine
- instructions, are used in the instruction field anyway because that's
- the most convenient place to put them. The current pseudo-instructions
- are \i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ}, \i\c{DT}, \i\c{DO},
- \i\c{DY} and \i\c\{DZ}; their \i{uninitialized} counterparts
- \i\c{RESB}, \i\c{RESW}, \i\c{RESD}, \i\c{RESQ}, \i\c{REST},
- \i\c{RESO}, \i\c{RESY} and \i\c\{RESZ}; the \i\c{INCBIN} command, the
- \i\c{EQU} command, and the \i\c{TIMES} prefix.
- \S{db} \c{DB} and Friends: Declaring Initialized Data
- \i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ}, \i\c{DT}, \i\c{DO}, \i\c{DY}
- and \i\c{DZ} are used, much as in MASM, to declare initialized data in
- the output file. They can be invoked in a wide range of ways:
- \I{floating-point}\I{character constant}\I{string constant}
- \c db 0x55 ; just the byte 0x55
- \c db 0x55,0x56,0x57 ; three bytes in succession
- \c db 'a',0x55 ; character constants are OK
- \c db 'hello',13,10,'$' ; so are string constants
- \c dw 0x1234 ; 0x34 0x12
- \c dw 'a' ; 0x61 0x00 (it's just a number)
- \c dw 'ab' ; 0x61 0x62 (character constant)
- \c dw 'abc' ; 0x61 0x62 0x63 0x00 (string)
- \c dd 0x12345678 ; 0x78 0x56 0x34 0x12
- \c dd 1.234567e20 ; floating-point constant
- \c dq 0x123456789abcdef0 ; eight byte constant
- \c dq 1.234567e20 ; double-precision float
- \c dt 1.234567e20 ; extended-precision float
- \c{DT}, \c{DO}, \c{DY} and \c{DZ} do not accept \i{numeric constants}
- as operands.
- \S{resb} \c{RESB} and Friends: Declaring \i{Uninitialized} Data
- \i\c{RESB}, \i\c{RESW}, \i\c{RESD}, \i\c{RESQ}, \i\c{REST},
- \i\c{RESO}, \i\c{RESY} and \i\c\{RESZ} are designed to be used in the
- BSS section of a module: they declare \e{uninitialized} storage
- space. Each takes a single operand, which is the number of bytes,
- words, doublewords or whatever to reserve. As stated in \k{qsother},
- NASM does not support the MASM/TASM syntax of reserving uninitialized
- space by writing \I\c{?}\c{DW ?} or similar things: this is what it
- does instead. The operand to a \c{RESB}-type pseudo-instruction is a
- \i\e{critical expression}: see \k{crit}.
- For example:
- \c buffer: resb 64 ; reserve 64 bytes
- \c wordvar: resw 1 ; reserve a word
- \c realarray resq 10 ; array of ten reals
- \c ymmval: resy 1 ; one YMM register
- \c zmmvals: resz 32 ; 32 ZMM registers
- \S{incbin} \i\c{INCBIN}: Including External \i{Binary Files}
- \c{INCBIN} is borrowed from the old Amiga assembler \i{DevPac}: it
- includes a binary file verbatim into the output file. This can be
- handy for (for example) including \i{graphics} and \i{sound} data
- directly into a game executable file. It can be called in one of
- these three ways:
- \c incbin "file.dat" ; include the whole file
- \c incbin "file.dat",1024 ; skip the first 1024 bytes
- \c incbin "file.dat",1024,512 ; skip the first 1024, and
- \c ; actually include at most 512
- \c{INCBIN} is both a directive and a standard macro; the standard
- macro version searches for the file in the include file search path
- and adds the file to the dependency lists. This macro can be
- overridden if desired.
- \S{equ} \i\c{EQU}: Defining Constants
- \c{EQU} defines a symbol to a given constant value: when \c{EQU} is
- used, the source line must contain a label. The action of \c{EQU} is
- to define the given label name to the value of its (only) operand.
- This definition is absolute, and cannot change later. So, for
- example,
- \c message db 'hello, world'
- \c msglen equ $-message
- defines \c{msglen} to be the constant 12. \c{msglen} may not then be
- redefined later. This is not a \i{preprocessor} definition either:
- the value of \c{msglen} is evaluated \e{once}, using the value of
- \c{$} (see \k{expr} for an explanation of \c{$}) at the point of
- definition, rather than being evaluated wherever it is referenced
- and using the value of \c{$} at the point of reference.
- \S{times} \i\c{TIMES}: \i{Repeating} Instructions or Data
- The \c{TIMES} prefix causes the instruction to be assembled multiple
- times. This is partly present as NASM's equivalent of the \i\c{DUP}
- syntax supported by \i{MASM}-compatible assemblers, in that you can
- code
- \c zerobuf: times 64 db 0
- or similar things; but \c{TIMES} is more versatile than that. The
- argument to \c{TIMES} is not just a numeric constant, but a numeric
- \e{expression}, so you can do things like
- \c buffer: db 'hello, world'
- \c times 64-$+buffer db ' '
- which will store exactly enough spaces to make the total length of
- \c{buffer} up to 64. Finally, \c{TIMES} can be applied to ordinary
- instructions, so you can code trivial \i{unrolled loops} in it:
- \c times 100 movsb
- Note that there is no effective difference between \c{times 100 resb
- 1} and \c{resb 100}, except that the latter will be assembled about
- 100 times faster due to the internal structure of the assembler.
- The operand to \c{TIMES} is a critical expression (\k{crit}).
- Note also that \c{TIMES} can't be applied to \i{macros}: the reason
- for this is that \c{TIMES} is processed after the macro phase, which
- allows the argument to \c{TIMES} to contain expressions such as
- \c{64-$+buffer} as above. To repeat more than one line of code, or a
- complex macro, use the preprocessor \i\c{%rep} directive.
- \H{effaddr} Effective Addresses
- An \i{effective address} is any operand to an instruction which
- \I{memory reference}references memory. Effective addresses, in NASM,
- have a very simple syntax: they consist of an expression evaluating
- to the desired address, enclosed in \i{square brackets}. For
- example:
- \c wordvar dw 123
- \c mov ax,[wordvar]
- \c mov ax,[wordvar+1]
- \c mov ax,[es:wordvar+bx]
- Anything not conforming to this simple system is not a valid memory
- reference in NASM, for example \c{es:wordvar[bx]}.
- More complicated effective addresses, such as those involving more
- than one register, work in exactly the same way:
- \c mov eax,[ebx*2+ecx+offset]
- \c mov ax,[bp+di+8]
- NASM is capable of doing \i{algebra} on these effective addresses,
- so that things which don't necessarily \e{look} legal are perfectly
- all right:
- \c mov eax,[ebx*5] ; assembles as [ebx*4+ebx]
- \c mov eax,[label1*2-label2] ; ie [label1+(label1-label2)]
- Some forms of effective address have more than one assembled form;
- in most such cases NASM will generate the smallest form it can. For
- example, there are distinct assembled forms for the 32-bit effective
- addresses \c{[eax*2+0]} and \c{[eax+eax]}, and NASM will generally
- generate the latter on the grounds that the former requires four
- bytes to store a zero offset.
- NASM has a hinting mechanism which will cause \c{[eax+ebx]} and
- \c{[ebx+eax]} to generate different opcodes; this is occasionally
- useful because \c{[esi+ebp]} and \c{[ebp+esi]} have different
- default segment registers.
- However, you can force NASM to generate an effective address in a
- particular form by the use of the keywords \c{BYTE}, \c{WORD},
- \c{DWORD} and \c{NOSPLIT}. If you need \c{[eax+3]} to be assembled
- using a double-word offset field instead of the one byte NASM will
- normally generate, you can code \c{[dword eax+3]}. Similarly, you
- can force NASM to use a byte offset for a small value which it
- hasn't seen on the first pass (see \k{crit} for an example of such a
- code fragment) by using \c{[byte eax+offset]}. As special cases,
- \c{[byte eax]} will code \c{[eax+0]} with a byte offset of zero, and
- \c{[dword eax]} will code it with a double-word offset of zero. The
- normal form, \c{[eax]}, will be coded with no offset field.
- The form described in the previous paragraph is also useful if you
- are trying to access data in a 32-bit segment from within 16 bit code.
- For more information on this see the section on mixed-size addressing
- (\k{mixaddr}). In particular, if you need to access data with a known
- offset that is larger than will fit in a 16-bit value, if you don't
- specify that it is a dword offset, nasm will cause the high word of
- the offset to be lost.
- Similarly, NASM will split \c{[eax*2]} into \c{[eax+eax]} because
- that allows the offset field to be absent and space to be saved; in
- fact, it will also split \c{[eax*2+offset]} into
- \c{[eax+eax+offset]}. You can combat this behaviour by the use of
- the \c{NOSPLIT} keyword: \c{[nosplit eax*2]} will force
- \c{[eax*2+0]} to be generated literally. \c{[nosplit eax*1]} also has the
- same effect. In another way, a split EA form \c{[0, eax*2]} can be used, too.
- However, \c{NOSPLIT} in \c{[nosplit eax+eax]} will be ignored because user's
- intention here is considered as \c{[eax+eax]}.
- In 64-bit mode, NASM will by default generate absolute addresses. The
- \i\c{REL} keyword makes it produce \c{RIP}-relative addresses. Since
- this is frequently the normally desired behaviour, see the \c{DEFAULT}
- directive (\k{default}). The keyword \i\c{ABS} overrides \i\c{REL}.
- A new form of split effective addres syntax is also supported. This is
- mainly intended for mib operands as used by MPX instructions, but can
- be used for any memory reference. The basic concept of this form is
- splitting base and index.
- \c mov eax,[ebx+8,ecx*4] ; ebx=base, ecx=index, 4=scale, 8=disp
- For mib operands, there are several ways of writing effective address depending
- on the tools. NASM supports all currently possible ways of mib syntax:
- \c ; bndstx
- \c ; next 5 lines are parsed same
- \c ; base=rax, index=rbx, scale=1, displacement=3
- \c bndstx [rax+0x3,rbx], bnd0 ; NASM - split EA
- \c bndstx [rbx*1+rax+0x3], bnd0 ; GAS - '*1' indecates an index reg
- \c bndstx [rax+rbx+3], bnd0 ; GAS - without hints
- \c bndstx [rax+0x3], bnd0, rbx ; ICC-1
- \c bndstx [rax+0x3], rbx, bnd0 ; ICC-2
- When broadcasting decorator is used, the opsize keyword should match
- the size of each element.
- \c VDIVPS zmm4, zmm5, dword [rbx]{1to16} ; single-precision float
- \c VDIVPS zmm4, zmm5, zword [rbx] ; packed 512 bit memory
- \H{const} \i{Constants}
- NASM understands four different types of constant: numeric,
- character, string and floating-point.
- \S{numconst} \i{Numeric Constants}
- A numeric constant is simply a number. NASM allows you to specify
- numbers in a variety of number bases, in a variety of ways: you can
- suffix \c{H} or \c{X}, \c{D} or \c{T}, \c{Q} or \c{O}, and \c{B} or
- \c{Y} for \i{hexadecimal}, \i{decimal}, \i{octal} and \i{binary}
- respectively, or you can prefix \c{0x}, for hexadecimal in the style
- of C, or you can prefix \c{$} for hexadecimal in the style of Borland
- Pascal or Motorola Assemblers. Note, though, that the \I{$,
- prefix}\c{$} prefix does double duty as a prefix on identifiers (see
- \k{syntax}), so a hex number prefixed with a \c{$} sign must have a
- digit after the \c{$} rather than a letter. In addition, current
- versions of NASM accept the prefix \c{0h} for hexadecimal, \c{0d} or
- \c{0t} for decimal, \c{0o} or \c{0q} for octal, and \c{0b} or \c{0y}
- for binary. Please note that unlike C, a \c{0} prefix by itself does
- \e{not} imply an octal constant!
- Numeric constants can have underscores (\c{_}) interspersed to break
- up long strings.
- Some examples (all producing exactly the same code):
- \c mov ax,200 ; decimal
- \c mov ax,0200 ; still decimal
- \c mov ax,0200d ; explicitly decimal
- \c mov ax,0d200 ; also decimal
- \c mov ax,0c8h ; hex
- \c mov ax,$0c8 ; hex again: the 0 is required
- \c mov ax,0xc8 ; hex yet again
- \c mov ax,0hc8 ; still hex
- \c mov ax,310q ; octal
- \c mov ax,310o ; octal again
- \c mov ax,0o310 ; octal yet again
- \c mov ax,0q310 ; octal yet again
- \c mov ax,11001000b ; binary
- \c mov ax,1100_1000b ; same binary constant
- \c mov ax,1100_1000y ; same binary constant once more
- \c mov ax,0b1100_1000 ; same binary constant yet again
- \c mov ax,0y1100_1000 ; same binary constant yet again
- \S{strings} \I{Strings}\i{Character Strings}
- A character string consists of up to eight characters enclosed in
- either single quotes (\c{'...'}), double quotes (\c{"..."}) or
- backquotes (\c{`...`}). Single or double quotes are equivalent to
- NASM (except of course that surrounding the constant with single
- quotes allows double quotes to appear within it and vice versa); the
- contents of those are represented verbatim. Strings enclosed in
- backquotes support C-style \c{\\}-escapes for special characters.
- The following \i{escape sequences} are recognized by backquoted strings:
- \c \' single quote (')
- \c \" double quote (")
- \c \` backquote (`)
- \c \\\ backslash (\)
- \c \? question mark (?)
- \c \a BEL (ASCII 7)
- \c \b BS (ASCII 8)
- \c \t TAB (ASCII 9)
- \c \n LF (ASCII 10)
- \c \v VT (ASCII 11)
- \c \f FF (ASCII 12)
- \c \r CR (ASCII 13)
- \c \e ESC (ASCII 27)
- \c \377 Up to 3 octal digits - literal byte
- \c \xFF Up to 2 hexadecimal digits - literal byte
- \c \u1234 4 hexadecimal digits - Unicode character
- \c \U12345678 8 hexadecimal digits - Unicode character
- All other escape sequences are reserved. Note that \c{\\0}, meaning a
- \c{NUL} character (ASCII 0), is a special case of the octal escape
- sequence.
- \i{Unicode} characters specified with \c{\\u} or \c{\\U} are converted to
- \i{UTF-8}. For example, the following lines are all equivalent:
- \c db `\u263a` ; UTF-8 smiley face
- \c db `\xe2\x98\xba` ; UTF-8 smiley face
- \c db 0E2h, 098h, 0BAh ; UTF-8 smiley face
- \S{chrconst} \i{Character Constants}
- A character constant consists of a string up to eight bytes long, used
- in an expression context. It is treated as if it was an integer.
- A character constant with more than one byte will be arranged
- with \i{little-endian} order in mind: if you code
- \c mov eax,'abcd'
- then the constant generated is not \c{0x61626364}, but
- \c{0x64636261}, so that if you were then to store the value into
- memory, it would read \c{abcd} rather than \c{dcba}. This is also
- the sense of character constants understood by the Pentium's
- \i\c{CPUID} instruction.
- \S{strconst} \i{String Constants}
- String constants are character strings used in the context of some
- pseudo-instructions, namely the
- \I\c{DW}\I\c{DD}\I\c{DQ}\I\c{DT}\I\c{DO}\I\c{DY}\i\c{DB} family and
- \i\c{INCBIN} (where it represents a filename.) They are also used in
- certain preprocessor directives.
- A string constant looks like a character constant, only longer. It
- is treated as a concatenation of maximum-size character constants
- for the conditions. So the following are equivalent:
- \c db 'hello' ; string constant
- \c db 'h','e','l','l','o' ; equivalent character constants
- And the following are also equivalent:
- \c dd 'ninechars' ; doubleword string constant
- \c dd 'nine','char','s' ; becomes three doublewords
- \c db 'ninechars',0,0,0 ; and really looks like this
- Note that when used in a string-supporting context, quoted strings are
- treated as a string constants even if they are short enough to be a
- character constant, because otherwise \c{db 'ab'} would have the same
- effect as \c{db 'a'}, which would be silly. Similarly, three-character
- or four-character constants are treated as strings when they are
- operands to \c{DW}, and so forth.
- \S{unicode} \I{UTF-16}\I{UTF-32}\i{Unicode} Strings
- The special operators \i\c{__utf16__}, \i\c{__utf16le__},
- \i\c{__utf16be__}, \i\c{__utf32__}, \i\c{__utf32le__} and
- \i\c{__utf32be__} allows definition of Unicode strings. They take a
- string in UTF-8 format and converts it to UTF-16 or UTF-32,
- respectively. Unless the \c{be} forms are specified, the output is
- littleendian.
- For example:
- \c %define u(x) __utf16__(x)
- \c %define w(x) __utf32__(x)
- \c
- \c dw u('C:\WINDOWS'), 0 ; Pathname in UTF-16
- \c dd w(`A + B = \u206a`), 0 ; String in UTF-32
- The UTF operators can be applied either to strings passed to the
- \c{DB} family instructions, or to character constants in an expression
- context.
- \S{fltconst} \I{floating-point, constants}Floating-Point Constants
- \i{Floating-point} constants are acceptable only as arguments to
- \i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ}, \i\c{DT}, and \i\c{DO}, or as
- arguments to the special operators \i\c{__float8__},
- \i\c{__float16__}, \i\c{__float32__}, \i\c{__float64__},
- \i\c{__float80m__}, \i\c{__float80e__}, \i\c{__float128l__}, and
- \i\c{__float128h__}.
- Floating-point constants are expressed in the traditional form:
- digits, then a period, then optionally more digits, then optionally an
- \c{E} followed by an exponent. The period is mandatory, so that NASM
- can distinguish between \c{dd 1}, which declares an integer constant,
- and \c{dd 1.0} which declares a floating-point constant.
- NASM also support C99-style hexadecimal floating-point: \c{0x},
- hexadecimal digits, period, optionally more hexadeximal digits, then
- optionally a \c{P} followed by a \e{binary} (not hexadecimal) exponent
- in decimal notation. As an extension, NASM additionally supports the
- \c{0h} and \c{$} prefixes for hexadecimal, as well binary and octal
- floating-point, using the \c{0b} or \c{0y} and \c{0o} or \c{0q}
- prefixes, respectively.
- Underscores to break up groups of digits are permitted in
- floating-point constants as well.
- Some examples:
- \c db -0.2 ; "Quarter precision"
- \c dw -0.5 ; IEEE 754r/SSE5 half precision
- \c dd 1.2 ; an easy one
- \c dd 1.222_222_222 ; underscores are permitted
- \c dd 0x1p+2 ; 1.0x2^2 = 4.0
- \c dq 0x1p+32 ; 1.0x2^32 = 4 294 967 296.0
- \c dq 1.e10 ; 10 000 000 000.0
- \c dq 1.e+10 ; synonymous with 1.e10
- \c dq 1.e-10 ; 0.000 000 000 1
- \c dt 3.141592653589793238462 ; pi
- \c do 1.e+4000 ; IEEE 754r quad precision
- The 8-bit "quarter-precision" floating-point format is
- sign:exponent:mantissa = 1:4:3 with an exponent bias of 7. This
- appears to be the most frequently used 8-bit floating-point format,
- although it is not covered by any formal standard. This is sometimes
- called a "\i{minifloat}."
- The special operators are used to produce floating-point numbers in
- other contexts. They produce the binary representation of a specific
- floating-point number as an integer, and can use anywhere integer
- constants are used in an expression. \c{__float80m__} and
- \c{__float80e__} produce the 64-bit mantissa and 16-bit exponent of an
- 80-bit floating-point number, and \c{__float128l__} and
- \c{__float128h__} produce the lower and upper 64-bit halves of a 128-bit
- floating-point number, respectively.
- For example:
- \c mov rax,__float64__(3.141592653589793238462)
- ... would assign the binary representation of pi as a 64-bit floating
- point number into \c{RAX}. This is exactly equivalent to:
- \c mov rax,0x400921fb54442d18
- NASM cannot do compile-time arithmetic on floating-point constants.
- This is because NASM is designed to be portable - although it always
- generates code to run on x86 processors, the assembler itself can
- run on any system with an ANSI C compiler. Therefore, the assembler
- cannot guarantee the presence of a floating-point unit capable of
- handling the \i{Intel number formats}, and so for NASM to be able to
- do floating arithmetic it would have to include its own complete set
- of floating-point routines, which would significantly increase the
- size of the assembler for very little benefit.
- The special tokens \i\c{__Infinity__}, \i\c{__QNaN__} (or
- \i\c{__NaN__}) and \i\c{__SNaN__} can be used to generate
- \I{infinity}infinities, quiet \i{NaN}s, and signalling NaNs,
- respectively. These are normally used as macros:
- \c %define Inf __Infinity__
- \c %define NaN __QNaN__
- \c
- \c dq +1.5, -Inf, NaN ; Double-precision constants
- The \c{%use fp} standard macro package contains a set of convenience
- macros. See \k{pkg_fp}.
- \S{bcdconst} \I{floating-point, packed BCD constants}Packed BCD Constants
- x87-style packed BCD constants can be used in the same contexts as
- 80-bit floating-point numbers. They are suffixed with \c{p} or
- prefixed with \c{0p}, and can include up to 18 decimal digits.
- As with other numeric constants, underscores can be used to separate
- digits.
- For example:
- \c dt 12_345_678_901_245_678p
- \c dt -12_345_678_901_245_678p
- \c dt +0p33
- \c dt 33p
- \H{expr} \i{Expressions}
- Expressions in NASM are similar in syntax to those in C. Expressions
- are evaluated as 64-bit integers which are then adjusted to the
- appropriate size.
- NASM supports two special tokens in expressions, allowing
- calculations to involve the current assembly position: the
- \I{$, here}\c{$} and \i\c{$$} tokens. \c{$} evaluates to the assembly
- position at the beginning of the line containing the expression; so
- you can code an \i{infinite loop} using \c{JMP $}. \c{$$} evaluates
- to the beginning of the current section; so you can tell how far
- into the section you are by using \c{($-$$)}.
- The arithmetic \i{operators} provided by NASM are listed here, in
- increasing order of \i{precedence}.
- \S{expor} \i\c{|}: \i{Bitwise OR} Operator
- The \c{|} operator gives a bitwise OR, exactly as performed by the
- \c{OR} machine instruction. Bitwise OR is the lowest-priority
- arithmetic operator supported by NASM.
- \S{expxor} \i\c{^}: \i{Bitwise XOR} Operator
- \c{^} provides the bitwise XOR operation.
- \S{expand} \i\c{&}: \i{Bitwise AND} Operator
- \c{&} provides the bitwise AND operation.
- \S{expshift} \i\c{<<} and \i\c{>>}: \i{Bit Shift} Operators
- \c{<<} gives a bit-shift to the left, just as it does in C. So \c{5<<3}
- evaluates to 5 times 8, or 40. \c{>>} gives a bit-shift to the
- right; in NASM, such a shift is \e{always} unsigned, so that
- the bits shifted in from the left-hand end are filled with zero
- rather than a sign-extension of the previous highest bit.
- \S{expplmi} \I{+ opaddition}\c{+} and \I{- opsubtraction}\c{-}:
- \i{Addition} and \i{Subtraction} Operators
- The \c{+} and \c{-} operators do perfectly ordinary addition and
- subtraction.
- \S{expmul} \i\c{*}, \i\c{/}, \i\c{//}, \i\c{%} and \i\c{%%}:
- \i{Multiplication} and \i{Division}
- \c{*} is the multiplication operator. \c{/} and \c{//} are both
- division operators: \c{/} is \i{unsigned division} and \c{//} is
- \i{signed division}. Similarly, \c{%} and \c{%%} provide \I{unsigned
- modulo}\I{modulo operators}unsigned and
- \i{signed modulo} operators respectively.
- NASM, like ANSI C, provides no guarantees about the sensible
- operation of the signed modulo operator.
- Since the \c{%} character is used extensively by the macro
- \i{preprocessor}, you should ensure that both the signed and unsigned
- modulo operators are followed by white space wherever they appear.
- \S{expmul} \i{Unary Operators}
- The highest-priority operators in NASM's expression grammar are those
- which only apply to one argument. These are \I{+ opunary}\c{+}, \I{-
- opunary}\c{-}, \i\c{~}, \I{! opunary}\c{!}, \i\c{SEG}, and the
- \i{integer functions} operators.
- \c{-} negates its operand, \c{+} does nothing (it's provided for
- symmetry with \c{-}), \c{~} computes the \i{one's complement} of its
- operand, \c{!} is the \i{logical negation} operator.
- \c{SEG} provides the \i{segment address}
- of its operand (explained in more detail in \k{segwrt}).
- A set of additional operators with leading and trailing double
- underscores are used to implement the integer functions of the
- \c{ifunc} macro package, see \k{pkg_ifunc}.
- \H{segwrt} \i\c{SEG} and \i\c{WRT}
- When writing large 16-bit programs, which must be split into
- multiple \i{segments}, it is often necessary to be able to refer to
- the \I{segment address}segment part of the address of a symbol. NASM
- supports the \c{SEG} operator to perform this function.
- The \c{SEG} operator returns the \i\e{preferred} segment base of a
- symbol, defined as the segment base relative to which the offset of
- the symbol makes sense. So the code
- \c mov ax,seg symbol
- \c mov es,ax
- \c mov bx,symbol
- will load \c{ES:BX} with a valid pointer to the symbol \c{symbol}.
- Things can be more complex than this: since 16-bit segments and
- \i{groups} may \I{overlapping segments}overlap, you might occasionally
- want to refer to some symbol using a different segment base from the
- preferred one. NASM lets you do this, by the use of the \c{WRT}
- (With Reference To) keyword. So you can do things like
- \c mov ax,weird_seg ; weird_seg is a segment base
- \c mov es,ax
- \c mov bx,symbol wrt weird_seg
- to load \c{ES:BX} with a different, but functionally equivalent,
- pointer to the symbol \c{symbol}.
- NASM supports far (inter-segment) calls and jumps by means of the
- syntax \c{call segment:offset}, where \c{segment} and \c{offset}
- both represent immediate values. So to call a far procedure, you
- could code either of
- \c call (seg procedure):procedure
- \c call weird_seg:(procedure wrt weird_seg)
- (The parentheses are included for clarity, to show the intended
- parsing of the above instructions. They are not necessary in
- practice.)
- NASM supports the syntax \I\c{CALL FAR}\c{call far procedure} as a
- synonym for the first of the above usages. \c{JMP} works identically
- to \c{CALL} in these examples.
- To declare a \i{far pointer} to a data item in a data segment, you
- must code
- \c dw symbol, seg symbol
- NASM supports no convenient synonym for this, though you can always
- invent one using the macro processor.
- \H{strict} \i\c{STRICT}: Inhibiting Optimization
- When assembling with the optimizer set to level 2 or higher (see
- \k{opt-O}), NASM will use size specifiers (\c{BYTE}, \c{WORD},
- \c{DWORD}, \c{QWORD}, \c{TWORD}, \c{OWORD}, \c{YWORD} or \c{ZWORD}),
- but will give them the smallest possible size. The keyword \c{STRICT}
- can be used to inhibit optimization and force a particular operand to
- be emitted in the specified size. For example, with the optimizer on,
- and in \c{BITS 16} mode,
- \c push dword 33
- is encoded in three bytes \c{66 6A 21}, whereas
- \c push strict dword 33
- is encoded in six bytes, with a full dword immediate operand \c{66 68
- 21 00 00 00}.
- With the optimizer off, the same code (six bytes) is generated whether
- the \c{STRICT} keyword was used or not.
- \H{crit} \i{Critical Expressions}
- Although NASM has an optional multi-pass optimizer, there are some
- expressions which must be resolvable on the first pass. These are
- called \e{Critical Expressions}.
- The first pass is used to determine the size of all the assembled
- code and data, so that the second pass, when generating all the
- code, knows all the symbol addresses the code refers to. So one
- thing NASM can't handle is code whose size depends on the value of a
- symbol declared after the code in question. For example,
- \c times (label-$) db 0
- \c label: db 'Where am I?'
- The argument to \i\c{TIMES} in this case could equally legally
- evaluate to anything at all; NASM will reject this example because
- it cannot tell the size of the \c{TIMES} line when it first sees it.
- It will just as firmly reject the slightly \I{paradox}paradoxical
- code
- \c times (label-$+1) db 0
- \c label: db 'NOW where am I?'
- in which \e{any} value for the \c{TIMES} argument is by definition
- wrong!
- NASM rejects these examples by means of a concept called a
- \e{critical expression}, which is defined to be an expression whose
- value is required to be computable in the first pass, and which must
- therefore depend only on symbols defined before it. The argument to
- the \c{TIMES} prefix is a critical expression.
- \H{locallab} \i{Local Labels}
- NASM gives special treatment to symbols beginning with a \i{period}.
- A label beginning with a single period is treated as a \e{local}
- label, which means that it is associated with the previous non-local
- label. So, for example:
- \c label1 ; some code
- \c
- \c .loop
- \c ; some more code
- \c
- \c jne .loop
- \c ret
- \c
- \c label2 ; some code
- \c
- \c .loop
- \c ; some more code
- \c
- \c jne .loop
- \c ret
- In the above code fragment, each \c{JNE} instruction jumps to the
- line immediately before it, because the two definitions of \c{.loop}
- are kept separate by virtue of each being associated with the
- previous non-local label.
- This form of local label handling is borrowed from the old Amiga
- assembler \i{DevPac}; however, NASM goes one step further, in
- allowing access to local labels from other parts of the code. This
- is achieved by means of \e{defining} a local label in terms of the
- previous non-local label: the first definition of \c{.loop} above is
- really defining a symbol called \c{label1.loop}, and the second
- defines a symbol called \c{label2.loop}. So, if you really needed
- to, you could write
- \c label3 ; some more code
- \c ; and some more
- \c
- \c jmp label1.loop
- Sometimes it is useful - in a macro, for instance - to be able to
- define a label which can be referenced from anywhere but which
- doesn't interfere with the normal local-label mechanism. Such a
- label can't be non-local because it would interfere with subsequent
- definitions of, and references to, local labels; and it can't be
- local because the macro that defined it wouldn't know the label's
- full name. NASM therefore introduces a third type of label, which is
- probably only useful in macro definitions: if a label begins with
- the \I{label prefix}special prefix \i\c{..@}, then it does nothing
- to the local label mechanism. So you could code
- \c label1: ; a non-local label
- \c .local: ; this is really label1.local
- \c ..@foo: ; this is a special symbol
- \c label2: ; another non-local label
- \c .local: ; this is really label2.local
- \c
- \c jmp ..@foo ; this will jump three lines up
- NASM has the capacity to define other special symbols beginning with
- a double period: for example, \c{..start} is used to specify the
- entry point in the \c{obj} output format (see \k{dotdotstart}),
- \c{..imagebase} is used to find out the offset from a base address
- of the current image in the \c{win64} output format (see \k{win64pic}).
- So just keep in mind that symbols beginning with a double period are
- special.
- \C{preproc} The NASM \i{Preprocessor}
- NASM contains a powerful \i{macro processor}, which supports
- conditional assembly, multi-level file inclusion, two forms of macro
- (single-line and multi-line), and a `context stack' mechanism for
- extra macro power. Preprocessor directives all begin with a \c{%}
- sign.
- The preprocessor collapses all lines which end with a backslash (\\)
- character into a single line. Thus:
- \c %define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \\
- \c THIS_VALUE
- will work like a single-line macro without the backslash-newline
- sequence.
- \H{slmacro} \i{Single-Line Macros}
- \S{define} The Normal Way: \I\c{%idefine}\i\c{%define}
- Single-line macros are defined using the \c{%define} preprocessor
- directive. The definitions work in a similar way to C; so you can do
- things like
- \c %define ctrl 0x1F &
- \c %define param(a,b) ((a)+(a)*(b))
- \c
- \c mov byte [param(2,ebx)], ctrl 'D'
- which will expand to
- \c mov byte [(2)+(2)*(ebx)], 0x1F & 'D'
- When the expansion of a single-line macro contains tokens which
- invoke another macro, the expansion is performed at invocation time,
- not at definition time. Thus the code
- \c %define a(x) 1+b(x)
- \c %define b(x) 2*x
- \c
- \c mov ax,a(8)
- will evaluate in the expected way to \c{mov ax,1+2*8}, even though
- the macro \c{b} wasn't defined at the time of definition of \c{a}.
- Macros defined with \c{%define} are \i{case sensitive}: after
- \c{%define foo bar}, only \c{foo} will expand to \c{bar}: \c{Foo} or
- \c{FOO} will not. By using \c{%idefine} instead of \c{%define} (the
- `i' stands for `insensitive') you can define all the case variants
- of a macro at once, so that \c{%idefine foo bar} would cause
- \c{foo}, \c{Foo}, \c{FOO}, \c{fOO} and so on all to expand to
- \c{bar}.
- There is a mechanism which detects when a macro call has occurred as
- a result of a previous expansion of the same macro, to guard against
- \i{circular references} and infinite loops. If this happens, the
- preprocessor will only expand the first occurrence of the macro.
- Hence, if you code
- \c %define a(x) 1+a(x)
- \c
- \c mov ax,a(3)
- the macro \c{a(3)} will expand once, becoming \c{1+a(3)}, and will
- then expand no further. This behaviour can be useful: see \k{32c}
- for an example of its use.
- You can \I{overloading, single-line macros}overload single-line
- macros: if you write
- \c %define foo(x) 1+x
- \c %define foo(x,y) 1+x*y
- the preprocessor will be able to handle both types of macro call,
- by counting the parameters you pass; so \c{foo(3)} will become
- \c{1+3} whereas \c{foo(ebx,2)} will become \c{1+ebx*2}. However, if
- you define
- \c %define foo bar
- then no other definition of \c{foo} will be accepted: a macro with
- no parameters prohibits the definition of the same name as a macro
- \e{with} parameters, and vice versa.
- This doesn't prevent single-line macros being \e{redefined}: you can
- perfectly well define a macro with
- \c %define foo bar
- and then re-define it later in the same source file with
- \c %define foo baz
- Then everywhere the macro \c{foo} is invoked, it will be expanded
- according to the most recent definition. This is particularly useful
- when defining single-line macros with \c{%assign} (see \k{assign}).
- You can \i{pre-define} single-line macros using the `-d' option on
- the NASM command line: see \k{opt-d}.
- \S{xdefine} Resolving \c{%define}: \I\c{%ixdefine}\i\c{%xdefine}
- To have a reference to an embedded single-line macro resolved at the
- time that the embedding macro is \e{defined}, as opposed to when the
- embedding macro is \e{expanded}, you need a different mechanism to the
- one offered by \c{%define}. The solution is to use \c{%xdefine}, or
- it's \I{case sensitive}case-insensitive counterpart \c{%ixdefine}.
- Suppose you have the following code:
- \c %define isTrue 1
- \c %define isFalse isTrue
- \c %define isTrue 0
- \c
- \c val1: db isFalse
- \c
- \c %define isTrue 1
- \c
- \c val2: db isFalse
- In this case, \c{val1} is equal to 0, and \c{val2} is equal to 1.
- This is because, when a single-line macro is defined using
- \c{%define}, it is expanded only when it is called. As \c{isFalse}
- expands to \c{isTrue}, the expansion will be the current value of
- \c{isTrue}. The first time it is called that is 0, and the second
- time it is 1.
- If you wanted \c{isFalse} to expand to the value assigned to the
- embedded macro \c{isTrue} at the time that \c{isFalse} was defined,
- you need to change the above code to use \c{%xdefine}.
- \c %xdefine isTrue 1
- \c %xdefine isFalse isTrue
- \c %xdefine isTrue 0
- \c
- \c val1: db isFalse
- \c
- \c %xdefine isTrue 1
- \c
- \c val2: db isFalse
- Now, each time that \c{isFalse} is called, it expands to 1,
- as that is what the embedded macro \c{isTrue} expanded to at
- the time that \c{isFalse} was defined.
- \S{indmacro} \i{Macro Indirection}: \I\c{%[}\c{%[...]}
- The \c{%[...]} construct can be used to expand macros in contexts
- where macro expansion would otherwise not occur, including in the
- names other macros. For example, if you have a set of macros named
- \c{Foo16}, \c{Foo32} and \c{Foo64}, you could write:
- \c mov ax,Foo%[__BITS__] ; The Foo value
- to use the builtin macro \c{__BITS__} (see \k{bitsm}) to automatically
- select between them. Similarly, the two statements:
- \c %xdefine Bar Quux ; Expands due to %xdefine
- \c %define Bar %[Quux] ; Expands due to %[...]
- have, in fact, exactly the same effect.
- \c{%[...]} concatenates to adjacent tokens in the same way that
- multi-line macro parameters do, see \k{concat} for details.
- \S{concat%+} Concatenating Single Line Macro Tokens: \i\c{%+}
- Individual tokens in single line macros can be concatenated, to produce
- longer tokens for later processing. This can be useful if there are
- several similar macros that perform similar functions.
- Please note that a space is required after \c{%+}, in order to
- disambiguate it from the syntax \c{%+1} used in multiline macros.
- As an example, consider the following:
- \c %define BDASTART 400h ; Start of BIOS data area
- \c struc tBIOSDA ; its structure
- \c .COM1addr RESW 1
- \c .COM2addr RESW 1
- \c ; ..and so on
- \c endstruc
- Now, if we need to access the elements of tBIOSDA in different places,
- we can end up with:
- \c mov ax,BDASTART + tBIOSDA.COM1addr
- \c mov bx,BDASTART + tBIOSDA.COM2addr
- This will become pretty ugly (and tedious) if used in many places, and
- can be reduced in size significantly by using the following macro:
- \c ; Macro to access BIOS variables by their names (from tBDA):
- \c %define BDA(x) BDASTART + tBIOSDA. %+ x
- Now the above code can be written as:
- \c mov ax,BDA(COM1addr)
- \c mov bx,BDA(COM2addr)
- Using this feature, we can simplify references to a lot of macros (and,
- in turn, reduce typing errors).
- \S{selfref%?} The Macro Name Itself: \i\c{%?} and \i\c{%??}
- The special symbols \c{%?} and \c{%??} can be used to reference the
- macro name itself inside a macro expansion, this is supported for both
- single-and multi-line macros. \c{%?} refers to the macro name as
- \e{invoked}, whereas \c{%??} refers to the macro name as
- \e{declared}. The two are always the same for case-sensitive
- macros, but for case-insensitive macros, they can differ.
- For example:
- \c %idefine Foo mov %?,%??
- \c
- \c foo
- \c FOO
- will expand to:
- \c mov foo,Foo
- \c mov FOO,Foo
- The sequence:
- \c %idefine keyword $%?
- can be used to make a keyword "disappear", for example in case a new
- instruction has been used as a label in older code. For example:
- \c %idefine pause $%? ; Hide the PAUSE instruction
- \S{undef} Undefining Single-Line Macros: \i\c{%undef}
- Single-line macros can be removed with the \c{%undef} directive. For
- example, the following sequence:
- \c %define foo bar
- \c %undef foo
- \c
- \c mov eax, foo
- will expand to the instruction \c{mov eax, foo}, since after
- \c{%undef} the macro \c{foo} is no longer defined.
- Macros that would otherwise be pre-defined can be undefined on the
- command-line using the `-u' option on the NASM command line: see
- \k{opt-u}.
- \S{assign} \i{Preprocessor Variables}: \i\c{%assign}
- An alternative way to define single-line macros is by means of the
- \c{%assign} command (and its \I{case sensitive}case-insensitive
- counterpart \i\c{%iassign}, which differs from \c{%assign} in
- exactly the same way that \c{%idefine} differs from \c{%define}).
- \c{%assign} is used to define single-line macros which take no
- parameters and have a numeric value. This value can be specified in
- the form of an expression, and it will be evaluated once, when the
- \c{%assign} directive is processed.
- Like \c{%define}, macros defined using \c{%assign} can be re-defined
- later, so you can do things like
- \c %assign i i+1
- to increment the numeric value of a macro.
- \c{%assign} is useful for controlling the termination of \c{%rep}
- preprocessor loops: see \k{rep} for an example of this. Another
- use for \c{%assign} is given in \k{16c} and \k{32c}.
- The expression passed to \c{%assign} is a \i{critical expression}
- (see \k{crit}), and must also evaluate to a pure number (rather than
- a relocatable reference such as a code or data address, or anything
- involving a register).
- \S{defstr} Defining Strings: \I\c{%idefstr}\i\c{%defstr}
- \c{%defstr}, and its case-insensitive counterpart \c{%idefstr}, define
- or redefine a single-line macro without parameters but converts the
- entire right-hand side, after macro expansion, to a quoted string
- before definition.
- For example:
- \c %defstr test TEST
- is equivalent to
- \c %define test 'TEST'
- This can be used, for example, with the \c{%!} construct (see
- \k{getenv}):
- \c %defstr PATH %!PATH ; The operating system PATH variable
- \S{deftok} Defining Tokens: \I\c{%ideftok}\i\c{%deftok}
- \c{%deftok}, and its case-insensitive counterpart \c{%ideftok}, define
- or redefine a single-line macro without parameters but converts the
- second parameter, after string conversion, to a sequence of tokens.
- For example:
- \c %deftok test 'TEST'
- is equivalent to
- \c %define test TEST
- \H{strlen} \i{String Manipulation in Macros}
- It's often useful to be able to handle strings in macros. NASM
- supports a few simple string handling macro operators from which
- more complex operations can be constructed.
- All the string operators define or redefine a value (either a string
- or a numeric value) to a single-line macro. When producing a string
- value, it may change the style of quoting of the input string or
- strings, and possibly use \c{\\}-escapes inside \c{`}-quoted strings.
- \S{strcat} \i{Concatenating Strings}: \i\c{%strcat}
- The \c{%strcat} operator concatenates quoted strings and assign them to
- a single-line macro.
- For example:
- \c %strcat alpha "Alpha: ", '12" screen'
- ... would assign the value \c{'Alpha: 12" screen'} to \c{alpha}.
- Similarly:
- \c %strcat beta '"foo"\', "'bar'"
- ... would assign the value \c{`"foo"\\\\'bar'`} to \c{beta}.
- The use of commas to separate strings is permitted but optional.
- \S{strlen} \i{String Length}: \i\c{%strlen}
- The \c{%strlen} operator assigns the length of a string to a macro.
- For example:
- \c %strlen charcnt 'my string'
- In this example, \c{charcnt} would receive the value 9, just as
- if an \c{%assign} had been used. In this example, \c{'my string'}
- was a literal string but it could also have been a single-line
- macro that expands to a string, as in the following example:
- \c %define sometext 'my string'
- \c %strlen charcnt sometext
- As in the first case, this would result in \c{charcnt} being
- assigned the value of 9.
- \S{substr} \i{Extracting Substrings}: \i\c{%substr}
- Individual letters or substrings in strings can be extracted using the
- \c{%substr} operator. An example of its use is probably more useful
- than the description:
- \c %substr mychar 'xyzw' 1 ; equivalent to %define mychar 'x'
- \c %substr mychar 'xyzw' 2 ; equivalent to %define mychar 'y'
- \c %substr mychar 'xyzw' 3 ; equivalent to %define mychar 'z'
- \c %substr mychar 'xyzw' 2,2 ; equivalent to %define mychar 'yz'
- \c %substr mychar 'xyzw' 2,-1 ; equivalent to %define mychar 'yzw'
- \c %substr mychar 'xyzw' 2,-2 ; equivalent to %define mychar 'yz'
- As with \c{%strlen} (see \k{strlen}), the first parameter is the
- single-line macro to be created and the second is the string. The
- third parameter specifies the first character to be selected, and the
- optional fourth parameter preceeded by comma) is the length. Note
- that the first index is 1, not 0 and the last index is equal to the
- value that \c{%strlen} would assign given the same string. Index
- values out of range result in an empty string. A negative length
- means "until N-1 characters before the end of string", i.e. \c{-1}
- means until end of string, \c{-2} until one character before, etc.
- \H{mlmacro} \i{Multi-Line Macros}: \I\c{%imacro}\i\c{%macro}
- Multi-line macros are much more like the type of macro seen in MASM
- and TASM: a multi-line macro definition in NASM looks something like
- this.
- \c %macro prologue 1
- \c
- \c push ebp
- \c mov ebp,esp
- \c sub esp,%1
- \c
- \c %endmacro
- This defines a C-like function prologue as a macro: so you would
- invoke the macro with a call such as
- \c myfunc: prologue 12
- which would expand to the three lines of code
- \c myfunc: push ebp
- \c mov ebp,esp
- \c sub esp,12
- The number \c{1} after the macro name in the \c{%macro} line defines
- the number of parameters the macro \c{prologue} expects to receive.
- The use of \c{%1} inside the macro definition refers to the first
- parameter to the macro call. With a macro taking more than one
- parameter, subsequent parameters would be referred to as \c{%2},
- \c{%3} and so on.
- Multi-line macros, like single-line macros, are \i{case-sensitive},
- unless you define them using the alternative directive \c{%imacro}.
- If you need to pass a comma as \e{part} of a parameter to a
- multi-line macro, you can do that by enclosing the entire parameter
- in \I{braces, around macro parameters}braces. So you could code
- things like
- \c %macro silly 2
- \c
- \c %2: db %1
- \c
- \c %endmacro
- \c
- \c silly 'a', letter_a ; letter_a: db 'a'
- \c silly 'ab', string_ab ; string_ab: db 'ab'
- \c silly {13,10}, crlf ; crlf: db 13,10
- \S{mlmacover} Overloading Multi-Line Macros\I{overloading, multi-line macros}
- As with single-line macros, multi-line macros can be overloaded by
- defining the same macro name several times with different numbers of
- parameters. This time, no exception is made for macros with no
- parameters at all. So you could define
- \c %macro prologue 0
- \c
- \c push ebp
- \c mov ebp,esp
- \c
- \c %endmacro
- to define an alternative form of the function prologue which
- allocates no local stack space.
- Sometimes, however, you might want to `overload' a machine
- instruction; for example, you might want to define
- \c %macro push 2
- \c
- \c push %1
- \c push %2
- \c
- \c %endmacro
- so that you could code
- \c push ebx ; this line is not a macro call
- \c push eax,ecx ; but this one is
- Ordinarily, NASM will give a warning for the first of the above two
- lines, since \c{push} is now defined to be a macro, and is being
- invoked with a number of parameters for which no definition has been
- given. The correct code will still be generated, but the assembler
- will give a warning. This warning can be disabled by the use of the
- \c{-w-macro-params} command-line option (see \k{opt-w}).
- \S{maclocal} \i{Macro-Local Labels}
- NASM allows you to define labels within a multi-line macro
- definition in such a way as to make them local to the macro call: so
- calling the same macro multiple times will use a different label
- each time. You do this by prefixing \i\c{%%} to the label name. So
- you can invent an instruction which executes a \c{RET} if the \c{Z}
- flag is set by doing this:
- \c %macro retz 0
- \c
- \c jnz %%skip
- \c ret
- \c %%skip:
- \c
- \c %endmacro
- You can call this macro as many times as you want, and every time
- you call it NASM will make up a different `real' name to substitute
- for the label \c{%%skip}. The names NASM invents are of the form
- \c{..@2345.skip}, where the number 2345 changes with every macro
- call. The \i\c{..@} prefix prevents macro-local labels from
- interfering with the local label mechanism, as described in
- \k{locallab}. You should avoid defining your own labels in this form
- (the \c{..@} prefix, then a number, then another period) in case
- they interfere with macro-local labels.
- \S{mlmacgre} \i{Greedy Macro Parameters}
- Occasionally it is useful to define a macro which lumps its entire
- command line into one parameter definition, possibly after
- extracting one or two smaller parameters from the front. An example
- might be a macro to write a text string to a file in MS-DOS, where
- you might want to be able to write
- \c writefile [filehandle],"hello, world",13,10
- NASM allows you to define the last parameter of a macro to be
- \e{greedy}, meaning that if you invoke the macro with more
- parameters than it expects, all the spare parameters get lumped into
- the last defined one along with the separating commas. So if you
- code:
- \c %macro writefile 2+
- \c
- \c jmp %%endstr
- \c %%str: db %2
- \c %%endstr:
- \c mov dx,%%str
- \c mov cx,%%endstr-%%str
- \c mov bx,%1
- \c mov ah,0x40
- \c int 0x21
- \c
- \c %endmacro
- then the example call to \c{writefile} above will work as expected:
- the text before the first comma, \c{[filehandle]}, is used as the
- first macro parameter and expanded when \c{%1} is referred to, and
- all the subsequent text is lumped into \c{%2} and placed after the
- \c{db}.
- The greedy nature of the macro is indicated to NASM by the use of
- the \I{+ modifier}\c{+} sign after the parameter count on the
- \c{%macro} line.
- If you define a greedy macro, you are effectively telling NASM how
- it should expand the macro given \e{any} number of parameters from
- the actual number specified up to infinity; in this case, for
- example, NASM now knows what to do when it sees a call to
- \c{writefile} with 2, 3, 4 or more parameters. NASM will take this
- into account when overloading macros, and will not allow you to
- define another form of \c{writefile} taking 4 parameters (for
- example).
- Of course, the above macro could have been implemented as a
- non-greedy macro, in which case the call to it would have had to
- look like
- \c writefile [filehandle], {"hello, world",13,10}
- NASM provides both mechanisms for putting \i{commas in macro
- parameters}, and you choose which one you prefer for each macro
- definition.
- See \k{sectmac} for a better way to write the above macro.
- \S{mlmacrange} \i{Macro Parameters Range}
- NASM allows you to expand parameters via special construction \c{%\{x:y\}}
- where \c{x} is the first parameter index and \c{y} is the last. Any index can
- be either negative or positive but must never be zero.
- For example
- \c %macro mpar 1-*
- \c db %{3:5}
- \c %endmacro
- \c
- \c mpar 1,2,3,4,5,6
- expands to \c{3,4,5} range.
- Even more, the parameters can be reversed so that
- \c %macro mpar 1-*
- \c db %{5:3}
- \c %endmacro
- \c
- \c mpar 1,2,3,4,5,6
- expands to \c{5,4,3} range.
- But even this is not the last. The parameters can be addressed via negative
- indices so NASM will count them reversed. The ones who know Python may see
- the analogue here.
- \c %macro mpar 1-*
- \c db %{-1:-3}
- \c %endmacro
- \c
- \c mpar 1,2,3,4,5,6
- expands to \c{6,5,4} range.
- Note that NASM uses \i{comma} to separate parameters being expanded.
- By the way, here is a trick - you might use the index \c{%{-1:-1}}
- which gives you the \i{last} argument passed to a macro.
- \S{mlmacdef} \i{Default Macro Parameters}
- NASM also allows you to define a multi-line macro with a \e{range}
- of allowable parameter counts. If you do this, you can specify
- defaults for \i{omitted parameters}. So, for example:
- \c %macro die 0-1 "Painful program death has occurred."
- \c
- \c writefile 2,%1
- \c mov ax,0x4c01
- \c int 0x21
- \c
- \c %endmacro
- This macro (which makes use of the \c{writefile} macro defined in
- \k{mlmacgre}) can be called with an explicit error message, which it
- will display on the error output stream before exiting, or it can be
- called with no parameters, in which case it will use the default
- error message supplied in the macro definition.
- In general, you supply a minimum and maximum number of parameters
- for a macro of this type; the minimum number of parameters are then
- required in the macro call, and then you provide defaults for the
- optional ones. So if a macro definition began with the line
- \c %macro foobar 1-3 eax,[ebx+2]
- then it could be called with between one and three parameters, and
- \c{%1} would always be taken from the macro call. \c{%2}, if not
- specified by the macro call, would default to \c{eax}, and \c{%3} if
- not specified would default to \c{[ebx+2]}.
- You can provide extra information to a macro by providing
- too many default parameters:
- \c %macro quux 1 something
- This will trigger a warning by default; see \k{opt-w} for
- more information.
- When \c{quux} is invoked, it receives not one but two parameters.
- \c{something} can be referred to as \c{%2}. The difference
- between passing \c{something} this way and writing \c{something}
- in the macro body is that with this way \c{something} is evaluated
- when the macro is defined, not when it is expanded.
- You may omit parameter defaults from the macro definition, in which
- case the parameter default is taken to be blank. This can be useful
- for macros which can take a variable number of parameters, since the
- \i\c{%0} token (see \k{percent0}) allows you to determine how many
- parameters were really passed to the macro call.
- This defaulting mechanism can be combined with the greedy-parameter
- mechanism; so the \c{die} macro above could be made more powerful,
- and more useful, by changing the first line of the definition to
- \c %macro die 0-1+ "Painful program death has occurred.",13,10
- The maximum parameter count can be infinite, denoted by \c{*}. In
- this case, of course, it is impossible to provide a \e{full} set of
- default parameters. Examples of this usage are shown in \k{rotate}.
- \S{percent0} \i\c{%0}: \I{counting macro parameters}Macro Parameter Counter
- The parameter reference \c{%0} will return a numeric constant giving the
- number of parameters received, that is, if \c{%0} is n then \c{%}n is the
- last parameter. \c{%0} is mostly useful for macros that can take a variable
- number of parameters. It can be used as an argument to \c{%rep}
- (see \k{rep}) in order to iterate through all the parameters of a macro.
- Examples are given in \k{rotate}.
- \S{percent00} \i\c{%00}: \I{label preceeding macro}Label Preceeding Macro
- \c{%00} will return the label preceeding the macro invocation, if any. The
- label must be on the same line as the macro invocation, may be a local label
- (see \k{locallab}), and need not end in a colon.
- \S{rotate} \i\c{%rotate}: \i{Rotating Macro Parameters}
- Unix shell programmers will be familiar with the \I{shift
- command}\c{shift} shell command, which allows the arguments passed
- to a shell script (referenced as \c{$1}, \c{$2} and so on) to be
- moved left by one place, so that the argument previously referenced
- as \c{$2} becomes available as \c{$1}, and the argument previously
- referenced as \c{$1} is no longer available at all.
- NASM provides a similar mechanism, in the form of \c{%rotate}. As
- its name suggests, it differs from the Unix \c{shift} in that no
- parameters are lost: parameters rotated off the left end of the
- argument list reappear on the right, and vice versa.
- \c{%rotate} is invoked with a single numeric argument (which may be
- an expression). The macro parameters are rotated to the left by that
- many places. If the argument to \c{%rotate} is negative, the macro
- parameters are rotated to the right.
- \I{iterating over macro parameters}So a pair of macros to save and
- restore a set of registers might work as follows:
- \c %macro multipush 1-*
- \c
- \c %rep %0
- \c push %1
- \c %rotate 1
- \c %endrep
- \c
- \c %endmacro
- This macro invokes the \c{PUSH} instruction on each of its arguments
- in turn, from left to right. It begins by pushing its first
- argument, \c{%1}, then invokes \c{%rotate} to move all the arguments
- one place to the left, so that the original second argument is now
- available as \c{%1}. Repeating this procedure as many times as there
- were arguments (achieved by supplying \c{%0} as the argument to
- \c{%rep}) causes each argument in turn to be pushed.
- Note also the use of \c{*} as the maximum parameter count,
- indicating that there is no upper limit on the number of parameters
- you may supply to the \i\c{multipush} macro.
- It would be convenient, when using this macro, to have a \c{POP}
- equivalent, which \e{didn't} require the arguments to be given in
- reverse order. Ideally, you would write the \c{multipush} macro
- call, then cut-and-paste the line to where the pop needed to be
- done, and change the name of the called macro to \c{multipop}, and
- the macro would take care of popping the registers in the opposite
- order from the one in which they were pushed.
- This can be done by the following definition:
- \c %macro multipop 1-*
- \c
- \c %rep %0
- \c %rotate -1
- \c pop %1
- \c %endrep
- \c
- \c %endmacro
- This macro begins by rotating its arguments one place to the
- \e{right}, so that the original \e{last} argument appears as \c{%1}.
- This is then popped, and the arguments are rotated right again, so
- the second-to-last argument becomes \c{%1}. Thus the arguments are
- iterated through in reverse order.
- \S{concat} \i{Concatenating Macro Parameters}
- NASM can concatenate macro parameters and macro indirection constructs
- on to other text surrounding them. This allows you to declare a family
- of symbols, for example, in a macro definition. If, for example, you
- wanted to generate a table of key codes along with offsets into the
- table, you could code something like
- \c %macro keytab_entry 2
- \c
- \c keypos%1 equ $-keytab
- \c db %2
- \c
- \c %endmacro
- \c
- \c keytab:
- \c keytab_entry F1,128+1
- \c keytab_entry F2,128+2
- \c keytab_entry Return,13
- which would expand to
- \c keytab:
- \c keyposF1 equ $-keytab
- \c db 128+1
- \c keyposF2 equ $-keytab
- \c db 128+2
- \c keyposReturn equ $-keytab
- \c db 13
- You can just as easily concatenate text on to the other end of a
- macro parameter, by writing \c{%1foo}.
- If you need to append a \e{digit} to a macro parameter, for example
- defining labels \c{foo1} and \c{foo2} when passed the parameter
- \c{foo}, you can't code \c{%11} because that would be taken as the
- eleventh macro parameter. Instead, you must code
- \I{braces, after % sign}\c{%\{1\}1}, which will separate the first
- \c{1} (giving the number of the macro parameter) from the second
- (literal text to be concatenated to the parameter).
- This concatenation can also be applied to other preprocessor in-line
- objects, such as macro-local labels (\k{maclocal}) and context-local
- labels (\k{ctxlocal}). In all cases, ambiguities in syntax can be
- resolved by enclosing everything after the \c{%} sign and before the
- literal text in braces: so \c{%\{%foo\}bar} concatenates the text
- \c{bar} to the end of the real name of the macro-local label
- \c{%%foo}. (This is unnecessary, since the form NASM uses for the
- real names of macro-local labels means that the two usages
- \c{%\{%foo\}bar} and \c{%%foobar} would both expand to the same
- thing anyway; nevertheless, the capability is there.)
- The single-line macro indirection construct, \c{%[...]}
- (\k{indmacro}), behaves the same way as macro parameters for the
- purpose of concatenation.
- See also the \c{%+} operator, \k{concat%+}.
- \S{mlmaccc} \i{Condition Codes as Macro Parameters}
- NASM can give special treatment to a macro parameter which contains
- a condition code. For a start, you can refer to the macro parameter
- \c{%1} by means of the alternative syntax \i\c{%+1}, which informs
- NASM that this macro parameter is supposed to contain a condition
- code, and will cause the preprocessor to report an error message if
- the macro is called with a parameter which is \e{not} a valid
- condition code.
- Far more usefully, though, you can refer to the macro parameter by
- means of \i\c{%-1}, which NASM will expand as the \e{inverse}
- condition code. So the \c{retz} macro defined in \k{maclocal} can be
- replaced by a general \i{conditional-return macro} like this:
- \c %macro retc 1
- \c
- \c j%-1 %%skip
- \c ret
- \c %%skip:
- \c
- \c %endmacro
- This macro can now be invoked using calls like \c{retc ne}, which
- will cause the conditional-jump instruction in the macro expansion
- to come out as \c{JE}, or \c{retc po} which will make the jump a
- \c{JPE}.
- The \c{%+1} macro-parameter reference is quite happy to interpret
- the arguments \c{CXZ} and \c{ECXZ} as valid condition codes;
- however, \c{%-1} will report an error if passed either of these,
- because no inverse condition code exists.
- \S{nolist} \i{Disabling Listing Expansion}\I\c{.nolist}
- When NASM is generating a listing file from your program, it will
- generally expand multi-line macros by means of writing the macro
- call and then listing each line of the expansion. This allows you to
- see which instructions in the macro expansion are generating what
- code; however, for some macros this clutters the listing up
- unnecessarily.
- NASM therefore provides the \c{.nolist} qualifier, which you can
- include in a macro definition to inhibit the expansion of the macro
- in the listing file. The \c{.nolist} qualifier comes directly after
- the number of parameters, like this:
- \c %macro foo 1.nolist
- Or like this:
- \c %macro bar 1-5+.nolist a,b,c,d,e,f,g,h
- \S{unmacro} Undefining Multi-Line Macros: \i\c{%unmacro}
- Multi-line macros can be removed with the \c{%unmacro} directive.
- Unlike the \c{%undef} directive, however, \c{%unmacro} takes an
- argument specification, and will only remove \i{exact matches} with
- that argument specification.
- For example:
- \c %macro foo 1-3
- \c ; Do something
- \c %endmacro
- \c %unmacro foo 1-3
- removes the previously defined macro \c{foo}, but
- \c %macro bar 1-3
- \c ; Do something
- \c %endmacro
- \c %unmacro bar 1
- does \e{not} remove the macro \c{bar}, since the argument
- specification does not match exactly.
- \H{condasm} \i{Conditional Assembly}\I\c{%if}
- Similarly to the C preprocessor, NASM allows sections of a source
- file to be assembled only if certain conditions are met. The general
- syntax of this feature looks like this:
- \c %if<condition>
- \c ; some code which only appears if <condition> is met
- \c %elif<condition2>
- \c ; only appears if <condition> is not met but <condition2> is
- \c %else
- \c ; this appears if neither <condition> nor <condition2> was met
- \c %endif
- The inverse forms \i\c{%ifn} and \i\c{%elifn} are also supported.
- The \i\c{%else} clause is optional, as is the \i\c{%elif} clause.
- You can have more than one \c{%elif} clause as well.
- There are a number of variants of the \c{%if} directive. Each has its
- corresponding \c{%elif}, \c{%ifn}, and \c{%elifn} directives; for
- example, the equivalents to the \c{%ifdef} directive are \c{%elifdef},
- \c{%ifndef}, and \c{%elifndef}.
- \S{ifdef} \i\c{%ifdef}: Testing Single-Line Macro Existence\I{testing,
- single-line macro existence}
- Beginning a conditional-assembly block with the line \c{%ifdef
- MACRO} will assemble the subsequent code if, and only if, a
- single-line macro called \c{MACRO} is defined. If not, then the
- \c{%elif} and \c{%else} blocks (if any) will be processed instead.
- For example, when debugging a program, you might want to write code
- such as
- \c ; perform some function
- \c %ifdef DEBUG
- \c writefile 2,"Function performed successfully",13,10
- \c %endif
- \c ; go and do something else
- Then you could use the command-line option \c{-dDEBUG} to create a
- version of the program which produced debugging messages, and remove
- the option to generate the final release version of the program.
- You can test for a macro \e{not} being defined by using
- \i\c{%ifndef} instead of \c{%ifdef}. You can also test for macro
- definitions in \c{%elif} blocks by using \i\c{%elifdef} and
- \i\c{%elifndef}.
- \S{ifmacro} \i\c{%ifmacro}: Testing Multi-Line Macro
- Existence\I{testing, multi-line macro existence}
- The \c{%ifmacro} directive operates in the same way as the \c{%ifdef}
- directive, except that it checks for the existence of a multi-line macro.
- For example, you may be working with a large project and not have control
- over the macros in a library. You may want to create a macro with one
- name if it doesn't already exist, and another name if one with that name
- does exist.
- The \c{%ifmacro} is considered true if defining a macro with the given name
- and number of arguments would cause a definitions conflict. For example:
- \c %ifmacro MyMacro 1-3
- \c
- \c %error "MyMacro 1-3" causes a conflict with an existing macro.
- \c
- \c %else
- \c
- \c %macro MyMacro 1-3
- \c
- \c ; insert code to define the macro
- \c
- \c %endmacro
- \c
- \c %endif
- This will create the macro "MyMacro 1-3" if no macro already exists which
- would conflict with it, and emits a warning if there would be a definition
- conflict.
- You can test for the macro not existing by using the \i\c{%ifnmacro} instead
- of \c{%ifmacro}. Additional tests can be performed in \c{%elif} blocks by using
- \i\c{%elifmacro} and \i\c{%elifnmacro}.
- \S{ifctx} \i\c{%ifctx}: Testing the Context Stack\I{testing, context
- stack}
- The conditional-assembly construct \c{%ifctx} will cause the
- subsequent code to be assembled if and only if the top context on
- the preprocessor's context stack has the same name as one of the arguments.
- As with \c{%ifdef}, the inverse and \c{%elif} forms \i\c{%ifnctx},
- \i\c{%elifctx} and \i\c{%elifnctx} are also supported.
- For more details of the context stack, see \k{ctxstack}. For a
- sample use of \c{%ifctx}, see \k{blockif}.
- \S{if} \i\c{%if}: Testing Arbitrary Numeric Expressions\I{testing,
- arbitrary numeric expressions}
- The conditional-assembly construct \c{%if expr} will cause the
- subsequent code to be assembled if and only if the value of the
- numeric expression \c{expr} is non-zero. An example of the use of
- this feature is in deciding when to break out of a \c{%rep}
- preprocessor loop: see \k{rep} for a detailed example.
- The expression given to \c{%if}, and its counterpart \i\c{%elif}, is
- a critical expression (see \k{crit}).
- \c{%if} extends the normal NASM expression syntax, by providing a
- set of \i{relational operators} which are not normally available in
- expressions. The operators \i\c{=}, \i\c{<}, \i\c{>}, \i\c{<=},
- \i\c{>=} and \i\c{<>} test equality, less-than, greater-than,
- less-or-equal, greater-or-equal and not-equal respectively. The
- C-like forms \i\c{==} and \i\c{!=} are supported as alternative
- forms of \c{=} and \c{<>}. In addition, low-priority logical
- operators \i\c{&&}, \i\c{^^} and \i\c{||} are provided, supplying
- \i{logical AND}, \i{logical XOR} and \i{logical OR}. These work like
- the C logical operators (although C has no logical XOR), in that
- they always return either 0 or 1, and treat any non-zero input as 1
- (so that \c{^^}, for example, returns 1 if exactly one of its inputs
- is zero, and 0 otherwise). The relational operators also return 1
- for true and 0 for false.
- Like other \c{%if} constructs, \c{%if} has a counterpart
- \i\c{%elif}, and negative forms \i\c{%ifn} and \i\c{%elifn}.
- \S{ifidn} \i\c{%ifidn} and \i\c{%ifidni}: Testing Exact Text
- Identity\I{testing, exact text identity}
- The construct \c{%ifidn text1,text2} will cause the subsequent code
- to be assembled if and only if \c{text1} and \c{text2}, after
- expanding single-line macros, are identical pieces of text.
- Differences in white space are not counted.
- \c{%ifidni} is similar to \c{%ifidn}, but is \i{case-insensitive}.
- For example, the following macro pushes a register or number on the
- stack, and allows you to treat \c{IP} as a real register:
- \c %macro pushparam 1
- \c
- \c %ifidni %1,ip
- \c call %%label
- \c %%label:
- \c %else
- \c push %1
- \c %endif
- \c
- \c %endmacro
- Like other \c{%if} constructs, \c{%ifidn} has a counterpart
- \i\c{%elifidn}, and negative forms \i\c{%ifnidn} and \i\c{%elifnidn}.
- Similarly, \c{%ifidni} has counterparts \i\c{%elifidni},
- \i\c{%ifnidni} and \i\c{%elifnidni}.
- \S{iftyp} \i\c{%ifid}, \i\c{%ifnum}, \i\c{%ifstr}: Testing Token
- Types\I{testing, token types}
- Some macros will want to perform different tasks depending on
- whether they are passed a number, a string, or an identifier. For
- example, a string output macro might want to be able to cope with
- being passed either a string constant or a pointer to an existing
- string.
- The conditional assembly construct \c{%ifid}, taking one parameter
- (which may be blank), assembles the subsequent code if and only if
- the first token in the parameter exists and is an identifier.
- \c{%ifnum} works similarly, but tests for the token being a numeric
- constant; \c{%ifstr} tests for it being a string.
- For example, the \c{writefile} macro defined in \k{mlmacgre} can be
- extended to take advantage of \c{%ifstr} in the following fashion:
- \c %macro writefile 2-3+
- \c
- \c %ifstr %2
- \c jmp %%endstr
- \c %if %0 = 3
- \c %%str: db %2,%3
- \c %else
- \c %%str: db %2
- \c %endif
- \c %%endstr: mov dx,%%str
- \c mov cx,%%endstr-%%str
- \c %else
- \c mov dx,%2
- \c mov cx,%3
- \c %endif
- \c mov bx,%1
- \c mov ah,0x40
- \c int 0x21
- \c
- \c %endmacro
- Then the \c{writefile} macro can cope with being called in either of
- the following two ways:
- \c writefile [file], strpointer, length
- \c writefile [file], "hello", 13, 10
- In the first, \c{strpointer} is used as the address of an
- already-declared string, and \c{length} is used as its length; in
- the second, a string is given to the macro, which therefore declares
- it itself and works out the address and length for itself.
- Note the use of \c{%if} inside the \c{%ifstr}: this is to detect
- whether the macro was passed two arguments (so the string would be a
- single string constant, and \c{db %2} would be adequate) or more (in
- which case, all but the first two would be lumped together into
- \c{%3}, and \c{db %2,%3} would be required).
- The usual \I\c{%elifid}\I\c{%elifnum}\I\c{%elifstr}\c{%elif}...,
- \I\c{%ifnid}\I\c{%ifnnum}\I\c{%ifnstr}\c{%ifn}..., and
- \I\c{%elifnid}\I\c{%elifnnum}\I\c{%elifnstr}\c{%elifn}... versions
- exist for each of \c{%ifid}, \c{%ifnum} and \c{%ifstr}.
- \S{iftoken} \i\c{%iftoken}: Test for a Single Token
- Some macros will want to do different things depending on if it is
- passed a single token (e.g. paste it to something else using \c{%+})
- versus a multi-token sequence.
- The conditional assembly construct \c{%iftoken} assembles the
- subsequent code if and only if the expanded parameters consist of
- exactly one token, possibly surrounded by whitespace.
- For example:
- \c %iftoken 1
- will assemble the subsequent code, but
- \c %iftoken -1
- will not, since \c{-1} contains two tokens: the unary minus operator
- \c{-}, and the number \c{1}.
- The usual \i\c{%eliftoken}, \i\c\{%ifntoken}, and \i\c{%elifntoken}
- variants are also provided.
- \S{ifempty} \i\c{%ifempty}: Test for Empty Expansion
- The conditional assembly construct \c{%ifempty} assembles the
- subsequent code if and only if the expanded parameters do not contain
- any tokens at all, whitespace excepted.
- The usual \i\c{%elifempty}, \i\c\{%ifnempty}, and \i\c{%elifnempty}
- variants are also provided.
- \S{ifenv} \i\c{%ifenv}: Test If Environment Variable Exists
- The conditional assembly construct \c{%ifenv} assembles the
- subsequent code if and only if the environment variable referenced by
- the \c{%!}\e{variable} directive exists.
- The usual \i\c{%elifenv}, \i\c\{%ifnenv}, and \i\c{%elifnenv}
- variants are also provided.
- Just as for \c{%!}\e{variable} the argument should be written as a
- string if it contains characters that would not be legal in an
- identifier. See \k{getenv}.
- \H{rep} \i{Preprocessor Loops}\I{repeating code}: \i\c{%rep}
- NASM's \c{TIMES} prefix, though useful, cannot be used to invoke a
- multi-line macro multiple times, because it is processed by NASM
- after macros have already been expanded. Therefore NASM provides
- another form of loop, this time at the preprocessor level: \c{%rep}.
- The directives \c{%rep} and \i\c{%endrep} (\c{%rep} takes a numeric
- argument, which can be an expression; \c{%endrep} takes no
- arguments) can be used to enclose a chunk of code, which is then
- replicated as many times as specified by the preprocessor:
- \c %assign i 0
- \c %rep 64
- \c inc word [table+2*i]
- \c %assign i i+1
- \c %endrep
- This will generate a sequence of 64 \c{INC} instructions,
- incrementing every word of memory from \c{[table]} to
- \c{[table+126]}.
- For more complex termination conditions, or to break out of a repeat
- loop part way along, you can use the \i\c{%exitrep} directive to
- terminate the loop, like this:
- \c fibonacci:
- \c %assign i 0
- \c %assign j 1
- \c %rep 100
- \c %if j > 65535
- \c %exitrep
- \c %endif
- \c dw j
- \c %assign k j+i
- \c %assign i j
- \c %assign j k
- \c %endrep
- \c
- \c fib_number equ ($-fibonacci)/2
- This produces a list of all the Fibonacci numbers that will fit in
- 16 bits. Note that a maximum repeat count must still be given to
- \c{%rep}. This is to prevent the possibility of NASM getting into an
- infinite loop in the preprocessor, which (on multitasking or
- multi-user systems) would typically cause all the system memory to
- be gradually used up and other applications to start crashing.
- Note a maximum repeat count is limited by 62 bit number, though it
- is hardly possible that you ever need anything bigger.
- \H{files} Source Files and Dependencies
- These commands allow you to split your sources into multiple files.
- \S{include} \i\c{%include}: \i{Including Other Files}
- Using, once again, a very similar syntax to the C preprocessor,
- NASM's preprocessor lets you include other source files into your
- code. This is done by the use of the \i\c{%include} directive:
- \c %include "macros.mac"
- will include the contents of the file \c{macros.mac} into the source
- file containing the \c{%include} directive.
- Include files are \I{searching for include files}searched for in the
- current directory (the directory you're in when you run NASM, as
- opposed to the location of the NASM executable or the location of
- the source file), plus any directories specified on the NASM command
- line using the \c{-i} option.
- The standard C idiom for preventing a file being included more than
- once is just as applicable in NASM: if the file \c{macros.mac} has
- the form
- \c %ifndef MACROS_MAC
- \c %define MACROS_MAC
- \c ; now define some macros
- \c %endif
- then including the file more than once will not cause errors,
- because the second time the file is included nothing will happen
- because the macro \c{MACROS_MAC} will already be defined.
- You can force a file to be included even if there is no \c{%include}
- directive that explicitly includes it, by using the \i\c{-p} option
- on the NASM command line (see \k{opt-p}).
- \S{pathsearch} \i\c{%pathsearch}: Search the Include Path
- The \c{%pathsearch} directive takes a single-line macro name and a
- filename, and declare or redefines the specified single-line macro to
- be the include-path-resolved version of the filename, if the file
- exists (otherwise, it is passed unchanged.)
- For example,
- \c %pathsearch MyFoo "foo.bin"
- ... with \c{-Ibins/} in the include path may end up defining the macro
- \c{MyFoo} to be \c{"bins/foo.bin"}.
- \S{depend} \i\c{%depend}: Add Dependent Files
- The \c{%depend} directive takes a filename and adds it to the list of
- files to be emitted as dependency generation when the \c{-M} options
- and its relatives (see \k{opt-M}) are used. It produces no output.
- This is generally used in conjunction with \c{%pathsearch}. For
- example, a simplified version of the standard macro wrapper for the
- \c{INCBIN} directive looks like:
- \c %imacro incbin 1-2+ 0
- \c %pathsearch dep %1
- \c %depend dep
- \c incbin dep,%2
- \c %endmacro
- This first resolves the location of the file into the macro \c{dep},
- then adds it to the dependency lists, and finally issues the
- assembler-level \c{INCBIN} directive.
- \S{use} \i\c{%use}: Include Standard Macro Package
- The \c{%use} directive is similar to \c{%include}, but rather than
- including the contents of a file, it includes a named standard macro
- package. The standard macro packages are part of NASM, and are
- described in \k{macropkg}.
- Unlike the \c{%include} directive, package names for the \c{%use}
- directive do not require quotes, but quotes are permitted. In NASM
- 2.04 and 2.05 the unquoted form would be macro-expanded; this is no
- longer true. Thus, the following lines are equivalent:
- \c %use altreg
- \c %use 'altreg'
- Standard macro packages are protected from multiple inclusion. When a
- standard macro package is used, a testable single-line macro of the
- form \c{__USE_}\e{package}\c{__} is also defined, see \k{use_def}.
- \H{ctxstack} The \i{Context Stack}
- Having labels that are local to a macro definition is sometimes not
- quite powerful enough: sometimes you want to be able to share labels
- between several macro calls. An example might be a \c{REPEAT} ...
- \c{UNTIL} loop, in which the expansion of the \c{REPEAT} macro
- would need to be able to refer to a label which the \c{UNTIL} macro
- had defined. However, for such a macro you would also want to be
- able to nest these loops.
- NASM provides this level of power by means of a \e{context stack}.
- The preprocessor maintains a stack of \e{contexts}, each of which is
- characterized by a name. You add a new context to the stack using
- the \i\c{%push} directive, and remove one using \i\c{%pop}. You can
- define labels that are local to a particular context on the stack.
- \S{pushpop} \i\c{%push} and \i\c{%pop}: \I{creating
- contexts}\I{removing contexts}Creating and Removing Contexts
- The \c{%push} directive is used to create a new context and place it
- on the top of the context stack. \c{%push} takes an optional argument,
- which is the name of the context. For example:
- \c %push foobar
- This pushes a new context called \c{foobar} on the stack. You can have
- several contexts on the stack with the same name: they can still be
- distinguished. If no name is given, the context is unnamed (this is
- normally used when both the \c{%push} and the \c{%pop} are inside a
- single macro definition.)
- The directive \c{%pop}, taking one optional argument, removes the top
- context from the context stack and destroys it, along with any
- labels associated with it. If an argument is given, it must match the
- name of the current context, otherwise it will issue an error.
- \S{ctxlocal} \i{Context-Local Labels}
- Just as the usage \c{%%foo} defines a label which is local to the
- particular macro call in which it is used, the usage \I{%$}\c{%$foo}
- is used to define a label which is local to the context on the top
- of the context stack. So the \c{REPEAT} and \c{UNTIL} example given
- above could be implemented by means of:
- \c %macro repeat 0
- \c
- \c %push repeat
- \c %$begin:
- \c
- \c %endmacro
- \c
- \c %macro until 1
- \c
- \c j%-1 %$begin
- \c %pop
- \c
- \c %endmacro
- and invoked by means of, for example,
- \c mov cx,string
- \c repeat
- \c add cx,3
- \c scasb
- \c until e
- which would scan every fourth byte of a string in search of the byte
- in \c{AL}.
- If you need to define, or access, labels local to the context
- \e{below} the top one on the stack, you can use \I{%$$}\c{%$$foo}, or
- \c{%$$$foo} for the context below that, and so on.
- \S{ctxdefine} \i{Context-Local Single-Line Macros}
- NASM also allows you to define single-line macros which are local to
- a particular context, in just the same way:
- \c %define %$localmac 3
- will define the single-line macro \c{%$localmac} to be local to the
- top context on the stack. Of course, after a subsequent \c{%push},
- it can then still be accessed by the name \c{%$$localmac}.
- \S{ctxfallthrough} \i{Context Fall-Through Lookup} \e{(deprecated)}
- Context fall-through lookup (automatic searching of outer contexts)
- is a feature that was added in NASM version 0.98.03. Unfortunately,
- this feature is unintuitive and can result in buggy code that would
- have otherwise been prevented by NASM's error reporting. As a result,
- this feature has been \e{deprecated}. NASM version 2.09 will issue a
- warning when usage of this \e{deprecated} feature is detected. Starting
- with NASM version 2.10, usage of this \e{deprecated} feature will simply
- result in an \e{expression syntax error}.
- An example usage of this \e{deprecated} feature follows:
- \c %macro ctxthru 0
- \c %push ctx1
- \c %assign %$external 1
- \c %push ctx2
- \c %assign %$internal 1
- \c mov eax, %$external
- \c mov eax, %$internal
- \c %pop
- \c %pop
- \c %endmacro
- As demonstrated, \c{%$external} is being defined in the \c{ctx1}
- context and referenced within the \c{ctx2} context. With context
- fall-through lookup, referencing an undefined context-local macro
- like this implicitly searches through all outer contexts until a match
- is made or isn't found in any context. As a result, \c{%$external}
- referenced within the \c{ctx2} context would implicitly use \c{%$external}
- as defined in \c{ctx1}. Most people would expect NASM to issue an error in
- this situation because \c{%$external} was never defined within \c{ctx2} and also
- isn't qualified with the proper context depth, \c{%$$external}.
- Here is a revision of the above example with proper context depth:
- \c %macro ctxthru 0
- \c %push ctx1
- \c %assign %$external 1
- \c %push ctx2
- \c %assign %$internal 1
- \c mov eax, %$$external
- \c mov eax, %$internal
- \c %pop
- \c %pop
- \c %endmacro
- As demonstrated, \c{%$external} is still being defined in the \c{ctx1}
- context and referenced within the \c{ctx2} context. However, the
- reference to \c{%$external} within \c{ctx2} has been fully qualified with
- the proper context depth, \c{%$$external}, and thus is no longer ambiguous,
- unintuitive or erroneous.
- \S{ctxrepl} \i\c{%repl}: \I{renaming contexts}Renaming a Context
- If you need to change the name of the top context on the stack (in
- order, for example, to have it respond differently to \c{%ifctx}),
- you can execute a \c{%pop} followed by a \c{%push}; but this will
- have the side effect of destroying all context-local labels and
- macros associated with the context that was just popped.
- NASM provides the directive \c{%repl}, which \e{replaces} a context
- with a different name, without touching the associated macros and
- labels. So you could replace the destructive code
- \c %pop
- \c %push newname
- with the non-destructive version \c{%repl newname}.
- \S{blockif} Example Use of the \i{Context Stack}: \i{Block IFs}
- This example makes use of almost all the context-stack features,
- including the conditional-assembly construct \i\c{%ifctx}, to
- implement a block IF statement as a set of macros.
- \c %macro if 1
- \c
- \c %push if
- \c j%-1 %$ifnot
- \c
- \c %endmacro
- \c
- \c %macro else 0
- \c
- \c %ifctx if
- \c %repl else
- \c jmp %$ifend
- \c %$ifnot:
- \c %else
- \c %error "expected `if' before `else'"
- \c %endif
- \c
- \c %endmacro
- \c
- \c %macro endif 0
- \c
- \c %ifctx if
- \c %$ifnot:
- \c %pop
- \c %elifctx else
- \c %$ifend:
- \c %pop
- \c %else
- \c %error "expected `if' or `else' before `endif'"
- \c %endif
- \c
- \c %endmacro
- This code is more robust than the \c{REPEAT} and \c{UNTIL} macros
- given in \k{ctxlocal}, because it uses conditional assembly to check
- that the macros are issued in the right order (for example, not
- calling \c{endif} before \c{if}) and issues a \c{%error} if they're
- not.
- In addition, the \c{endif} macro has to be able to cope with the two
- distinct cases of either directly following an \c{if}, or following
- an \c{else}. It achieves this, again, by using conditional assembly
- to do different things depending on whether the context on top of
- the stack is \c{if} or \c{else}.
- The \c{else} macro has to preserve the context on the stack, in
- order to have the \c{%$ifnot} referred to by the \c{if} macro be the
- same as the one defined by the \c{endif} macro, but has to change
- the context's name so that \c{endif} will know there was an
- intervening \c{else}. It does this by the use of \c{%repl}.
- A sample usage of these macros might look like:
- \c cmp ax,bx
- \c
- \c if ae
- \c cmp bx,cx
- \c
- \c if ae
- \c mov ax,cx
- \c else
- \c mov ax,bx
- \c endif
- \c
- \c else
- \c cmp ax,cx
- \c
- \c if ae
- \c mov ax,cx
- \c endif
- \c
- \c endif
- The block-\c{IF} macros handle nesting quite happily, by means of
- pushing another context, describing the inner \c{if}, on top of the
- one describing the outer \c{if}; thus \c{else} and \c{endif} always
- refer to the last unmatched \c{if} or \c{else}.
- \H{stackrel} \i{Stack Relative Preprocessor Directives}
- The following preprocessor directives provide a way to use
- labels to refer to local variables allocated on the stack.
- \b\c{%arg} (see \k{arg})
- \b\c{%stacksize} (see \k{stacksize})
- \b\c{%local} (see \k{local})
- \S{arg} \i\c{%arg} Directive
- The \c{%arg} directive is used to simplify the handling of
- parameters passed on the stack. Stack based parameter passing
- is used by many high level languages, including C, C++ and Pascal.
- While NASM has macros which attempt to duplicate this
- functionality (see \k{16cmacro}), the syntax is not particularly
- convenient to use and is not TASM compatible. Here is an example
- which shows the use of \c{%arg} without any external macros:
- \c some_function:
- \c
- \c %push mycontext ; save the current context
- \c %stacksize large ; tell NASM to use bp
- \c %arg i:word, j_ptr:word
- \c
- \c mov ax,[i]
- \c mov bx,[j_ptr]
- \c add ax,[bx]
- \c ret
- \c
- \c %pop ; restore original context
- This is similar to the procedure defined in \k{16cmacro} and adds
- the value in i to the value pointed to by j_ptr and returns the
- sum in the ax register. See \k{pushpop} for an explanation of
- \c{push} and \c{pop} and the use of context stacks.
- \S{stacksize} \i\c{%stacksize} Directive
- The \c{%stacksize} directive is used in conjunction with the
- \c{%arg} (see \k{arg}) and the \c{%local} (see \k{local}) directives.
- It tells NASM the default size to use for subsequent \c{%arg} and
- \c{%local} directives. The \c{%stacksize} directive takes one
- required argument which is one of \c{flat}, \c{flat64}, \c{large} or \c{small}.
- \c %stacksize flat
- This form causes NASM to use stack-based parameter addressing
- relative to \c{ebp} and it assumes that a near form of call was used
- to get to this label (i.e. that \c{eip} is on the stack).
- \c %stacksize flat64
- This form causes NASM to use stack-based parameter addressing
- relative to \c{rbp} and it assumes that a near form of call was used
- to get to this label (i.e. that \c{rip} is on the stack).
- \c %stacksize large
- This form uses \c{bp} to do stack-based parameter addressing and
- assumes that a far form of call was used to get to this address
- (i.e. that \c{ip} and \c{cs} are on the stack).
- \c %stacksize small
- This form also uses \c{bp} to address stack parameters, but it is
- different from \c{large} because it also assumes that the old value
- of bp is pushed onto the stack (i.e. it expects an \c{ENTER}
- instruction). In other words, it expects that \c{bp}, \c{ip} and
- \c{cs} are on the top of the stack, underneath any local space which
- may have been allocated by \c{ENTER}. This form is probably most
- useful when used in combination with the \c{%local} directive
- (see \k{local}).
- \S{local} \i\c{%local} Directive
- The \c{%local} directive is used to simplify the use of local
- temporary stack variables allocated in a stack frame. Automatic
- local variables in C are an example of this kind of variable. The
- \c{%local} directive is most useful when used with the \c{%stacksize}
- (see \k{stacksize} and is also compatible with the \c{%arg} directive
- (see \k{arg}). It allows simplified reference to variables on the
- stack which have been allocated typically by using the \c{ENTER}
- instruction.
- \# (see \k{insENTER} for a description of that instruction).
- An example of its use is the following:
- \c silly_swap:
- \c
- \c %push mycontext ; save the current context
- \c %stacksize small ; tell NASM to use bp
- \c %assign %$localsize 0 ; see text for explanation
- \c %local old_ax:word, old_dx:word
- \c
- \c enter %$localsize,0 ; see text for explanation
- \c mov [old_ax],ax ; swap ax & bx
- \c mov [old_dx],dx ; and swap dx & cx
- \c mov ax,bx
- \c mov dx,cx
- \c mov bx,[old_ax]
- \c mov cx,[old_dx]
- \c leave ; restore old bp
- \c ret ;
- \c
- \c %pop ; restore original context
- The \c{%$localsize} variable is used internally by the
- \c{%local} directive and \e{must} be defined within the
- current context before the \c{%local} directive may be used.
- Failure to do so will result in one expression syntax error for
- each \c{%local} variable declared. It then may be used in
- the construction of an appropriately sized ENTER instruction
- as shown in the example.
- \H{pperror} Reporting \i{User-Defined Errors}: \i\c{%error}, \i\c{%warning}, \i\c{%fatal}
- The preprocessor directive \c{%error} will cause NASM to report an
- error if it occurs in assembled code. So if other users are going to
- try to assemble your source files, you can ensure that they define the
- right macros by means of code like this:
- \c %ifdef F1
- \c ; do some setup
- \c %elifdef F2
- \c ; do some different setup
- \c %else
- \c %error "Neither F1 nor F2 was defined."
- \c %endif
- Then any user who fails to understand the way your code is supposed
- to be assembled will be quickly warned of their mistake, rather than
- having to wait until the program crashes on being run and then not
- knowing what went wrong.
- Similarly, \c{%warning} issues a warning, but allows assembly to continue:
- \c %ifdef F1
- \c ; do some setup
- \c %elifdef F2
- \c ; do some different setup
- \c %else
- \c %warning "Neither F1 nor F2 was defined, assuming F1."
- \c %define F1
- \c %endif
- \c{%error} and \c{%warning} are issued only on the final assembly
- pass. This makes them safe to use in conjunction with tests that
- depend on symbol values.
- \c{%fatal} terminates assembly immediately, regardless of pass. This
- is useful when there is no point in continuing the assembly further,
- and doing so is likely just going to cause a spew of confusing error
- messages.
- It is optional for the message string after \c{%error}, \c{%warning}
- or \c{%fatal} to be quoted. If it is \e{not}, then single-line macros
- are expanded in it, which can be used to display more information to
- the user. For example:
- \c %if foo > 64
- \c %assign foo_over foo-64
- \c %error foo is foo_over bytes too large
- \c %endif
- \H{otherpreproc} \i{Other Preprocessor Directives}
- \S{line} \i\c{%line} Directive
- The \c{%line} directive is used to notify NASM that the input line
- corresponds to a specific line number in another file. Typically
- this other file would be an original source file, with the current
- NASM input being the output of a pre-processor. The \c{%line}
- directive allows NASM to output messages which indicate the line
- number of the original source file, instead of the file that is being
- read by NASM.
- This preprocessor directive is not generally used directly by
- programmers, but may be of interest to preprocessor authors. The
- usage of the \c{%line} preprocessor directive is as follows:
- \c %line nnn[+mmm] [filename]
- In this directive, \c{nnn} identifies the line of the original source
- file which this line corresponds to. \c{mmm} is an optional parameter
- which specifies a line increment value; each line of the input file
- read in is considered to correspond to \c{mmm} lines of the original
- source file. Finally, \c{filename} is an optional parameter which
- specifies the file name of the original source file.
- After reading a \c{%line} preprocessor directive, NASM will report
- all file name and line numbers relative to the values specified
- therein.
- If the command line option \i\c{--no-line} is given, all \c{%line}
- directives are ignored. This may be useful for debugging preprocessed
- code. See \k{opt-no-line}.
- \S{getenv} \i\c{%!}\e{variable}: Read an Environment Variable.
- The \c{%!}\e{variable} directive makes it possible to read the value of an
- environment variable at assembly time. This could, for example, be used
- to store the contents of an environment variable into a string, which
- could be used at some other point in your code.
- For example, suppose that you have an environment variable \c{FOO},
- and you want the contents of \c{FOO} to be embedded in your program as
- a quoted string. You could do that as follows:
- \c %defstr FOO %!FOO
- See \k{defstr} for notes on the \c{%defstr} directive.
- If the name of the environment variable contains non-identifier
- characters, you can use string quotes to surround the name of the
- variable, for example:
- \c %defstr C_colon %!'C:'
- \H{stdmac} \i{Standard Macros}
- NASM defines a set of standard macros, which are already defined
- when it starts to process any source file. If you really need a
- program to be assembled with no pre-defined macros, you can use the
- \i\c{%clear} directive to empty the preprocessor of everything but
- context-local preprocessor variables and single-line macros.
- Most \i{user-level assembler directives} (see \k{directive}) are
- implemented as macros which invoke primitive directives; these are
- described in \k{directive}. The rest of the standard macro set is
- described here.
- \S{stdmacver} \i{NASM Version} Macros
- The single-line macros \i\c{__NASM_MAJOR__}, \i\c{__NASM_MINOR__},
- \i\c{__NASM_SUBMINOR__} and \i\c{___NASM_PATCHLEVEL__} expand to the
- major, minor, subminor and patch level parts of the \i{version
- number of NASM} being used. So, under NASM 0.98.32p1 for
- example, \c{__NASM_MAJOR__} would be defined to be 0, \c{__NASM_MINOR__}
- would be defined as 98, \c{__NASM_SUBMINOR__} would be defined to 32,
- and \c{___NASM_PATCHLEVEL__} would be defined as 1.
- Additionally, the macro \i\c{__NASM_SNAPSHOT__} is defined for
- automatically generated snapshot releases \e{only}.
- \S{stdmacverid} \i\c{__NASM_VERSION_ID__}: \i{NASM Version ID}
- The single-line macro \c{__NASM_VERSION_ID__} expands to a dword integer
- representing the full version number of the version of nasm being used.
- The value is the equivalent to \c{__NASM_MAJOR__}, \c{__NASM_MINOR__},
- \c{__NASM_SUBMINOR__} and \c{___NASM_PATCHLEVEL__} concatenated to
- produce a single doubleword. Hence, for 0.98.32p1, the returned number
- would be equivalent to:
- \c dd 0x00622001
- or
- \c db 1,32,98,0
- Note that the above lines are generate exactly the same code, the second
- line is used just to give an indication of the order that the separate
- values will be present in memory.
- \S{stdmacverstr} \i\c{__NASM_VER__}: \i{NASM Version string}
- The single-line macro \c{__NASM_VER__} expands to a string which defines
- the version number of nasm being used. So, under NASM 0.98.32 for example,
- \c db __NASM_VER__
- would expand to
- \c db "0.98.32"
- \S{fileline} \i\c{__FILE__} and \i\c{__LINE__}: File Name and Line Number
- Like the C preprocessor, NASM allows the user to find out the file
- name and line number containing the current instruction. The macro
- \c{__FILE__} expands to a string constant giving the name of the
- current input file (which may change through the course of assembly
- if \c{%include} directives are used), and \c{__LINE__} expands to a
- numeric constant giving the current line number in the input file.
- These macros could be used, for example, to communicate debugging
- information to a macro, since invoking \c{__LINE__} inside a macro
- definition (either single-line or multi-line) will return the line
- number of the macro \e{call}, rather than \e{definition}. So to
- determine where in a piece of code a crash is occurring, for
- example, one could write a routine \c{stillhere}, which is passed a
- line number in \c{EAX} and outputs something like `line 155: still
- here'. You could then write a macro
- \c %macro notdeadyet 0
- \c
- \c push eax
- \c mov eax,__LINE__
- \c call stillhere
- \c pop eax
- \c
- \c %endmacro
- and then pepper your code with calls to \c{notdeadyet} until you
- find the crash point.
- \S{bitsm} \i\c{__BITS__}: Current BITS Mode
- The \c{__BITS__} standard macro is updated every time that the BITS mode is
- set using the \c{BITS XX} or \c{[BITS XX]} directive, where XX is a valid mode
- number of 16, 32 or 64. \c{__BITS__} receives the specified mode number and
- makes it globally available. This can be very useful for those who utilize
- mode-dependent macros.
- \S{ofmtm} \i\c{__OUTPUT_FORMAT__}: Current Output Format
- The \c{__OUTPUT_FORMAT__} standard macro holds the current output
- format name, as given by the \c{-f} option or NASM's default. Type
- \c{nasm -hf} for a list.
- \c %ifidn __OUTPUT_FORMAT__, win32
- \c %define NEWLINE 13, 10
- \c %elifidn __OUTPUT_FORMAT__, elf32
- \c %define NEWLINE 10
- \c %endif
- \S{dfmtm} \i\c{__DEBUG_FORMAT__}: Current Debug Format
- If debugging information generation is enabled, The
- \c{__DEBUG_FORMAT__} standard macro holds the current debug format
- name as specified by the \c{-F} or \c{-g} option or the output format
- default. Type \c{nasm -f} \e{output} \c{y} for a list.
- \c{__DEBUG_FORMAT__} is not defined if debugging is not enabled, or if
- the debug format specified is \c{null}.
- \S{datetime} Assembly Date and Time Macros
- NASM provides a variety of macros that represent the timestamp of the
- assembly session.
- \b The \i\c{__DATE__} and \i\c{__TIME__} macros give the assembly date and
- time as strings, in ISO 8601 format (\c{"YYYY-MM-DD"} and \c{"HH:MM:SS"},
- respectively.)
- \b The \i\c{__DATE_NUM__} and \i\c{__TIME_NUM__} macros give the assembly
- date and time in numeric form; in the format \c{YYYYMMDD} and
- \c{HHMMSS} respectively.
- \b The \i\c{__UTC_DATE__} and \i\c{__UTC_TIME__} macros give the assembly
- date and time in universal time (UTC) as strings, in ISO 8601 format
- (\c{"YYYY-MM-DD"} and \c{"HH:MM:SS"}, respectively.) If the host
- platform doesn't provide UTC time, these macros are undefined.
- \b The \i\c{__UTC_DATE_NUM__} and \i\c{__UTC_TIME_NUM__} macros give the
- assembly date and time universal time (UTC) in numeric form; in the
- format \c{YYYYMMDD} and \c{HHMMSS} respectively. If the
- host platform doesn't provide UTC time, these macros are
- undefined.
- \b The \c{__POSIX_TIME__} macro is defined as a number containing the
- number of seconds since the POSIX epoch, 1 January 1970 00:00:00 UTC;
- excluding any leap seconds. This is computed using UTC time if
- available on the host platform, otherwise it is computed using the
- local time as if it was UTC.
- All instances of time and date macros in the same assembly session
- produce consistent output. For example, in an assembly session
- started at 42 seconds after midnight on January 1, 2010 in Moscow
- (timezone UTC+3) these macros would have the following values,
- assuming, of course, a properly configured environment with a correct
- clock:
- \c __DATE__ "2010-01-01"
- \c __TIME__ "00:00:42"
- \c __DATE_NUM__ 20100101
- \c __TIME_NUM__ 000042
- \c __UTC_DATE__ "2009-12-31"
- \c __UTC_TIME__ "21:00:42"
- \c __UTC_DATE_NUM__ 20091231
- \c __UTC_TIME_NUM__ 210042
- \c __POSIX_TIME__ 1262293242
- \S{use_def} \I\c{__USE_*__}\c{__USE_}\e{package}\c{__}: Package
- Include Test
- When a standard macro package (see \k{macropkg}) is included with the
- \c{%use} directive (see \k{use}), a single-line macro of the form
- \c{__USE_}\e{package}\c{__} is automatically defined. This allows
- testing if a particular package is invoked or not.
- For example, if the \c{altreg} package is included (see
- \k{pkg_altreg}), then the macro \c{__USE_ALTREG__} is defined.
- \S{pass_macro} \i\c{__PASS__}: Assembly Pass
- The macro \c{__PASS__} is defined to be \c{1} on preparatory passes,
- and \c{2} on the final pass. In preprocess-only mode, it is set to
- \c{3}, and when running only to generate dependencies (due to the
- \c{-M} or \c{-MG} option, see \k{opt-M}) it is set to \c{0}.
- \e{Avoid using this macro if at all possible. It is tremendously easy
- to generate very strange errors by misusing it, and the semantics may
- change in future versions of NASM.}
- \S{struc} \i\c{STRUC} and \i\c{ENDSTRUC}: \i{Declaring Structure} Data Types
- The core of NASM contains no intrinsic means of defining data
- structures; instead, the preprocessor is sufficiently powerful that
- data structures can be implemented as a set of macros. The macros
- \c{STRUC} and \c{ENDSTRUC} are used to define a structure data type.
- \c{STRUC} takes one or two parameters. The first parameter is the name
- of the data type. The second, optional parameter is the base offset of
- the structure. The name of the data type is defined as a symbol with
- the value of the base offset, and the name of the data type with the
- suffix \c{_size} appended to it is defined as an \c{EQU} giving the
- size of the structure. Once \c{STRUC} has been issued, you are
- defining the structure, and should define fields using the \c{RESB}
- family of pseudo-instructions, and then invoke \c{ENDSTRUC} to finish
- the definition.
- For example, to define a structure called \c{mytype} containing a
- longword, a word, a byte and a string of bytes, you might code
- \c struc mytype
- \c
- \c mt_long: resd 1
- \c mt_word: resw 1
- \c mt_byte: resb 1
- \c mt_str: resb 32
- \c
- \c endstruc
- The above code defines six symbols: \c{mt_long} as 0 (the offset
- from the beginning of a \c{mytype} structure to the longword field),
- \c{mt_word} as 4, \c{mt_byte} as 6, \c{mt_str} as 7, \c{mytype_size}
- as 39, and \c{mytype} itself as zero.
- The reason why the structure type name is defined at zero by default
- is a side effect of allowing structures to work with the local label
- mechanism: if your structure members tend to have the same names in
- more than one structure, you can define the above structure like this:
- \c struc mytype
- \c
- \c .long: resd 1
- \c .word: resw 1
- \c .byte: resb 1
- \c .str: resb 32
- \c
- \c endstruc
- This defines the offsets to the structure fields as \c{mytype.long},
- \c{mytype.word}, \c{mytype.byte} and \c{mytype.str}.
- NASM, since it has no \e{intrinsic} structure support, does not
- support any form of period notation to refer to the elements of a
- structure once you have one (except the above local-label notation),
- so code such as \c{mov ax,[mystruc.mt_word]} is not valid.
- \c{mt_word} is a constant just like any other constant, so the
- correct syntax is \c{mov ax,[mystruc+mt_word]} or \c{mov
- ax,[mystruc+mytype.word]}.
- Sometimes you only have the address of the structure displaced by an
- offset. For example, consider this standard stack frame setup:
- \c push ebp
- \c mov ebp, esp
- \c sub esp, 40
- In this case, you could access an element by subtracting the offset:
- \c mov [ebp - 40 + mytype.word], ax
- However, if you do not want to repeat this offset, you can use -40 as
- a base offset:
- \c struc mytype, -40
- And access an element this way:
- \c mov [ebp + mytype.word], ax
- \S{istruc} \i\c{ISTRUC}, \i\c{AT} and \i\c{IEND}: Declaring
- \i{Instances of Structures}
- Having defined a structure type, the next thing you typically want
- to do is to declare instances of that structure in your data
- segment. NASM provides an easy way to do this in the \c{ISTRUC}
- mechanism. To declare a structure of type \c{mytype} in a program,
- you code something like this:
- \c mystruc:
- \c istruc mytype
- \c
- \c at mt_long, dd 123456
- \c at mt_word, dw 1024
- \c at mt_byte, db 'x'
- \c at mt_str, db 'hello, world', 13, 10, 0
- \c
- \c iend
- The function of the \c{AT} macro is to make use of the \c{TIMES}
- prefix to advance the assembly position to the correct point for the
- specified structure field, and then to declare the specified data.
- Therefore the structure fields must be declared in the same order as
- they were specified in the structure definition.
- If the data to go in a structure field requires more than one source
- line to specify, the remaining source lines can easily come after
- the \c{AT} line. For example:
- \c at mt_str, db 123,134,145,156,167,178,189
- \c db 190,100,0
- Depending on personal taste, you can also omit the code part of the
- \c{AT} line completely, and start the structure field on the next
- line:
- \c at mt_str
- \c db 'hello, world'
- \c db 13,10,0
- \S{align} \i\c{ALIGN} and \i\c{ALIGNB}: Data Alignment
- The \c{ALIGN} and \c{ALIGNB} macros provides a convenient way to
- align code or data on a word, longword, paragraph or other boundary.
- (Some assemblers call this directive \i\c{EVEN}.) The syntax of the
- \c{ALIGN} and \c{ALIGNB} macros is
- \c align 4 ; align on 4-byte boundary
- \c align 16 ; align on 16-byte boundary
- \c align 8,db 0 ; pad with 0s rather than NOPs
- \c align 4,resb 1 ; align to 4 in the BSS
- \c alignb 4 ; equivalent to previous line
- Both macros require their first argument to be a power of two; they
- both compute the number of additional bytes required to bring the
- length of the current section up to a multiple of that power of two,
- and then apply the \c{TIMES} prefix to their second argument to
- perform the alignment.
- If the second argument is not specified, the default for \c{ALIGN}
- is \c{NOP}, and the default for \c{ALIGNB} is \c{RESB 1}. So if the
- second argument is specified, the two macros are equivalent.
- Normally, you can just use \c{ALIGN} in code and data sections and
- \c{ALIGNB} in BSS sections, and never need the second argument
- except for special purposes.
- \c{ALIGN} and \c{ALIGNB}, being simple macros, perform no error
- checking: they cannot warn you if their first argument fails to be a
- power of two, or if their second argument generates more than one
- byte of code. In each of these cases they will silently do the wrong
- thing.
- \c{ALIGNB} (or \c{ALIGN} with a second argument of \c{RESB 1}) can
- be used within structure definitions:
- \c struc mytype2
- \c
- \c mt_byte:
- \c resb 1
- \c alignb 2
- \c mt_word:
- \c resw 1
- \c alignb 4
- \c mt_long:
- \c resd 1
- \c mt_str:
- \c resb 32
- \c
- \c endstruc
- This will ensure that the structure members are sensibly aligned
- relative to the base of the structure.
- A final caveat: \c{ALIGN} and \c{ALIGNB} work relative to the
- beginning of the \e{section}, not the beginning of the address space
- in the final executable. Aligning to a 16-byte boundary when the
- section you're in is only guaranteed to be aligned to a 4-byte
- boundary, for example, is a waste of effort. Again, NASM does not
- check that the section's alignment characteristics are sensible for
- the use of \c{ALIGN} or \c{ALIGNB}.
- Both \c{ALIGN} and \c{ALIGNB} do call \c{SECTALIGN} macro implicitly.
- See \k{sectalign} for details.
- See also the \c{smartalign} standard macro package, \k{pkg_smartalign}.
- \S{sectalign} \i\c{SECTALIGN}: Section Alignment
- The \c{SECTALIGN} macros provides a way to modify alignment attribute
- of output file section. Unlike the \c{align=} attribute (which is allowed
- at section definition only) the \c{SECTALIGN} macro may be used at any time.
- For example the directive
- \c SECTALIGN 16
- sets the section alignment requirements to 16 bytes. Once increased it can
- not be decreased, the magnitude may grow only.
- Note that \c{ALIGN} (see \k{align}) calls the \c{SECTALIGN} macro implicitly
- so the active section alignment requirements may be updated. This is by default
- behaviour, if for some reason you want the \c{ALIGN} do not call \c{SECTALIGN}
- at all use the directive
- \c SECTALIGN OFF
- It is still possible to turn in on again by
- \c SECTALIGN ON
- \C{macropkg} \i{Standard Macro Packages}
- The \i\c{%use} directive (see \k{use}) includes one of the standard
- macro packages included with the NASM distribution and compiled into
- the NASM binary. It operates like the \c{%include} directive (see
- \k{include}), but the included contents is provided by NASM itself.
- The names of standard macro packages are case insensitive, and can be
- quoted or not.
- \H{pkg_altreg} \i\c{altreg}: \i{Alternate Register Names}
- The \c{altreg} standard macro package provides alternate register
- names. It provides numeric register names for all registers (not just
- \c{R8}-\c{R15}), the Intel-defined aliases \c{R8L}-\c{R15L} for the
- low bytes of register (as opposed to the NASM/AMD standard names
- \c{R8B}-\c{R15B}), and the names \c{R0H}-\c{R3H} (by analogy with
- \c{R0L}-\c{R3L}) for \c{AH}, \c{CH}, \c{DH}, and \c{BH}.
- Example use:
- \c %use altreg
- \c
- \c proc:
- \c mov r0l,r3h ; mov al,bh
- \c ret
- See also \k{reg64}.
- \H{pkg_smartalign} \i\c{smartalign}\I{align, smart}: Smart \c{ALIGN} Macro
- The \c{smartalign} standard macro package provides for an \i\c{ALIGN}
- macro which is more powerful than the default (and
- backwards-compatible) one (see \k{align}). When the \c{smartalign}
- package is enabled, when \c{ALIGN} is used without a second argument,
- NASM will generate a sequence of instructions more efficient than a
- series of \c{NOP}. Furthermore, if the padding exceeds a specific
- threshold, then NASM will generate a jump over the entire padding
- sequence.
- The specific instructions generated can be controlled with the
- new \i\c{ALIGNMODE} macro. This macro takes two parameters: one mode,
- and an optional jump threshold override. If (for any reason) you need
- to turn off the jump completely just set jump threshold value to -1
- (or set it to \c{nojmp}). The following modes are possible:
- \b \c{generic}: Works on all x86 CPUs and should have reasonable
- performance. The default jump threshold is 8. This is the
- default.
- \b \c{nop}: Pad out with \c{NOP} instructions. The only difference
- compared to the standard \c{ALIGN} macro is that NASM can still jump
- over a large padding area. The default jump threshold is 16.
- \b \c{k7}: Optimize for the AMD K7 (Athlon/Althon XP). These
- instructions should still work on all x86 CPUs. The default jump
- threshold is 16.
- \b \c{k8}: Optimize for the AMD K8 (Opteron/Althon 64). These
- instructions should still work on all x86 CPUs. The default jump
- threshold is 16.
- \b \c{p6}: Optimize for Intel CPUs. This uses the long \c{NOP}
- instructions first introduced in Pentium Pro. This is incompatible
- with all CPUs of family 5 or lower, as well as some VIA CPUs and
- several virtualization solutions. The default jump threshold is 16.
- The macro \i\c{__ALIGNMODE__} is defined to contain the current
- alignment mode. A number of other macros beginning with \c{__ALIGN_}
- are used internally by this macro package.
- \H{pkg_fp} \i\c\{fp}: Floating-point macros
- This packages contains the following floating-point convenience macros:
- \c %define Inf __Infinity__
- \c %define NaN __QNaN__
- \c %define QNaN __QNaN__
- \c %define SNaN __SNaN__
- \c
- \c %define float8(x) __float8__(x)
- \c %define float16(x) __float16__(x)
- \c %define float32(x) __float32__(x)
- \c %define float64(x) __float64__(x)
- \c %define float80m(x) __float80m__(x)
- \c %define float80e(x) __float80e__(x)
- \c %define float128l(x) __float128l__(x)
- \c %define float128h(x) __float128h__(x)
- \H{pkg_ifunc} \i\c{ifunc}: \i{Integer functions}
- This package contains a set of macros which implement integer
- functions. These are actually implemented as special operators, but
- are most conveniently accessed via this macro package.
- The macros provided are:
- \S{ilog2} \i{Integer logarithms}
- These functions calculate the integer logarithm base 2 of their
- argument, considered as an unsigned integer. The only differences
- between the functions is their respective behavior if the argument
- provided is not a power of two.
- The function \i\c{ilog2e()} (alias \i\c{ilog2()}) generates an error if
- the argument is not a power of two.
- The function \i\c{ilog2f()} rounds the argument down to the nearest
- power of two; if the argument is zero it returns zero.
- The function \i\c{ilog2c()} rounds the argument up to the nearest
- power of two.
- The functions \i\c{ilog2fw()} (alias \i\c{ilog2w()}) and
- \i\c{ilog2cw()} generate a warning if the argument is not a power of
- two, but otherwise behaves like \c{ilog2f()} and \c{ilog2c()},
- respectively.
- \C{directive} \i{Assembler Directives}
- NASM, though it attempts to avoid the bureaucracy of assemblers like
- MASM and TASM, is nevertheless forced to support a \e{few}
- directives. These are described in this chapter.
- NASM's directives come in two types: \I{user-level
- directives}\e{user-level} directives and \I{primitive
- directives}\e{primitive} directives. Typically, each directive has a
- user-level form and a primitive form. In almost all cases, we
- recommend that users use the user-level forms of the directives,
- which are implemented as macros which call the primitive forms.
- Primitive directives are enclosed in square brackets; user-level
- directives are not.
- In addition to the universal directives described in this chapter,
- each object file format can optionally supply extra directives in
- order to control particular features of that file format. These
- \I{format-specific directives}\e{format-specific} directives are
- documented along with the formats that implement them, in \k{outfmt}.
- \H{bits} \i\c{BITS}: Specifying Target \i{Processor Mode}
- The \c{BITS} directive specifies whether NASM should generate code
- \I{16-bit mode, versus 32-bit mode}designed to run on a processor
- operating in 16-bit mode, 32-bit mode or 64-bit mode. The syntax is
- \c{BITS XX}, where XX is 16, 32 or 64.
- In most cases, you should not need to use \c{BITS} explicitly. The
- \c{aout}, \c{coff}, \c{elf}, \c{macho}, \c{win32} and \c{win64}
- object formats, which are designed for use in 32-bit or 64-bit
- operating systems, all cause NASM to select 32-bit or 64-bit mode,
- respectively, by default. The \c{obj} object format allows you
- to specify each segment you define as either \c{USE16} or \c{USE32},
- and NASM will set its operating mode accordingly, so the use of the
- \c{BITS} directive is once again unnecessary.
- The most likely reason for using the \c{BITS} directive is to write
- 32-bit or 64-bit code in a flat binary file; this is because the \c{bin}
- output format defaults to 16-bit mode in anticipation of it being
- used most frequently to write DOS \c{.COM} programs, DOS \c{.SYS}
- device drivers and boot loader software.
- The \c{BITS} directive can also be used to generate code for a
- different mode than the standard one for the output format.
- You do \e{not} need to specify \c{BITS 32} merely in order to use
- 32-bit instructions in a 16-bit DOS program; if you do, the
- assembler will generate incorrect code because it will be writing
- code targeted at a 32-bit platform, to be run on a 16-bit one.
- When NASM is in \c{BITS 16} mode, instructions which use 32-bit
- data are prefixed with an 0x66 byte, and those referring to 32-bit
- addresses have an 0x67 prefix. In \c{BITS 32} mode, the reverse is
- true: 32-bit instructions require no prefixes, whereas instructions
- using 16-bit data need an 0x66 and those working on 16-bit addresses
- need an 0x67.
- When NASM is in \c{BITS 64} mode, most instructions operate the same
- as they do for \c{BITS 32} mode. However, there are 8 more general and
- SSE registers, and 16-bit addressing is no longer supported.
- The default address size is 64 bits; 32-bit addressing can be selected
- with the 0x67 prefix. The default operand size is still 32 bits,
- however, and the 0x66 prefix selects 16-bit operand size. The \c{REX}
- prefix is used both to select 64-bit operand size, and to access the
- new registers. NASM automatically inserts REX prefixes when
- necessary.
- When the \c{REX} prefix is used, the processor does not know how to
- address the AH, BH, CH or DH (high 8-bit legacy) registers. Instead,
- it is possible to access the the low 8-bits of the SP, BP SI and DI
- registers as SPL, BPL, SIL and DIL, respectively; but only when the
- REX prefix is used.
- The \c{BITS} directive has an exactly equivalent primitive form,
- \c{[BITS 16]}, \c{[BITS 32]} and \c{[BITS 64]}. The user-level form is
- a macro which has no function other than to call the primitive form.
- Note that the space is neccessary, e.g. \c{BITS32} will \e{not} work!
- \S{USE16 & USE32} \i\c{USE16} & \i\c{USE32}: Aliases for BITS
- The `\c{USE16}' and `\c{USE32}' directives can be used in place of
- `\c{BITS 16}' and `\c{BITS 32}', for compatibility with other assemblers.
- \H{default} \i\c{DEFAULT}: Change the assembler defaults
- The \c{DEFAULT} directive changes the assembler defaults. Normally,
- NASM defaults to a mode where the programmer is expected to explicitly
- specify most features directly. However, this is occasionally
- obnoxious, as the explicit form is pretty much the only one one wishes
- to use.
- Currently, \c{DEFAULT} can set \c{REL} & \c{ABS} and \c{BND} & \c{NOBND}.
- \S{REL & ABS} \i\c{REL} & \i\c{ABS}: RIP-relative addressing
- This sets whether registerless instructions in 64-bit mode are \c{RIP}-relative
- or not. By default, they are absolute unless overridden with the \i\c{REL}
- specifier (see \k{effaddr}). However, if \c{DEFAULT REL} is
- specified, \c{REL} is default, unless overridden with the \c{ABS}
- specifier, \e{except when used with an FS or GS segment override}.
- The special handling of \c{FS} and \c{GS} overrides are due to the
- fact that these registers are generally used as thread pointers or
- other special functions in 64-bit mode, and generating
- \c{RIP}-relative addresses would be extremely confusing.
- \c{DEFAULT REL} is disabled with \c{DEFAULT ABS}.
- \S{BND & NOBND} \i\c{BND} & \i\c{NOBND}: \c{BND} prefix
- If \c{DEFAULT BND} is set, all bnd-prefix available instructions following
- this directive are prefixed with bnd. To override it, \c{NOBND} prefix can
- be used.
- \c DEFAULT BND
- \c call foo ; BND will be prefixed
- \c nobnd call foo ; BND will NOT be prefixed
- \c{DEFAULT NOBND} can disable \c{DEFAULT BND} and then \c{BND} prefix will be
- added only when explicitly specified in code.
- \c{DEFAULT BND} is expected to be the normal configuration for writing
- MPX-enabled code.
- \H{section} \i\c{SECTION} or \i\c{SEGMENT}: Changing and \i{Defining
- Sections}
- \I{changing sections}\I{switching between sections}The \c{SECTION}
- directive (\c{SEGMENT} is an exactly equivalent synonym) changes
- which section of the output file the code you write will be
- assembled into. In some object file formats, the number and names of
- sections are fixed; in others, the user may make up as many as they
- wish. Hence \c{SECTION} may sometimes give an error message, or may
- define a new section, if you try to switch to a section that does
- not (yet) exist.
- The Unix object formats, and the \c{bin} object format (but see
- \k{multisec}), all support
- the \i{standardized section names} \c{.text}, \c{.data} and \c{.bss}
- for the code, data and uninitialized-data sections. The \c{obj}
- format, by contrast, does not recognize these section names as being
- special, and indeed will strip off the leading period of any section
- name that has one.
- \S{sectmac} The \i\c{__SECT__} Macro
- The \c{SECTION} directive is unusual in that its user-level form
- functions differently from its primitive form. The primitive form,
- \c{[SECTION xyz]}, simply switches the current target section to the
- one given. The user-level form, \c{SECTION xyz}, however, first
- defines the single-line macro \c{__SECT__} to be the primitive
- \c{[SECTION]} directive which it is about to issue, and then issues
- it. So the user-level directive
- \c SECTION .text
- expands to the two lines
- \c %define __SECT__ [SECTION .text]
- \c [SECTION .text]
- Users may find it useful to make use of this in their own macros.
- For example, the \c{writefile} macro defined in \k{mlmacgre} can be
- usefully rewritten in the following more sophisticated form:
- \c %macro writefile 2+
- \c
- \c [section .data]
- \c
- \c %%str: db %2
- \c %%endstr:
- \c
- \c __SECT__
- \c
- \c mov dx,%%str
- \c mov cx,%%endstr-%%str
- \c mov bx,%1
- \c mov ah,0x40
- \c int 0x21
- \c
- \c %endmacro
- This form of the macro, once passed a string to output, first
- switches temporarily to the data section of the file, using the
- primitive form of the \c{SECTION} directive so as not to modify
- \c{__SECT__}. It then declares its string in the data section, and
- then invokes \c{__SECT__} to switch back to \e{whichever} section
- the user was previously working in. It thus avoids the need, in the
- previous version of the macro, to include a \c{JMP} instruction to
- jump over the data, and also does not fail if, in a complicated
- \c{OBJ} format module, the user could potentially be assembling the
- code in any of several separate code sections.
- \H{absolute} \i\c{ABSOLUTE}: Defining Absolute Labels
- The \c{ABSOLUTE} directive can be thought of as an alternative form
- of \c{SECTION}: it causes the subsequent code to be directed at no
- physical section, but at the hypothetical section starting at the
- given absolute address. The only instructions you can use in this
- mode are the \c{RESB} family.
- \c{ABSOLUTE} is used as follows:
- \c absolute 0x1A
- \c
- \c kbuf_chr resw 1
- \c kbuf_free resw 1
- \c kbuf resw 16
- This example describes a section of the PC BIOS data area, at
- segment address 0x40: the above code defines \c{kbuf_chr} to be
- 0x1A, \c{kbuf_free} to be 0x1C, and \c{kbuf} to be 0x1E.
- The user-level form of \c{ABSOLUTE}, like that of \c{SECTION},
- redefines the \i\c{__SECT__} macro when it is invoked.
- \i\c{STRUC} and \i\c{ENDSTRUC} are defined as macros which use
- \c{ABSOLUTE} (and also \c{__SECT__}).
- \c{ABSOLUTE} doesn't have to take an absolute constant as an
- argument: it can take an expression (actually, a \i{critical
- expression}: see \k{crit}) and it can be a value in a segment. For
- example, a TSR can re-use its setup code as run-time BSS like this:
- \c org 100h ; it's a .COM program
- \c
- \c jmp setup ; setup code comes last
- \c
- \c ; the resident part of the TSR goes here
- \c setup:
- \c ; now write the code that installs the TSR here
- \c
- \c absolute setup
- \c
- \c runtimevar1 resw 1
- \c runtimevar2 resd 20
- \c
- \c tsr_end:
- This defines some variables `on top of' the setup code, so that
- after the setup has finished running, the space it took up can be
- re-used as data storage for the running TSR. The symbol `tsr_end'
- can be used to calculate the total size of the part of the TSR that
- needs to be made resident.
- \H{extern} \i\c{EXTERN}: \i{Importing Symbols} from Other Modules
- \c{EXTERN} is similar to the MASM directive \c{EXTRN} and the C
- keyword \c{extern}: it is used to declare a symbol which is not
- defined anywhere in the module being assembled, but is assumed to be
- defined in some other module and needs to be referred to by this
- one. Not every object-file format can support external variables:
- the \c{bin} format cannot.
- The \c{EXTERN} directive takes as many arguments as you like. Each
- argument is the name of a symbol:
- \c extern _printf
- \c extern _sscanf,_fscanf
- Some object-file formats provide extra features to the \c{EXTERN}
- directive. In all cases, the extra features are used by suffixing a
- colon to the symbol name followed by object-format specific text.
- For example, the \c{obj} format allows you to declare that the
- default segment base of an external should be the group \c{dgroup}
- by means of the directive
- \c extern _variable:wrt dgroup
- The primitive form of \c{EXTERN} differs from the user-level form
- only in that it can take only one argument at a time: the support
- for multiple arguments is implemented at the preprocessor level.
- You can declare the same variable as \c{EXTERN} more than once: NASM
- will quietly ignore the second and later redeclarations.
- If a variable is declared both \c{GLOBAL} and \c{EXTERN}, or if it is
- declared as \c{EXTERN} and then defined, it will be treated as
- \c{GLOBAL}. If a variable is declared both as \c{COMMON} and
- \c{EXTERN}, it will be treated as \c{COMMON}.
- \H{global} \i\c{GLOBAL}: \i{Exporting Symbols} to Other Modules
- \c{GLOBAL} is the other end of \c{EXTERN}: if one module declares a
- symbol as \c{EXTERN} and refers to it, then in order to prevent
- linker errors, some other module must actually \e{define} the
- symbol and declare it as \c{GLOBAL}. Some assemblers use the name
- \i\c{PUBLIC} for this purpose.
- \c{GLOBAL} uses the same syntax as \c{EXTERN}, except that it must
- refer to symbols which \e{are} defined in the same module as the
- \c{GLOBAL} directive. For example:
- \c global _main
- \c _main:
- \c ; some code
- \c{GLOBAL}, like \c{EXTERN}, allows object formats to define private
- extensions by means of a colon. The \c{elf} object format, for
- example, lets you specify whether global data items are functions or
- data:
- \c global hashlookup:function, hashtable:data
- Like \c{EXTERN}, the primitive form of \c{GLOBAL} differs from the
- user-level form only in that it can take only one argument at a
- time.
- \H{common} \i\c{COMMON}: Defining Common Data Areas
- The \c{COMMON} directive is used to declare \i\e{common variables}.
- A common variable is much like a global variable declared in the
- uninitialized data section, so that
- \c common intvar 4
- is similar in function to
- \c global intvar
- \c section .bss
- \c
- \c intvar resd 1
- The difference is that if more than one module defines the same
- common variable, then at link time those variables will be
- \e{merged}, and references to \c{intvar} in all modules will point
- at the same piece of memory.
- Like \c{GLOBAL} and \c{EXTERN}, \c{COMMON} supports object-format
- specific extensions. For example, the \c{obj} format allows common
- variables to be NEAR or FAR, and the \c{elf} format allows you to
- specify the alignment requirements of a common variable:
- \c common commvar 4:near ; works in OBJ
- \c common intarray 100:4 ; works in ELF: 4 byte aligned
- Once again, like \c{EXTERN} and \c{GLOBAL}, the primitive form of
- \c{COMMON} differs from the user-level form only in that it can take
- only one argument at a time.
- \H{static} \i\c{STATIC}: Local Symbols within Modules
- Opposite to \c{EXTERN} and \c{GLOBAL}, \c{STATIC} is local symbol, but
- should be named according to the global mangling rules (named by
- analogy with the C keyword \c{static} as applied to functions or
- global variables).
- \c static foo
- \c foo:
- \c ; codes
- Unlike \c{GLOBAL}, \c{STATIC} does not allow object formats to accept
- private extensions mentioned in \k{global}.
- \H{mangling} \i\c{(G|L)PREFIX}, \i\c{(G|L)POSTFIX}: Mangling Symbols
- \c{PREFIX}, \c{GPREFIX}, \c{LPREFIX}, \c{POSTFIX}, \c{GPOSTFIX}, and
- \c{LPOSTFIX} directives can prepend or append the given argument to
- a certain type of symbols. The directive should be as a preprocess
- statement. Each usage is:
- \b\c{PREFIX}|\c{GPREFIX}: Prepend the argument to all \c{EXTERN}
- \c{COMMON}, \c{STATIC}, and \c{GLOBAL} symbols
- \b\c{LPREFIX}: Prepend the argument to all other symbols
- such as Local Labels, and backend defined symbols
- \b\c{POSTFIX}|\c{GPOSTFIX}: Append the argument to all \c{EXTERN}
- \c{COMMON}, \c{STATIC}, and \c{GLOBAL} symbols
- \b\c{LPOSTFIX}: Append the argument to all other symbols
- such as Local Labels, and backend defined symbols
- This is a macro implemented as a \c{%pragma}:
- \c %pragma macho lprefix L_
- Commandline option is also possible. See also \k{opt-pfix}.
- Some toolchains is aware of a particular prefix for its own optimization
- options, such as code elimination. For instance, Mach-O backend has a
- linker that uses a simplistic naming scheme to chunk up sections into a
- meta section. When the \c{subsections_via_symbols} directive
- (\k{macho-ssvs}) is declared, each symbol is the start of a
- separate block. The meta section is, then, defined to include sections
- before the one that starts with a 'L'. \c{LPREFIX} is useful here to mark
- all local symbols with the 'L' prefix to be excluded to the meta section.
- It converts local symbols compatible with the particular toolchain.
- Note that local symbols declared with \c{STATIC} (\k{static})
- are excluded from the symbol mangling and also not marked as global.
- \H{gen-namespace} \i\c{OUTPUT}, \i\c{DEBUG}: Generic Namespaces
- \c{OUTPUT} and \c{DEBUG} are generic \c{%pragma} namespaces that are
- supposed to redirect to the current output and debug formats.
- For example, when mangling local symbols via the generic namespace:
- \c %pragma output gprefix _
- This is useful when the directive is needed to be output format
- agnostic.
- The example is also euquivalent to this, when the output format is \c{elf}:
- \c %pragma elf gprefix _
- \H{CPU} \i\c{CPU}: Defining CPU Dependencies
- The \i\c{CPU} directive restricts assembly to those instructions which
- are available on the specified CPU.
- Options are:
- \b\c{CPU 8086} Assemble only 8086 instruction set
- \b\c{CPU 186} Assemble instructions up to the 80186 instruction set
- \b\c{CPU 286} Assemble instructions up to the 286 instruction set
- \b\c{CPU 386} Assemble instructions up to the 386 instruction set
- \b\c{CPU 486} 486 instruction set
- \b\c{CPU 586} Pentium instruction set
- \b\c{CPU PENTIUM} Same as 586
- \b\c{CPU 686} P6 instruction set
- \b\c{CPU PPRO} Same as 686
- \b\c{CPU P2} Same as 686
- \b\c{CPU P3} Pentium III (Katmai) instruction sets
- \b\c{CPU KATMAI} Same as P3
- \b\c{CPU P4} Pentium 4 (Willamette) instruction set
- \b\c{CPU WILLAMETTE} Same as P4
- \b\c{CPU PRESCOTT} Prescott instruction set
- \b\c{CPU X64} x86-64 (x64/AMD64/Intel 64) instruction set
- \b\c{CPU IA64} IA64 CPU (in x86 mode) instruction set
- All options are case insensitive. All instructions will be selected
- only if they apply to the selected CPU or lower. By default, all
- instructions are available.
- \H{FLOAT} \i\c{FLOAT}: Handling of \I{floating-point, constants}floating-point constants
- By default, floating-point constants are rounded to nearest, and IEEE
- denormals are supported. The following options can be set to alter
- this behaviour:
- \b\c{FLOAT DAZ} Flush denormals to zero
- \b\c{FLOAT NODAZ} Do not flush denormals to zero (default)
- \b\c{FLOAT NEAR} Round to nearest (default)
- \b\c{FLOAT UP} Round up (toward +Infinity)
- \b\c{FLOAT DOWN} Round down (toward -Infinity)
- \b\c{FLOAT ZERO} Round toward zero
- \b\c{FLOAT DEFAULT} Restore default settings
- The standard macros \i\c{__FLOAT_DAZ__}, \i\c{__FLOAT_ROUND__}, and
- \i\c{__FLOAT__} contain the current state, as long as the programmer
- has avoided the use of the brackeded primitive form, (\c{[FLOAT]}).
- \c{__FLOAT__} contains the full set of floating-point settings; this
- value can be saved away and invoked later to restore the setting.
- \H{asmdir-warning} \i\c{[WARNING]}: Enable or disable warnings
- The \c{[WARNING]} directive can be used to enable or disable classes
- of warnings in the same way as the \c{-w} option, see \k{opt-w} for
- more details about warning classes.
- \b \c{[warning +}\e{warning-class}\c{]} enables warnings for
- \e{warning-class}.
- \b \c{[warning -}\e{warning-class}\c{]} disables warnings for
- \e{warning-class}.
- \b \c{[warning *}\e{warning-class}\c{]} restores \e{warning-class} to
- the original value, either the default value or as specified on the
- command line.
- The \c{[WARNING]} directive also accepts the \c{all}, \c{error} and
- \c{error=}\e{warning-class} specifiers.
- No "user form" (without the brackets) currently exists.
- \C{outfmt} \i{Output Formats}
- NASM is a portable assembler, designed to be able to compile on any
- ANSI C-supporting platform and produce output to run on a variety of
- Intel x86 operating systems. For this reason, it has a large number
- of available output formats, selected using the \i\c{-f} option on
- the NASM \i{command line}. Each of these formats, along with its
- extensions to the base NASM syntax, is detailed in this chapter.
- As stated in \k{opt-o}, NASM chooses a \i{default name} for your
- output file based on the input file name and the chosen output
- format. This will be generated by removing the \i{extension}
- (\c{.asm}, \c{.s}, or whatever you like to use) from the input file
- name, and substituting an extension defined by the output format.
- The extensions are given with each format below.
- \H{binfmt} \i\c{bin}: \i{Flat-Form Binary}\I{pure binary} Output
- The \c{bin} format does not produce object files: it generates
- nothing in the output file except the code you wrote. Such `pure
- binary' files are used by \i{MS-DOS}: \i\c{.COM} executables and
- \i\c{.SYS} device drivers are pure binary files. Pure binary output
- is also useful for \i{operating system} and \i{boot loader}
- development.
- The \c{bin} format supports \i{multiple section names}. For details of
- how NASM handles sections in the \c{bin} format, see \k{multisec}.
- Using the \c{bin} format puts NASM by default into 16-bit mode (see
- \k{bits}). In order to use \c{bin} to write 32-bit or 64-bit code,
- such as an OS kernel, you need to explicitly issue the \I\c{BITS}\c{BITS 32}
- or \I\c{BITS}\c{BITS 64} directive.
- \c{bin} has no default output file name extension: instead, it
- leaves your file name as it is once the original extension has been
- removed. Thus, the default is for NASM to assemble \c{binprog.asm}
- into a binary file called \c{binprog}.
- \S{org} \i\c{ORG}: Binary File \i{Program Origin}
- The \c{bin} format provides an additional directive to the list
- given in \k{directive}: \c{ORG}. The function of the \c{ORG}
- directive is to specify the origin address which NASM will assume
- the program begins at when it is loaded into memory.
- For example, the following code will generate the longword
- \c{0x00000104}:
- \c org 0x100
- \c dd label
- \c label:
- Unlike the \c{ORG} directive provided by MASM-compatible assemblers,
- which allows you to jump around in the object file and overwrite
- code you have already generated, NASM's \c{ORG} does exactly what
- the directive says: \e{origin}. Its sole function is to specify one
- offset which is added to all internal address references within the
- section; it does not permit any of the trickery that MASM's version
- does. See \k{proborg} for further comments.
- \S{binseg} \c{bin} Extensions to the \c{SECTION}
- Directive\I{SECTION, bin extensions to}
- The \c{bin} output format extends the \c{SECTION} (or \c{SEGMENT})
- directive to allow you to specify the alignment requirements of
- segments. This is done by appending the \i\c{ALIGN} qualifier to the
- end of the section-definition line. For example,
- \c section .data align=16
- switches to the section \c{.data} and also specifies that it must be
- aligned on a 16-byte boundary.
- The parameter to \c{ALIGN} specifies how many low bits of the
- section start address must be forced to zero. The alignment value
- given may be any power of two.\I{section alignment, in
- bin}\I{segment alignment, in bin}\I{alignment, in bin sections}
- \S{multisec} \i{Multisection}\I{bin, multisection} Support for the \c{bin} Format
- The \c{bin} format allows the use of multiple sections, of arbitrary names,
- besides the "known" \c{.text}, \c{.data}, and \c{.bss} names.
- \b Sections may be designated \i\c{progbits} or \i\c{nobits}. Default
- is \c{progbits} (except \c{.bss}, which defaults to \c{nobits},
- of course).
- \b Sections can be aligned at a specified boundary following the previous
- section with \c{align=}, or at an arbitrary byte-granular position with
- \i\c{start=}.
- \b Sections can be given a virtual start address, which will be used
- for the calculation of all memory references within that section
- with \i\c{vstart=}.
- \b Sections can be ordered using \i\c{follows=}\c{<section>} or
- \i\c{vfollows=}\c{<section>} as an alternative to specifying an explicit
- start address.
- \b Arguments to \c{org}, \c{start}, \c{vstart}, and \c{align=} are
- critical expressions. See \k{crit}. E.g. \c{align=(1 << ALIGN_SHIFT)}
- - \c{ALIGN_SHIFT} must be defined before it is used here.
- \b Any code which comes before an explicit \c{SECTION} directive
- is directed by default into the \c{.text} section.
- \b If an \c{ORG} statement is not given, \c{ORG 0} is used
- by default.
- \b The \c{.bss} section will be placed after the last \c{progbits}
- section, unless \c{start=}, \c{vstart=}, \c{follows=}, or \c{vfollows=}
- has been specified.
- \b All sections are aligned on dword boundaries, unless a different
- alignment has been specified.
- \b Sections may not overlap.
- \b NASM creates the \c{section.<secname>.start} for each section,
- which may be used in your code.
- \S{map}\i{Map Files}
- Map files can be generated in \c{-f bin} format by means of the \c{[map]}
- option. Map types of \c{all} (default), \c{brief}, \c{sections}, \c{segments},
- or \c{symbols} may be specified. Output may be directed to \c{stdout}
- (default), \c{stderr}, or a specified file. E.g.
- \c{[map symbols myfile.map]}. No "user form" exists, the square
- brackets must be used.
- \H{ithfmt} \i\c{ith}: \i{Intel Hex} Output
- The \c{ith} file format produces Intel hex-format files. Just as the
- \c{bin} format, this is a flat memory image format with no support for
- relocation or linking. It is usually used with ROM programmers and
- similar utilities.
- All extensions supported by the \c{bin} file format is also supported by
- the \c{ith} file format.
- \c{ith} provides a default output file-name extension of \c{.ith}.
- \H{srecfmt} \i\c{srec}: \i{Motorola S-Records} Output
- The \c{srec} file format produces Motorola S-records files. Just as the
- \c{bin} format, this is a flat memory image format with no support for
- relocation or linking. It is usually used with ROM programmers and
- similar utilities.
- All extensions supported by the \c{bin} file format is also supported by
- the \c{srec} file format.
- \c{srec} provides a default output file-name extension of \c{.srec}.
- \H{objfmt} \i\c{obj}: \i{Microsoft OMF}\I{OMF} Object Files
- The \c{obj} file format (NASM calls it \c{obj} rather than \c{omf}
- for historical reasons) is the one produced by \i{MASM} and
- \i{TASM}, which is typically fed to 16-bit DOS linkers to produce
- \i\c{.EXE} files. It is also the format used by \i{OS/2}.
- \c{obj} provides a default output file-name extension of \c{.obj}.
- \c{obj} is not exclusively a 16-bit format, though: NASM has full
- support for the 32-bit extensions to the format. In particular,
- 32-bit \c{obj} format files are used by \i{Borland's Win32
- compilers}, instead of using Microsoft's newer \i\c{win32} object
- file format.
- The \c{obj} format does not define any special segment names: you
- can call your segments anything you like. Typical names for segments
- in \c{obj} format files are \c{CODE}, \c{DATA} and \c{BSS}.
- If your source file contains code before specifying an explicit
- \c{SEGMENT} directive, then NASM will invent its own segment called
- \i\c{__NASMDEFSEG} for you.
- When you define a segment in an \c{obj} file, NASM defines the
- segment name as a symbol as well, so that you can access the segment
- address of the segment. So, for example:
- \c segment data
- \c
- \c dvar: dw 1234
- \c
- \c segment code
- \c
- \c function:
- \c mov ax,data ; get segment address of data
- \c mov ds,ax ; and move it into DS
- \c inc word [dvar] ; now this reference will work
- \c ret
- The \c{obj} format also enables the use of the \i\c{SEG} and
- \i\c{WRT} operators, so that you can write code which does things
- like
- \c extern foo
- \c
- \c mov ax,seg foo ; get preferred segment of foo
- \c mov ds,ax
- \c mov ax,data ; a different segment
- \c mov es,ax
- \c mov ax,[ds:foo] ; this accesses `foo'
- \c mov [es:foo wrt data],bx ; so does this
- \S{objseg} \c{obj} Extensions to the \c{SEGMENT}
- Directive\I{SEGMENT, obj extensions to}
- The \c{obj} output format extends the \c{SEGMENT} (or \c{SECTION})
- directive to allow you to specify various properties of the segment
- you are defining. This is done by appending extra qualifiers to the
- end of the segment-definition line. For example,
- \c segment code private align=16
- defines the segment \c{code}, but also declares it to be a private
- segment, and requires that the portion of it described in this code
- module must be aligned on a 16-byte boundary.
- The available qualifiers are:
- \b \i\c{PRIVATE}, \i\c{PUBLIC}, \i\c{COMMON} and \i\c{STACK} specify
- the combination characteristics of the segment. \c{PRIVATE} segments
- do not get combined with any others by the linker; \c{PUBLIC} and
- \c{STACK} segments get concatenated together at link time; and
- \c{COMMON} segments all get overlaid on top of each other rather
- than stuck end-to-end.
- \b \i\c{ALIGN} is used, as shown above, to specify how many low bits
- of the segment start address must be forced to zero. The alignment
- value given may be any power of two from 1 to 4096; in reality, the
- only values supported are 1, 2, 4, 16, 256 and 4096, so if 8 is
- specified it will be rounded up to 16, and 32, 64 and 128 will all
- be rounded up to 256, and so on. Note that alignment to 4096-byte
- boundaries is a \i{PharLap} extension to the format and may not be
- supported by all linkers.\I{section alignment, in OBJ}\I{segment
- alignment, in OBJ}\I{alignment, in OBJ sections}
- \b \i\c{CLASS} can be used to specify the segment class; this feature
- indicates to the linker that segments of the same class should be
- placed near each other in the output file. The class name can be any
- word, e.g. \c{CLASS=CODE}.
- \b \i\c{OVERLAY}, like \c{CLASS}, is specified with an arbitrary word
- as an argument, and provides overlay information to an
- overlay-capable linker.
- \b Segments can be declared as \i\c{USE16} or \i\c{USE32}, which has
- the effect of recording the choice in the object file and also
- ensuring that NASM's default assembly mode when assembling in that
- segment is 16-bit or 32-bit respectively.
- \b When writing \i{OS/2} object files, you should declare 32-bit
- segments as \i\c{FLAT}, which causes the default segment base for
- anything in the segment to be the special group \c{FLAT}, and also
- defines the group if it is not already defined.
- \b The \c{obj} file format also allows segments to be declared as
- having a pre-defined absolute segment address, although no linkers
- are currently known to make sensible use of this feature;
- nevertheless, NASM allows you to declare a segment such as
- \c{SEGMENT SCREEN ABSOLUTE=0xB800} if you need to. The \i\c{ABSOLUTE}
- and \c{ALIGN} keywords are mutually exclusive.
- NASM's default segment attributes are \c{PUBLIC}, \c{ALIGN=1}, no
- class, no overlay, and \c{USE16}.
- \S{group} \i\c{GROUP}: Defining Groups of Segments\I{segments, groups of}
- The \c{obj} format also allows segments to be grouped, so that a
- single segment register can be used to refer to all the segments in
- a group. NASM therefore supplies the \c{GROUP} directive, whereby
- you can code
- \c segment data
- \c
- \c ; some data
- \c
- \c segment bss
- \c
- \c ; some uninitialized data
- \c
- \c group dgroup data bss
- which will define a group called \c{dgroup} to contain the segments
- \c{data} and \c{bss}. Like \c{SEGMENT}, \c{GROUP} causes the group
- name to be defined as a symbol, so that you can refer to a variable
- \c{var} in the \c{data} segment as \c{var wrt data} or as \c{var wrt
- dgroup}, depending on which segment value is currently in your
- segment register.
- If you just refer to \c{var}, however, and \c{var} is declared in a
- segment which is part of a group, then NASM will default to giving
- you the offset of \c{var} from the beginning of the \e{group}, not
- the \e{segment}. Therefore \c{SEG var}, also, will return the group
- base rather than the segment base.
- NASM will allow a segment to be part of more than one group, but
- will generate a warning if you do this. Variables declared in a
- segment which is part of more than one group will default to being
- relative to the first group that was defined to contain the segment.
- A group does not have to contain any segments; you can still make
- \c{WRT} references to a group which does not contain the variable
- you are referring to. OS/2, for example, defines the special group
- \c{FLAT} with no segments in it.
- \S{uppercase} \i\c{UPPERCASE}: Disabling Case Sensitivity in Output
- Although NASM itself is \i{case sensitive}, some OMF linkers are
- not; therefore it can be useful for NASM to output single-case
- object files. The \c{UPPERCASE} format-specific directive causes all
- segment, group and symbol names that are written to the object file
- to be forced to upper case just before being written. Within a
- source file, NASM is still case-sensitive; but the object file can
- be written entirely in upper case if desired.
- \c{UPPERCASE} is used alone on a line; it requires no parameters.
- \S{import} \i\c{IMPORT}: Importing DLL Symbols\I{DLL symbols,
- importing}\I{symbols, importing from DLLs}
- The \c{IMPORT} format-specific directive defines a symbol to be
- imported from a DLL, for use if you are writing a DLL's \i{import
- library} in NASM. You still need to declare the symbol as \c{EXTERN}
- as well as using the \c{IMPORT} directive.
- The \c{IMPORT} directive takes two required parameters, separated by
- white space, which are (respectively) the name of the symbol you
- wish to import and the name of the library you wish to import it
- from. For example:
- \c import WSAStartup wsock32.dll
- A third optional parameter gives the name by which the symbol is
- known in the library you are importing it from, in case this is not
- the same as the name you wish the symbol to be known by to your code
- once you have imported it. For example:
- \c import asyncsel wsock32.dll WSAAsyncSelect
- \S{export} \i\c{EXPORT}: Exporting DLL Symbols\I{DLL symbols,
- exporting}\I{symbols, exporting from DLLs}
- The \c{EXPORT} format-specific directive defines a global symbol to
- be exported as a DLL symbol, for use if you are writing a DLL in
- NASM. You still need to declare the symbol as \c{GLOBAL} as well as
- using the \c{EXPORT} directive.
- \c{EXPORT} takes one required parameter, which is the name of the
- symbol you wish to export, as it was defined in your source file. An
- optional second parameter (separated by white space from the first)
- gives the \e{external} name of the symbol: the name by which you
- wish the symbol to be known to programs using the DLL. If this name
- is the same as the internal name, you may leave the second parameter
- off.
- Further parameters can be given to define attributes of the exported
- symbol. These parameters, like the second, are separated by white
- space. If further parameters are given, the external name must also
- be specified, even if it is the same as the internal name. The
- available attributes are:
- \b \c{resident} indicates that the exported name is to be kept
- resident by the system loader. This is an optimisation for
- frequently used symbols imported by name.
- \b \c{nodata} indicates that the exported symbol is a function which
- does not make use of any initialized data.
- \b \c{parm=NNN}, where \c{NNN} is an integer, sets the number of
- parameter words for the case in which the symbol is a call gate
- between 32-bit and 16-bit segments.
- \b An attribute which is just a number indicates that the symbol
- should be exported with an identifying number (ordinal), and gives
- the desired number.
- For example:
- \c export myfunc
- \c export myfunc TheRealMoreFormalLookingFunctionName
- \c export myfunc myfunc 1234 ; export by ordinal
- \c export myfunc myfunc resident parm=23 nodata
- \S{dotdotstart} \i\c{..start}: Defining the \i{Program Entry
- Point}
- \c{OMF} linkers require exactly one of the object files being linked to
- define the program entry point, where execution will begin when the
- program is run. If the object file that defines the entry point is
- assembled using NASM, you specify the entry point by declaring the
- special symbol \c{..start} at the point where you wish execution to
- begin.
- \S{objextern} \c{obj} Extensions to the \c{EXTERN}
- Directive\I{EXTERN, obj extensions to}
- If you declare an external symbol with the directive
- \c extern foo
- then references such as \c{mov ax,foo} will give you the offset of
- \c{foo} from its preferred segment base (as specified in whichever
- module \c{foo} is actually defined in). So to access the contents of
- \c{foo} you will usually need to do something like
- \c mov ax,seg foo ; get preferred segment base
- \c mov es,ax ; move it into ES
- \c mov ax,[es:foo] ; and use offset `foo' from it
- This is a little unwieldy, particularly if you know that an external
- is going to be accessible from a given segment or group, say
- \c{dgroup}. So if \c{DS} already contained \c{dgroup}, you could
- simply code
- \c mov ax,[foo wrt dgroup]
- However, having to type this every time you want to access \c{foo}
- can be a pain; so NASM allows you to declare \c{foo} in the
- alternative form
- \c extern foo:wrt dgroup
- This form causes NASM to pretend that the preferred segment base of
- \c{foo} is in fact \c{dgroup}; so the expression \c{seg foo} will
- now return \c{dgroup}, and the expression \c{foo} is equivalent to
- \c{foo wrt dgroup}.
- This \I{default-WRT mechanism}default-\c{WRT} mechanism can be used
- to make externals appear to be relative to any group or segment in
- your program. It can also be applied to common variables: see
- \k{objcommon}.
- \S{objcommon} \c{obj} Extensions to the \c{COMMON}
- Directive\I{COMMON, obj extensions to}
- The \c{obj} format allows common variables to be either near\I{near
- common variables} or far\I{far common variables}; NASM allows you to
- specify which your variables should be by the use of the syntax
- \c common nearvar 2:near ; `nearvar' is a near common
- \c common farvar 10:far ; and `farvar' is far
- Far common variables may be greater in size than 64Kb, and so the
- OMF specification says that they are declared as a number of
- \e{elements} of a given size. So a 10-byte far common variable could
- be declared as ten one-byte elements, five two-byte elements, two
- five-byte elements or one ten-byte element.
- Some \c{OMF} linkers require the \I{element size, in common
- variables}\I{common variables, element size}element size, as well as
- the variable size, to match when resolving common variables declared
- in more than one module. Therefore NASM must allow you to specify
- the element size on your far common variables. This is done by the
- following syntax:
- \c common c_5by2 10:far 5 ; two five-byte elements
- \c common c_2by5 10:far 2 ; five two-byte elements
- If no element size is specified, the default is 1. Also, the \c{FAR}
- keyword is not required when an element size is specified, since
- only far commons may have element sizes at all. So the above
- declarations could equivalently be
- \c common c_5by2 10:5 ; two five-byte elements
- \c common c_2by5 10:2 ; five two-byte elements
- In addition to these extensions, the \c{COMMON} directive in \c{obj}
- also supports default-\c{WRT} specification like \c{EXTERN} does
- (explained in \k{objextern}). So you can also declare things like
- \c common foo 10:wrt dgroup
- \c common bar 16:far 2:wrt data
- \c common baz 24:wrt data:6
- \S{objdepend} Embedded File Dependency Information
- Since NASM 2.13.02, \c{obj} files contain embedded dependency file
- information. To suppress the generation of dependencies, use
- \c %pragma obj nodepend
- \H{win32fmt} \i\c{win32}: Microsoft Win32 Object Files
- The \c{win32} output format generates Microsoft Win32 object files,
- suitable for passing to Microsoft linkers such as \i{Visual C++}.
- Note that Borland Win32 compilers do not use this format, but use
- \c{obj} instead (see \k{objfmt}).
- \c{win32} provides a default output file-name extension of \c{.obj}.
- Note that although Microsoft say that Win32 object files follow the
- \c{COFF} (Common Object File Format) standard, the object files produced
- by Microsoft Win32 compilers are not compatible with COFF linkers
- such as DJGPP's, and vice versa. This is due to a difference of
- opinion over the precise semantics of PC-relative relocations. To
- produce COFF files suitable for DJGPP, use NASM's \c{coff} output
- format; conversely, the \c{coff} format does not produce object
- files that Win32 linkers can generate correct output from.
- \S{win32sect} \c{win32} Extensions to the \c{SECTION}
- Directive\I{SECTION, win32 extensions to}
- Like the \c{obj} format, \c{win32} allows you to specify additional
- information on the \c{SECTION} directive line, to control the type
- and properties of sections you declare. Section types and properties
- are generated automatically by NASM for the \i{standard section names}
- \c{.text}, \c{.data} and \c{.bss}, but may still be overridden by
- these qualifiers.
- The available qualifiers are:
- \b \c{code}, or equivalently \c{text}, defines the section to be a
- code section. This marks the section as readable and executable, but
- not writable, and also indicates to the linker that the type of the
- section is code.
- \b \c{data} and \c{bss} define the section to be a data section,
- analogously to \c{code}. Data sections are marked as readable and
- writable, but not executable. \c{data} declares an initialized data
- section, whereas \c{bss} declares an uninitialized data section.
- \b \c{rdata} declares an initialized data section that is readable
- but not writable. Microsoft compilers use this section to place
- constants in it.
- \b \c{info} defines the section to be an \i{informational section},
- which is not included in the executable file by the linker, but may
- (for example) pass information \e{to} the linker. For example,
- declaring an \c{info}-type section called \i\c{.drectve} causes the
- linker to interpret the contents of the section as command-line
- options.
- \b \c{align=}, used with a trailing number as in \c{obj}, gives the
- \I{section alignment, in win32}\I{alignment, in win32
- sections}alignment requirements of the section. The maximum you may
- specify is 64: the Win32 object file format contains no means to
- request a greater section alignment than this. If alignment is not
- explicitly specified, the defaults are 16-byte alignment for code
- sections, 8-byte alignment for rdata sections and 4-byte alignment
- for data (and BSS) sections.
- Informational sections get a default alignment of 1 byte (no
- alignment), though the value does not matter.
- The defaults assumed by NASM if you do not specify the above
- qualifiers are:
- \c section .text code align=16
- \c section .data data align=4
- \c section .rdata rdata align=8
- \c section .bss bss align=4
- Any other section name is treated by default like \c{.text}.
- \S{win32safeseh} \c{win32}: Safe Structured Exception Handling
- Among other improvements in Windows XP SP2 and Windows Server 2003
- Microsoft has introduced concept of "safe structured exception
- handling." General idea is to collect handlers' entry points in
- designated read-only table and have alleged entry point verified
- against this table prior exception control is passed to the handler. In
- order for an executable module to be equipped with such "safe exception
- handler table," all object modules on linker command line has to comply
- with certain criteria. If one single module among them does not, then
- the table in question is omitted and above mentioned run-time checks
- will not be performed for application in question. Table omission is by
- default silent and therefore can be easily overlooked. One can instruct
- linker to refuse to produce binary without such table by passing
- \c{/safeseh} command line option.
- Without regard to this run-time check merits it's natural to expect
- NASM to be capable of generating modules suitable for \c{/safeseh}
- linking. From developer's viewpoint the problem is two-fold:
- \b how to adapt modules not deploying exception handlers of their own;
- \b how to adapt/develop modules utilizing custom exception handling;
- Former can be easily achieved with any NASM version by adding following
- line to source code:
- \c $@feat.00 equ 1
- As of version 2.03 NASM adds this absolute symbol automatically. If
- it's not already present to be precise. I.e. if for whatever reason
- developer would choose to assign another value in source file, it would
- still be perfectly possible.
- Registering custom exception handler on the other hand requires certain
- "magic." As of version 2.03 additional directive is implemented,
- \c{safeseh}, which instructs the assembler to produce appropriately
- formatted input data for above mentioned "safe exception handler
- table." Its typical use would be:
- \c section .text
- \c extern _MessageBoxA@16
- \c %if __NASM_VERSION_ID__ >= 0x02030000
- \c safeseh handler ; register handler as "safe handler"
- \c %endif
- \c handler:
- \c push DWORD 1 ; MB_OKCANCEL
- \c push DWORD caption
- \c push DWORD text
- \c push DWORD 0
- \c call _MessageBoxA@16
- \c sub eax,1 ; incidentally suits as return value
- \c ; for exception handler
- \c ret
- \c global _main
- \c _main:
- \c push DWORD handler
- \c push DWORD [fs:0]
- \c mov DWORD [fs:0],esp ; engage exception handler
- \c xor eax,eax
- \c mov eax,DWORD[eax] ; cause exception
- \c pop DWORD [fs:0] ; disengage exception handler
- \c add esp,4
- \c ret
- \c text: db 'OK to rethrow, CANCEL to generate core dump',0
- \c caption:db 'SEGV',0
- \c
- \c section .drectve info
- \c db '/defaultlib:user32.lib /defaultlib:msvcrt.lib '
- As you might imagine, it's perfectly possible to produce .exe binary
- with "safe exception handler table" and yet engage unregistered
- exception handler. Indeed, handler is engaged by simply manipulating
- \c{[fs:0]} location at run-time, something linker has no power over,
- run-time that is. It should be explicitly mentioned that such failure
- to register handler's entry point with \c{safeseh} directive has
- undesired side effect at run-time. If exception is raised and
- unregistered handler is to be executed, the application is abruptly
- terminated without any notification whatsoever. One can argue that
- system could at least have logged some kind "non-safe exception
- handler in x.exe at address n" message in event log, but no, literally
- no notification is provided and user is left with no clue on what
- caused application failure.
- Finally, all mentions of linker in this paragraph refer to Microsoft
- linker version 7.x and later. Presence of \c{@feat.00} symbol and input
- data for "safe exception handler table" causes no backward
- incompatibilities and "safeseh" modules generated by NASM 2.03 and
- later can still be linked by earlier versions or non-Microsoft linkers.
- \S{codeview} Debugging formats for Windows
- \I{Windows debugging formats}
- The \c{win32} and \c{win64} formats support the Microsoft CodeView
- debugging format. Currently CodeView version 8 format is supported
- (\i\c{cv8}), but newer versions of the CodeView debugger should be
- able to handle this format as well.
- \H{win64fmt} \i\c{win64}: Microsoft Win64 Object Files
- The \c{win64} output format generates Microsoft Win64 object files,
- which is nearly 100% identical to the \c{win32} object format (\k{win32fmt})
- with the exception that it is meant to target 64-bit code and the x86-64
- platform altogether. This object file is used exactly the same as the \c{win32}
- object format (\k{win32fmt}), in NASM, with regard to this exception.
- \S{win64pic} \c{win64}: Writing Position-Independent Code
- While \c{REL} takes good care of RIP-relative addressing, there is one
- aspect that is easy to overlook for a Win64 programmer: indirect
- references. Consider a switch dispatch table:
- \c jmp qword [dsptch+rax*8]
- \c ...
- \c dsptch: dq case0
- \c dq case1
- \c ...
- Even a novice Win64 assembler programmer will soon realize that the code
- is not 64-bit savvy. Most notably linker will refuse to link it with
- \c 'ADDR32' relocation to '.text' invalid without /LARGEADDRESSAWARE:NO
- So [s]he will have to split jmp instruction as following:
- \c lea rbx,[rel dsptch]
- \c jmp qword [rbx+rax*8]
- What happens behind the scene is that effective address in \c{lea} is
- encoded relative to instruction pointer, or in perfectly
- position-independent manner. But this is only part of the problem!
- Trouble is that in .dll context \c{caseN} relocations will make their
- way to the final module and might have to be adjusted at .dll load
- time. To be specific when it can't be loaded at preferred address. And
- when this occurs, pages with such relocations will be rendered private
- to current process, which kind of undermines the idea of sharing .dll.
- But no worry, it's trivial to fix:
- \c lea rbx,[rel dsptch]
- \c add rbx,[rbx+rax*8]
- \c jmp rbx
- \c ...
- \c dsptch: dq case0-dsptch
- \c dq case1-dsptch
- \c ...
- NASM version 2.03 and later provides another alternative, \c{wrt
- ..imagebase} operator, which returns offset from base address of the
- current image, be it .exe or .dll module, therefore the name. For those
- acquainted with PE-COFF format base address denotes start of
- \c{IMAGE_DOS_HEADER} structure. Here is how to implement switch with
- these image-relative references:
- \c lea rbx,[rel dsptch]
- \c mov eax,[rbx+rax*4]
- \c sub rbx,dsptch wrt ..imagebase
- \c add rbx,rax
- \c jmp rbx
- \c ...
- \c dsptch: dd case0 wrt ..imagebase
- \c dd case1 wrt ..imagebase
- One can argue that the operator is redundant. Indeed, snippet before
- last works just fine with any NASM version and is not even Windows
- specific... The real reason for implementing \c{wrt ..imagebase} will
- become apparent in next paragraph.
- It should be noted that \c{wrt ..imagebase} is defined as 32-bit
- operand only:
- \c dd label wrt ..imagebase ; ok
- \c dq label wrt ..imagebase ; bad
- \c mov eax,label wrt ..imagebase ; ok
- \c mov rax,label wrt ..imagebase ; bad
- \S{win64seh} \c{win64}: Structured Exception Handling
- Structured exception handing in Win64 is completely different matter
- from Win32. Upon exception program counter value is noted, and
- linker-generated table comprising start and end addresses of all the
- functions [in given executable module] is traversed and compared to the
- saved program counter. Thus so called \c{UNWIND_INFO} structure is
- identified. If it's not found, then offending subroutine is assumed to
- be "leaf" and just mentioned lookup procedure is attempted for its
- caller. In Win64 leaf function is such function that does not call any
- other function \e{nor} modifies any Win64 non-volatile registers,
- including stack pointer. The latter ensures that it's possible to
- identify leaf function's caller by simply pulling the value from the
- top of the stack.
- While majority of subroutines written in assembler are not calling any
- other function, requirement for non-volatile registers' immutability
- leaves developer with not more than 7 registers and no stack frame,
- which is not necessarily what [s]he counted with. Customarily one would
- meet the requirement by saving non-volatile registers on stack and
- restoring them upon return, so what can go wrong? If [and only if] an
- exception is raised at run-time and no \c{UNWIND_INFO} structure is
- associated with such "leaf" function, the stack unwind procedure will
- expect to find caller's return address on the top of stack immediately
- followed by its frame. Given that developer pushed caller's
- non-volatile registers on stack, would the value on top point at some
- code segment or even addressable space? Well, developer can attempt
- copying caller's return address to the top of stack and this would
- actually work in some very specific circumstances. But unless developer
- can guarantee that these circumstances are always met, it's more
- appropriate to assume worst case scenario, i.e. stack unwind procedure
- going berserk. Relevant question is what happens then? Application is
- abruptly terminated without any notification whatsoever. Just like in
- Win32 case, one can argue that system could at least have logged
- "unwind procedure went berserk in x.exe at address n" in event log, but
- no, no trace of failure is left.
- Now, when we understand significance of the \c{UNWIND_INFO} structure,
- let's discuss what's in it and/or how it's processed. First of all it
- is checked for presence of reference to custom language-specific
- exception handler. If there is one, then it's invoked. Depending on the
- return value, execution flow is resumed (exception is said to be
- "handled"), \e{or} rest of \c{UNWIND_INFO} structure is processed as
- following. Beside optional reference to custom handler, it carries
- information about current callee's stack frame and where non-volatile
- registers are saved. Information is detailed enough to be able to
- reconstruct contents of caller's non-volatile registers upon call to
- current callee. And so caller's context is reconstructed, and then
- unwind procedure is repeated, i.e. another \c{UNWIND_INFO} structure is
- associated, this time, with caller's instruction pointer, which is then
- checked for presence of reference to language-specific handler, etc.
- The procedure is recursively repeated till exception is handled. As
- last resort system "handles" it by generating memory core dump and
- terminating the application.
- As for the moment of this writing NASM unfortunately does not
- facilitate generation of above mentioned detailed information about
- stack frame layout. But as of version 2.03 it implements building
- blocks for generating structures involved in stack unwinding. As
- simplest example, here is how to deploy custom exception handler for
- leaf function:
- \c default rel
- \c section .text
- \c extern MessageBoxA
- \c handler:
- \c sub rsp,40
- \c mov rcx,0
- \c lea rdx,[text]
- \c lea r8,[caption]
- \c mov r9,1 ; MB_OKCANCEL
- \c call MessageBoxA
- \c sub eax,1 ; incidentally suits as return value
- \c ; for exception handler
- \c add rsp,40
- \c ret
- \c global main
- \c main:
- \c xor rax,rax
- \c mov rax,QWORD[rax] ; cause exception
- \c ret
- \c main_end:
- \c text: db 'OK to rethrow, CANCEL to generate core dump',0
- \c caption:db 'SEGV',0
- \c
- \c section .pdata rdata align=4
- \c dd main wrt ..imagebase
- \c dd main_end wrt ..imagebase
- \c dd xmain wrt ..imagebase
- \c section .xdata rdata align=8
- \c xmain: db 9,0,0,0
- \c dd handler wrt ..imagebase
- \c section .drectve info
- \c db '/defaultlib:user32.lib /defaultlib:msvcrt.lib '
- What you see in \c{.pdata} section is element of the "table comprising
- start and end addresses of function" along with reference to associated
- \c{UNWIND_INFO} structure. And what you see in \c{.xdata} section is
- \c{UNWIND_INFO} structure describing function with no frame, but with
- designated exception handler. References are \e{required} to be
- image-relative (which is the real reason for implementing \c{wrt
- ..imagebase} operator). It should be noted that \c{rdata align=n}, as
- well as \c{wrt ..imagebase}, are optional in these two segments'
- contexts, i.e. can be omitted. Latter means that \e{all} 32-bit
- references, not only above listed required ones, placed into these two
- segments turn out image-relative. Why is it important to understand?
- Developer is allowed to append handler-specific data to \c{UNWIND_INFO}
- structure, and if [s]he adds a 32-bit reference, then [s]he will have
- to remember to adjust its value to obtain the real pointer.
- As already mentioned, in Win64 terms leaf function is one that does not
- call any other function \e{nor} modifies any non-volatile register,
- including stack pointer. But it's not uncommon that assembler
- programmer plans to utilize every single register and sometimes even
- have variable stack frame. Is there anything one can do with bare
- building blocks? I.e. besides manually composing fully-fledged
- \c{UNWIND_INFO} structure, which would surely be considered
- error-prone? Yes, there is. Recall that exception handler is called
- first, before stack layout is analyzed. As it turned out, it's
- perfectly possible to manipulate current callee's context in custom
- handler in manner that permits further stack unwinding. General idea is
- that handler would not actually "handle" the exception, but instead
- restore callee's context, as it was at its entry point and thus mimic
- leaf function. In other words, handler would simply undertake part of
- unwinding procedure. Consider following example:
- \c function:
- \c mov rax,rsp ; copy rsp to volatile register
- \c push r15 ; save non-volatile registers
- \c push rbx
- \c push rbp
- \c mov r11,rsp ; prepare variable stack frame
- \c sub r11,rcx
- \c and r11,-64
- \c mov QWORD[r11],rax ; check for exceptions
- \c mov rsp,r11 ; allocate stack frame
- \c mov QWORD[rsp],rax ; save original rsp value
- \c magic_point:
- \c ...
- \c mov r11,QWORD[rsp] ; pull original rsp value
- \c mov rbp,QWORD[r11-24]
- \c mov rbx,QWORD[r11-16]
- \c mov r15,QWORD[r11-8]
- \c mov rsp,r11 ; destroy frame
- \c ret
- The keyword is that up to \c{magic_point} original \c{rsp} value
- remains in chosen volatile register and no non-volatile register,
- except for \c{rsp}, is modified. While past \c{magic_point} \c{rsp}
- remains constant till the very end of the \c{function}. In this case
- custom language-specific exception handler would look like this:
- \c EXCEPTION_DISPOSITION handler (EXCEPTION_RECORD *rec,ULONG64 frame,
- \c CONTEXT *context,DISPATCHER_CONTEXT *disp)
- \c { ULONG64 *rsp;
- \c if (context->Rip<(ULONG64)magic_point)
- \c rsp = (ULONG64 *)context->Rax;
- \c else
- \c { rsp = ((ULONG64 **)context->Rsp)[0];
- \c context->Rbp = rsp[-3];
- \c context->Rbx = rsp[-2];
- \c context->R15 = rsp[-1];
- \c }
- \c context->Rsp = (ULONG64)rsp;
- \c
- \c memcpy (disp->ContextRecord,context,sizeof(CONTEXT));
- \c RtlVirtualUnwind(UNW_FLAG_NHANDLER,disp->ImageBase,
- \c dips->ControlPc,disp->FunctionEntry,disp->ContextRecord,
- \c &disp->HandlerData,&disp->EstablisherFrame,NULL);
- \c return ExceptionContinueSearch;
- \c }
- As custom handler mimics leaf function, corresponding \c{UNWIND_INFO}
- structure does not have to contain any information about stack frame
- and its layout.
- \H{cofffmt} \i\c{coff}: \i{Common Object File Format}
- The \c{coff} output type produces \c{COFF} object files suitable for
- linking with the \i{DJGPP} linker.
- \c{coff} provides a default output file-name extension of \c{.o}.
- The \c{coff} format supports the same extensions to the \c{SECTION}
- directive as \c{win32} does, except that the \c{align} qualifier and
- the \c{info} section type are not supported.
- \H{machofmt} \I{Mach-O}\i\c{macho32} and \i\c{macho64}: \i{Mach Object File Format}
- The \c{macho32} and \c{macho64} output formts produces Mach-O
- object files suitable for linking with the \i{MacOS X} linker.
- \i\c{macho} is a synonym for \c{macho32}.
- \c{macho} provides a default output file-name extension of \c{.o}.
- \S{machosect} \c{macho} extensions to the \c{SECTION} Directive
- \I{SECTION, macho extensions to}
- The \c{macho} output format specifies section names in the format
- "\e{segment}\c{,}\e{section}". No spaces are allowed around the
- comma. The following flags can also be specified:
- \b \c{data} - this section contains initialized data items
- \b \c{code} - this section contains code exclusively
- \b \c{mixed} - this section contains both code and data
- \b \c{bss} - this section is uninitialized and filled with zero
- \b \c{zerofill} - same as \c{bss}
- \b \c{no_dead_strip} - inhibit dead code stripping for this section
- \b \c{live_support} - set the live support flag for this section
- \b \c{strip_static_syms} - strip static symbols for this section
- \b \c{debug} - this section contains debugging information
- \b \c{align=}\e{alignment} - specify section alignment
- The default is \c{data}, unless the section name is \c{__text} or
- \c{__bss} in which case the default is \c{text} or \c{bss},
- respectively.
- For compatibility with other Unix platforms, the following standard
- names are also supported:
- \c .text = __TEXT,__text text
- \c .rodata = __DATA,__const data
- \c .data = __DATA,__data data
- \c .bss = __DATA,__bss bss
- If the \c{.rodata} section contains no relocations, it is instead put
- into the \c{__TEXT,__const} section unless this section has already
- been specified explicitly. However, it is probably better to specify
- \c{__TEXT,__const} and \c{__DATA,__const} explicitly as appropriate.
- \S{machotls} \i{Thread Local Storage in Mach-O}\I{TLS}: \c{macho} special
- symbols and \i\c{WRT}
- Mach-O defines the following special symbols that can be used on the
- right-hand side of the \c{WRT} operator:
- \b \c{..tlvp} is used to specify access to thread-local storage.
- \b \c{..gotpcrel} is used to specify references to the Global Offset
- Table. The GOT is supported in the \c{macho64} format only.
- \S{macho-ssvs} \c{macho} specfic directive \i\c{subsections_via_symbols}
- The directive \c{subsections_via_symbols} sets the
- \c{MH_SUBSECTIONS_VIA_SYMBOLS} flag in the Mach-O header, that effectively
- separates a block (or a subsection) based on a symbol. It is often used
- for eliminating dead codes by a linker.
- This directive takes no arguments.
- This is a macro implemented as a \c{%pragma}. It can also be
- specified in its \c{%pragma} form, in which case it will not affect
- non-Mach-O builds of the same source code:
- \c %pragma macho subsections_via_symbols
- \S{macho-ssvs} \c{macho} specfic directive \i\c{no_dead_strip}
- The directive \c{no_dead_strip} sets the Mach-O \c{SH_NO_DEAD_STRIP}
- section flag on the section containing a a specific symbol. This
- directive takes a list of symbols as its arguments.
- This is a macro implemented as a \c{%pragma}. It can also be
- specified in its \c{%pragma} form, in which case it will not affect
- non-Mach-O builds of the same source code:
- \c %pragma macho no_dead_strip symbol...
- \S{macho-pext} \c{macho} specific extensions to the \c{GLOBAL}
- Directive: \i\c{private_extern}
- The directive extension to \c{GLOBAL} marks the symbol with limited
- global scope. For example, you can specify the global symbol with
- this extension:
- \c global foo:private_extern
- \c foo:
- \c ; codes
- Using with static linker will clear the private extern attribute.
- But linker option like \c{-keep_private_externs} can avoid it.
- \H{elffmt} \i\c{elf32}, \i\c{elf64}, \i\c{elfx32}: \I{ELF}\I{linux, elf}\i{Executable and Linkable
- Format} Object Files
- The \c{elf32}, \c{elf64} and \c{elfx32} output formats generate
- \c{ELF32 and ELF64} (Executable and Linkable Format) object files, as
- used by Linux as well as \i{Unix System V}, including \i{Solaris x86},
- \i{UnixWare} and \i{SCO Unix}. \c{elf} provides a default output
- file-name extension of \c{.o}. \c{elf} is a synonym for \c{elf32}.
- The \c{elfx32} format is used for the \i{x32} ABI, which is a 32-bit
- ABI with the CPU in 64-bit mode.
- \S{abisect} ELF specific directive \i\c{osabi}
- The ELF header specifies the application binary interface for the
- target operating system (OSABI). This field can be set by using the
- \c{osabi} directive with the numeric value (0-255) of the target
- system. If this directive is not used, the default value will be "UNIX
- System V ABI" (0) which will work on most systems which support ELF.
- \S{elfsect} \c{elf} extensions to the \c{SECTION} Directive
- \I{SECTION, elf extensions to}
- Like the \c{obj} format, \c{elf} allows you to specify additional
- information on the \c{SECTION} directive line, to control the type
- and properties of sections you declare. Section types and properties
- are generated automatically by NASM for the \i{standard section
- names}, but may still be
- overridden by these qualifiers.
- The available qualifiers are:
- \b \i\c{alloc} defines the section to be one which is loaded into
- memory when the program is run. \i\c{noalloc} defines it to be one
- which is not, such as an informational or comment section.
- \b \i\c{exec} defines the section to be one which should have execute
- permission when the program is run. \i\c{noexec} defines it as one
- which should not.
- \b \i\c{write} defines the section to be one which should be writable
- when the program is run. \i\c{nowrite} defines it as one which should
- not.
- \b \i\c{progbits} defines the section to be one with explicit contents
- stored in the object file: an ordinary code or data section, for
- example, \i\c{nobits} defines the section to be one with no explicit
- contents given, such as a BSS section.
- \b \c{align=}, used with a trailing number as in \c{obj}, gives the
- \I{section alignment, in elf}\I{alignment, in elf sections}alignment
- requirements of the section.
- \b \i\c{tls} defines the section to be one which contains
- thread local variables.
- The defaults assumed by NASM if you do not specify the above
- qualifiers are:
- \I\c{.text} \I\c{.rodata} \I\c{.lrodata} \I\c{.data} \I\c{.ldata}
- \I\c{.bss} \I\c{.lbss} \I\c{.tdata} \I\c{.tbss} \I\c\{.comment}
- \c section .text progbits alloc exec nowrite align=16
- \c section .rodata progbits alloc noexec nowrite align=4
- \c section .lrodata progbits alloc noexec nowrite align=4
- \c section .data progbits alloc noexec write align=4
- \c section .ldata progbits alloc noexec write align=4
- \c section .bss nobits alloc noexec write align=4
- \c section .lbss nobits alloc noexec write align=4
- \c section .tdata progbits alloc noexec write align=4 tls
- \c section .tbss nobits alloc noexec write align=4 tls
- \c section .comment progbits noalloc noexec nowrite align=1
- \c section other progbits alloc noexec nowrite align=1
- (Any section name other than those in the above table
- is treated by default like \c{other} in the above table.
- Please note that section names are case sensitive.)
- \S{elfwrt} \i{Position-Independent Code}\I{PIC}: \c{macho} Special
- Symbols and \i\c{WRT}
- Since \c{ELF} does not support segment-base references, the \c{WRT}
- operator is not used for its normal purpose; therefore NASM's
- \c{elf} output format makes use of \c{WRT} for a different purpose,
- namely the PIC-specific \I{relocations, PIC-specific}relocation
- types.
- \c{elf} defines five special symbols which you can use as the
- right-hand side of the \c{WRT} operator to obtain PIC relocation
- types. They are \i\c{..gotpc}, \i\c{..gotoff}, \i\c{..got},
- \i\c{..plt} and \i\c{..sym}. Their functions are summarized here:
- \b Referring to the symbol marking the global offset table base
- using \c{wrt ..gotpc} will end up giving the distance from the
- beginning of the current section to the global offset table.
- (\i\c{_GLOBAL_OFFSET_TABLE_} is the standard symbol name used to
- refer to the \i{GOT}.) So you would then need to add \i\c{$$} to the
- result to get the real address of the GOT.
- \b Referring to a location in one of your own sections using \c{wrt
- ..gotoff} will give the distance from the beginning of the GOT to
- the specified location, so that adding on the address of the GOT
- would give the real address of the location you wanted.
- \b Referring to an external or global symbol using \c{wrt ..got}
- causes the linker to build an entry \e{in} the GOT containing the
- address of the symbol, and the reference gives the distance from the
- beginning of the GOT to the entry; so you can add on the address of
- the GOT, load from the resulting address, and end up with the
- address of the symbol.
- \b Referring to a procedure name using \c{wrt ..plt} causes the
- linker to build a \i{procedure linkage table} entry for the symbol,
- and the reference gives the address of the \i{PLT} entry. You can
- only use this in contexts which would generate a PC-relative
- relocation normally (i.e. as the destination for \c{CALL} or
- \c{JMP}), since ELF contains no relocation type to refer to PLT
- entries absolutely.
- \b Referring to a symbol name using \c{wrt ..sym} causes NASM to
- write an ordinary relocation, but instead of making the relocation
- relative to the start of the section and then adding on the offset
- to the symbol, it will write a relocation record aimed directly at
- the symbol in question. The distinction is a necessary one due to a
- peculiarity of the dynamic linker.
- A fuller explanation of how to use these relocation types to write
- shared libraries entirely in NASM is given in \k{picdll}.
- \S{elftls} \i{Thread Local Storage in ELF}\I{TLS}: \c{elf} Special
- Symbols and \i\c{WRT}
- \b In ELF32 mode, referring to an external or global symbol using
- \c{wrt ..tlsie} \I\c{..tlsie}
- causes the linker to build an entry \e{in} the GOT containing the
- offset of the symbol within the TLS block, so you can access the value
- of the symbol with code such as:
- \c mov eax,[tid wrt ..tlsie]
- \c mov [gs:eax],ebx
- \b In ELF64 or ELFx32 mode, referring to an external or global symbol using
- \c{wrt ..gottpoff} \I\c{..gottpoff}
- causes the linker to build an entry \e{in} the GOT containing the
- offset of the symbol within the TLS block, so you can access the value
- of the symbol with code such as:
- \c mov rax,[rel tid wrt ..gottpoff]
- \c mov rcx,[fs:rax]
- \S{elfglob} \c{elf} Extensions to the \c{GLOBAL} Directive\I{GLOBAL,
- elf extensions to}\I{GLOBAL, aoutb extensions to}
- \c{ELF} object files can contain more information about a global symbol
- than just its address: they can contain the \I{symbol sizes,
- specifying}\I{size, of symbols}size of the symbol and its \I{symbol
- types, specifying}\I{type, of symbols}type as well. These are not
- merely debugger conveniences, but are actually necessary when the
- program being written is a \i{shared library}. NASM therefore
- supports some extensions to the \c{GLOBAL} directive, allowing you
- to specify these features.
- You can specify whether a global variable is a function or a data
- object by suffixing the name with a colon and the word
- \i\c{function} or \i\c{data}. (\i\c{object} is a synonym for
- \c{data}.) For example:
- \c global hashlookup:function, hashtable:data
- exports the global symbol \c{hashlookup} as a function and
- \c{hashtable} as a data object.
- Optionally, you can control the ELF visibility of the symbol. Just
- add one of the visibility keywords: \i\c{default}, \i\c{internal},
- \i\c{hidden}, or \i\c{protected}. The default is \i\c{default} of
- course. For example, to make \c{hashlookup} hidden:
- \c global hashlookup:function hidden
- You can also specify the size of the data associated with the
- symbol, as a numeric expression (which may involve labels, and even
- forward references) after the type specifier. Like this:
- \c global hashtable:data (hashtable.end - hashtable)
- \c
- \c hashtable:
- \c db this,that,theother ; some data here
- \c .end:
- This makes NASM automatically calculate the length of the table and
- place that information into the \c{ELF} symbol table.
- Declaring the type and size of global symbols is necessary when
- writing shared library code. For more information, see
- \k{picglobal}.
- \S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive
- \I{COMMON, elf extensions to}
- \c{ELF} also allows you to specify alignment requirements \I{common
- variables, alignment in elf}\I{alignment, of elf common variables}on
- common variables. This is done by putting a number (which must be a
- power of two) after the name and size of the common variable,
- separated (as usual) by a colon. For example, an array of
- doublewords would benefit from 4-byte alignment:
- \c common dwordarray 128:4
- This declares the total size of the array to be 128 bytes, and
- requires that it be aligned on a 4-byte boundary.
- \S{elf16} 16-bit code and ELF
- \I{ELF, 16-bit code and}
- The \c{ELF32} specification doesn't provide relocations for 8- and
- 16-bit values, but the GNU \c{ld} linker adds these as an extension.
- NASM can generate GNU-compatible relocations, to allow 16-bit code to
- be linked as ELF using GNU \c{ld}. If NASM is used with the
- \c{-w+gnu-elf-extensions} option, a warning is issued when one of
- these relocations is generated.
- \S{elfdbg} Debug formats and ELF
- \I{ELF, Debug formats and}
- ELF provides debug information in \c{STABS} and \c{DWARF} formats.
- Line number information is generated for all executable sections, but please
- note that only the ".text" section is executable by default.
- \H{aoutfmt} \i\c{aout}: Linux \I{a.out, Linux version}\I{linux, a.out}\c{a.out} Object Files
- The \c{aout} format generates \c{a.out} object files, in the form used
- by early Linux systems (current Linux systems use ELF, see
- \k{elffmt}.) These differ from other \c{a.out} object files in that
- the magic number in the first four bytes of the file is
- different; also, some implementations of \c{a.out}, for example
- NetBSD's, support position-independent code, which Linux's
- implementation does not.
- \c{a.out} provides a default output file-name extension of \c{.o}.
- \c{a.out} is a very simple object format. It supports no special
- directives, no special symbols, no use of \c{SEG} or \c{WRT}, and no
- extensions to any standard directives. It supports only the three
- \i{standard section names} \i\c{.text}, \i\c{.data} and \i\c{.bss}.
- \H{aoutfmt} \i\c{aoutb}: \i{NetBSD}/\i{FreeBSD}/\i{OpenBSD}
- \I{a.out, BSD version}\c{a.out} Object Files
- The \c{aoutb} format generates \c{a.out} object files, in the form
- used by the various free \c{BSD Unix} clones, \c{NetBSD}, \c{FreeBSD}
- and \c{OpenBSD}. For simple object files, this object format is exactly
- the same as \c{aout} except for the magic number in the first four bytes
- of the file. However, the \c{aoutb} format supports
- \I{PIC}\i{position-independent code} in the same way as the \c{elf}
- format, so you can use it to write \c{BSD} \i{shared libraries}.
- \c{aoutb} provides a default output file-name extension of \c{.o}.
- \c{aoutb} supports no special directives, no special symbols, and
- only the three \i{standard section names} \i\c{.text}, \i\c{.data}
- and \i\c{.bss}. However, it also supports the same use of \i\c{WRT} as
- \c{elf} does, to provide position-independent code relocation types.
- See \k{elfwrt} for full documentation of this feature.
- \c{aoutb} also supports the same extensions to the \c{GLOBAL}
- directive as \c{elf} does: see \k{elfglob} for documentation of
- this.
- \H{as86fmt} \c{as86}: \i{Minix}/Linux\I{linux, as86} \i\c{as86} Object Files
- The Minix/Linux 16-bit assembler \c{as86} has its own non-standard
- object file format. Although its companion linker \i\c{ld86} produces
- something close to ordinary \c{a.out} binaries as output, the object
- file format used to communicate between \c{as86} and \c{ld86} is not
- itself \c{a.out}.
- NASM supports this format, just in case it is useful, as \c{as86}.
- \c{as86} provides a default output file-name extension of \c{.o}.
- \c{as86} is a very simple object format (from the NASM user's point
- of view). It supports no special directives, no use of \c{SEG} or \c{WRT},
- and no extensions to any standard directives. It supports only the three
- \i{standard section names} \i\c{.text}, \i\c{.data} and \i\c{.bss}. The
- only special symbol supported is \c{..start}.
- \H{rdffmt} \I{RDOFF}\i\c{rdf}: \i{Relocatable Dynamic Object File
- Format}
- The \c{rdf} output format produces \c{RDOFF} object files. \c{RDOFF}
- (Relocatable Dynamic Object File Format) is a home-grown object-file
- format, designed alongside NASM itself and reflecting in its file
- format the internal structure of the assembler.
- \c{RDOFF} is not used by any well-known operating systems. Those
- writing their own systems, however, may well wish to use \c{RDOFF}
- as their object format, on the grounds that it is designed primarily
- for simplicity and contains very little file-header bureaucracy.
- The Unix NASM archive, and the DOS archive which includes sources,
- both contain an \I{rdoff subdirectory}\c{rdoff} subdirectory holding
- a set of RDOFF utilities: an RDF linker, an \c{RDF} static-library
- manager, an RDF file dump utility, and a program which will load and
- execute an RDF executable under Linux.
- \c{rdf} supports only the \i{standard section names} \i\c{.text},
- \i\c{.data} and \i\c{.bss}.
- \S{rdflib} Requiring a Library: The \i\c{LIBRARY} Directive
- \c{RDOFF} contains a mechanism for an object file to demand a given
- library to be linked to the module, either at load time or run time.
- This is done by the \c{LIBRARY} directive, which takes one argument
- which is the name of the module:
- \c library mylib.rdl
- \S{rdfmod} Specifying a Module Name: The \i\c{MODULE} Directive
- Special \c{RDOFF} header record is used to store the name of the module.
- It can be used, for example, by run-time loader to perform dynamic
- linking. \c{MODULE} directive takes one argument which is the name
- of current module:
- \c module mymodname
- Note that when you statically link modules and tell linker to strip
- the symbols from output file, all module names will be stripped too.
- To avoid it, you should start module names with \I{$, prefix}\c{$}, like:
- \c module $kernel.core
- \S{rdfglob} \c{rdf} Extensions to the \c{GLOBAL} Directive\I{GLOBAL,
- rdf extensions to}
- \c{RDOFF} global symbols can contain additional information needed by
- the static linker. You can mark a global symbol as exported, thus
- telling the linker do not strip it from target executable or library
- file. Like in \c{ELF}, you can also specify whether an exported symbol
- is a procedure (function) or data object.
- Suffixing the name with a colon and the word \i\c{export} you make the
- symbol exported:
- \c global sys_open:export
- To specify that exported symbol is a procedure (function), you add the
- word \i\c{proc} or \i\c{function} after declaration:
- \c global sys_open:export proc
- Similarly, to specify exported data object, add the word \i\c{data}
- or \i\c{object} to the directive:
- \c global kernel_ticks:export data
- \S{rdfimpt} \c{rdf} Extensions to the \c{EXTERN} Directive\I{EXTERN,
- rdf extensions to}
- By default the \c{EXTERN} directive in \c{RDOFF} declares a "pure external"
- symbol (i.e. the static linker will complain if such a symbol is not resolved).
- To declare an "imported" symbol, which must be resolved later during a dynamic
- linking phase, \c{RDOFF} offers an additional \c{import} modifier. As in
- \c{GLOBAL}, you can also specify whether an imported symbol is a procedure
- (function) or data object. For example:
- \c library $libc
- \c extern _open:import
- \c extern _printf:import proc
- \c extern _errno:import data
- Here the directive \c{LIBRARY} is also included, which gives the dynamic linker
- a hint as to where to find requested symbols.
- \H{dbgfmt} \i\c{dbg}: Debugging Format
- The \c{dbg} format does not output an object file as such; instead,
- it outputs a text file which contains a complete list of all the
- transactions between the main body of NASM and the output-format
- back end module. It is primarily intended to aid people who want to
- write their own output drivers, so that they can get a clearer idea
- of the various requests the main program makes of the output driver,
- and in what order they happen.
- For simple files, one can easily use the \c{dbg} format like this:
- \c nasm -f dbg filename.asm
- which will generate a diagnostic file called \c{filename.dbg}.
- However, this will not work well on files which were designed for a
- different object format, because each object format defines its own
- macros (usually user-level forms of directives), and those macros
- will not be defined in the \c{dbg} format. Therefore it can be
- useful to run NASM twice, in order to do the preprocessing with the
- native object format selected:
- \c nasm -e -f rdf -o rdfprog.i rdfprog.asm
- \c nasm -a -f dbg rdfprog.i
- This preprocesses \c{rdfprog.asm} into \c{rdfprog.i}, keeping the
- \c{rdf} object format selected in order to make sure RDF special
- directives are converted into primitive form correctly. Then the
- preprocessed source is fed through the \c{dbg} format to generate
- the final diagnostic output.
- This workaround will still typically not work for programs intended
- for \c{obj} format, because the \c{obj} \c{SEGMENT} and \c{GROUP}
- directives have side effects of defining the segment and group names
- as symbols; \c{dbg} will not do this, so the program will not
- assemble. You will have to work around that by defining the symbols
- yourself (using \c{EXTERN}, for example) if you really need to get a
- \c{dbg} trace of an \c{obj}-specific source file.
- \c{dbg} accepts any section name and any directives at all, and logs
- them all to its output file.
- \c{dbg} accepts and logs any \c{%pragma}, but the specific
- \c{%pragma}:
- \c %pragma dbg maxdump <size>
- where \c{<size>} is either a number or \c{unlimited}, can be used to
- control the maximum size for dumping the full contents of a
- \c{rawdata} output object.
- \C{16bit} Writing 16-bit Code (DOS, Windows 3/3.1)
- This chapter attempts to cover some of the common issues encountered
- when writing 16-bit code to run under \c{MS-DOS} or \c{Windows 3.x}. It
- covers how to link programs to produce \c{.EXE} or \c{.COM} files,
- how to write \c{.SYS} device drivers, and how to interface assembly
- language code with 16-bit C compilers and with Borland Pascal.
- \H{exefiles} Producing \i\c{.EXE} Files
- Any large program written under DOS needs to be built as a \c{.EXE}
- file: only \c{.EXE} files have the necessary internal structure
- required to span more than one 64K segment. \i{Windows} programs,
- also, have to be built as \c{.EXE} files, since Windows does not
- support the \c{.COM} format.
- In general, you generate \c{.EXE} files by using the \c{obj} output
- format to produce one or more \i\c{.OBJ} files, and then linking
- them together using a linker. However, NASM also supports the direct
- generation of simple DOS \c{.EXE} files using the \c{bin} output
- format (by using \c{DB} and \c{DW} to construct the \c{.EXE} file
- header), and a macro package is supplied to do this. Thanks to
- Yann Guidon for contributing the code for this.
- NASM may also support \c{.EXE} natively as another output format in
- future releases.
- \S{objexe} Using the \c{obj} Format To Generate \c{.EXE} Files
- This section describes the usual method of generating \c{.EXE} files
- by linking \c{.OBJ} files together.
- Most 16-bit programming language packages come with a suitable
- linker; if you have none of these, there is a free linker called
- \i{VAL}\I{linker, free}, available in \c{LZH} archive format from
- \W{ftp://x2ftp.oulu.fi/pub/msdos/programming/lang/}\i\c{x2ftp.oulu.fi}.
- An LZH archiver can be found at
- \W{ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers}\i\c{ftp.simtel.net}.
- There is another `free' linker (though this one doesn't come with
- sources) called \i{FREELINK}, available from
- \W{http://www.pcorner.com/tpc/old/3-101.html}\i\c{www.pcorner.com}.
- A third, \i\c{djlink}, written by DJ Delorie, is available at
- \W{http://www.delorie.com/djgpp/16bit/djlink/}\i\c{www.delorie.com}.
- A fourth linker, \i\c{ALINK}, written by Anthony A.J. Williams, is
- available at \W{http://alink.sourceforge.net}\i\c{alink.sourceforge.net}.
- When linking several \c{.OBJ} files into a \c{.EXE} file, you should
- ensure that exactly one of them has a start point defined (using the
- \I{program entry point}\i\c{..start} special symbol defined by the
- \c{obj} format: see \k{dotdotstart}). If no module defines a start
- point, the linker will not know what value to give the entry-point
- field in the output file header; if more than one defines a start
- point, the linker will not know \e{which} value to use.
- An example of a NASM source file which can be assembled to a
- \c{.OBJ} file and linked on its own to a \c{.EXE} is given here. It
- demonstrates the basic principles of defining a stack, initialising
- the segment registers, and declaring a start point. This file is
- also provided in the \I{test subdirectory}\c{test} subdirectory of
- the NASM archives, under the name \c{objexe.asm}.
- \c segment code
- \c
- \c ..start:
- \c mov ax,data
- \c mov ds,ax
- \c mov ax,stack
- \c mov ss,ax
- \c mov sp,stacktop
- This initial piece of code sets up \c{DS} to point to the data
- segment, and initializes \c{SS} and \c{SP} to point to the top of
- the provided stack. Notice that interrupts are implicitly disabled
- for one instruction after a move into \c{SS}, precisely for this
- situation, so that there's no chance of an interrupt occurring
- between the loads of \c{SS} and \c{SP} and not having a stack to
- execute on.
- Note also that the special symbol \c{..start} is defined at the
- beginning of this code, which means that will be the entry point
- into the resulting executable file.
- \c mov dx,hello
- \c mov ah,9
- \c int 0x21
- The above is the main program: load \c{DS:DX} with a pointer to the
- greeting message (\c{hello} is implicitly relative to the segment
- \c{data}, which was loaded into \c{DS} in the setup code, so the
- full pointer is valid), and call the DOS print-string function.
- \c mov ax,0x4c00
- \c int 0x21
- This terminates the program using another DOS system call.
- \c segment data
- \c
- \c hello: db 'hello, world', 13, 10, '$'
- The data segment contains the string we want to display.
- \c segment stack stack
- \c resb 64
- \c stacktop:
- The above code declares a stack segment containing 64 bytes of
- uninitialized stack space, and points \c{stacktop} at the top of it.
- The directive \c{segment stack stack} defines a segment \e{called}
- \c{stack}, and also of \e{type} \c{STACK}. The latter is not
- necessary to the correct running of the program, but linkers are
- likely to issue warnings or errors if your program has no segment of
- type \c{STACK}.
- The above file, when assembled into a \c{.OBJ} file, will link on
- its own to a valid \c{.EXE} file, which when run will print `hello,
- world' and then exit.
- \S{binexe} Using the \c{bin} Format To Generate \c{.EXE} Files
- The \c{.EXE} file format is simple enough that it's possible to
- build a \c{.EXE} file by writing a pure-binary program and sticking
- a 32-byte header on the front. This header is simple enough that it
- can be generated using \c{DB} and \c{DW} commands by NASM itself, so
- that you can use the \c{bin} output format to directly generate
- \c{.EXE} files.
- Included in the NASM archives, in the \I{misc subdirectory}\c{misc}
- subdirectory, is a file \i\c{exebin.mac} of macros. It defines three
- macros: \i\c{EXE_begin}, \i\c{EXE_stack} and \i\c{EXE_end}.
- To produce a \c{.EXE} file using this method, you should start by
- using \c{%include} to load the \c{exebin.mac} macro package into
- your source file. You should then issue the \c{EXE_begin} macro call
- (which takes no arguments) to generate the file header data. Then
- write code as normal for the \c{bin} format - you can use all three
- standard sections \c{.text}, \c{.data} and \c{.bss}. At the end of
- the file you should call the \c{EXE_end} macro (again, no arguments),
- which defines some symbols to mark section sizes, and these symbols
- are referred to in the header code generated by \c{EXE_begin}.
- In this model, the code you end up writing starts at \c{0x100}, just
- like a \c{.COM} file - in fact, if you strip off the 32-byte header
- from the resulting \c{.EXE} file, you will have a valid \c{.COM}
- program. All the segment bases are the same, so you are limited to a
- 64K program, again just like a \c{.COM} file. Note that an \c{ORG}
- directive is issued by the \c{EXE_begin} macro, so you should not
- explicitly issue one of your own.
- You can't directly refer to your segment base value, unfortunately,
- since this would require a relocation in the header, and things
- would get a lot more complicated. So you should get your segment
- base by copying it out of \c{CS} instead.
- On entry to your \c{.EXE} file, \c{SS:SP} are already set up to
- point to the top of a 2Kb stack. You can adjust the default stack
- size of 2Kb by calling the \c{EXE_stack} macro. For example, to
- change the stack size of your program to 64 bytes, you would call
- \c{EXE_stack 64}.
- A sample program which generates a \c{.EXE} file in this way is
- given in the \c{test} subdirectory of the NASM archive, as
- \c{binexe.asm}.
- \H{comfiles} Producing \i\c{.COM} Files
- While large DOS programs must be written as \c{.EXE} files, small
- ones are often better written as \c{.COM} files. \c{.COM} files are
- pure binary, and therefore most easily produced using the \c{bin}
- output format.
- \S{combinfmt} Using the \c{bin} Format To Generate \c{.COM} Files
- \c{.COM} files expect to be loaded at offset \c{100h} into their
- segment (though the segment may change). Execution then begins at
- \I\c{ORG}\c{100h}, i.e. right at the start of the program. So to
- write a \c{.COM} program, you would create a source file looking
- like
- \c org 100h
- \c
- \c section .text
- \c
- \c start:
- \c ; put your code here
- \c
- \c section .data
- \c
- \c ; put data items here
- \c
- \c section .bss
- \c
- \c ; put uninitialized data here
- The \c{bin} format puts the \c{.text} section first in the file, so
- you can declare data or BSS items before beginning to write code if
- you want to and the code will still end up at the front of the file
- where it belongs.
- The BSS (uninitialized data) section does not take up space in the
- \c{.COM} file itself: instead, addresses of BSS items are resolved
- to point at space beyond the end of the file, on the grounds that
- this will be free memory when the program is run. Therefore you
- should not rely on your BSS being initialized to all zeros when you
- run.
- To assemble the above program, you should use a command line like
- \c nasm myprog.asm -fbin -o myprog.com
- The \c{bin} format would produce a file called \c{myprog} if no
- explicit output file name were specified, so you have to override it
- and give the desired file name.
- \S{comobjfmt} Using the \c{obj} Format To Generate \c{.COM} Files
- If you are writing a \c{.COM} program as more than one module, you
- may wish to assemble several \c{.OBJ} files and link them together
- into a \c{.COM} program. You can do this, provided you have a linker
- capable of outputting \c{.COM} files directly (\i{TLINK} does this),
- or alternatively a converter program such as \i\c{EXE2BIN} to
- transform the \c{.EXE} file output from the linker into a \c{.COM}
- file.
- If you do this, you need to take care of several things:
- \b The first object file containing code should start its code
- segment with a line like \c{RESB 100h}. This is to ensure that the
- code begins at offset \c{100h} relative to the beginning of the code
- segment, so that the linker or converter program does not have to
- adjust address references within the file when generating the
- \c{.COM} file. Other assemblers use an \i\c{ORG} directive for this
- purpose, but \c{ORG} in NASM is a format-specific directive to the
- \c{bin} output format, and does not mean the same thing as it does
- in MASM-compatible assemblers.
- \b You don't need to define a stack segment.
- \b All your segments should be in the same group, so that every time
- your code or data references a symbol offset, all offsets are
- relative to the same segment base. This is because, when a \c{.COM}
- file is loaded, all the segment registers contain the same value.
- \H{sysfiles} Producing \i\c{.SYS} Files
- \i{MS-DOS device drivers} - \c{.SYS} files - are pure binary files,
- similar to \c{.COM} files, except that they start at origin zero
- rather than \c{100h}. Therefore, if you are writing a device driver
- using the \c{bin} format, you do not need the \c{ORG} directive,
- since the default origin for \c{bin} is zero. Similarly, if you are
- using \c{obj}, you do not need the \c{RESB 100h} at the start of
- your code segment.
- \c{.SYS} files start with a header structure, containing pointers to
- the various routines inside the driver which do the work. This
- structure should be defined at the start of the code segment, even
- though it is not actually code.
- For more information on the format of \c{.SYS} files, and the data
- which has to go in the header structure, a list of books is given in
- the Frequently Asked Questions list for the newsgroup
- \W{news:comp.os.msdos.programmer}\i\c{comp.os.msdos.programmer}.
- \H{16c} Interfacing to 16-bit C Programs
- This section covers the basics of writing assembly routines that
- call, or are called from, C programs. To do this, you would
- typically write an assembly module as a \c{.OBJ} file, and link it
- with your C modules to produce a \i{mixed-language program}.
- \S{16cunder} External Symbol Names
- \I{C symbol names}\I{underscore, in C symbols}C compilers have the
- convention that the names of all global symbols (functions or data)
- they define are formed by prefixing an underscore to the name as it
- appears in the C program. So, for example, the function a C
- programmer thinks of as \c{printf} appears to an assembly language
- programmer as \c{_printf}. This means that in your assembly
- programs, you can define symbols without a leading underscore, and
- not have to worry about name clashes with C symbols.
- If you find the underscores inconvenient, you can define macros to
- replace the \c{GLOBAL} and \c{EXTERN} directives as follows:
- \c %macro cglobal 1
- \c
- \c global _%1
- \c %define %1 _%1
- \c
- \c %endmacro
- \c
- \c %macro cextern 1
- \c
- \c extern _%1
- \c %define %1 _%1
- \c
- \c %endmacro
- (These forms of the macros only take one argument at a time; a
- \c{%rep} construct could solve this.)
- If you then declare an external like this:
- \c cextern printf
- then the macro will expand it as
- \c extern _printf
- \c %define printf _printf
- Thereafter, you can reference \c{printf} as if it was a symbol, and
- the preprocessor will put the leading underscore on where necessary.
- The \c{cglobal} macro works similarly. You must use \c{cglobal}
- before defining the symbol in question, but you would have had to do
- that anyway if you used \c{GLOBAL}.
- Also see \k{opt-pfix}.
- \S{16cmodels} \i{Memory Models}
- NASM contains no mechanism to support the various C memory models
- directly; you have to keep track yourself of which one you are
- writing for. This means you have to keep track of the following
- things:
- \b In models using a single code segment (tiny, small and compact),
- functions are near. This means that function pointers, when stored
- in data segments or pushed on the stack as function arguments, are
- 16 bits long and contain only an offset field (the \c{CS} register
- never changes its value, and always gives the segment part of the
- full function address), and that functions are called using ordinary
- near \c{CALL} instructions and return using \c{RETN} (which, in
- NASM, is synonymous with \c{RET} anyway). This means both that you
- should write your own routines to return with \c{RETN}, and that you
- should call external C routines with near \c{CALL} instructions.
- \b In models using more than one code segment (medium, large and
- huge), functions are far. This means that function pointers are 32
- bits long (consisting of a 16-bit offset followed by a 16-bit
- segment), and that functions are called using \c{CALL FAR} (or
- \c{CALL seg:offset}) and return using \c{RETF}. Again, you should
- therefore write your own routines to return with \c{RETF} and use
- \c{CALL FAR} to call external routines.
- \b In models using a single data segment (tiny, small and medium),
- data pointers are 16 bits long, containing only an offset field (the
- \c{DS} register doesn't change its value, and always gives the
- segment part of the full data item address).
- \b In models using more than one data segment (compact, large and
- huge), data pointers are 32 bits long, consisting of a 16-bit offset
- followed by a 16-bit segment. You should still be careful not to
- modify \c{DS} in your routines without restoring it afterwards, but
- \c{ES} is free for you to use to access the contents of 32-bit data
- pointers you are passed.
- \b The huge memory model allows single data items to exceed 64K in
- size. In all other memory models, you can access the whole of a data
- item just by doing arithmetic on the offset field of the pointer you
- are given, whether a segment field is present or not; in huge model,
- you have to be more careful of your pointer arithmetic.
- \b In most memory models, there is a \e{default} data segment, whose
- segment address is kept in \c{DS} throughout the program. This data
- segment is typically the same segment as the stack, kept in \c{SS},
- so that functions' local variables (which are stored on the stack)
- and global data items can both be accessed easily without changing
- \c{DS}. Particularly large data items are typically stored in other
- segments. However, some memory models (though not the standard
- ones, usually) allow the assumption that \c{SS} and \c{DS} hold the
- same value to be removed. Be careful about functions' local
- variables in this latter case.
- In models with a single code segment, the segment is called
- \i\c{_TEXT}, so your code segment must also go by this name in order
- to be linked into the same place as the main code segment. In models
- with a single data segment, or with a default data segment, it is
- called \i\c{_DATA}.
- \S{16cfunc} Function Definitions and Function Calls
- \I{functions, C calling convention}The \i{C calling convention} in
- 16-bit programs is as follows. In the following description, the
- words \e{caller} and \e{callee} are used to denote the function
- doing the calling and the function which gets called.
- \b The caller pushes the function's parameters on the stack, one
- after another, in reverse order (right to left, so that the first
- argument specified to the function is pushed last).
- \b The caller then executes a \c{CALL} instruction to pass control
- to the callee. This \c{CALL} is either near or far depending on the
- memory model.
- \b The callee receives control, and typically (although this is not
- actually necessary, in functions which do not need to access their
- parameters) starts by saving the value of \c{SP} in \c{BP} so as to
- be able to use \c{BP} as a base pointer to find its parameters on
- the stack. However, the caller was probably doing this too, so part
- of the calling convention states that \c{BP} must be preserved by
- any C function. Hence the callee, if it is going to set up \c{BP} as
- a \i\e{frame pointer}, must push the previous value first.
- \b The callee may then access its parameters relative to \c{BP}.
- The word at \c{[BP]} holds the previous value of \c{BP} as it was
- pushed; the next word, at \c{[BP+2]}, holds the offset part of the
- return address, pushed implicitly by \c{CALL}. In a small-model
- (near) function, the parameters start after that, at \c{[BP+4]}; in
- a large-model (far) function, the segment part of the return address
- lives at \c{[BP+4]}, and the parameters begin at \c{[BP+6]}. The
- leftmost parameter of the function, since it was pushed last, is
- accessible at this offset from \c{BP}; the others follow, at
- successively greater offsets. Thus, in a function such as \c{printf}
- which takes a variable number of parameters, the pushing of the
- parameters in reverse order means that the function knows where to
- find its first parameter, which tells it the number and type of the
- remaining ones.
- \b The callee may also wish to decrease \c{SP} further, so as to
- allocate space on the stack for local variables, which will then be
- accessible at negative offsets from \c{BP}.
- \b The callee, if it wishes to return a value to the caller, should
- leave the value in \c{AL}, \c{AX} or \c{DX:AX} depending on the size
- of the value. Floating-point results are sometimes (depending on the
- compiler) returned in \c{ST0}.
- \b Once the callee has finished processing, it restores \c{SP} from
- \c{BP} if it had allocated local stack space, then pops the previous
- value of \c{BP}, and returns via \c{RETN} or \c{RETF} depending on
- memory model.
- \b When the caller regains control from the callee, the function
- parameters are still on the stack, so it typically adds an immediate
- constant to \c{SP} to remove them (instead of executing a number of
- slow \c{POP} instructions). Thus, if a function is accidentally
- called with the wrong number of parameters due to a prototype
- mismatch, the stack will still be returned to a sensible state since
- the caller, which \e{knows} how many parameters it pushed, does the
- removing.
- It is instructive to compare this calling convention with that for
- Pascal programs (described in \k{16bpfunc}). Pascal has a simpler
- convention, since no functions have variable numbers of parameters.
- Therefore the callee knows how many parameters it should have been
- passed, and is able to deallocate them from the stack itself by
- passing an immediate argument to the \c{RET} or \c{RETF}
- instruction, so the caller does not have to do it. Also, the
- parameters are pushed in left-to-right order, not right-to-left,
- which means that a compiler can give better guarantees about
- sequence points without performance suffering.
- Thus, you would define a function in C style in the following way.
- The following example is for small model:
- \c global _myfunc
- \c
- \c _myfunc:
- \c push bp
- \c mov bp,sp
- \c sub sp,0x40 ; 64 bytes of local stack space
- \c mov bx,[bp+4] ; first parameter to function
- \c
- \c ; some more code
- \c
- \c mov sp,bp ; undo "sub sp,0x40" above
- \c pop bp
- \c ret
- For a large-model function, you would replace \c{RET} by \c{RETF},
- and look for the first parameter at \c{[BP+6]} instead of
- \c{[BP+4]}. Of course, if one of the parameters is a pointer, then
- the offsets of \e{subsequent} parameters will change depending on
- the memory model as well: far pointers take up four bytes on the
- stack when passed as a parameter, whereas near pointers take up two.
- At the other end of the process, to call a C function from your
- assembly code, you would do something like this:
- \c extern _printf
- \c
- \c ; and then, further down...
- \c
- \c push word [myint] ; one of my integer variables
- \c push word mystring ; pointer into my data segment
- \c call _printf
- \c add sp,byte 4 ; `byte' saves space
- \c
- \c ; then those data items...
- \c
- \c segment _DATA
- \c
- \c myint dw 1234
- \c mystring db 'This number -> %d <- should be 1234',10,0
- This piece of code is the small-model assembly equivalent of the C
- code
- \c int myint = 1234;
- \c printf("This number -> %d <- should be 1234\n", myint);
- In large model, the function-call code might look more like this. In
- this example, it is assumed that \c{DS} already holds the segment
- base of the segment \c{_DATA}. If not, you would have to initialize
- it first.
- \c push word [myint]
- \c push word seg mystring ; Now push the segment, and...
- \c push word mystring ; ... offset of "mystring"
- \c call far _printf
- \c add sp,byte 6
- The integer value still takes up one word on the stack, since large
- model does not affect the size of the \c{int} data type. The first
- argument (pushed last) to \c{printf}, however, is a data pointer,
- and therefore has to contain a segment and offset part. The segment
- should be stored second in memory, and therefore must be pushed
- first. (Of course, \c{PUSH DS} would have been a shorter instruction
- than \c{PUSH WORD SEG mystring}, if \c{DS} was set up as the above
- example assumed.) Then the actual call becomes a far call, since
- functions expect far calls in large model; and \c{SP} has to be
- increased by 6 rather than 4 afterwards to make up for the extra
- word of parameters.
- \S{16cdata} Accessing Data Items
- To get at the contents of C variables, or to declare variables which
- C can access, you need only declare the names as \c{GLOBAL} or
- \c{EXTERN}. (Again, the names require leading underscores, as stated
- in \k{16cunder}.) Thus, a C variable declared as \c{int i} can be
- accessed from assembler as
- \c extern _i
- \c
- \c mov ax,[_i]
- And to declare your own integer variable which C programs can access
- as \c{extern int j}, you do this (making sure you are assembling in
- the \c{_DATA} segment, if necessary):
- \c global _j
- \c
- \c _j dw 0
- To access a C array, you need to know the size of the components of
- the array. For example, \c{int} variables are two bytes long, so if
- a C program declares an array as \c{int a[10]}, you can access
- \c{a[3]} by coding \c{mov ax,[_a+6]}. (The byte offset 6 is obtained
- by multiplying the desired array index, 3, by the size of the array
- element, 2.) The sizes of the C base types in 16-bit compilers are:
- 1 for \c{char}, 2 for \c{short} and \c{int}, 4 for \c{long} and
- \c{float}, and 8 for \c{double}.
- To access a C \i{data structure}, you need to know the offset from
- the base of the structure to the field you are interested in. You
- can either do this by converting the C structure definition into a
- NASM structure definition (using \i\c{STRUC}), or by calculating the
- one offset and using just that.
- To do either of these, you should read your C compiler's manual to
- find out how it organizes data structures. NASM gives no special
- alignment to structure members in its own \c{STRUC} macro, so you
- have to specify alignment yourself if the C compiler generates it.
- Typically, you might find that a structure like
- \c struct {
- \c char c;
- \c int i;
- \c } foo;
- might be four bytes long rather than three, since the \c{int} field
- would be aligned to a two-byte boundary. However, this sort of
- feature tends to be a configurable option in the C compiler, either
- using command-line options or \c{#pragma} lines, so you have to find
- out how your own compiler does it.
- \S{16cmacro} \i\c{c16.mac}: Helper Macros for the 16-bit C Interface
- Included in the NASM archives, in the \I{misc subdirectory}\c{misc}
- directory, is a file \c{c16.mac} of macros. It defines three macros:
- \i\c{proc}, \i\c{arg} and \i\c{endproc}. These are intended to be
- used for C-style procedure definitions, and they automate a lot of
- the work involved in keeping track of the calling convention.
- (An alternative, TASM compatible form of \c{arg} is also now built
- into NASM's preprocessor. See \k{stackrel} for details.)
- An example of an assembly function using the macro set is given
- here:
- \c proc _nearproc
- \c
- \c %$i arg
- \c %$j arg
- \c mov ax,[bp + %$i]
- \c mov bx,[bp + %$j]
- \c add ax,[bx]
- \c
- \c endproc
- This defines \c{_nearproc} to be a procedure taking two arguments,
- the first (\c{i}) an integer and the second (\c{j}) a pointer to an
- integer. It returns \c{i + *j}.
- Note that the \c{arg} macro has an \c{EQU} as the first line of its
- expansion, and since the label before the macro call gets prepended
- to the first line of the expanded macro, the \c{EQU} works, defining
- \c{%$i} to be an offset from \c{BP}. A context-local variable is
- used, local to the context pushed by the \c{proc} macro and popped
- by the \c{endproc} macro, so that the same argument name can be used
- in later procedures. Of course, you don't \e{have} to do that.
- The macro set produces code for near functions (tiny, small and
- compact-model code) by default. You can have it generate far
- functions (medium, large and huge-model code) by means of coding
- \I\c{FARCODE}\c{%define FARCODE}. This changes the kind of return
- instruction generated by \c{endproc}, and also changes the starting
- point for the argument offsets. The macro set contains no intrinsic
- dependency on whether data pointers are far or not.
- \c{arg} can take an optional parameter, giving the size of the
- argument. If no size is given, 2 is assumed, since it is likely that
- many function parameters will be of type \c{int}.
- The large-model equivalent of the above function would look like this:
- \c %define FARCODE
- \c
- \c proc _farproc
- \c
- \c %$i arg
- \c %$j arg 4
- \c mov ax,[bp + %$i]
- \c mov bx,[bp + %$j]
- \c mov es,[bp + %$j + 2]
- \c add ax,[bx]
- \c
- \c endproc
- This makes use of the argument to the \c{arg} macro to define a
- parameter of size 4, because \c{j} is now a far pointer. When we
- load from \c{j}, we must load a segment and an offset.
- \H{16bp} Interfacing to \i{Borland Pascal} Programs
- Interfacing to Borland Pascal programs is similar in concept to
- interfacing to 16-bit C programs. The differences are:
- \b The leading underscore required for interfacing to C programs is
- not required for Pascal.
- \b The memory model is always large: functions are far, data
- pointers are far, and no data item can be more than 64K long.
- (Actually, some functions are near, but only those functions that
- are local to a Pascal unit and never called from outside it. All
- assembly functions that Pascal calls, and all Pascal functions that
- assembly routines are able to call, are far.) However, all static
- data declared in a Pascal program goes into the default data
- segment, which is the one whose segment address will be in \c{DS}
- when control is passed to your assembly code. The only things that
- do not live in the default data segment are local variables (they
- live in the stack segment) and dynamically allocated variables. All
- data \e{pointers}, however, are far.
- \b The function calling convention is different - described below.
- \b Some data types, such as strings, are stored differently.
- \b There are restrictions on the segment names you are allowed to
- use - Borland Pascal will ignore code or data declared in a segment
- it doesn't like the name of. The restrictions are described below.
- \S{16bpfunc} The Pascal Calling Convention
- \I{functions, Pascal calling convention}\I{Pascal calling
- convention}The 16-bit Pascal calling convention is as follows. In
- the following description, the words \e{caller} and \e{callee} are
- used to denote the function doing the calling and the function which
- gets called.
- \b The caller pushes the function's parameters on the stack, one
- after another, in normal order (left to right, so that the first
- argument specified to the function is pushed first).
- \b The caller then executes a far \c{CALL} instruction to pass
- control to the callee.
- \b The callee receives control, and typically (although this is not
- actually necessary, in functions which do not need to access their
- parameters) starts by saving the value of \c{SP} in \c{BP} so as to
- be able to use \c{BP} as a base pointer to find its parameters on
- the stack. However, the caller was probably doing this too, so part
- of the calling convention states that \c{BP} must be preserved by
- any function. Hence the callee, if it is going to set up \c{BP} as a
- \i{frame pointer}, must push the previous value first.
- \b The callee may then access its parameters relative to \c{BP}.
- The word at \c{[BP]} holds the previous value of \c{BP} as it was
- pushed. The next word, at \c{[BP+2]}, holds the offset part of the
- return address, and the next one at \c{[BP+4]} the segment part. The
- parameters begin at \c{[BP+6]}. The rightmost parameter of the
- function, since it was pushed last, is accessible at this offset
- from \c{BP}; the others follow, at successively greater offsets.
- \b The callee may also wish to decrease \c{SP} further, so as to
- allocate space on the stack for local variables, which will then be
- accessible at negative offsets from \c{BP}.
- \b The callee, if it wishes to return a value to the caller, should
- leave the value in \c{AL}, \c{AX} or \c{DX:AX} depending on the size
- of the value. Floating-point results are returned in \c{ST0}.
- Results of type \c{Real} (Borland's own custom floating-point data
- type, not handled directly by the FPU) are returned in \c{DX:BX:AX}.
- To return a result of type \c{String}, the caller pushes a pointer
- to a temporary string before pushing the parameters, and the callee
- places the returned string value at that location. The pointer is
- not a parameter, and should not be removed from the stack by the
- \c{RETF} instruction.
- \b Once the callee has finished processing, it restores \c{SP} from
- \c{BP} if it had allocated local stack space, then pops the previous
- value of \c{BP}, and returns via \c{RETF}. It uses the form of
- \c{RETF} with an immediate parameter, giving the number of bytes
- taken up by the parameters on the stack. This causes the parameters
- to be removed from the stack as a side effect of the return
- instruction.
- \b When the caller regains control from the callee, the function
- parameters have already been removed from the stack, so it needs to
- do nothing further.
- Thus, you would define a function in Pascal style, taking two
- \c{Integer}-type parameters, in the following way:
- \c global myfunc
- \c
- \c myfunc: push bp
- \c mov bp,sp
- \c sub sp,0x40 ; 64 bytes of local stack space
- \c mov bx,[bp+8] ; first parameter to function
- \c mov bx,[bp+6] ; second parameter to function
- \c
- \c ; some more code
- \c
- \c mov sp,bp ; undo "sub sp,0x40" above
- \c pop bp
- \c retf 4 ; total size of params is 4
- At the other end of the process, to call a Pascal function from your
- assembly code, you would do something like this:
- \c extern SomeFunc
- \c
- \c ; and then, further down...
- \c
- \c push word seg mystring ; Now push the segment, and...
- \c push word mystring ; ... offset of "mystring"
- \c push word [myint] ; one of my variables
- \c call far SomeFunc
- This is equivalent to the Pascal code
- \c procedure SomeFunc(String: PChar; Int: Integer);
- \c SomeFunc(@mystring, myint);
- \S{16bpseg} Borland Pascal \I{segment names, Borland Pascal}Segment
- Name Restrictions
- Since Borland Pascal's internal unit file format is completely
- different from \c{OBJ}, it only makes a very sketchy job of actually
- reading and understanding the various information contained in a
- real \c{OBJ} file when it links that in. Therefore an object file
- intended to be linked to a Pascal program must obey a number of
- restrictions:
- \b Procedures and functions must be in a segment whose name is
- either \c{CODE}, \c{CSEG}, or something ending in \c{_TEXT}.
- \b initialized data must be in a segment whose name is either
- \c{CONST} or something ending in \c{_DATA}.
- \b Uninitialized data must be in a segment whose name is either
- \c{DATA}, \c{DSEG}, or something ending in \c{_BSS}.
- \b Any other segments in the object file are completely ignored.
- \c{GROUP} directives and segment attributes are also ignored.
- \S{16bpmacro} Using \i\c{c16.mac} With Pascal Programs
- The \c{c16.mac} macro package, described in \k{16cmacro}, can also
- be used to simplify writing functions to be called from Pascal
- programs, if you code \I\c{PASCAL}\c{%define PASCAL}. This
- definition ensures that functions are far (it implies
- \i\c{FARCODE}), and also causes procedure return instructions to be
- generated with an operand.
- Defining \c{PASCAL} does not change the code which calculates the
- argument offsets; you must declare your function's arguments in
- reverse order. For example:
- \c %define PASCAL
- \c
- \c proc _pascalproc
- \c
- \c %$j arg 4
- \c %$i arg
- \c mov ax,[bp + %$i]
- \c mov bx,[bp + %$j]
- \c mov es,[bp + %$j + 2]
- \c add ax,[bx]
- \c
- \c endproc
- This defines the same routine, conceptually, as the example in
- \k{16cmacro}: it defines a function taking two arguments, an integer
- and a pointer to an integer, which returns the sum of the integer
- and the contents of the pointer. The only difference between this
- code and the large-model C version is that \c{PASCAL} is defined
- instead of \c{FARCODE}, and that the arguments are declared in
- reverse order.
- \C{32bit} Writing 32-bit Code (Unix, Win32, DJGPP)
- This chapter attempts to cover some of the common issues involved
- when writing 32-bit code, to run under \i{Win32} or Unix, or to be
- linked with C code generated by a Unix-style C compiler such as
- \i{DJGPP}. It covers how to write assembly code to interface with
- 32-bit C routines, and how to write position-independent code for
- shared libraries.
- Almost all 32-bit code, and in particular all code running under
- \c{Win32}, \c{DJGPP} or any of the PC Unix variants, runs in \I{flat
- memory model}\e{flat} memory model. This means that the segment registers
- and paging have already been set up to give you the same 32-bit 4Gb
- address space no matter what segment you work relative to, and that
- you should ignore all segment registers completely. When writing
- flat-model application code, you never need to use a segment
- override or modify any segment register, and the code-section
- addresses you pass to \c{CALL} and \c{JMP} live in the same address
- space as the data-section addresses you access your variables by and
- the stack-section addresses you access local variables and procedure
- parameters by. Every address is 32 bits long and contains only an
- offset part.
- \H{32c} Interfacing to 32-bit C Programs
- A lot of the discussion in \k{16c}, about interfacing to 16-bit C
- programs, still applies when working in 32 bits. The absence of
- memory models or segmentation worries simplifies things a lot.
- \S{32cunder} External Symbol Names
- Most 32-bit C compilers share the convention used by 16-bit
- compilers, that the names of all global symbols (functions or data)
- they define are formed by prefixing an underscore to the name as it
- appears in the C program. However, not all of them do: the \c{ELF}
- specification states that C symbols do \e{not} have a leading
- underscore on their assembly-language names.
- The older Linux \c{a.out} C compiler, all \c{Win32} compilers,
- \c{DJGPP}, and \c{NetBSD} and \c{FreeBSD}, all use the leading
- underscore; for these compilers, the macros \c{cextern} and
- \c{cglobal}, as given in \k{16cunder}, will still work. For \c{ELF},
- though, the leading underscore should not be used.
- See also \k{opt-pfix}.
- \S{32cfunc} Function Definitions and Function Calls
- \I{functions, C calling convention}The \i{C calling convention}
- in 32-bit programs is as follows. In the following description,
- the words \e{caller} and \e{callee} are used to denote
- the function doing the calling and the function which gets called.
- \b The caller pushes the function's parameters on the stack, one
- after another, in reverse order (right to left, so that the first
- argument specified to the function is pushed last).
- \b The caller then executes a near \c{CALL} instruction to pass
- control to the callee.
- \b The callee receives control, and typically (although this is not
- actually necessary, in functions which do not need to access their
- parameters) starts by saving the value of \c{ESP} in \c{EBP} so as
- to be able to use \c{EBP} as a base pointer to find its parameters
- on the stack. However, the caller was probably doing this too, so
- part of the calling convention states that \c{EBP} must be preserved
- by any C function. Hence the callee, if it is going to set up
- \c{EBP} as a \i{frame pointer}, must push the previous value first.
- \b The callee may then access its parameters relative to \c{EBP}.
- The doubleword at \c{[EBP]} holds the previous value of \c{EBP} as
- it was pushed; the next doubleword, at \c{[EBP+4]}, holds the return
- address, pushed implicitly by \c{CALL}. The parameters start after
- that, at \c{[EBP+8]}. The leftmost parameter of the function, since
- it was pushed last, is accessible at this offset from \c{EBP}; the
- others follow, at successively greater offsets. Thus, in a function
- such as \c{printf} which takes a variable number of parameters, the
- pushing of the parameters in reverse order means that the function
- knows where to find its first parameter, which tells it the number
- and type of the remaining ones.
- \b The callee may also wish to decrease \c{ESP} further, so as to
- allocate space on the stack for local variables, which will then be
- accessible at negative offsets from \c{EBP}.
- \b The callee, if it wishes to return a value to the caller, should
- leave the value in \c{AL}, \c{AX} or \c{EAX} depending on the size
- of the value. Floating-point results are typically returned in
- \c{ST0}.
- \b Once the callee has finished processing, it restores \c{ESP} from
- \c{EBP} if it had allocated local stack space, then pops the previous
- value of \c{EBP}, and returns via \c{RET} (equivalently, \c{RETN}).
- \b When the caller regains control from the callee, the function
- parameters are still on the stack, so it typically adds an immediate
- constant to \c{ESP} to remove them (instead of executing a number of
- slow \c{POP} instructions). Thus, if a function is accidentally
- called with the wrong number of parameters due to a prototype
- mismatch, the stack will still be returned to a sensible state since
- the caller, which \e{knows} how many parameters it pushed, does the
- removing.
- There is an alternative calling convention used by Win32 programs
- for Windows API calls, and also for functions called \e{by} the
- Windows API such as window procedures: they follow what Microsoft
- calls the \c{__stdcall} convention. This is slightly closer to the
- Pascal convention, in that the callee clears the stack by passing a
- parameter to the \c{RET} instruction. However, the parameters are
- still pushed in right-to-left order.
- Thus, you would define a function in C style in the following way:
- \c global _myfunc
- \c
- \c _myfunc:
- \c push ebp
- \c mov ebp,esp
- \c sub esp,0x40 ; 64 bytes of local stack space
- \c mov ebx,[ebp+8] ; first parameter to function
- \c
- \c ; some more code
- \c
- \c leave ; mov esp,ebp / pop ebp
- \c ret
- At the other end of the process, to call a C function from your
- assembly code, you would do something like this:
- \c extern _printf
- \c
- \c ; and then, further down...
- \c
- \c push dword [myint] ; one of my integer variables
- \c push dword mystring ; pointer into my data segment
- \c call _printf
- \c add esp,byte 8 ; `byte' saves space
- \c
- \c ; then those data items...
- \c
- \c segment _DATA
- \c
- \c myint dd 1234
- \c mystring db 'This number -> %d <- should be 1234',10,0
- This piece of code is the assembly equivalent of the C code
- \c int myint = 1234;
- \c printf("This number -> %d <- should be 1234\n", myint);
- \S{32cdata} Accessing Data Items
- To get at the contents of C variables, or to declare variables which
- C can access, you need only declare the names as \c{GLOBAL} or
- \c{EXTERN}. (Again, the names require leading underscores, as stated
- in \k{32cunder}.) Thus, a C variable declared as \c{int i} can be
- accessed from assembler as
- \c extern _i
- \c mov eax,[_i]
- And to declare your own integer variable which C programs can access
- as \c{extern int j}, you do this (making sure you are assembling in
- the \c{_DATA} segment, if necessary):
- \c global _j
- \c _j dd 0
- To access a C array, you need to know the size of the components of
- the array. For example, \c{int} variables are four bytes long, so if
- a C program declares an array as \c{int a[10]}, you can access
- \c{a[3]} by coding \c{mov ax,[_a+12]}. (The byte offset 12 is obtained
- by multiplying the desired array index, 3, by the size of the array
- element, 4.) The sizes of the C base types in 32-bit compilers are:
- 1 for \c{char}, 2 for \c{short}, 4 for \c{int}, \c{long} and
- \c{float}, and 8 for \c{double}. Pointers, being 32-bit addresses,
- are also 4 bytes long.
- To access a C \i{data structure}, you need to know the offset from
- the base of the structure to the field you are interested in. You
- can either do this by converting the C structure definition into a
- NASM structure definition (using \c{STRUC}), or by calculating the
- one offset and using just that.
- To do either of these, you should read your C compiler's manual to
- find out how it organizes data structures. NASM gives no special
- alignment to structure members in its own \i\c{STRUC} macro, so you
- have to specify alignment yourself if the C compiler generates it.
- Typically, you might find that a structure like
- \c struct {
- \c char c;
- \c int i;
- \c } foo;
- might be eight bytes long rather than five, since the \c{int} field
- would be aligned to a four-byte boundary. However, this sort of
- feature is sometimes a configurable option in the C compiler, either
- using command-line options or \c{#pragma} lines, so you have to find
- out how your own compiler does it.
- \S{32cmacro} \i\c{c32.mac}: Helper Macros for the 32-bit C Interface
- Included in the NASM archives, in the \I{misc directory}\c{misc}
- directory, is a file \c{c32.mac} of macros. It defines three macros:
- \i\c{proc}, \i\c{arg} and \i\c{endproc}. These are intended to be
- used for C-style procedure definitions, and they automate a lot of
- the work involved in keeping track of the calling convention.
- An example of an assembly function using the macro set is given
- here:
- \c proc _proc32
- \c
- \c %$i arg
- \c %$j arg
- \c mov eax,[ebp + %$i]
- \c mov ebx,[ebp + %$j]
- \c add eax,[ebx]
- \c
- \c endproc
- This defines \c{_proc32} to be a procedure taking two arguments, the
- first (\c{i}) an integer and the second (\c{j}) a pointer to an
- integer. It returns \c{i + *j}.
- Note that the \c{arg} macro has an \c{EQU} as the first line of its
- expansion, and since the label before the macro call gets prepended
- to the first line of the expanded macro, the \c{EQU} works, defining
- \c{%$i} to be an offset from \c{BP}. A context-local variable is
- used, local to the context pushed by the \c{proc} macro and popped
- by the \c{endproc} macro, so that the same argument name can be used
- in later procedures. Of course, you don't \e{have} to do that.
- \c{arg} can take an optional parameter, giving the size of the
- argument. If no size is given, 4 is assumed, since it is likely that
- many function parameters will be of type \c{int} or pointers.
- \H{picdll} Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF \i{Shared
- Libraries}
- \c{ELF} replaced the older \c{a.out} object file format under Linux
- because it contains support for \i{position-independent code}
- (\i{PIC}), which makes writing shared libraries much easier. NASM
- supports the \c{ELF} position-independent code features, so you can
- write Linux \c{ELF} shared libraries in NASM.
- \i{NetBSD}, and its close cousins \i{FreeBSD} and \i{OpenBSD}, take
- a different approach by hacking PIC support into the \c{a.out}
- format. NASM supports this as the \i\c{aoutb} output format, so you
- can write \i{BSD} shared libraries in NASM too.
- The operating system loads a PIC shared library by memory-mapping
- the library file at an arbitrarily chosen point in the address space
- of the running process. The contents of the library's code section
- must therefore not depend on where it is loaded in memory.
- Therefore, you cannot get at your variables by writing code like
- this:
- \c mov eax,[myvar] ; WRONG
- Instead, the linker provides an area of memory called the
- \i\e{global offset table}, or \i{GOT}; the GOT is situated at a
- constant distance from your library's code, so if you can find out
- where your library is loaded (which is typically done using a
- \c{CALL} and \c{POP} combination), you can obtain the address of the
- GOT, and you can then load the addresses of your variables out of
- linker-generated entries in the GOT.
- The \e{data} section of a PIC shared library does not have these
- restrictions: since the data section is writable, it has to be
- copied into memory anyway rather than just paged in from the library
- file, so as long as it's being copied it can be relocated too. So
- you can put ordinary types of relocation in the data section without
- too much worry (but see \k{picglobal} for a caveat).
- \S{picgot} Obtaining the Address of the GOT
- Each code module in your shared library should define the GOT as an
- external symbol:
- \c extern _GLOBAL_OFFSET_TABLE_ ; in ELF
- \c extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out
- At the beginning of any function in your shared library which plans
- to access your data or BSS sections, you must first calculate the
- address of the GOT. This is typically done by writing the function
- in this form:
- \c func: push ebp
- \c mov ebp,esp
- \c push ebx
- \c call .get_GOT
- \c .get_GOT:
- \c pop ebx
- \c add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc
- \c
- \c ; the function body comes here
- \c
- \c mov ebx,[ebp-4]
- \c mov esp,ebp
- \c pop ebp
- \c ret
- (For BSD, again, the symbol \c{_GLOBAL_OFFSET_TABLE} requires a
- second leading underscore.)
- The first two lines of this function are simply the standard C
- prologue to set up a stack frame, and the last three lines are
- standard C function epilogue. The third line, and the fourth to last
- line, save and restore the \c{EBX} register, because PIC shared
- libraries use this register to store the address of the GOT.
- The interesting bit is the \c{CALL} instruction and the following
- two lines. The \c{CALL} and \c{POP} combination obtains the address
- of the label \c{.get_GOT}, without having to know in advance where
- the program was loaded (since the \c{CALL} instruction is encoded
- relative to the current position). The \c{ADD} instruction makes use
- of one of the special PIC relocation types: \i{GOTPC relocation}.
- With the \i\c{WRT ..gotpc} qualifier specified, the symbol
- referenced (here \c{_GLOBAL_OFFSET_TABLE_}, the special symbol
- assigned to the GOT) is given as an offset from the beginning of the
- section. (Actually, \c{ELF} encodes it as the offset from the operand
- field of the \c{ADD} instruction, but NASM simplifies this
- deliberately, so you do things the same way for both \c{ELF} and
- \c{BSD}.) So the instruction then \e{adds} the beginning of the section,
- to get the real address of the GOT, and subtracts the value of
- \c{.get_GOT} which it knows is in \c{EBX}. Therefore, by the time
- that instruction has finished, \c{EBX} contains the address of the GOT.
- If you didn't follow that, don't worry: it's never necessary to
- obtain the address of the GOT by any other means, so you can put
- those three instructions into a macro and safely ignore them:
- \c %macro get_GOT 0
- \c
- \c call %%getgot
- \c %%getgot:
- \c pop ebx
- \c add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
- \c
- \c %endmacro
- \S{piclocal} Finding Your Local Data Items
- Having got the GOT, you can then use it to obtain the addresses of
- your data items. Most variables will reside in the sections you have
- declared; they can be accessed using the \I{GOTOFF
- relocation}\c{..gotoff} special \I\c{WRT ..gotoff}\c{WRT} type. The
- way this works is like this:
- \c lea eax,[ebx+myvar wrt ..gotoff]
- The expression \c{myvar wrt ..gotoff} is calculated, when the shared
- library is linked, to be the offset to the local variable \c{myvar}
- from the beginning of the GOT. Therefore, adding it to \c{EBX} as
- above will place the real address of \c{myvar} in \c{EAX}.
- If you declare variables as \c{GLOBAL} without specifying a size for
- them, they are shared between code modules in the library, but do
- not get exported from the library to the program that loaded it.
- They will still be in your ordinary data and BSS sections, so you
- can access them in the same way as local variables, using the above
- \c{..gotoff} mechanism.
- Note that due to a peculiarity of the way BSD \c{a.out} format
- handles this relocation type, there must be at least one non-local
- symbol in the same section as the address you're trying to access.
- \S{picextern} Finding External and Common Data Items
- If your library needs to get at an external variable (external to
- the \e{library}, not just to one of the modules within it), you must
- use the \I{GOT relocations}\I\c{WRT ..got}\c{..got} type to get at
- it. The \c{..got} type, instead of giving you the offset from the
- GOT base to the variable, gives you the offset from the GOT base to
- a GOT \e{entry} containing the address of the variable. The linker
- will set up this GOT entry when it builds the library, and the
- dynamic linker will place the correct address in it at load time. So
- to obtain the address of an external variable \c{extvar} in \c{EAX},
- you would code
- \c mov eax,[ebx+extvar wrt ..got]
- This loads the address of \c{extvar} out of an entry in the GOT. The
- linker, when it builds the shared library, collects together every
- relocation of type \c{..got}, and builds the GOT so as to ensure it
- has every necessary entry present.
- Common variables must also be accessed in this way.
- \S{picglobal} Exporting Symbols to the Library User
- If you want to export symbols to the user of the library, you have
- to declare whether they are functions or data, and if they are data,
- you have to give the size of the data item. This is because the
- dynamic linker has to build \I{PLT}\i{procedure linkage table}
- entries for any exported functions, and also moves exported data
- items away from the library's data section in which they were
- declared.
- So to export a function to users of the library, you must use
- \c global func:function ; declare it as a function
- \c
- \c func: push ebp
- \c
- \c ; etc.
- And to export a data item such as an array, you would have to code
- \c global array:data array.end-array ; give the size too
- \c
- \c array: resd 128
- \c .end:
- Be careful: If you export a variable to the library user, by
- declaring it as \c{GLOBAL} and supplying a size, the variable will
- end up living in the data section of the main program, rather than
- in your library's data section, where you declared it. So you will
- have to access your own global variable with the \c{..got} mechanism
- rather than \c{..gotoff}, as if it were external (which,
- effectively, it has become).
- Equally, if you need to store the address of an exported global in
- one of your data sections, you can't do it by means of the standard
- sort of code:
- \c dataptr: dd global_data_item ; WRONG
- NASM will interpret this code as an ordinary relocation, in which
- \c{global_data_item} is merely an offset from the beginning of the
- \c{.data} section (or whatever); so this reference will end up
- pointing at your data section instead of at the exported global
- which resides elsewhere.
- Instead of the above code, then, you must write
- \c dataptr: dd global_data_item wrt ..sym
- which makes use of the special \c{WRT} type \I\c{WRT ..sym}\c{..sym}
- to instruct NASM to search the symbol table for a particular symbol
- at that address, rather than just relocating by section base.
- Either method will work for functions: referring to one of your
- functions by means of
- \c funcptr: dd my_function
- will give the user the address of the code you wrote, whereas
- \c funcptr: dd my_function wrt ..sym
- will give the address of the procedure linkage table for the
- function, which is where the calling program will \e{believe} the
- function lives. Either address is a valid way to call the function.
- \S{picproc} Calling Procedures Outside the Library
- Calling procedures outside your shared library has to be done by
- means of a \i\e{procedure linkage table}, or \i{PLT}. The PLT is
- placed at a known offset from where the library is loaded, so the
- library code can make calls to the PLT in a position-independent
- way. Within the PLT there is code to jump to offsets contained in
- the GOT, so function calls to other shared libraries or to routines
- in the main program can be transparently passed off to their real
- destinations.
- To call an external routine, you must use another special PIC
- relocation type, \I{PLT relocations}\i\c{WRT ..plt}. This is much
- easier than the GOT-based ones: you simply replace calls such as
- \c{CALL printf} with the PLT-relative version \c{CALL printf WRT
- ..plt}.
- \S{link} Generating the Library File
- Having written some code modules and assembled them to \c{.o} files,
- you then generate your shared library with a command such as
- \c ld -shared -o library.so module1.o module2.o # for ELF
- \c ld -Bshareable -o library.so module1.o module2.o # for BSD
- For ELF, if your shared library is going to reside in system
- directories such as \c{/usr/lib} or \c{/lib}, it is usually worth
- using the \i\c{-soname} flag to the linker, to store the final
- library file name, with a version number, into the library:
- \c ld -shared -soname library.so.1 -o library.so.1.2 *.o
- You would then copy \c{library.so.1.2} into the library directory,
- and create \c{library.so.1} as a symbolic link to it.
- \C{mixsize} Mixing 16 and 32 Bit Code
- This chapter tries to cover some of the issues, largely related to
- unusual forms of addressing and jump instructions, encountered when
- writing operating system code such as protected-mode initialisation
- routines, which require code that operates in mixed segment sizes,
- such as code in a 16-bit segment trying to modify data in a 32-bit
- one, or jumps between different-size segments.
- \H{mixjump} Mixed-Size Jumps\I{jumps, mixed-size}
- \I{operating system, writing}\I{writing operating systems}The most
- common form of \i{mixed-size instruction} is the one used when
- writing a 32-bit OS: having done your setup in 16-bit mode, such as
- loading the kernel, you then have to boot it by switching into
- protected mode and jumping to the 32-bit kernel start address. In a
- fully 32-bit OS, this tends to be the \e{only} mixed-size
- instruction you need, since everything before it can be done in pure
- 16-bit code, and everything after it can be pure 32-bit.
- This jump must specify a 48-bit far address, since the target
- segment is a 32-bit one. However, it must be assembled in a 16-bit
- segment, so just coding, for example,
- \c jmp 0x1234:0x56789ABC ; wrong!
- will not work, since the offset part of the address will be
- truncated to \c{0x9ABC} and the jump will be an ordinary 16-bit far
- one.
- The Linux kernel setup code gets round the inability of \c{as86} to
- generate the required instruction by coding it manually, using
- \c{DB} instructions. NASM can go one better than that, by actually
- generating the right instruction itself. Here's how to do it right:
- \c jmp dword 0x1234:0x56789ABC ; right
- \I\c{JMP DWORD}The \c{DWORD} prefix (strictly speaking, it should
- come \e{after} the colon, since it is declaring the \e{offset} field
- to be a doubleword; but NASM will accept either form, since both are
- unambiguous) forces the offset part to be treated as far, in the
- assumption that you are deliberately writing a jump from a 16-bit
- segment to a 32-bit one.
- You can do the reverse operation, jumping from a 32-bit segment to a
- 16-bit one, by means of the \c{WORD} prefix:
- \c jmp word 0x8765:0x4321 ; 32 to 16 bit
- If the \c{WORD} prefix is specified in 16-bit mode, or the \c{DWORD}
- prefix in 32-bit mode, they will be ignored, since each is
- explicitly forcing NASM into a mode it was in anyway.
- \H{mixaddr} Addressing Between Different-Size Segments\I{addressing,
- mixed-size}\I{mixed-size addressing}
- If your OS is mixed 16 and 32-bit, or if you are writing a DOS
- extender, you are likely to have to deal with some 16-bit segments
- and some 32-bit ones. At some point, you will probably end up
- writing code in a 16-bit segment which has to access data in a
- 32-bit segment, or vice versa.
- If the data you are trying to access in a 32-bit segment lies within
- the first 64K of the segment, you may be able to get away with using
- an ordinary 16-bit addressing operation for the purpose; but sooner
- or later, you will want to do 32-bit addressing from 16-bit mode.
- The easiest way to do this is to make sure you use a register for
- the address, since any effective address containing a 32-bit
- register is forced to be a 32-bit address. So you can do
- \c mov eax,offset_into_32_bit_segment_specified_by_fs
- \c mov dword [fs:eax],0x11223344
- This is fine, but slightly cumbersome (since it wastes an
- instruction and a register) if you already know the precise offset
- you are aiming at. The x86 architecture does allow 32-bit effective
- addresses to specify nothing but a 4-byte offset, so why shouldn't
- NASM be able to generate the best instruction for the purpose?
- It can. As in \k{mixjump}, you need only prefix the address with the
- \c{DWORD} keyword, and it will be forced to be a 32-bit address:
- \c mov dword [fs:dword my_offset],0x11223344
- Also as in \k{mixjump}, NASM is not fussy about whether the
- \c{DWORD} prefix comes before or after the segment override, so
- arguably a nicer-looking way to code the above instruction is
- \c mov dword [dword fs:my_offset],0x11223344
- Don't confuse the \c{DWORD} prefix \e{outside} the square brackets,
- which controls the size of the data stored at the address, with the
- one \c{inside} the square brackets which controls the length of the
- address itself. The two can quite easily be different:
- \c mov word [dword 0x12345678],0x9ABC
- This moves 16 bits of data to an address specified by a 32-bit
- offset.
- You can also specify \c{WORD} or \c{DWORD} prefixes along with the
- \c{FAR} prefix to indirect far jumps or calls. For example:
- \c call dword far [fs:word 0x4321]
- This instruction contains an address specified by a 16-bit offset;
- it loads a 48-bit far pointer from that (16-bit segment and 32-bit
- offset), and calls that address.
- \H{mixother} Other Mixed-Size Instructions
- The other way you might want to access data might be using the
- string instructions (\c{LODSx}, \c{STOSx} and so on) or the
- \c{XLATB} instruction. These instructions, since they take no
- parameters, might seem to have no easy way to make them perform
- 32-bit addressing when assembled in a 16-bit segment.
- This is the purpose of NASM's \i\c{a16}, \i\c{a32} and \i\c{a64} prefixes. If
- you are coding \c{LODSB} in a 16-bit segment but it is supposed to
- be accessing a string in a 32-bit segment, you should load the
- desired address into \c{ESI} and then code
- \c a32 lodsb
- The prefix forces the addressing size to 32 bits, meaning that
- \c{LODSB} loads from \c{[DS:ESI]} instead of \c{[DS:SI]}. To access
- a string in a 16-bit segment when coding in a 32-bit one, the
- corresponding \c{a16} prefix can be used.
- The \c{a16}, \c{a32} and \c{a64} prefixes can be applied to any instruction
- in NASM's instruction table, but most of them can generate all the
- useful forms without them. The prefixes are necessary only for
- instructions with implicit addressing:
- \# \c{CMPSx} (\k{insCMPSB}),
- \# \c{SCASx} (\k{insSCASB}), \c{LODSx} (\k{insLODSB}), \c{STOSx}
- \# (\k{insSTOSB}), \c{MOVSx} (\k{insMOVSB}), \c{INSx} (\k{insINSB}),
- \# \c{OUTSx} (\k{insOUTSB}), and \c{XLATB} (\k{insXLATB}).
- \c{CMPSx}, \c{SCASx}, \c{LODSx}, \c{STOSx}, \c{MOVSx}, \c{INSx},
- \c{OUTSx}, and \c{XLATB}.
- Also, the
- various push and pop instructions (\c{PUSHA} and \c{POPF} as well as
- the more usual \c{PUSH} and \c{POP}) can accept \c{a16}, \c{a32} or \c{a64}
- prefixes to force a particular one of \c{SP}, \c{ESP} or \c{RSP} to be used
- as a stack pointer, in case the stack segment in use is a different
- size from the code segment.
- \c{PUSH} and \c{POP}, when applied to segment registers in 32-bit
- mode, also have the slightly odd behaviour that they push and pop 4
- bytes at a time, of which the top two are ignored and the bottom two
- give the value of the segment register being manipulated. To force
- the 16-bit behaviour of segment-register push and pop instructions,
- you can use the operand-size prefix \i\c{o16}:
- \c o16 push ss
- \c o16 push ds
- This code saves a doubleword of stack space by fitting two segment
- registers into the space which would normally be consumed by pushing
- one.
- (You can also use the \i\c{o32} prefix to force the 32-bit behaviour
- when in 16-bit mode, but this seems less useful.)
- \C{64bit} Writing 64-bit Code (Unix, Win64)
- This chapter attempts to cover some of the common issues involved when
- writing 64-bit code, to run under \i{Win64} or Unix. It covers how to
- write assembly code to interface with 64-bit C routines, and how to
- write position-independent code for shared libraries.
- All 64-bit code uses a flat memory model, since segmentation is not
- available in 64-bit mode. The one exception is the \c{FS} and \c{GS}
- registers, which still add their bases.
- Position independence in 64-bit mode is significantly simpler, since
- the processor supports \c{RIP}-relative addressing directly; see the
- \c{REL} keyword (\k{effaddr}). On most 64-bit platforms, it is
- probably desirable to make that the default, using the directive
- \c{DEFAULT REL} (\k{default}).
- 64-bit programming is relatively similar to 32-bit programming, but
- of course pointers are 64 bits long; additionally, all existing
- platforms pass arguments in registers rather than on the stack.
- Furthermore, 64-bit platforms use SSE2 by default for floating point.
- Please see the ABI documentation for your platform.
- 64-bit platforms differ in the sizes of the C/C++ fundamental
- datatypes, not just from 32-bit platforms but from each other. If a
- specific size data type is desired, it is probably best to use the
- types defined in the standard C header \c{<inttypes.h>}.
- All known 64-bit platforms except some embedded platforms require that
- the stack is 16-byte aligned at the entry to a function. In order to
- enforce that, the stack pointer (\c{RSP}) needs to be aligned on an
- \c{odd} multiple of 8 bytes before the \c{CALL} instruction.
- In 64-bit mode, the default instruction size is still 32 bits. When
- loading a value into a 32-bit register (but not an 8- or 16-bit
- register), the upper 32 bits of the corresponding 64-bit register are
- set to zero.
- \H{reg64} Register Names in 64-bit Mode
- NASM uses the following names for general-purpose registers in 64-bit
- mode, for 8-, 16-, 32- and 64-bit references, respectively:
- \c AL/AH, CL/CH, DL/DH, BL/BH, SPL, BPL, SIL, DIL, R8B-R15B
- \c AX, CX, DX, BX, SP, BP, SI, DI, R8W-R15W
- \c EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, R8D-R15D
- \c RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8-R15
- This is consistent with the AMD documentation and most other
- assemblers. The Intel documentation, however, uses the names
- \c{R8L-R15L} for 8-bit references to the higher registers. It is
- possible to use those names by definiting them as macros; similarly,
- if one wants to use numeric names for the low 8 registers, define them
- as macros. The standard macro package \c{altreg} (see \k{pkg_altreg})
- can be used for this purpose.
- \H{id64} Immediates and Displacements in 64-bit Mode
- In 64-bit mode, immediates and displacements are generally only 32
- bits wide. NASM will therefore truncate most displacements and
- immediates to 32 bits.
- The only instruction which takes a full \i{64-bit immediate} is:
- \c MOV reg64,imm64
- NASM will produce this instruction whenever the programmer uses
- \c{MOV} with an immediate into a 64-bit register. If this is not
- desirable, simply specify the equivalent 32-bit register, which will
- be automatically zero-extended by the processor, or specify the
- immediate as \c{DWORD}:
- \c mov rax,foo ; 64-bit immediate
- \c mov rax,qword foo ; (identical)
- \c mov eax,foo ; 32-bit immediate, zero-extended
- \c mov rax,dword foo ; 32-bit immediate, sign-extended
- The length of these instructions are 10, 5 and 7 bytes, respectively.
- If optimization is enabled and NASM can determine at assembly time
- that a shorter instruction will suffice, the shorter instruction will
- be emitted unless of course \c{STRICT QWORD} or \c{STRICT DWORD} is
- specified (see \k{strict}):
- \c mov rax,1 ; Assembles as "mov eax,1" (5 bytes)
- \c mov rax,strict qword 1 ; Full 10-byte instruction
- \c mov rax,strict dword 1 ; 7-byte instruction
- \c mov rax,symbol ; 10 bytes, not known at assembly time
- \c lea rax,[rel symbol] ; 7 bytes, usually preferred by the ABI
- Note that \c{lea rax,[rel symbol]} is position-independent, whereas
- \c{mov rax,symbol} is not. Most ABIs prefer or even require
- position-independent code in 64-bit mode. However, the \c{MOV}
- instruction is able to reference a symbol anywhere in the 64-bit
- address space, whereas \c{LEA} is only able to access a symbol within
- within 2 GB of the instruction itself (see below.)
- The only instructions which take a full \I{64-bit displacement}64-bit
- \e{displacement} is loading or storing, using \c{MOV}, \c{AL}, \c{AX},
- \c{EAX} or \c{RAX} (but no other registers) to an absolute 64-bit address.
- Since this is a relatively rarely used instruction (64-bit code generally uses
- relative addressing), the programmer has to explicitly declare the
- displacement size as \c{ABS QWORD}:
- \c default abs
- \c
- \c mov eax,[foo] ; 32-bit absolute disp, sign-extended
- \c mov eax,[a32 foo] ; 32-bit absolute disp, zero-extended
- \c mov eax,[qword foo] ; 64-bit absolute disp
- \c
- \c default rel
- \c
- \c mov eax,[foo] ; 32-bit relative disp
- \c mov eax,[a32 foo] ; d:o, address truncated to 32 bits(!)
- \c mov eax,[qword foo] ; error
- \c mov eax,[abs qword foo] ; 64-bit absolute disp
- A sign-extended absolute displacement can access from -2 GB to +2 GB;
- a zero-extended absolute displacement can access from 0 to 4 GB.
- \H{unix64} Interfacing to 64-bit C Programs (Unix)
- On Unix, the 64-bit ABI as well as the x32 ABI (32-bit ABI with the
- CPU in 64-bit mode) is defined by the documents at:
- \W{http://www.nasm.us/abi/unix64}\c{http://www.nasm.us/abi/unix64}
- Although written for AT&T-syntax assembly, the concepts apply equally
- well for NASM-style assembly. What follows is a simplified summary.
- The first six integer arguments (from the left) are passed in \c{RDI},
- \c{RSI}, \c{RDX}, \c{RCX}, \c{R8}, and \c{R9}, in that order.
- Additional integer arguments are passed on the stack. These
- registers, plus \c{RAX}, \c{R10} and \c{R11} are destroyed by function
- calls, and thus are available for use by the function without saving.
- Integer return values are passed in \c{RAX} and \c{RDX}, in that order.
- Floating point is done using SSE registers, except for \c{long
- double}, which is 80 bits (\c{TWORD}) on most platforms (Android is
- one exception; there \c{long double} is 64 bits and treated the same
- as \c{double}.) Floating-point arguments are passed in \c{XMM0} to
- \c{XMM7}; return is \c{XMM0} and \c{XMM1}. \c{long double} are passed
- on the stack, and returned in \c{ST0} and \c{ST1}.
- All SSE and x87 registers are destroyed by function calls.
- On 64-bit Unix, \c{long} is 64 bits.
- Integer and SSE register arguments are counted separately, so for the case of
- \c void foo(long a, double b, int c)
- \c{a} is passed in \c{RDI}, \c{b} in \c{XMM0}, and \c{c} in \c{ESI}.
- \H{win64} Interfacing to 64-bit C Programs (Win64)
- The Win64 ABI is described by the document at:
- \W{http://www.nasm.us/abi/win64}\c{http://www.nasm.us/abi/win64}
- What follows is a simplified summary.
- The first four integer arguments are passed in \c{RCX}, \c{RDX},
- \c{R8} and \c{R9}, in that order. Additional integer arguments are
- passed on the stack. These registers, plus \c{RAX}, \c{R10} and
- \c{R11} are destroyed by function calls, and thus are available for
- use by the function without saving.
- Integer return values are passed in \c{RAX} only.
- Floating point is done using SSE registers, except for \c{long
- double}. Floating-point arguments are passed in \c{XMM0} to \c{XMM3};
- return is \c{XMM0} only.
- On Win64, \c{long} is 32 bits; \c{long long} or \c{_int64} is 64 bits.
- Integer and SSE register arguments are counted together, so for the case of
- \c void foo(long long a, double b, int c)
- \c{a} is passed in \c{RCX}, \c{b} in \c{XMM1}, and \c{c} in \c{R8D}.
- \C{trouble} Troubleshooting
- This chapter describes some of the common problems that users have
- been known to encounter with NASM, and answers them. If you think you
- have found a bug in NASM, please see \k{bugs}.
- \H{problems} Common Problems
- \S{inefficient} NASM Generates \i{Inefficient Code}
- We sometimes get `bug' reports about NASM generating inefficient, or
- even `wrong', code on instructions such as \c{ADD ESP,8}. This is a
- deliberate design feature, connected to predictability of output:
- NASM, on seeing \c{ADD ESP,8}, will generate the form of the
- instruction which leaves room for a 32-bit offset. You need to code
- \I\c{BYTE}\c{ADD ESP,BYTE 8} if you want the space-efficient form of
- the instruction. This isn't a bug, it's user error: if you prefer to
- have NASM produce the more efficient code automatically enable
- optimization with the \c{-O} option (see \k{opt-O}).
- \S{jmprange} My Jumps are Out of Range\I{out of range, jumps}
- Similarly, people complain that when they issue \i{conditional
- jumps} (which are \c{SHORT} by default) that try to jump too far,
- NASM reports `short jump out of range' instead of making the jumps
- longer.
- This, again, is partly a predictability issue, but in fact has a
- more practical reason as well. NASM has no means of being told what
- type of processor the code it is generating will be run on; so it
- cannot decide for itself that it should generate \i\c{Jcc NEAR} type
- instructions, because it doesn't know that it's working for a 386 or
- above. Alternatively, it could replace the out-of-range short
- \c{JNE} instruction with a very short \c{JE} instruction that jumps
- over a \c{JMP NEAR}; this is a sensible solution for processors
- below a 386, but hardly efficient on processors which have good
- branch prediction \e{and} could have used \c{JNE NEAR} instead. So,
- once again, it's up to the user, not the assembler, to decide what
- instructions should be generated. See \k{opt-O}.
- \S{proborg} \i\c{ORG} Doesn't Work
- People writing \i{boot sector} programs in the \c{bin} format often
- complain that \c{ORG} doesn't work the way they'd like: in order to
- place the \c{0xAA55} signature word at the end of a 512-byte boot
- sector, people who are used to MASM tend to code
- \c ORG 0
- \c
- \c ; some boot sector code
- \c
- \c ORG 510
- \c DW 0xAA55
- This is not the intended use of the \c{ORG} directive in NASM, and
- will not work. The correct way to solve this problem in NASM is to
- use the \i\c{TIMES} directive, like this:
- \c ORG 0
- \c
- \c ; some boot sector code
- \c
- \c TIMES 510-($-$$) DB 0
- \c DW 0xAA55
- The \c{TIMES} directive will insert exactly enough zero bytes into
- the output to move the assembly point up to 510. This method also
- has the advantage that if you accidentally fill your boot sector too
- full, NASM will catch the problem at assembly time and report it, so
- you won't end up with a boot sector that you have to disassemble to
- find out what's wrong with it.
- \S{probtimes} \i\c{TIMES} Doesn't Work
- The other common problem with the above code is people who write the
- \c{TIMES} line as
- \c TIMES 510-$ DB 0
- by reasoning that \c{$} should be a pure number, just like 510, so
- the difference between them is also a pure number and can happily be
- fed to \c{TIMES}.
- NASM is a \e{modular} assembler: the various component parts are
- designed to be easily separable for re-use, so they don't exchange
- information unnecessarily. In consequence, the \c{bin} output
- format, even though it has been told by the \c{ORG} directive that
- the \c{.text} section should start at 0, does not pass that
- information back to the expression evaluator. So from the
- evaluator's point of view, \c{$} isn't a pure number: it's an offset
- from a section base. Therefore the difference between \c{$} and 510
- is also not a pure number, but involves a section base. Values
- involving section bases cannot be passed as arguments to \c{TIMES}.
- The solution, as in the previous section, is to code the \c{TIMES}
- line in the form
- \c TIMES 510-($-$$) DB 0
- in which \c{$} and \c{$$} are offsets from the same section base,
- and so their difference is a pure number. This will solve the
- problem and generate sensible code.
- \A{ndisasm} \i{Ndisasm}
- The Netwide Disassembler, NDISASM
- \H{ndisintro} Introduction
- The Netwide Disassembler is a small companion program to the Netwide
- Assembler, NASM. It seemed a shame to have an x86 assembler,
- complete with a full instruction table, and not make as much use of
- it as possible, so here's a disassembler which shares the
- instruction table (and some other bits of code) with NASM.
- The Netwide Disassembler does nothing except to produce
- disassemblies of \e{binary} source files. NDISASM does not have any
- understanding of object file formats, like \c{objdump}, and it will
- not understand \c{DOS .EXE} files like \c{debug} will. It just
- disassembles.
- \H{ndisrun} Running NDISASM
- To disassemble a file, you will typically use a command of the form
- \c ndisasm -b {16|32|64} filename
- NDISASM can disassemble 16-, 32- or 64-bit code equally easily,
- provided of course that you remember to specify which it is to work
- with. If no \i\c{-b} switch is present, NDISASM works in 16-bit mode
- by default. The \i\c{-u} switch (for USE32) also invokes 32-bit mode.
- Two more command line options are \i\c{-r} which reports the version
- number of NDISASM you are running, and \i\c{-h} which gives a short
- summary of command line options.
- \S{ndiscom} COM Files: Specifying an Origin
- To disassemble a \c{DOS .COM} file correctly, a disassembler must assume
- that the first instruction in the file is loaded at address \c{0x100},
- rather than at zero. NDISASM, which assumes by default that any file
- you give it is loaded at zero, will therefore need to be informed of
- this.
- The \i\c{-o} option allows you to declare a different origin for the
- file you are disassembling. Its argument may be expressed in any of
- the NASM numeric formats: decimal by default, if it begins with `\c{$}'
- or `\c{0x}' or ends in `\c{H}' it's \c{hex}, if it ends in `\c{Q}' it's
- \c{octal}, and if it ends in `\c{B}' it's \c{binary}.
- Hence, to disassemble a \c{.COM} file:
- \c ndisasm -o100h filename.com
- will do the trick.
- \S{ndissync} Code Following Data: Synchronisation
- Suppose you are disassembling a file which contains some data which
- isn't machine code, and \e{then} contains some machine code. NDISASM
- will faithfully plough through the data section, producing machine
- instructions wherever it can (although most of them will look
- bizarre, and some may have unusual prefixes, e.g. `\c{FS OR AX,0x240A}'),
- and generating `DB' instructions ever so often if it's totally stumped.
- Then it will reach the code section.
- Supposing NDISASM has just finished generating a strange machine
- instruction from part of the data section, and its file position is
- now one byte \e{before} the beginning of the code section. It's
- entirely possible that another spurious instruction will get
- generated, starting with the final byte of the data section, and
- then the correct first instruction in the code section will not be
- seen because the starting point skipped over it. This isn't really
- ideal.
- To avoid this, you can specify a `\i\c{synchronisation}' point, or indeed
- as many synchronisation points as you like (although NDISASM can
- only handle 2147483647 sync points internally). The definition of a sync
- point is this: NDISASM guarantees to hit sync points exactly during
- disassembly. If it is thinking about generating an instruction which
- would cause it to jump over a sync point, it will discard that
- instruction and output a `\c{db}' instead. So it \e{will} start
- disassembly exactly from the sync point, and so you \e{will} see all
- the instructions in your code section.
- Sync points are specified using the \i\c{-s} option: they are measured
- in terms of the program origin, not the file position. So if you
- want to synchronize after 32 bytes of a \c{.COM} file, you would have to
- do
- \c ndisasm -o100h -s120h file.com
- rather than
- \c ndisasm -o100h -s20h file.com
- As stated above, you can specify multiple sync markers if you need
- to, just by repeating the \c{-s} option.
- \S{ndisisync} Mixed Code and Data: Automatic (Intelligent) Synchronisation
- \I\c{auto-sync}
- Suppose you are disassembling the boot sector of a \c{DOS} floppy (maybe
- it has a virus, and you need to understand the virus so that you
- know what kinds of damage it might have done you). Typically, this
- will contain a \c{JMP} instruction, then some data, then the rest of the
- code. So there is a very good chance of NDISASM being \e{misaligned}
- when the data ends and the code begins. Hence a sync point is
- needed.
- On the other hand, why should you have to specify the sync point
- manually? What you'd do in order to find where the sync point would
- be, surely, would be to read the \c{JMP} instruction, and then to use
- its target address as a sync point. So can NDISASM do that for you?
- The answer, of course, is yes: using either of the synonymous
- switches \i\c{-a} (for automatic sync) or \i\c{-i} (for intelligent
- sync) will enable \c{auto-sync} mode. Auto-sync mode automatically
- generates a sync point for any forward-referring PC-relative jump or
- call instruction that NDISASM encounters. (Since NDISASM is one-pass,
- if it encounters a PC-relative jump whose target has already been
- processed, there isn't much it can do about it...)
- Only PC-relative jumps are processed, since an absolute jump is
- either through a register (in which case NDISASM doesn't know what
- the register contains) or involves a segment address (in which case
- the target code isn't in the same segment that NDISASM is working
- in, and so the sync point can't be placed anywhere useful).
- For some kinds of file, this mechanism will automatically put sync
- points in all the right places, and save you from having to place
- any sync points manually. However, it should be stressed that
- auto-sync mode is \e{not} guaranteed to catch all the sync points, and
- you may still have to place some manually.
- Auto-sync mode doesn't prevent you from declaring manual sync
- points: it just adds automatically generated ones to the ones you
- provide. It's perfectly feasible to specify \c{-i} \e{and} some \c{-s}
- options.
- Another caveat with auto-sync mode is that if, by some unpleasant
- fluke, something in your data section should disassemble to a
- PC-relative call or jump instruction, NDISASM may obediently place a
- sync point in a totally random place, for example in the middle of
- one of the instructions in your code section. So you may end up with
- a wrong disassembly even if you use auto-sync. Again, there isn't
- much I can do about this. If you have problems, you'll have to use
- manual sync points, or use the \c{-k} option (documented below) to
- suppress disassembly of the data area.
- \S{ndisother} Other Options
- The \i\c{-e} option skips a header on the file, by ignoring the first N
- bytes. This means that the header is \e{not} counted towards the
- disassembly offset: if you give \c{-e10 -o10}, disassembly will start
- at byte 10 in the file, and this will be given offset 10, not 20.
- The \i\c{-k} option is provided with two comma-separated numeric
- arguments, the first of which is an assembly offset and the second
- is a number of bytes to skip. This \e{will} count the skipped bytes
- towards the assembly offset: its use is to suppress disassembly of a
- data section which wouldn't contain anything you wanted to see
- anyway.
- \A{inslist} \i{Instruction List}
- \H{inslistintro} Introduction
- The following sections show the instructions which NASM currently supports. For each
- instruction, there is a separate entry for each supported addressing mode. The third
- column shows the processor type in which the instruction was introduced and,
- when appropriate, one or more usage flags.
- \& inslist.src
- \A{changelog} \i{NASM Version History}
- \& changes.src
- \A{source} Building NASM from Source
- The source code for NASM is available from our website,
- \W{http://www.nasm.us/}{http://wwww.nasm.us/}, see \k{website}.
- \H{tarball} Building from a Source Archive
- The source archives available on the web site should be capable of
- building on a number of platforms. This is the recommended method for
- building NASM to support platforms for which executables are not
- available.
- On a system which has Unix shell (\c{sh}), run:
- \c sh configure
- \c make everything
- A number of options can be passed to \c{configure}; see
- \c{sh configure --help}.
- A set of Makefiles for some other environments are also available;
- please see the file \c{Mkfiles/README}.
- To build the installer for the Windows platform, you will need the
- \i\e{Nullsoft Scriptable Installer}, \i{NSIS}, installed.
- To build the documentation, you will need a set of additional tools.
- The documentation is not likely to be able to build on non-Unix
- systems.
- \H{git} Building from the \i\c{git} Repository
- The NASM development tree is kept in a source code repository using
- the \c{git} distributed source control system. The link is available
- on the website. This is recommended only to participate in the
- development of NASM or to assist with testing the development code.
- To build NASM from the \c{git} repository you will need a Perl and, if
- building on a Unix system, GNU autoconf.
- To build on a Unix system, run:
- \c sh autogen.sh
- to create the \c{configure} script and then build as listed above.
- \A{contact} Contact Information
- \H{website} Website
- NASM has a \i{website} at
- \W{http://www.nasm.us/}\c{http://www.nasm.us/}.
- \i{New releases}, \i{release candidates}, and \I{snapshots, daily
- development}\i{daily development snapshots} of NASM are available from
- the official web site in source form as well as binaries for a number
- of common platforms.
- \S{forums} User Forums
- Users of NASM may find the Forums on the website useful. These are,
- however, not frequented much by the developers of NASM, so they are
- not suitable for reporting bugs.
- \S{develcom} Development Community
- The development of NASM is coordinated primarily though the
- \i\c{nasm-devel} mailing list. If you wish to participate in
- development of NASM, please join this mailing list. Subscription
- links and archives of past posts are available on the website.
- \H{bugs} \i{Reporting Bugs}\I{bugs}
- To report bugs in NASM, please use the \i{bug tracker} at
- \W{http://www.nasm.us/}\c{http://www.nasm.us/} (click on "Bug
- Tracker"), or if that fails then through one of the contacts in
- \k{website}.
- Please read \k{qstart} first, and don't report the bug if it's
- listed in there as a deliberate feature. (If you think the feature
- is badly thought out, feel free to send us reasons why you think it
- should be changed, but don't just send us mail saying `This is a
- bug' if the documentation says we did it on purpose.) Then read
- \k{problems}, and don't bother reporting the bug if it's listed
- there.
- If you do report a bug, \e{please} make sure your bug report includes
- the following information:
- \b What operating system you're running NASM under. Linux,
- FreeBSD, NetBSD, MacOS X, Win16, Win32, Win64, MS-DOS, OS/2, VMS,
- whatever.
- \b If you compiled your own executable from a source archive, compiled
- your own executable from \c{git}, used the standard distribution
- binaries from the website, or got an executable from somewhere else
- (e.g. a Linux distribution.) If you were using a locally built
- executable, try to reproduce the problem using one of the standard
- binaries, as this will make it easier for us to reproduce your problem
- prior to fixing it.
- \b Which version of NASM you're using, and exactly how you invoked
- it. Give us the precise command line, and the contents of the
- \c{NASMENV} environment variable if any.
- \b Which versions of any supplementary programs you're using, and
- how you invoked them. If the problem only becomes visible at link
- time, tell us what linker you're using, what version of it you've
- got, and the exact linker command line. If the problem involves
- linking against object files generated by a compiler, tell us what
- compiler, what version, and what command line or options you used.
- (If you're compiling in an IDE, please try to reproduce the problem
- with the command-line version of the compiler.)
- \b If at all possible, send us a NASM source file which exhibits the
- problem. If this causes copyright problems (e.g. you can only
- reproduce the bug in restricted-distribution code) then bear in mind
- the following two points: firstly, we guarantee that any source code
- sent to us for the purposes of debugging NASM will be used \e{only}
- for the purposes of debugging NASM, and that we will delete all our
- copies of it as soon as we have found and fixed the bug or bugs in
- question; and secondly, we would prefer \e{not} to be mailed large
- chunks of code anyway. The smaller the file, the better. A
- three-line sample file that does nothing useful \e{except}
- demonstrate the problem is much easier to work with than a
- fully fledged ten-thousand-line program. (Of course, some errors
- \e{do} only crop up in large files, so this may not be possible.)
- \b A description of what the problem actually \e{is}. `It doesn't
- work' is \e{not} a helpful description! Please describe exactly what
- is happening that shouldn't be, or what isn't happening that should.
- Examples might be: `NASM generates an error message saying Line 3
- for an error that's actually on Line 5'; `NASM generates an error
- message that I believe it shouldn't be generating at all'; `NASM
- fails to generate an error message that I believe it \e{should} be
- generating'; `the object file produced from this source code crashes
- my linker'; `the ninth byte of the output file is 66 and I think it
- should be 77 instead'.
- \b If you believe the output file from NASM to be faulty, send it to
- us. That allows us to determine whether our own copy of NASM
- generates the same file, or whether the problem is related to
- portability issues between our development platforms and yours. We
- can handle binary files mailed to us as MIME attachments, uuencoded,
- and even BinHex. Alternatively, we may be able to provide an FTP
- site you can upload the suspect files to; but mailing them is easier
- for us.
- \b Any other information or data files that might be helpful. If,
- for example, the problem involves NASM failing to generate an object
- file while TASM can generate an equivalent file without trouble,
- then send us \e{both} object files, so we can see what TASM is doing
- differently from us.
|