Assembly Language Basics
Assembly Language Basics
In Assembly Language:
syntax
data types
addressing modes
are essential for writing efficient and correct programs.
Syntax
Assembly language syntax is mnemonic and symbolic, making it readable and understandable.
Instructions are represented by mnemonics like MOV, ADD, SUB, JMP, etc. ▪ Instructions can have operands that specify the data to be operated on.
Operands can be registers, memory locations, or immediate values.
Labels are used for defining addresses in the program and creating jump targets.
Comments start with a semicolon ( ; ) and are used for documentation.

Data Types
In 8086 Assembly, there is no strict type system as in high-level languages, but data is classified based on size and purpose.
Common data types include:
Byte (8 bits): Represented as DB (Define Byte).
Word (16 bits): Represented as DW (Define Word).
Doubleword (32 bits): Represented as DD (Define Doubleword).
Quadword (64 bits): Represented as DQ (Define Quadword).
Data types are used for defining variables, constants, and memory allocation.
Addressing Modes
Addressing modes determine how an operand's address is calculated.
Common addressing modes include:
Immediate: Operand is a constant value ( MOV AX, 5 ).
Register: Operand is a value stored in a register ( MOV AX, BX ).
Direct: Operand is the address of a memory location ( MOV AL, [1000h] ).
Indirect: Operand is a register that holds the address of the memory location ( MOV AL, [BX] ).
Indexed: Operand is an indexed memory location using a base register and an index register ( MOV AL, [BX+SI] ).
Base-Indexed: Operand is a combination of base and index registers ( MOV AL, [BX+SI+10h] ).
Scaled Indexed: Operand is an indexed memory location with a scaling factor ( MOV AL, [BX+SI * 2] ).
Rules for writing code
The code consists of:
Mnemonics
Mnemonics are Machine Instructions
Mnemonics are abbreviations
Example: MOV, ADD, SUB, IMUL
Operands
Operands are Data or Addresses
Types of Operands: Registers, Memory, Constants
Example: MOV AX, [BX]
Comments
Enhance Code Readability
Used for documenting important parts of the code
Ignored by the Assembler Everything from the character ' ; ' till the end of the line is ignored by Assembler
Example: ; This is a Comment
Syntax Rules
Assembly is case insensitive (big or small letters, its the same)
Rule: If an identifier is present in the label field, always start that identifier in column one of the source line.
Rule: All operands should start in the same column. Generally, this should be column 17 (the second tab stop) or some other convenient position
Rule: All mnemonics should start in the same column. Generally, this should be column 25 (the third tab stop) or some other convenient position.
Exception: If a mnemonic (typically a macro) is longer than seven characters and requires an operand, you have no choice but to start the operand field beyond column 25 (this is an exception assuming you've chosen columns 17 and 25 for your mnemonic and operand fields, respectively).
Guideline: Try to always start the comment fields on adjacent source lines in the same column
Guideline: Use blank lines to separate special blocks of code from the surrounding code. Use an aesthetic looking row of asterisks or dashes if you need a stronger separation between two blocks of code (do not overdo this, however)
Example Code:
GetFileRecords: mov dx, offset DTA ;Set up DTA
DOS SetDTA
mov dx, FileSpec
mov cl, 37h
DOS FindFirstFile
jc FileNotFound
; *********************************************
mov di, offset fileRecords ;DI -> storage for file names
mov bx, offset files ;BX -> array of files
sub bx, 2 ;Special case for 1st iteration
Error Handling
Assembly Errors:
Undefined symbol: Program refers to a label that does not exist
How to fix: check spelling of both the definition and access
Undefined opcode or pseudo-op
How to fix: check the spelling/availability of the instruction
Addressing mode not available
How to fix: look up the addressing modes available for the instruction
Expression error
How to fix: check parentheses, start with a simpler expression
Phasing Error: occurs when the value of a symbol changes from pass1 to pass2
How to fix: first remove any undefined symbols, then remove forward references
Address error
How to fix: use org pseudo-op’s to match available memory.
Data Types - Byte, Word, Doubleword, Quadword
Byte Data Type Represents 8 Bits (1 Byte) Example: DB 65 ; (Decimal Byte Value)
Word Data Type Represents 16 Bits (2 Bytes) Example: DW 1234 ; (Decimal Word Value)
Doubleword Data Type Represents 32 Bits (4 Bytes) Example: DD 12345678h ; (Hex Doubleword Value)
Quadword Data Type (no native instructions or directives to directly declare Quadword (64-bit) data types) Represents 64 Bits (8 Bytes) Example: DQ 123456789ABCDEFh ; (Hex Quadword Value)
Data Type Conversion
Maintaining Data Integrity
CBW - convert byte to word extends the sign bit of AL into the AH register. This preserves the number 's sign
byte_val SBYTE -101
mov al, byte_val ; AL = 9Bh
cbw ; AX = FF9Bh
CWD - convert word to doubleword extends the sign bit of AX into the DX register:
word_val SWORD -101
mov ax, word_val ; FF9Bh ; AX = FF9Bh
cwd ; DX:AX = FFFFh:FF9Bh
Addressing Modes
Different Ways to Specify Operand Locations -> Immediate, Register, Direct, Indirect, Indexed, Based Indexed
Immediate Addressing Mode - Operand as Constants MOV AX, 42
Register Addressing Mode - Operand is a Register MOV BX, AX
Direct Addressing Mode - Operand is a Memory Location MOV AL, [SI]
Indirect Addressing Mode - Register Holds Memory Address MOV DL, [BX]
Indexed Addressing Mode - Operand as Indexed Memory Location MOV AL, [BX+SI]
Based Indexed Addressing Mode - Combination of Base and Index Registers MOV AH, [BX+SI+10h]
Calculate the Sum of Two Numbers
Example 1:
MOV AX, 2340h
MOV BX, 4008h
ADD AX, BX
Example 2:
Number1 DW 2340h ; h -> hex number
Number2 DW 4008h ; h -> hex number
Result DW ?
MOV AX, Number1
MOV BX, Number2
ADD AX, BX ;
MOV Result, AX
Few registers – additional info
Register SP - stack pointer (16 bits) It points to the topmost item of the stack. If the stack is empty the stack pointer will be (FFFE)h. Its offset address is relative to the stack segment.
Register BP- base pointer (16 bits) It is primarily used in accessing parameters passed by the stack. Its offset address is relative to the stack segment.
Register SI - source index (16bits) It is used in the pointer addressing of data and as a source in some string-related operations. Its offset is relative to the data segment.
Register DI - destination index (16 bits) It is used in the pointer addressing of data and as a destination in some stringrelated operations. Its offset is relative to the extra segment.
Example – Accessing array elements
Array DW 10, 20, 30, 40, 50 ; Define an array of 16-bit integers
Index DW 2 ; Index for element access (e.g., access the third element)
Result DW ? ; Storage for the accessed element
; Load the array index into a register
MOV BX, Index
; Calculate the offset in the array based on the index
MOV CX, 2 ; Each element in the array is 2 bytes (16 bits)
MUL CX ; Multiply the index by 2 to get the offset
MOV SI, AX ; Store the offset in SI
; Access the array element and store it in Result
MOV AX, Array[SI] ; Load the element at the calculated offset
MOV Result, AX
Microsoft Macro Assembler (MASM)
The assembler contains as its front-end a macro processor.
The macro processor scans the source file for macro definitions and macro calls written in Macro Processor Language (MPL).
Macro calls are expanded according to macro definitions, and the resulting source assembly language is assembled by the assembler.
By using MPL, you can create macros specific to your application that can generate sequences of assembly language instructions or directives.
The macro processor is a very powerful string replacement facility that can help to simplify a programming task. Repeatedly used code sequences can be replaced by a simple macro call. Also, frequently used assembler directive statements can be replaced by macro calls.
MASM Directives
MASM uses Directives – instructions to the Assembler Indicate how an operand or section of a program is to be processed by the assembler.
DB (Define Byte) DW (Define Word) DD (Define Double Word) very often used to define and store memory data. With these directives, we give a symbolic name of a memory location and at the same time we define its size.
ARRAY DW 100 DUP(0) reserve 100 words of storage in memory, give it the name “ARRAY”, and initialize those 100 words with 0000.
ARRAY DW 100 DUP(0) reserve 100 words of storage in memory, give it the name “ARRAY”, and initialize those 100 words with 0000.
ARRAY DW 100 DUP(?) reserve 100 words of storage in memory, give it the name “ARRAY”, and the Assembled does not initialize them with anything.
ORG ORG (Origin) directive is an indication on where to put the next piece of code/data, related to the current segment. changes the starting offset address of the data in the data segment to a desired location the origin of data or the code must be assigned to an absolute offset address with the ORG statement
ORG 0000h
DataX DB 28
DataY DB 45
ORG 0020h
DataZ DB 86
// 0000 -> 1Ch
// 0001 -> 2Dh
// 0020 -> 56h
ASSUME the name of the logical segment it should use for a specified segment. It instructs the Assembler what names are chosen for the code, data, extra, stack segments
ASSUME DS:data , CS:code, SS:stack
EQU equates a numeric, ASCII, or label to another label Each time the assembler finds the given name in the program, it will replace the name with the value or symbol we equated with that name.
CONTROL_WORD EQU 11001001 ; replacement
MOV AX, CONTROL_WORD ; assignment
COUNT EQU 10
CONST EQU 20H
MOV AH, COUNT
MOV AL, CONST
PROC indicate start and end of a procedure (subroutine)
PROCONVERT PROC FAR
; identifies the start of a procedure named PROCONVERT and tells the Assembler that the procedure is far (in a segment with a different name from the one that contains the instruction that calls the procedure.)
The PROC directive, which indicates the start of a procedure, must also be followed with a NEAR or FAR
A NEAR procedure is one that resides in the same code segment as the program, often considered to be local A FAR procedure may reside at any location in the memory system, considered global
The term global denotes a procedure that can be used by any program. Local defines a procedure that is only used by the current program.
Model Types:
Tiny – All data and code fit in one segment. Tiny programs are written in .COM, which means the program must be originated at location 100h.
Small – Contains two segments – One DS (64KB) and one CS (64KB)
Medium – Contains one DS segment (64KB) and any number of CS for large programs.
Compact – Contains one CS (64KB) for the program and any number of DS contain the data
Large – Allows any number of CS and DS
Huge – same as Large, but the DS segments may contain more than 64KB each
Tiny programs are written in .COM, which means the program must be originated at location 100h.
COM programs in DOS are set up: The first 256 bytes contain management data for the operating system and the command line (which can be re-used as 128-byte data transfer buffer), and the program code starts after it.
So ORG 100h forces the toolchain to build a program that fits the memory layout of COM programs.
MASM – structure of a main module with full segment directives
assume ds:data, cs:code
data segment
data ends
code segment
START:
nop ; code here
code ends
end START
Assembler Phases
Phase 1 - Analysis The Analysis Phase validates the syntax of the code, checks for errors, and creates a symbol table.
Phase 2 – Synthesis The Synthesis Phase converts the assembly language instructions into machine code, using the information from the Analysis Phase.
These two phases work together to produce the final machine code that can be executed by the computer. The combination of these two phases makes the Assembler an essential tool for transforming assembly language into machine code, ensuring high-quality and error-free software.
Last updated