Assembly Language Basics
Last updated
Last updated
In Assembly Language:
syntax
data types
addressing modes
are essential for writing efficient and correct programs.
Assembly language syntax is mnemonic and symbolic, making it readable and understandable.
Instructions are represented by mnemonics like MOV, ADD, SUB, JMP, etc. βͺ Instructions can have operands that specify the data to be operated on.
Operands can be registers, memory locations, or immediate values.
Labels are used for defining addresses in the program and creating jump targets.
Comments start with a semicolon ( ; ) and are used for documentation.
In 8086 Assembly, there is no strict type system as in high-level languages, but data is classified based on size and purpose.
Common data types include:
Byte (8 bits): Represented as DB (Define Byte).
Word (16 bits): Represented as DW (Define Word).
Doubleword (32 bits): Represented as DD (Define Doubleword).
Quadword (64 bits): Represented as DQ (Define Quadword).
Data types are used for defining variables, constants, and memory allocation.
Addressing modes determine how an operand's address is calculated.
Common addressing modes include:
Immediate: Operand is a constant value ( MOV AX, 5 ).
Register: Operand is a value stored in a register ( MOV AX, BX ).
Direct: Operand is the address of a memory location ( MOV AL, [1000h] ).
Indirect: Operand is a register that holds the address of the memory location ( MOV AL, [BX] ).
Indexed: Operand is an indexed memory location using a base register and an index register ( MOV AL, [BX+SI] ).
Base-Indexed: Operand is a combination of base and index registers ( MOV AL, [BX+SI+10h] ).
Scaled Indexed: Operand is an indexed memory location with a scaling factor ( MOV AL, [BX+SI * 2] ).
The code consists of:
Mnemonics
Mnemonics are Machine Instructions
Mnemonics are abbreviations
Example: MOV, ADD, SUB, IMUL
Operands
Operands are Data or Addresses
Types of Operands: Registers, Memory, Constants
Example: MOV AX, [BX]
Comments
Enhance Code Readability
Used for documenting important parts of the code
Ignored by the Assembler Everything from the character ' ; ' till the end of the line is ignored by Assembler
Example: ; This is a Comment
Assembly is case insensitive (big or small letters, its the same)
Rule: If an identifier is present in the label field, always start that identifier in column one of the source line.
Rule: All operands should start in the same column. Generally, this should be column 17 (the second tab stop) or some other convenient position
Rule: All mnemonics should start in the same column. Generally, this should be column 25 (the third tab stop) or some other convenient position.
Exception: If a mnemonic (typically a macro) is longer than seven characters and requires an operand, you have no choice but to start the operand field beyond column 25 (this is an exception assuming you've chosen columns 17 and 25 for your mnemonic and operand fields, respectively).
Guideline: Try to always start the comment fields on adjacent source lines in the same column
Guideline: Use blank lines to separate special blocks of code from the surrounding code. Use an aesthetic looking row of asterisks or dashes if you need a stronger separation between two blocks of code (do not overdo this, however)
Assembly Errors:
Undefined symbol: Program refers to a label that does not exist
How to fix: check spelling of both the definition and access
Undefined opcode or pseudo-op
How to fix: check the spelling/availability of the instruction
Addressing mode not available
How to fix: look up the addressing modes available for the instruction
Expression error
How to fix: check parentheses, start with a simpler expression
Phasing Error: occurs when the value of a symbol changes from pass1 to pass2
How to fix: first remove any undefined symbols, then remove forward references
Address error
How to fix: use org pseudo-opβs to match available memory.
Byte Data Type Represents 8 Bits (1 Byte) Example: DB 65 ; (Decimal Byte Value)
Word Data Type Represents 16 Bits (2 Bytes) Example: DW 1234 ; (Decimal Word Value)
Doubleword Data Type Represents 32 Bits (4 Bytes) Example: DD 12345678h ; (Hex Doubleword Value)
Quadword Data Type (no native instructions or directives to directly declare Quadword (64-bit) data types) Represents 64 Bits (8 Bytes) Example: DQ 123456789ABCDEFh ; (Hex Quadword Value)
Maintaining Data Integrity
CBW - convert byte to word extends the sign bit of AL into the AH register. This preserves the number 's sign
CWD - convert word to doubleword extends the sign bit of AX into the DX register:
Different Ways to Specify Operand Locations -> Immediate, Register, Direct, Indirect, Indexed, Based Indexed
Immediate Addressing Mode - Operand as Constants MOV AX, 42
Register Addressing Mode - Operand is a Register MOV BX, AX
Direct Addressing Mode - Operand is a Memory Location MOV AL, [SI]
Indirect Addressing Mode - Register Holds Memory Address MOV DL, [BX]
Indexed Addressing Mode - Operand as Indexed Memory Location MOV AL, [BX+SI]
Based Indexed Addressing Mode - Combination of Base and Index Registers MOV AH, [BX+SI+10h]
Example 1:
Example 2:
Few registers β additional info
Register SP - stack pointer (16 bits) It points to the topmost item of the stack. If the stack is empty the stack pointer will be (FFFE)h. Its offset address is relative to the stack segment.
Register BP- base pointer (16 bits) It is primarily used in accessing parameters passed by the stack. Its offset address is relative to the stack segment.
Register SI - source index (16bits) It is used in the pointer addressing of data and as a source in some string-related operations. Its offset is relative to the data segment.
Register DI - destination index (16 bits) It is used in the pointer addressing of data and as a destination in some stringrelated operations. Its offset is relative to the extra segment.
The assembler contains as its front-end a macro processor.
The macro processor scans the source file for macro definitions and macro calls written in Macro Processor Language (MPL).
Macro calls are expanded according to macro definitions, and the resulting source assembly language is assembled by the assembler.
By using MPL, you can create macros specific to your application that can generate sequences of assembly language instructions or directives.
The macro processor is a very powerful string replacement facility that can help to simplify a programming task. Repeatedly used code sequences can be replaced by a simple macro call. Also, frequently used assembler directive statements can be replaced by macro calls.
MASM uses Directives β instructions to the Assembler Indicate how an operand or section of a program is to be processed by the assembler.
DB (Define Byte) DW (Define Word) DD (Define Double Word) very often used to define and store memory data. With these directives, we give a symbolic name of a memory location and at the same time we define its size.
ARRAY DW 100 DUP(0) reserve 100 words of storage in memory, give it the name βARRAYβ, and initialize those 100 words with 0000.
ARRAY DW 100 DUP(0) reserve 100 words of storage in memory, give it the name βARRAYβ, and initialize those 100 words with 0000.
ARRAY DW 100 DUP(?) reserve 100 words of storage in memory, give it the name βARRAYβ, and the Assembled does not initialize them with anything.
ORG ORG (Origin) directive is an indication on where to put the next piece of code/data, related to the current segment. changes the starting offset address of the data in the data segment to a desired location the origin of data or the code must be assigned to an absolute offset address with the ORG statement
ASSUME the name of the logical segment it should use for a specified segment. It instructs the Assembler what names are chosen for the code, data, extra, stack segments
EQU equates a numeric, ASCII, or label to another label Each time the assembler finds the given name in the program, it will replace the name with the value or symbol we equated with that name.
PROC indicate start and end of a procedure (subroutine)
The PROC directive, which indicates the start of a procedure, must also be followed with a NEAR or FAR
A NEAR procedure is one that resides in the same code segment as the program, often considered to be local A FAR procedure may reside at any location in the memory system, considered global
The term global denotes a procedure that can be used by any program. Local defines a procedure that is only used by the current program.
Tiny β All data and code fit in one segment. Tiny programs are written in .COM, which means the program must be originated at location 100h.
Small β Contains two segments β One DS (64KB) and one CS (64KB)
Medium β Contains one DS segment (64KB) and any number of CS for large programs.
Compact β Contains one CS (64KB) for the program and any number of DS contain the data
Large β Allows any number of CS and DS
Huge β same as Large, but the DS segments may contain more than 64KB each
Tiny programs are written in .COM, which means the program must be originated at location 100h.
COM programs in DOS are set up: The first 256 bytes contain management data for the operating system and the command line (which can be re-used as 128-byte data transfer buffer), and the program code starts after it.
So ORG 100h forces the toolchain to build a program that fits the memory layout of COM programs.
Phase 1 - Analysis The Analysis Phase validates the syntax of the code, checks for errors, and creates a symbol table.
Phase 2 β Synthesis The Synthesis Phase converts the assembly language instructions into machine code, using the information from the Analysis Phase.
These two phases work together to produce the final machine code that can be executed by the computer. The combination of these two phases makes the Assembler an essential tool for transforming assembly language into machine code, ensuring high-quality and error-free software.