Assembly Language
An assembly language is a low-level programming language for microprocessors and other programmable devices. It is not just a single language, but rather a group of languages. An assembly language implements a symbolic representation of the machine code needed to program a given CPU architecture.
Assembly language is also known as assembly code. The term is often also used synonymously with 2GL.
An assembly language is the most basic programming language available for any processor. With assembly language, a programmer works only with operations that are implemented directly on the physical CPU.
Assembly languages generally lack high-level conveniences such as variables and functions, and they are not portable between various families of processors. They have the same structures and set of commands as machine language, but allow a programmer to use names instead of numbers. This language is still useful for programmers when speed is necessary or when they need to carry out an operation that is not possible in high-level languages.
Why is ASM useful?
Machine language is a series of numbers, which is not easy for humans to read. Using ASM, programmers can write human-readable programs that correspond almost exactly to machine language.
The disadvantage is that everything the computer does must be described explicitly, in precise detail. The advantage is that the programmer has maximum control over what the computer is doing.
Why is ASM a “low-level” language?
Assembly is called a low-level programming language because there’s (nearly) a one-to-one relationship between what it tells the computer to do, and what the computer does. In general, one line of an assembly program contains a maximum of one instruction for the computer.
How is ASM different from a “high-level” language?
High-level languages provide abstractions of low-level operations which allow the programmer to focus more on describing what they want to do, and less on how it should be done. Programming this way is more convenient and makes programs easier to read at the sacrifice of low-level control.
Programs written in high-level languages will never match the raw speed and efficiency of programs written in assembly. Examples of high-level languages include Python, Java, JavaScript, Clojure, and Lisp.
What is a “mid-level” language?
Mid-level languages or lower-level languagesprovide some high-level abstractions to make the programmer’s life easier, while still providing access to low-level operations. They are often used to write operating systems, so they are sometimes called system programming languages.
Programs written in mid-level languages can perform as well, or nearly as well, as programs written in assembly language. Examples of mid-level programming languages include C, C++, Ada, Nim, and Rust.
Is ASM portable?
No. Because assembly languages are tied to one specific computer architecture, they are not portable. A program written in one assembly language would need to be completely rewritten for it to run on another type of machine.
Portability is one of the main advantages of higher-level languages. The C programming language is often called “portable assembly” because C compilers exist for nearly every modern system architecture. A program written in C may require some changes before it will compile on another computer, but the core language is portable.
Generally speaking, the higher-level a language is, the fewer changes need to be made for it to run on another architecture. The lowest-level languages — machine language and assembly language — are not portable.
Assembly level instructions
An assembly program can be divided into three sections −
- The data section,
- The bss section, and
- The text section.
The data Section
The data section is used for declaring initialized data or constants. This data does not change at runtime. You can declare various constant values, file names, or buffer size, etc., in this section.
The syntax for declaring data section is −
section.data
The bss Section
The bss section is used for declaring variables. The syntax for declaring bss section is −
section.bss
The text section
The text section is used for keeping the actual code. This section must begin with the declaration global _start, which tells the kernel where the program execution begins.
The syntax for declaring text section is −
section.text
global _start
_start:
Comments
Assembly language comment begins with a semicolon (;). It may contain any printable character including blank. It can appear on a line by itself, like −
; This program displays a message on screen
or, on the same line along with an instruction, like −
add eax, ebx ; adds ebx to eax
Assembly Language Statements
Assembly language programs consist of three types of statements −
- Executable instructions or instructions,
- Assembler directives or pseudo-ops, and
- Macros.
The executable instructions or simply instructions tell the processor what to do. Each instruction consists of an operation code (opcode). Each executable instruction generates one machine language instruction.
The assembler directives or pseudo-ops tell the assembler about the various aspects of the assembly process. These are non-executable and do not generate machine language instructions.
Macros are basically a text substitution mechanism.
Syntax of Assembly Language Statements
Assembly language statements are entered one statement per line. Each statement follows the following format −
[label] mnemonic [operands] [;comment]
The fields in the square brackets are optional. A basic instruction has two parts, the first one is the name of the instruction (or the mnemonic), which is to be executed, and the second are the operands or the parameters of the command.
Following are some examples of typical assembly language statements −
INC COUNT ; Increment the memory variable COUNT
MOV TOTAL, 48 ; Transfer the value 48 in the
; memory variable TOTAL
ADD AH, BH ; Add the content of the
; BH register into the AH register
AND MASK1, 128 ; Perform AND operation on the
; variable MASK1 and 128
ADD MARKS, 10 ; Add 10 to the variable MARKS
MOV AL, 10 ; Transfer the value 10 to the AL register
The Hello World Program in Assembly
Example: Hello, World! in 32-bit assembly, for Windows
K
Here is “Hello, World” written for a 32-bit Intelprocessor. It will also run on a 64-bit processor. We will compile and run it on Windows 10.
global _main
extern _printf
section .text
_main:
push message
call _printf
add esp, 4
ret
message:
db 'Hello, World!', 10, 0
To begin, open Notepad. Copy and paste the code above into a new text file, and save the file as hello.asm.
To compile the assembly, we use NASM, the Netwide Assembler. It can be downloaded at the NASM site.
nasm -f win32 hello.asm
When you run this command, NASM creates an object file. An object file contains machine code, but is not quite an executable file. Our object file is called hello.obj.
To create the executable, we use the 32-bit version of MinGW (Minimal GNU for Windows) which provides the gcc compiler. It can be downloaded at MinGW site.
gcc -o hello.exe hello.obj
hello
Hello, World!
Use of macros
A macro definition is a block of code enclosed between MACRO
and MEND
directives. It defines a name that you can use as a convenient alternative to repeating the block of code.The main uses for a macro are:
- To make it easier to follow the logic of the source code by replacing a block of code with a single meaningful name.
- To avoid repeating a block of code several times.