Computer Languages

Machine Language, Assembly Language, and High-level Languages

[ Back to Lecture Notes ]

A Programming Problem

Consider a program where part of the user interface is a system of drop- down and pop-up menus where the user can make choices by pressing a key or combination of keys similar to the Alt-F-X combination in the DOS Help, Edit, and QBasic programs. Good design dictates that we don't want to force the user to use only upper case or lower case characters; however, for efficiency of code and speed of operation, we don't want to make 2 tests every time a key is entered. The solution is to immediately convert a user's key to upper case, then wherever we need to make a branching decision based on the key elsewhere in the program we only need to make one test.

Here is about the shortest and simplest program I can think of that incorporates most of the principal elements of a program: executable statements, control statements (conditional and iterative), data manipulation statements, and input and output statements. It reads a character from the keyboard, converts it to uppercase (if it was entered as lower case), and displays it, looping until the 'q' or 'Q' character is entered. The line that displays the character is a testing tool just to show that the input and logic works up to this point; it would be replaced in a "real" program with code that made decisions based on the user's key.

[ TOP ]

Machine Language

The machine language version of the program is 30 bytes long; it consists of 15 2-byte pairs of opcode and operand. The Intel instruction set is not always so symmetrical; opcodes can be 2 bytes, and operands are more typically 2 bytes, but can be 4 or more bytes long. The machine instructions are shown below, grouped as opcode and operand, and are represented as hexadecimal numbers. Recall that 4 binary digits can be represented by 1 hexadecimal digit, so each 4-digit hex number represents 2 bytes or 16 bits. For example, the first opcode - operand pair is B400H. In binary it would be 1011 0100 0000 0000B

    B400    CD16    3C61    7206    3C7A    7702    24DF    88C2
    B402    CD21    3C51    75E8    B000    B44C    CD21

In order to get the binary codes entered and saved to disk as an executable program we need to enter them in a "hex editor." I will use the DOS utility DEBUG. While DEBUG is about as intuitive and user friendly as a chain saw, it is also available on every DOS and Windows operating system. The process is this:

  1. Enter the binary values as hexadecimal values using the E (Enter) command.
  2. Use the N (Name) command to give the program a name
  3. Tell DEBUG how many bytes to write to disk via the CX register
    Note: All numbers in DEBUG are in hexadecimal: 30D = 1EH
  4. Issue the W (Write) command
  5. And finally Q (Quit) DEBUG

It is also possible to write a "script" of the DEBUG commands to a text file and redirect input to DEBUG from the script file. The DOS command to feed the script file GETKEY1.SCR shown below to DEBUG is:   DEBUG < GETKEY1.SCR

    E 100 B4 00 CD 16 3C 61 72 06
    E 108 3C 7A 77 02 24 DF 88 C2
    E 110 B4 02 CD 21 3C 51 75 E8
    E 118 B0 00 B4 4C CD 21
    N GETKEY1.COM
    R CX
    1E
    W
    Q

Binary HEX codes entered in DEBUG and saved to file GETKEY1.COM Binary HEX codes redirected to DEBUG from script file GETKEY1.SCR
[ TOP ]

Assembly Language

The next step in the evolution of computer languages after Machine Language was Assembly Language where the machine opcodes are represented as somewhat more "English-like" words called Mnemonics. A program called an assembler translates the mnemonic instructions into binary machine code. Shown below is the assembly language version of the above machine language program. While the instructions could be entered directly into DEBUG, shown here is the script file GETKEY2.SCR that will be redirected to DEBUG.

    A 100
    MOV     AH,00   ; BIOS service 00H Read Keyboard Character
    INT     16      ; call BIOS
    CMP     AL,61   ; is char less than 'a'?
    JB      010E    ; yes, display the char
    CMP     AL,7A   ; is char greater than 'z'?
    JA      010E    ; yes, display the char
    AND     AL,DF   ; char is a-z, clear bit 5 to convert to upper case
                    ; display the char; in a "real" program the char
                    ; would be used to, say, check for a menu choice
    MOV     DL,AL   ; DL = char to display
    MOV     AH,02   ; DOS service 02H Display output
    INT     21      ; call DOS
    CMP     AL,51   ; was char 'Q' (Quit)?
    JNE     0100    ; no, loop and get another character
    MOV     AL,00   ; yes, AL = return code
    MOV     AH,4C   ; AH = DOS service 4CH Terminate with return code
    INT     21      ; call DOS
                    ; leave blank line to end DEBUG Assembly mode

    N getkey2.com
    R CX
    1E
    W
    Q


Assembly language instructions redirected to DEBUG from script file GETKEY2.SCR

It is instructive to "disassemble" the program in DEBUG because we can see the binary opcodes and operands side by side with the assembly instructions. Note that each assembly language mnemonic instruction represents one machine instruction. The far left column of numbers are the memory address "offsets" in the program of each instruction, the next column are the machine language codes, and the far right columns are the assembly language instructions.

    0100 B400          MOV     AH,00
    0102 CD16          INT     16
    0104 3C61          CMP     AL,61
    0106 7206          JB      010E
    0108 3C7A          CMP     AL,7A
    010A 7702          JA      010E
    010C 24DF          AND     AL,DF
    010E 88C2          MOV     DL,AL
    0110 B402          MOV     AH,02
    0112 CD21          INT     21
    0114 3C51          CMP     AL,51
    0116 75E8          JNZ     0100
    0118 B000          MOV     AL,00
    011A B44C          MOV     AH,4C
    011C CD21          INT     21

Disassembly of GETKEY2.COM using DEBUG's U (Unassemble) command
[ TOP ]

High Level Language

The next step after assembly language resulted in various high-level languages where the instructions are more like English statements. A program called a Compiler or an Interpreter translates the high-level statements into binary machine code, typically with an intermediate step of first translating the statements into assembly language, then calling an assembler to make the final translation to machine code.

Here is a C programming language program that is very nearly functionally identical to the above 2 programs in machine language and assembly language:

    /* GETKEY3.C    Read keyboard character without echo, convert
     *              to upper case and display to Standard Output
     */
    #include <stdio.h>          /* putchar()            */
    #include <conio.h>          /* getch()              */
    #include <ctype.h>          /* toupper(), islower() */

    /* all C/C++ programs have a main() function that return an integer
     * value to the operating system, 0 is the usual "success" code
     */
    int main()
    {
        int ch;                             /* keyboard character         */

        do {                                /* loop at least once         */
            ch = getch();                   /* read char from keyboard    */

            if ( islower( ch ) )            /* if char is lower case      */
                ch = toupper( ch );         /* convert to upper case      */

            /* display the character; a "real" program would now use the
             * upper case character to make, say, menu choices
             */
            putchar( ch );
        } while ( ch != 'Q' );              /* loop until char = 'Q'      */

        return 0;                           /* return success code to DOS */
    }


Compiling GETKEY3.C

Notice how very little "direct manipulation" of the input character is done in the high-level version. We call library functions to handle all the details of determining if the character was lower case in the first place ( islower( ch ) ), converting it to upper case ( toupper( ch ) ), and finally displaying the converted character ( putchar( ch ) ). Compare this with loading registers, calling operating system interrupt routines, comparing characters with the lower and upper end of a range of lower case characters, clearing bits in a byte that represents an ASCII character, etc.

Notice also that in the C version we declare a data type and reserve a memory location to store the character ( int ch ). In the machine and assembly language versions we were able to store the character in the CPU's registers. In fact, the C version has a lot more memory access going on behind the scenes. This is often cited as one of the advantages of assembly language even though many excellent high-level languages exist: code written in well written assembly language is typically smaller, faster, and more efficient.

[ TOP ]

Using GETKEY

GETKEY doesn't do much; it just displays the character you entered as upper case if you entered a lower case key. Any other keys, like numbers or punctuation, should get passed straight through without change. Here are some runs with GETKEY1 (the machine language version), GETKEY2 (the assembly language version), and GETKEY3 (the C language version) using the following characters as input: @ABYZ[`abyz{09q.

        GETKEY1 Input:  @ABYZ[`abyz{09q
        GETKEY1 Output: @ABYZ[`ABYZ{09Q

        GETKEY2 Input:  @ABYZ[`abyz{09q
        GETKEY2 Output: @ABYZ[`ABYZ{09Q

        GETKEY3 Input:  @ABYZ[`abyz{09q
        GETKEY3 Output: @ABYZ[`ABYZ{09Q
    
Testing machine, assembly, and C language versions of GETKEY

Extra Credit: See if you can figure out why I chose certain characters to test. (Hint: look at an ASCII character chart and look at the test in the assembly language version)

[ TOP ]

Source Code and Executables

GETKEY1.SCR
DEBUG script file for machine language version of GETKEY.
 
GETKEY2.SCR
DEBUG script file for assembly language version of GETKEY.
 
GETKEY3.C
C source code for C language version of GETKEY; compiles with Microsoft C version 6.0 - 8.0.
 
GETKEY1.COM,   GETKEY2.COM
Identical 30-byte executables created from machine language and assembly language instructions entered in DEBUG from above script files.
 
GETKEY3.EXE
5 Kb executable compiled with Microsoft Visual C++ 1.52 command line compiler from above C source code.
 

[ TOP ]

Revised: 09 JAN 2002 06:20