Lab #1 Writing an Assembler Due Date: 2 March 2018
How to Write a Two-Pass Assembler
All of you are familiar with writing programs in one or more assembly languages. But you may not have really considered how an assembler works when it translates an assembly language program into machine code. These are the functions that the assembler must perform when translating:
· Replace symbolic addresses with numeric addresses
· Replace symbolic operation codes with machine opcodes
· Reserve storage for instructions and data
· Translate constants into machine representation.
The assignment of numeric address can be performed without foreknowledge of what actual locations will be occupied by the assembled program. It is necessary only to generate addresses relative to the start of the code or data segment. Thus, we assume that we have separate data and code segments and that our assembler normally generates address starting a 0.
To illustrate, consider the following assembly code which is equivalent to the following pseudocode:
IF flag THEN answer := alpha + 2 * gamma / (C3P0 – R2D2) ENDIF
PRINT answer
1 Section .data
2 flag: word
3 answer: word
4 alpha: word
5 gamma: word
6 C3PO: word
7 R2D2: word
8 Section .code
9 RVALUE flag
10 GOFALSE L0
11 LVALUE answer
12 RVALUE alpha
13 PUSH 2
14 RVALUE gamma
15 MPY
16 RVALUE C3P0
17 RVALUE R2D2
18 SUB
19 DIV
20 ADD
21 :=
22 LABEL L0
23 RVALUE answer
24 PRINT
25 HALT
In translating line 9 of our example program, the resulting machine instruction would be assigned to address 0 of our code segment and occupy 4 bytes, if all machine instructions are 4 bytes (32 bits) long. Hence the instruction corresponding to line 10 would be assigned address 4. Similarly, the variable with the label flag would be at address 0 of our data segment, and occupy 4 bytes, if words are 32 bit entities.
Implementation Issues:
The assembler uses a counter to keep track of machine-language addresses. Because these addresses will ultimately specify locations in main stores, the counter is called the location counter. (“Address counter” would be a more accurate term.) Before assembling each section, the location counter is initialized to zero. After each source lien has been examined on the first pass, the location counter is incremented by the length of the machine-language code that will ultimately be generated to correspond to that source line. When a label definition is found, the assembler places both the label and its address (as determined by the value of the location counter) in a symbol table. Creation of the symbol table requires one pass over the source text. During a second pass, the assembler uses the address collected in the symbol table to perform the translation. As each symbolic address is encountered in the second pass, the corresponding numeric address is substituted for it in the object code.
Two types of logical errors can occur due to improper use of symbols. If a symbol appears in the operand field of some instruction, but nowhere in a label field, it is undefined. If a symbol appears in the label fields of more than one instruction, it is multiply-defined. The former error will be detected on the second pass, whereas the latter will be found on the first pass.
For an assembler, the symbol table structure is usually rather simple. It simply contains the symbolic labels, the appropriate address, and some simply type information (data or code, # data bits, etc.)
Assignment:
Your task is write an assembler for a hypothetical stack machine operating on 32 bit integers with the following instructions:
PUSH v — push v (an integer constant) on the stack
RVALUE l — push the contents of variable l
LVALUE l — push the address of the variable l
POP — throw away the top value on the stack
STO — the rvalue on top of the stack is place in the lvalue below it and both are popped
COPY — push a copy of the top value on the stack
ADD — pop the top two values off the stack, add them, and push the result
SUB — pop the top two values off the stack, subtract them, and push the result
MPY — pop the top two values off the stack, multiply them, and push the result
DIV — pop the top two values off the stack, divide them, and push the result
MOD — pop the top two values off the stack, compute the modulus, and push the result
NEG — pop the top value off the stack, negate it, and push the result
NOT — pop the top value off the stack, invert all the bits, and push the result
OR — pop the top two values off the stack, compute the logical OR, and push the result
AND — pop the top two values off the stack, compute the logical AND, and push the result
EQ — pop the top two values off the stack, compare them, and push a 1 if they are equal, and a 0 if they are not
NE — pop the top two values off the stack, compare them, and push a 1 if they are not equal, and a 0 if they are equal
GT — pop the top two values off the stack, compare them, and push a 1 if the first operand is greater than the second, and a 0 if it is not
GE — pop the top two values off the stack, compare them, and push a 1 if the first operand is greater than or equal to the second, and a 0 if it is not
LT — pop the top two values off the stack, compare them, and push a 1 if the first operand is less than the second, and a 0 if it is not
LE — pop the top two values off the stack, compare them, and push a 1 if the first operand is less than or equal to the second, and a 0 if it is not
LABEL n — serves as the target of jumps to n; has no other effect
GOTO n — the next instruction is taken from statement with label n
GOFALSE n — pop the top value; jump if it is zero
GOTRUE n — pop the top value; jump if it is nonzero
PRINT — pop the top value off the stack and display it as a base 10 integer
READ — read a base 10 integer from the keyboard and push its value on the stack
GOSUB l — push the current value of the program counter on the call stack and transfer control to the statement with label l
RET — pop the top value off the call stack and store it in the program counter
HALT — stop execution
For two operand instructions, the operand on top of the stack is the second operand, and the one immediately below it is the first operand
The numeric opcodes are as follows:
HALT 0
PUSH 1
RVALUE 2
LVALUE 3
POP 4
STO 5
COPY 6
ADD 7
SUB 8
MPY 9
DIV 10
MOD 11
NEG 12
NOT 13
OR 14
AND 15
EQ 16
NE 17
GT 18
GE 19
LT 20
LE 21
LABEL 22
GOTO 23
GOFALSE 24
GOTRUE 25
PRINT 26
READ 27
GOSUB 28
RET 29
Write an assembler with the following specifications:
All instructions for this machine are 32 bits (4 bytes) long, with the following format: Bits 32-21 are ignored (in practice, filled with zeros,) bits 20-16 hold the opcode, and bits 15-0 hold the operand. (If there is no operand, those bits are filled with zeroes, but otherwise ignored.) Your assembler should produce a binary file which is a set of machine instructions in Big-Endian format. We will use the Harvard memory model, with two separate 256KB (65,536 32-bit words) for instructions and data, and you will have two location counters (one for code, and one for data) rather than one. This also means that, any label with a numeric value greater than 65535 would be an error, although a highly unlikely one to encounter in a debugged assembler. To keep things simple, we will assume that the memory is word-addressable rather than byte-addressable. Thus, the symbol table would conceptually look like:
LexemeTypeAddressflagInt0answerInt1alphaInt2gammaInt3C3P0Int4R2D2Int5L0Code13
|
Note that implication of having word addressability means that the location counter will be incremented by 1 rather than by 4 at each step.
Assume labels are sequences of alphanumeric characters only, starting with an alphabetic character. The maximum length of a label can be up to you, either fixed or unlimited, but your implementation must allow labels to be at least 12 characters in length. Since the only data type the assembler recognizes is a 32-bit integer, word is the only data declaration directive your assembler will need to be able to handle.
You instructor will provide you with a virtual machine for this abstract stack machine. (In other words, he will do the hard part of actually executing the binary code.)
You may use Java, C , or Python 2.7 to implement your assembler, but it must generate the appropriate binary code in Big Endian format. If you use Java, you may use any platform you wish. If you use any other language, you must either provide a makefile or complete compilation instructions for generating an executable under Linux. I will be using Java 8, gcc, and Python 2.7 either in a VM or using the Windows Subsystem for Linux.
Hand in your source code and compilation instructions via Blackboard.
Your first assignment is to use ad-hoc methods (partially described in the assignment document) to write a two-pass assembler for an abstract stack machine.
Attached also are four files:
· a source file, lab1.asm, and
· a binary file, lab1.bin, which is the output of the assembler when given the previous source.
· a screen grab of a hexdump of the output file
· a very much simplified version of the assembler written in Java. It also lacks the symbol table and symbol table entry class definitions.
If you were to print the opcode and operand portion of each instruction, one per line, it would look like:
2 0
24 13
3 1
2 2
1 2
2 3
9 0
2 4
2 5
8 0
10 0
7 0
5 0
22 0
2 1
26 0
0 0
题内code /simpleASM.java
import java.util.*; import java.io.*; public class SimpleAsm { public static final int HALT = 0; public static final int PUSH = 1; public static final int RVALUE = 2; public static final int LVALUE = 3; public static final int ASGN = 4; public static String [] opcodes = {"HALT","PUSH","RVALUE","LVALUE","STO"}; public static void main(String [] args)throws IOException { // get filename String filename; SymbolTable symTab = new SymbolTable(); if (args.length != 0) { filename = args[0]; } else { filename = "simple.asm"; } // Open file for input Scanner infile = new Scanner(new File(filename)); // pass 1 -- build symbol table pass1(infile, symTab); infile.close(); // pass 2 -- assemble // reopen source file infile = new Scanner(new File(filename)); // pass 2 -- output binary code pass2(infile,symTab); infile.close(); // print symbol table dumpSymbolTable(symTab); System.out.println("Done"); } public static int lookUpOpcode(String s) { for(int i = 0; i < opcodes.length; i++) { if (s.equalsIgnoreCase(opcodes[i])) { return i; } } System.err.println("\nInvalid opcode:" + s); return -1; } public static void pass1(Scanner infile, SymbolTable tab) { // initialize location counter, etc. int locationCounter = 0; String line; Scanner input; String lexeme; // find start of data section do { line = infile.nextLine(); System.out.println(line); input = new Scanner(line); } while (!input.next().equalsIgnoreCase("Section")); if (!input.next().equalsIgnoreCase(".data")) { System.err.println("Error: Missing 'Section .data' directive"); System.exit(1); } else { System.out.println("Parsing data section, pass 1"); } // build symbol table from variable declarations line = infile.nextLine(); input = new Scanner(line); // data section ends where code section begins while(!(lexeme = input.next()).equalsIgnoreCase("Section")) { // look for labels (they end with a colon) int pos = lexeme.indexOf(':'); if (pos > 0) { lexeme = lexeme.substring(0,pos); } else { System.err.println("error parsing " + line); } // insert the lexeme, the type, and its address into the symbol table tab.insert(lexeme,"Int",locationCounter); locationCounter++; line = infile.nextLine(); input = new Scanner(line); } // Now, parse the code section, looking for the label directive System.out.println("Parsing code section, pass 1"); locationCounter=0; while(infile.hasNext()) { line = infile.nextLine(); input = new Scanner(line); lexeme = input.next(); // when a label is found, place it and its code offset in the symbol table if (lexeme.equalsIgnoreCase("label")) { lexeme = input.next(); tab.insert(lexeme,"Code",locationCounter); } locationCounter++; } } // generate the code public static void pass2(Scanner infile, SymbolTable tab) { // initialize location counter, etc. int locationCounter = 0; String line; Scanner input; String lexeme; int symTabPtr; SymbolTableEntry entry; final int NULL = -1; // find start of next section do { line = infile.nextLine(); input = new Scanner(line); } while (!input.next().equalsIgnoreCase("Section")); if (!input.next().equalsIgnoreCase(".data")) { System.err.println("Error: Missing 'Section .data' directive"); System.exit(1); } else { System.out.println("Parsing data section, pass 2"); } line = infile.nextLine(); input = new Scanner(line); while(!(lexeme = input.next()).equalsIgnoreCase("Section")) { // data section has been processed in previous pass, so skip this line = infile.nextLine(); input = new Scanner(line); } // Now, let's generate some code System.out.println("Parsing code section, pass 2"); locationCounter=0; // while not end of file keep parsing while(infile.hasNext()) { line = infile.nextLine(); input = new Scanner(line); lexeme = input.next(); int ptr; // lookup opcode and generate appropriate instructions int opcode = lookUpOpcode(lexeme); switch(opcode) { case HALT: insertCode(locationCounter, HALT); break; case PUSH: lexeme = input.next(); insertCode(locationCounter, PUSH, Integer.parseInt(lexeme)); break; case RVALUE: lexeme = input.next(); ptr = tab.lookup(lexeme); insertCode(locationCounter, RVALUE, tab.get(ptr).getAddress()); break; case LVALUE: lexeme = input.next(); ptr = tab.lookup(lexeme); insertCode(locationCounter, LVALUE, tab.get(ptr).getAddress()); break; case ASGN: insertCode(locationCounter, ASGN); break; default: System.err.println("Unimplemented opcode: " + opcode); System.exit(opcode); } locationCounter++; } } public static void insertCode(int loc, int opcode, int operand) { System.out.println(loc + ":\t" + opcode + "\t" + operand);; } public static void insertCode(int loc, int opcode) { insertCode(loc,opcode,0); } public static void dumpSymbolTable(SymbolTable tab) { System.out.println("\nlexeme \ttype \taddress"); System.out.println("-----------------------------------------"); for(int i=0; i<tab.size(); i++) { System.out.println(tab.get(i)); } } }
代写CS&Finance|建模|代码|系统|报告|考试
编程类:C++,JAVA ,数据库,WEB,Linux,Nodejs,JSP,Html,Prolog,Python,Haskell,hadoop算法,系统 机器学习
金融类:统计,计量,风险投资,金融工程,R语言,Python语言,Matlab,建立模型,数据分析,数据处理
服务类:Lab/Assignment/Project/Course/Qzui/Midterm/Final/Exam/Test帮助代写代考辅导
E-mail:850190831@qq.com 微信:BadGeniuscs 工作时间:无休息工作日-早上8点到凌晨3点
如果您用的手机请先保存二维码到手机里面,识别图中二维码。如果用电脑,直接掏出手机果断扫描。