家庭作业课后two编译原理Fundamentals of Compiling CS留学生代写JAVA实现

Lab #1 Writing an Assembler Due Date: 2 March 2018

How to Write a Two-Pass Assembler

All of you are familiar with writing programs in one or more assembly languages. But you may not have really considered how an assembler works when it translates an assembly language program into machine code. These are the functions that the assembler must perform when translating:

· Replace symbolic addresses with numeric addresses

· Replace symbolic operation codes with machine opcodes

· Reserve storage for instructions and data

· Translate constants into machine representation.

The assignment of numeric address can be performed without foreknowledge of what actual locations will be occupied by the assembled program. It is necessary only to generate addresses relative to the start of the code or data segment. Thus, we assume that we have separate data and code segments and that our assembler normally generates address starting a 0.

To illustrate, consider the following assembly code which is equivalent to the following pseudocode:
IF flag THEN answer := alpha + 2 * gamma / (C3P0 – R2D2) ENDIF

PRINT answer

1 Section .data

2 flag: word

3 answer: word

4 alpha: word

5 gamma: word

6 C3PO: word

7 R2D2: word

8 Section .code

9 RVALUE flag

10 GOFALSE L0

11 LVALUE answer

12 RVALUE alpha

13 PUSH 2

14 RVALUE gamma

15 MPY

16 RVALUE C3P0

17 RVALUE R2D2

18 SUB

19 DIV

20 ADD

21 :=

22 LABEL L0

23 RVALUE answer
24 PRINT

25 HALT

In translating line 9 of our example program, the resulting machine instruction would be assigned to address 0 of our code segment and occupy 4 bytes, if all machine instructions are 4 bytes (32 bits) long. Hence the instruction corresponding to line 10 would be assigned address 4. Similarly, the variable with the label flag would be at address 0 of our data segment, and occupy 4 bytes, if words are 32 bit entities.

Implementation Issues:

The assembler uses a counter to keep track of machine-language addresses. Because these addresses will ultimately specify locations in main stores, the counter is called the location counter. (“Address counter” would be a more accurate term.) Before assembling each section, the location counter is initialized to zero. After each source lien has been examined on the first pass, the location counter is incremented by the length of the machine-language code that will ultimately be generated to correspond to that source line. When a label definition is found, the assembler places both the label and its address (as determined by the value of the location counter) in a symbol table. Creation of the symbol table requires one pass over the source text. During a second pass, the assembler uses the address collected in the symbol table to perform the translation. As each symbolic address is encountered in the second pass, the corresponding numeric address is substituted for it in the object code.

Two types of logical errors can occur due to improper use of symbols. If a symbol appears in the operand field of some instruction, but nowhere in a label field, it is undefined. If a symbol appears in the label fields of more than one instruction, it is multiply-defined. The former error will be detected on the second pass, whereas the latter will be found on the first pass.

For an assembler, the symbol table structure is usually rather simple. It simply contains the symbolic labels, the appropriate address, and some simply type information (data or code, # data bits, etc.)

Assignment:

Your task is write an assembler for a hypothetical stack machine operating on 32 bit integers with the following instructions:

PUSH v — push v (an integer constant) on the stack

RVALUE l — push the contents of variable l

LVALUE l — push the address of the variable l

POP — throw away the top value on the stack

STO — the rvalue on top of the stack is place in the lvalue below it and both are popped

COPY — push a copy of the top value on the stack

ADD — pop the top two values off the stack, add them, and push the result

SUB — pop the top two values off the stack, subtract them, and push the result

MPY — pop the top two values off the stack, multiply them, and push the result

DIV — pop the top two values off the stack, divide them, and push the result

MOD — pop the top two values off the stack, compute the modulus, and push the result

NEG — pop the top value off the stack, negate it, and push the result

NOT — pop the top value off the stack, invert all the bits, and push the result

OR — pop the top two values off the stack, compute the logical OR, and push the result

AND — pop the top two values off the stack, compute the logical AND, and push the result

EQ — pop the top two values off the stack, compare them, and push a 1 if they are equal, and a 0 if they are not

NE — pop the top two values off the stack, compare them, and push a 1 if they are not equal, and a 0 if they are equal

GT — pop the top two values off the stack, compare them, and push a 1 if the first operand is greater than the second, and a 0 if it is not

GE — pop the top two values off the stack, compare them, and push a 1 if the first operand is greater than or equal to the second, and a 0 if it is not

LT — pop the top two values off the stack, compare them, and push a 1 if the first operand is less than the second, and a 0 if it is not

LE — pop the top two values off the stack, compare them, and push a 1 if the first operand is less than or equal to the second, and a 0 if it is not

LABEL n — serves as the target of jumps to n; has no other effect

GOTO n — the next instruction is taken from statement with label n

GOFALSE n — pop the top value; jump if it is zero

GOTRUE n — pop the top value; jump if it is nonzero

PRINT — pop the top value off the stack and display it as a base 10 integer

READ — read a base 10 integer from the keyboard and push its value on the stack

GOSUB l — push the current value of the program counter on the call stack and transfer control to the statement with label l

RET — pop the top value off the call stack and store it in the program counter

HALT — stop execution

For two operand instructions, the operand on top of the stack is the second operand, and the one immediately below it is the first operand

The numeric opcodes are as follows:

HALT 0

PUSH 1

RVALUE 2

LVALUE 3

POP 4

STO 5

COPY 6

ADD 7

SUB 8

MPY 9

DIV 10

MOD 11

NEG 12

NOT 13

OR 14

AND 15

EQ 16

NE 17

GT 18

GE 19

LT 20

LE 21

LABEL 22

GOTO 23

GOFALSE 24

GOTRUE 25

PRINT 26

READ 27

GOSUB 28

RET 29

Write an assembler with the following specifications:

All instructions for this machine are 32 bits (4 bytes) long, with the following format: Bits 32-21 are ignored (in practice, filled with zeros,) bits 20-16 hold the opcode, and bits 15-0 hold the operand. (If there is no operand, those bits are filled with zeroes, but otherwise ignored.)   Your assembler should produce a binary file which is a set of machine instructions in Big-Endian format. We will use the Harvard memory model, with two separate 256KB (65,536 32-bit words) for instructions and data, and you will have two location counters (one for code, and one for data) rather than one.   This also means that, any label with a numeric value greater than 65535 would be an error, although a highly unlikely one to encounter in a debugged assembler. To keep things simple, we will assume that the memory is word-addressable rather than byte-addressable. Thus, the symbol table would conceptually look like:

LexemeTypeAddressflagInt0answerInt1alphaInt2gammaInt3C3P0Int4R2D2Int5L0Code13

Note that implication of having word addressability means that the location counter will be incremented by 1 rather than by 4 at each step.

Assume labels are sequences of alphanumeric characters only, starting with an alphabetic character. The maximum length of a label can be up to you, either fixed or unlimited, but your implementation must allow labels to be at least 12 characters in length. Since the only data type the assembler recognizes is a 32-bit integer, word is the only data declaration directive your assembler will need to be able to handle.

You instructor will provide you with a virtual machine for this abstract stack machine. (In other words, he will do the hard part of actually executing the binary code.)

You may use Java, C , or Python 2.7 to implement your assembler, but it must generate the appropriate binary code in Big Endian format. If you use Java, you may use any platform you wish. If you use any other language, you must either provide a makefile or complete compilation instructions for generating an executable under Linux.   I will be using Java 8, gcc, and Python 2.7 either in a VM or using the Windows Subsystem for Linux.

Hand in your source code and compilation instructions via Blackboard.

Your first assignment is to use ad-hoc methods (partially described in the assignment document) to write a two-pass assembler for an abstract stack machine.

Attached also are four files:

· a source file, lab1.asm, and

· a binary file, lab1.bin, which is the output of the assembler when given the previous source.

· a screen grab of a hexdump of the output file

· a very much simplified version of the assembler written in Java. It also lacks the symbol table and symbol table entry class definitions.

If you were to print the opcode and operand portion of each instruction, one per line, it would look like:
2       0
24      13
3       1
2       2
1       2
2       3
9       0
2       4
2       5
8       0
10      0
7       0
5       0
22      0
2       1
26      0
0       0

题内code /simpleASM.java

import java.util.*;
import java.io.*;

public class SimpleAsm
{
	public static final int HALT 	= 0;
	public static final int PUSH 	= 1;
	public static final int RVALUE 	= 2;
	public static final int LVALUE 	= 3;
	public static final int ASGN 	= 4;


	public static String [] opcodes = {"HALT","PUSH","RVALUE","LVALUE","STO"};

	public static void main(String [] args)throws IOException
	{
		// get filename
		String filename;
		SymbolTable symTab = new SymbolTable();

		if (args.length != 0)
		{
			filename = args[0];
		}
		else
		{
			filename = "simple.asm";
		}

		// Open file for input
		Scanner infile = new Scanner(new File(filename));

		// pass 1 -- build symbol table
		pass1(infile, symTab);
		infile.close();

		// pass 2 -- assemble
		// reopen source file
		infile = new Scanner(new File(filename));

		// pass 2 -- output binary code
		pass2(infile,symTab);
		infile.close();

		// print symbol table
		dumpSymbolTable(symTab);
		System.out.println("Done");
	}

	public static int lookUpOpcode(String s)
	{
		for(int i = 0; i < opcodes.length; i++)
		{
			if (s.equalsIgnoreCase(opcodes[i]))
			{
				return i;
			}
		}
		System.err.println("\nInvalid opcode:" + s);
		return -1;
	}

	public static void pass1(Scanner infile, SymbolTable tab)
	{
		// initialize location counter, etc.
		int locationCounter = 0;
		String line;
		Scanner input;
		String lexeme;

		// find start of data section
		do
		{
			line = infile.nextLine();
			System.out.println(line);
			input = new Scanner(line);
		} while (!input.next().equalsIgnoreCase("Section"));
		if (!input.next().equalsIgnoreCase(".data"))
		{
			System.err.println("Error:  Missing 'Section .data' directive");
			System.exit(1);
		}
		else
		{
			System.out.println("Parsing data section, pass 1");
		}

		// build symbol table from variable declarations
		line = infile.nextLine();
		input = new Scanner(line);

		// data section ends where code section begins
		while(!(lexeme = input.next()).equalsIgnoreCase("Section"))
		{
			// look for labels (they end with a colon)
			int pos = lexeme.indexOf(':');
			if (pos > 0)
			{
				lexeme = lexeme.substring(0,pos);
			}
			else
			{
				System.err.println("error parsing " + line);
			}
			// insert the lexeme, the type, and its address into the symbol table
			tab.insert(lexeme,"Int",locationCounter);
			locationCounter++;
			line = infile.nextLine();
			input = new Scanner(line);
		}

		// Now, parse the code section, looking for the label directive
		System.out.println("Parsing code section, pass 1");
		locationCounter=0;
		while(infile.hasNext())
		{
			line = infile.nextLine();
			input = new Scanner(line);
			lexeme = input.next();
			// when a label is found, place it and its code offset in the symbol table
			if (lexeme.equalsIgnoreCase("label"))
			{
				lexeme = input.next();
				tab.insert(lexeme,"Code",locationCounter);
			}
			locationCounter++;
		}
	}

	// generate the code
	public static void pass2(Scanner infile, SymbolTable tab)
	{
		// initialize location counter, etc.
		int locationCounter = 0;
		String line;
		Scanner input;
		String lexeme;
		int symTabPtr;
		SymbolTableEntry entry;
		final int NULL = -1;
		// find start of next section
		do
		{
			line = infile.nextLine();
			input = new Scanner(line);

		} while (!input.next().equalsIgnoreCase("Section"));
		if (!input.next().equalsIgnoreCase(".data"))
		{
			System.err.println("Error:  Missing 'Section .data' directive");
			System.exit(1);
		}
		else
		{
			System.out.println("Parsing data section, pass 2");
		}
		line = infile.nextLine();
		input = new Scanner(line);

		while(!(lexeme = input.next()).equalsIgnoreCase("Section"))
		{
			// data section has been processed in previous pass, so skip this
			line = infile.nextLine();
			input = new Scanner(line);
		}

		// Now, let's generate some code
		System.out.println("Parsing code section, pass 2");
		locationCounter=0;
		// while not end of file keep parsing
		while(infile.hasNext())
		{
			line = infile.nextLine();
			input = new Scanner(line);
			lexeme = input.next();
			int ptr;
			//	lookup opcode and generate appropriate instructions
			int opcode = lookUpOpcode(lexeme);
			switch(opcode)
			{
				case HALT:
					insertCode(locationCounter, HALT);
					break;
				case PUSH:
					lexeme = input.next();
					insertCode(locationCounter, PUSH, Integer.parseInt(lexeme));
					break;
				case RVALUE:
					lexeme = input.next();
					ptr = tab.lookup(lexeme);
					insertCode(locationCounter, RVALUE, tab.get(ptr).getAddress());
					break;
				case LVALUE:
					lexeme = input.next();
					ptr = tab.lookup(lexeme);
					insertCode(locationCounter, LVALUE, tab.get(ptr).getAddress());
					break;
				case ASGN:
					insertCode(locationCounter, ASGN);
					break;
				default:
					System.err.println("Unimplemented opcode:  " + opcode);
					System.exit(opcode);
			}
			locationCounter++;
		}
	}

	public static void insertCode(int loc, int opcode, int operand)
	{
		System.out.println(loc + ":\t" + opcode + "\t" + operand);;
	}

	public static void insertCode(int loc, int opcode)
	{
		insertCode(loc,opcode,0);
	}

	public static void dumpSymbolTable(SymbolTable tab)
	{
		System.out.println("\nlexeme \ttype \taddress");
		System.out.println("-----------------------------------------");
		for(int i=0; i<tab.size(); i++)
		{
			System.out.println(tab.get(i));
		}
	}
}

代写CS&Finance|建模|代码|系统|报告|考试

编程类：C++,JAVA ,数据库,WEB,Linux,Nodejs,JSP,Html,Prolog,Python,Haskell,hadoop算法,系统机器学习

金融类：统计，计量，风险投资，金融工程，R语言，Python语言，Matlab，建立模型，数据分析，数据处理

服务类：Lab/Assignment/Project/Course/Qzui/Midterm/Final/Exam/Test帮助代写代考辅导

E-mail:850190831@qq.com 微信：BadGeniuscs 工作时间：无休息工作日-早上8点到凌晨3点

如果您用的手机请先保存二维码到手机里面，识别图中二维码。如果用电脑，直接掏出手机果断扫描。

关键字：