当前位置:天才代写 > JAVA代写,java代考-JAVA作业代写免费Moss检测 > 家庭作业课后two编译原理Fundamentals of Compiling CS留学生代写JAVA实现

家庭作业课后two编译原理Fundamentals of Compiling CS留学生代写JAVA实现

2018-03-29 08:00 星期四 所属: JAVA代写,java代考-JAVA作业代写免费Moss检测 浏览:823

Lab #1  Writing an Assembler   Due Date:  2 March 2018

 

How to Write a Two-Pass Assembler


All of you are familiar with writing programs in one or more assembly languages.  But you may not have really considered how an assembler works when it translates an assembly language program into machine code.  These are the functions that the assembler must perform when translating:

· Replace symbolic addresses with numeric addresses

· Replace symbolic operation codes with machine opcodes

· Reserve storage for instructions and data

· Translate constants into machine representation.

The assignment of numeric address can be performed without foreknowledge of what actual locations will be occupied by the assembled program. It is necessary only to generate addresses relative to the start of the code or data segment.  Thus, we assume that we have separate data and code segments and that our assembler normally generates address starting a 0.  

To illustrate, consider the following assembly code which is equivalent to the following pseudocode:
                           IF flag THEN answer := alpha + 2 * gamma / (C3P0 – R2D2) ENDIF

      PRINT answer

 

 1 Section  .data

 2   flag: word

 3 answer: word

 4 alpha: word

 5 gamma: word

 6 C3PO: word

 7 R2D2: word

 8 Section .code

 9    RVALUE flag

10    GOFALSE L0

11      LVALUE answer  

12    RVALUE   alpha

13    PUSH   2

14   RVALUE gamma

15    MPY

16    RVALUE   C3P0

17   RVALUE   R2D2

18   SUB

19    DIV

20    ADD

21 :=

22 LABEL L0

23 RVALUE answer
24   PRINT

25 HALT

 

In translating line 9 of our example program, the resulting machine instruction would be assigned to address 0 of our code segment and occupy 4 bytes, if all machine instructions are 4 bytes (32 bits) long.  Hence the instruction corresponding to line 10 would be assigned address 4.  Similarly, the variable with the label flag would be at address 0 of our data segment, and occupy 4 bytes, if words are 32 bit entities.  

Implementation Issues:

The assembler uses a counter to keep track of machine-language addresses.  Because these addresses will ultimately specify locations in main stores, the counter is called the location counter.  (“Address counter” would be a more accurate term.)  Before assembling each section, the location counter is initialized to zero.  After each source lien has been examined on the first pass, the location counter is incremented by the length of the machine-language code that will ultimately be generated to correspond to that source line.  When a label definition is found, the assembler places both the label and its address (as determined by the value of the location counter) in a symbol table.  Creation of the symbol table requires one pass over the source text.  During a second pass, the assembler uses the address collected in the symbol table to perform the translation.  As each symbolic address is encountered in the second pass, the corresponding numeric address is substituted for it in the object code.  

Two types of logical errors can occur due to improper use of symbols.  If a symbol appears in the operand field of some instruction, but nowhere in a label field, it is undefined.  If a symbol appears in the label fields of more than one instruction, it is multiply-defined.  The former error will be detected on the second pass, whereas the latter will be found on the first pass.

For an assembler, the symbol table structure is usually rather simple.  It simply contains the symbolic labels, the appropriate address, and some simply type information (data or code, # data bits, etc.)   

Assignment:

Your task is write an assembler for a hypothetical stack machine operating on 32 bit integers with the following instructions:

  PUSH v — push v (an integer constant) on the stack

  RVALUE    l — push the contents of variable l

  LVALUE l — push the address of the variable l

  POP  — throw away the top value on the stack

  STO  — the rvalue on top of the stack is place in the lvalue below it and both are popped

  COPY  — push a copy of the top value on the stack

  ADD         — pop the top two values off the stack, add them, and push the result

  SUB       — pop the top two values off the stack, subtract them, and push the result

  MPY      — pop the top two values off the stack, multiply them, and push the result

  DIV         — pop the top two values off the stack, divide them, and push the result

  MOD       — pop the top two values off the stack, compute the modulus, and push the result

  NEG  — pop the top value off the stack, negate it, and push the result

  NOT  — pop the top value off the stack, invert all the bits, and push the result

  OR         — pop the top two values off the stack, compute the logical OR, and push the result

  AND        — pop the top two values off the stack, compute the logical AND, and push the result

  EQ         — pop the top two values off the stack, compare them, and push a 1 if they are equal, and a 0 if they are not

  NE        — pop the top two values off the stack, compare them, and push a 1 if they are not equal, and a 0 if they are equal

  GT         — pop the top two values off the stack, compare them, and push a 1 if the first operand is greater than the second, and a 0 if it is not

  GE         — pop the top two values off the stack, compare them, and push a 1 if the first operand is greater than or equal to the second, and a 0 if it is not

  LT         — pop the top two values off the stack, compare them, and push a 1 if the first operand is less than the second, and a 0 if it is not

  LE         — pop the top two values off the stack, compare them, and push a 1 if the first operand is less than or equal to the second, and a 0 if it is not

  LABEL n — serves as the target of jumps to n; has no other effect

  GOTO n — the next instruction is taken from statement with label n

  GOFALSE n — pop the top value; jump if it is zero

  GOTRUE n — pop the top value; jump if it is nonzero

  PRINT  — pop the top value off the stack and display it as a base 10 integer

  READ  — read a base 10 integer from the keyboard and push its value on the stack

  GOSUB l — push the current value of the program counter on the call stack and transfer control to the      statement with label l

  RET  — pop the top value off the call stack and store it in the program counter

  HALT  — stop execution

For two operand instructions, the operand on top of the stack is the second operand, and the one immediately below it is the first operand

The numeric opcodes are as follows:

 


  HALT 0

  PUSH 1

  RVALUE    2

  LVALUE 3

  POP 4

  STO 5

  COPY 6

  ADD        7

  SUB      8

  MPY     9

  DIV        10

  MOD      11

  NEG 12

  NOT 13

  OR        14

  AND       15

  EQ        16

  NE       17

  GT        18

  GE        19

  LT        20

  LE        21

  LABEL 22

  GOTO 23

  GOFALSE 24

  GOTRUE 25

  PRINT 26

  READ 27

  GOSUB 28

  RET 29



Write an assembler with the following specifications:


All instructions for this machine are 32 bits (4 bytes) long, with the following format:  Bits 32-21 are ignored (in practice, filled with zeros,) bits 20-16 hold the opcode, and bits 15-0 hold the operand.  (If there is no operand, those bits are filled with zeroes, but otherwise ignored.)   Your assembler should produce a binary file which is a set of machine instructions in Big-Endian format.  We will use the Harvard memory model, with two separate 256KB (65,536 32-bit words) for instructions and data, and you will have two location counters (one for code, and one for data) rather than one.   This also means that, any label with a numeric value greater than 65535 would be an error, although a highly unlikely one to encounter in a debugged assembler.  To keep things simple, we will assume that the memory is word-addressable rather than byte-addressable. Thus, the symbol table would conceptually look like:  

LexemeTypeAddressflagInt0answerInt1alphaInt2gammaInt3C3P0Int4R2D2Int5L0Code13

 

 

Note that implication of having word addressability means that the location counter will be incremented by 1 rather than by 4 at each step.

 

Assume labels are sequences of alphanumeric characters only, starting with an alphabetic character.  The maximum length of a label can be up to you, either fixed or unlimited, but your implementation must allow labels to be at least 12 characters in length.  Since the only data type the assembler recognizes is a 32-bit integer, word is the only data declaration directive your assembler will need to be able to handle.  

 

You instructor will provide you with a virtual machine for this abstract stack machine.  (In other words, he will do the hard part of actually executing the binary code.)  

 

You may use Java, C , or Python 2.7 to implement your assembler, but it must generate the appropriate binary code in Big Endian format.  If you use Java, you may use any platform you wish.  If you use any other language, you must either provide a makefile or complete compilation instructions for generating an executable under Linux.   I will be using Java 8,  gcc,  and Python 2.7 either in a VM or using the Windows Subsystem for Linux.

 

Hand in your source code and compilation instructions via Blackboard.  


Your first assignment is to use ad-hoc methods (partially described in the assignment document) to write a two-pass assembler for an abstract stack machine.

Attached also are four files:

· a source file, lab1.asm, and

· a binary file, lab1.bin, which is the output of the assembler when given the previous source. 

· a screen grab of a hexdump of the output file 

· a very much simplified version of the assembler written in Java.  It also lacks the symbol table and symbol table entry class definitions. 

If you were to print the opcode and operand portion of each instruction, one per line, it would look like:
2       0 
24      13 
3       1 
2       2 
1       2 
2       3 
9       0 
2       4 
2       5 
8       0 
10      0 
7       0 
5       0 
22      0 
2       1 
26      0 
0       0

题内code /simpleASM.java

import java.util.*;
import java.io.*;

public class SimpleAsm
{
	public static final int HALT 	= 0;
	public static final int PUSH 	= 1;
	public static final int RVALUE 	= 2;
	public static final int LVALUE 	= 3;
	public static final int ASGN 	= 4;


	public static String [] opcodes = {"HALT","PUSH","RVALUE","LVALUE","STO"};

	public static void main(String [] args)throws IOException
	{
		// get filename
		String filename;
		SymbolTable symTab = new SymbolTable();

		if (args.length != 0)
		{
			filename = args[0];
		}
		else
		{
			filename = "simple.asm";
		}

		// Open file for input
		Scanner infile = new Scanner(new File(filename));

		// pass 1 -- build symbol table
		pass1(infile, symTab);
		infile.close();

		// pass 2 -- assemble
		// reopen source file
		infile = new Scanner(new File(filename));

		// pass 2 -- output binary code
		pass2(infile,symTab);
		infile.close();

		// print symbol table
		dumpSymbolTable(symTab);
		System.out.println("Done");
	}

	public static int lookUpOpcode(String s)
	{
		for(int i = 0; i < opcodes.length; i++)
		{
			if (s.equalsIgnoreCase(opcodes[i]))
			{
				return i;
			}
		}
		System.err.println("\nInvalid opcode:" + s);
		return -1;
	}

	public static void pass1(Scanner infile, SymbolTable tab)
	{
		// initialize location counter, etc.
		int locationCounter = 0;
		String line;
		Scanner input;
		String lexeme;

		// find start of data section
		do
		{
			line = infile.nextLine();
			System.out.println(line);
			input = new Scanner(line);
		} while (!input.next().equalsIgnoreCase("Section"));
		if (!input.next().equalsIgnoreCase(".data"))
		{
			System.err.println("Error:  Missing 'Section .data' directive");
			System.exit(1);
		}
		else
		{
			System.out.println("Parsing data section, pass 1");
		}

		// build symbol table from variable declarations
		line = infile.nextLine();
		input = new Scanner(line);

		// data section ends where code section begins
		while(!(lexeme = input.next()).equalsIgnoreCase("Section"))
		{
			// look for labels (they end with a colon)
			int pos = lexeme.indexOf(':');
			if (pos > 0)
			{
				lexeme = lexeme.substring(0,pos);
			}
			else
			{
				System.err.println("error parsing " + line);
			}
			// insert the lexeme, the type, and its address into the symbol table
			tab.insert(lexeme,"Int",locationCounter);
			locationCounter++;
			line = infile.nextLine();
			input = new Scanner(line);
		}

		// Now, parse the code section, looking for the label directive
		System.out.println("Parsing code section, pass 1");
		locationCounter=0;
		while(infile.hasNext())
		{
			line = infile.nextLine();
			input = new Scanner(line);
			lexeme = input.next();
			// when a label is found, place it and its code offset in the symbol table
			if (lexeme.equalsIgnoreCase("label"))
			{
				lexeme = input.next();
				tab.insert(lexeme,"Code",locationCounter);
			}
			locationCounter++;
		}
	}

	// generate the code
	public static void pass2(Scanner infile, SymbolTable tab)
	{
		// initialize location counter, etc.
		int locationCounter = 0;
		String line;
		Scanner input;
		String lexeme;
		int symTabPtr;
		SymbolTableEntry entry;
		final int NULL = -1;
		// find start of next section
		do
		{
			line = infile.nextLine();
			input = new Scanner(line);

		} while (!input.next().equalsIgnoreCase("Section"));
		if (!input.next().equalsIgnoreCase(".data"))
		{
			System.err.println("Error:  Missing 'Section .data' directive");
			System.exit(1);
		}
		else
		{
			System.out.println("Parsing data section, pass 2");
		}
		line = infile.nextLine();
		input = new Scanner(line);

		while(!(lexeme = input.next()).equalsIgnoreCase("Section"))
		{
			// data section has been processed in previous pass, so skip this
			line = infile.nextLine();
			input = new Scanner(line);
		}

		// Now, let's generate some code
		System.out.println("Parsing code section, pass 2");
		locationCounter=0;
		// while not end of file keep parsing
		while(infile.hasNext())
		{
			line = infile.nextLine();
			input = new Scanner(line);
			lexeme = input.next();
			int ptr;
			//	lookup opcode and generate appropriate instructions
			int opcode = lookUpOpcode(lexeme);
			switch(opcode)
			{
				case HALT:
					insertCode(locationCounter, HALT);
					break;
				case PUSH:
					lexeme = input.next();
					insertCode(locationCounter, PUSH, Integer.parseInt(lexeme));
					break;
				case RVALUE:
					lexeme = input.next();
					ptr = tab.lookup(lexeme);
					insertCode(locationCounter, RVALUE, tab.get(ptr).getAddress());
					break;
				case LVALUE:
					lexeme = input.next();
					ptr = tab.lookup(lexeme);
					insertCode(locationCounter, LVALUE, tab.get(ptr).getAddress());
					break;
				case ASGN:
					insertCode(locationCounter, ASGN);
					break;
				default:
					System.err.println("Unimplemented opcode:  " + opcode);
					System.exit(opcode);
			}
			locationCounter++;
		}
	}

	public static void insertCode(int loc, int opcode, int operand)
	{
		System.out.println(loc + ":\t" + opcode + "\t" + operand);;
	}

	public static void insertCode(int loc, int opcode)
	{
		insertCode(loc,opcode,0);
	}

	public static void dumpSymbolTable(SymbolTable tab)
	{
		System.out.println("\nlexeme \ttype \taddress");
		System.out.println("-----------------------------------------");
		for(int i=0; i<tab.size(); i++)
		{
			System.out.println(tab.get(i));
		}
	}
}

代写CS&Finance|建模|代码|系统|报告|考试

编程类:C++,JAVA ,数据库,WEB,Linux,Nodejs,JSP,Html,Prolog,Python,Haskell,hadoop算法,系统 机器学习

金融类统计,计量,风险投资,金融工程,R语言,Python语言,Matlab,建立模型,数据分析,数据处理

服务类:Lab/Assignment/Project/Course/Qzui/Midterm/Final/Exam/Test帮助代写代考辅导

E-mail:850190831@qq.com   微信:BadGeniuscs  工作时间:无休息工作日-早上8点到凌晨3点


如果您用的手机请先保存二维码到手机里面,识别图中二维码。如果用电脑,直接掏出手机果断扫描。

qr.png

 

    关键字:

天才代写-代写联系方式