Homework 3 : Compiling Simple Expressions to P–Code

Abstract

The task of this assignment is to implement a compiler that can evaluate simple arithmetic expressions. The focus is on the code gen- eration part. [2] [1].

1 Preparation

You need to have some programming environment to edit, compile and run C programs.

The concepts of stack should be understood. The picture above de- scribes a stack, which is found from the web at https://www.cs.cmu.edu/

Besides the textbook [ ], especially chapter 8, there are several docu-

ments that are helpful for this homework:

• In the folder slides at the ftp site of this course:

– The PowerPoint file for Code Generation. This file presents main ideas in chapter 8 of the textbook. Especially, some pictures of parse trees and the flow chart of translating while–statement and if–statement are shown.

– p-code.pdf and p-code.txt. The two files have the same content, which describes (a subset of) P-code instructions, which expla- nations, that are relevant to this project.

• In the folder ExpS–to–P, which is provided with this homework, there are several files .

– The .exp files are the code of ExpS programs.

– The .p files are the P-code programs translated from the corre- sponding .exp files. P-code program for array.exp is not pro- vided.

• gencode.c which is provided for this homework. This file combines the code provided in Chapter 8 of the textbook. This file describes a general skeleton of a code generator.

2 P-codeMachine

The runtime environment of a P–code virtual machine (PVM) can be im- plemented as follows:

There are 4 pieces of data structures:

• code area: It is an array of P–code instructions. There are two ways that instructions can be loaded into the code area:

1. All together: A sequence of P–code instructions can be loaded all together to the code area from an existing P–code program file.

2. On the fly: P–code instruction will be appended one-by-one to the instruction array by some language translator.

When a PVM is running, the current instruction’s address (an index in the array) is remembered. By default, current = current + 1, af- ter the current instruction is executed. A special case is a jumping instruction: when the current is a jump instruction to some label, and the corresponding label instruction is at address j, then current will be changed to j+1.

Once an instruction is added to the code area, it should not be removed or changed.

• data area: It is an array of integers (since we only require integer operations). The values of all variables will be saved in this area. The address of a variable x is the index in the array where x’s value is saved.

• symbol table: It is a data structure that supports the following oper- ations:

– lookup(x): Given a name (a string) return its address in the data area. If x does not exists, some special signal is returned. The names include variable names and label names. The value of a label is the corresponding label instruction address in the code area.

– insert(x): Allocate a new slot in the data area for the value of the name x (x is not initialized at the moment). Nothing is done if x is already existing. The address of x should be registered in the symbol table so that it can be looked up. Therefore, the symbol table remembers the next available address in the data area, and the size of the data area. When a name x is used by an instruction (lod or lda), if x is new (lookup(x) cannot find it), x is inserted into the symbol table. A special kind of name is a label. The value of a label is the address of the corresponding label defining instruction in the code area.

• stack: A stack is needed by the P-code instructions. (Read the de- scription of P-code instructions p-code.pdf). It is an array of integer. A stack remembers:

– the stack top, which is an index in the array, it is the address where the latest data item is saved. Data are added or removed from the stack at the top. Initially, when the stack is empty, the top could be -1.

– the size of the stack: the maximum number of data items of the stack, which can be implemented as the size of the array.

The stack supports 2 operations:

– pop(): When the stack is not empty, return the data (integer) at the top, and top = top − 1. When the stack is empty, nothing is popped, and some error message is printed out.

– push(c): When the stack is not full, top = top + 1, and c is saved at the top in the data area. When the stack is already full, nothing is pushed, and some error message is printed.

There are two ways that a source program can be translated to and executed as P-code instructions.

• Compiled. A source program is translated to a P-code program, and then the P-code program is loaded and executed.

• Interpreted. A statement of the source program is translated to a sequence of P-code instructions, then, this sequence of P–code instruc- tions are executed. Especially, for a structured statement: Compound– statement, If–statement, or While–statement, the PVM will wait until the whole structured statement is provided and then start to run the structured statement.

There are two different ways to print results on the screen when a PVM is running.

• Verbose mode: After each expression or assignment statement, the value of the expression or the RHS of the assignment is printed. The calculator described in homework 1 could be considered in the verbose mode.

• Silent mode: Only the write instruction (wri) will print an integer on the screen. Otherwise, nothing is printed.

A programmer can decide which mode to choose.

2.1P–codeprogram

A P–code program is a sequence of P–code instructions. A semicolon ; will start a comment that stop at the end of line. Detailed descriptions of P–code instructions are in the textbook and the file p-code.pdf. A P–code program will stop when there is no more instruction to run, or the halting instruction (stp) is executed.

3 The enriched ExprSlanguage

The language Expr–Simple (ExpS) is defined the same as in homework 1; in order to make the language more powerful and let the code generation task more interesting, we allow additional features in ExpS, as follows:

• Comparison operators are allowed: >, <, >=, <=, ==, ! =. There- fore, 6 more token types are added for them.

• While–statement is allowed, and the token while is considered a re- served word.

• If–statement is allowed, and the two tokens ifand elseare considered as reserved words.

• Curly braces, { and }, are allowed.

• A simple statement (assignment or expression statement) will end at a semicolon or newline.

• EOF token is not required. If we require that every ExpS program ends with a quit, then we don’t need the EOF token.

A programmer has the freedom to decide the token types.

3.1 Updated lexical rules ofExprS

Correspondingly, the updated lexical rules to define tokens of ExpS are described by regular–expressions as follows in Table 3.1. This table is the same as we saw in homework 1, but with additional token types.

More descriptions of the tokens are in the document for homework 1.

3.2 Updated syntax rules ofExpS

The updated grammar that allows while and if statements are shown in Table

3.2. For the tokens that are not ID or numbers or keywords, their string literals, instead of their token types, are directly shown in the grammar for simplicity of presentation. For two items in the grammar (variable or constant) X and Y , we use space in stead of · (which is used in homework

1) to describe the concatenation of X and Y ; i.e, X Y means the same as

X · Y .

Note that the grammar rules in Table 3.2 is not ideal for writing a recursive–descendent parser, since it does not represent the precedence and

Table 1: the lexical rules of tokens

associativity of operators in the expressions. You can adjust the grammar according to your consideration.

4 Tasks todo

1. Implement a PVM (virtual machine of P–code). When a PVM is running, it will execute the instructions that are in its memory of code area.

2. Write a language translator that can translate ExpS statements to P–code instructions. There are two choices to design the translator.

Table 2: Grammar of the ExpS language

• An interpreter: An ExpS statement is translated to some P–code instructions, then these instructions are executed, then continue to process the next ExpS statement. The calculator of home- work1 can be implemented as an interpreter.

• A compiler: An ExpS program (a sequence of statements) are translated all together to a sequence of P–code instructions, then these instructions are executed.

You are only required to implement one (interpreter of compiler).

Bonus points will be earned if you can do the following parts

• You program can work both as a compiler or interpreter.

•

Support arrays of the ExpS language. You may modify the grammar of ExpS in Table 3.2 for arrays. Hint: the grammar listed in cgencode.c will be helpful.

Here is some recommended strategy of doing this homework.

• Continue to use the data structure of homework 1.

• Translate simple expressions first, i.e. only the assignment and arith- metic statements of ExpS. Then try the if and while statements.

• If your homework 1 work is not perfect, you can focus on the code generator part in this homework, i.e, the code generator can assume some parse tree is already in the memory.

5 Submitting yourhomework

• Due time: 12 January 2018 Saturday, 7:00pm.

• At most 3 students can form a group to do this homework together.

• At the top of the every text file that you changed or created (maybe just the .c and .h files), record your name (if mutiple students forma group, provide all the names) and date as comments.

• Please attach all of your source code and document files to the email, possibly zip them into a single file. Any information about how to run and understand your code (a readme file) will be helpful.

• Attention:Do not attach any binary files (.o, .obj, .exe, .out), since google will consider them virus. Just send the source code and textual documents.

• Send your email to:

[email protected]

• The title (subject) of your email should be like:

[name][hmkNum info]

Name can be in Chinese or English (Pingying). For example, if a student wants to submit homework 5, whose name Shan Li, then the title of the email should be:

[Li, Shan][hmk3 code-generator]

If it is a group work, include all members’ name in the email title.

References

[1] Andrew W. Appel. ModernCompilerImplementationinC:BasicTech- niques. Cambridge University Press, New York, NY, USA, 1997.

[2] Kenneth C. Louden. Compiler Construction Principles and Practice. PWS Publishing Company, 1997. ISBN 0-534-93972-4