ECE463/563 – Microprocessor Architecture Project #2
Microprocessor Architecture代写 The goal of this project is to design and implement a C/C++ cycle-accurate simulator of a dynamically
Due date: April 10, 2019
Objective Microprocessor Architecture代写
The goal of this project is to design and implement a C/C++ cycle-accurate simulator of a dynamically scheduled processor implementing the Tomasulo algorithm with Reorder Buffer.
Students taking the course at the undergraduate level (ECE463) do not need to support store instructions (SW and SWS) in their simulator, while students taking the course at the graduate level (ECE563) will need to support store instructions as well.
Code organization Microprocessor Architecture代写
The project2_code.tar.gz archive contains C and C++ code templates for the simulator, as well as test cases to validate its operation. ECE563 students are required to use C++ code templates.
In order to extract the code in a Linux environment, you can invoke the following command:
tar –xzvf project2_code.tar.gz
This will create a project2_code folder containing two subfolders: c and c++, the former containing the C templates and the latter containing the C++ templates. The content of these two subfolders is very similar.Microprocessor Architecture代写
The c and c++ folders have the following content:
- h/sim_ooo.cc (or sim_ooo.c): code templates for the dynamically scheduled processor’s simulator. These are the only files that you need to modify to implement the simulator.
- Test cases: test cases to validate the operation of the dynamically scheduled processor’s simulator. This folder contains five test cases. For each of them, you will find two files: Test caseN.cc and Test caseN.out. The former contains the test case implementation, and the latter the expected output of the test case. You should not modify any of the test case files. We recommend writing your own test cases to verify the functionality that is not covered by the test cases
- Makefile:Makefile to be used to compile the The use of this Makefile will cause an object file (.o) to be created for each C or C++ file that is part of the project. If the compilation succeeds, the binaries corresponding to the test cases will be generated in the bin folder. You don’t need to modify the Makefile unless you are working on the floating-point pipeline simulator or you are using a library that is not included by default.
- asm: assembly files used by the test
- bin: once you compile the code, the test cases binaries will be saved into this
Important Microprocessor Architecture代写
The sim_ooo.h header file is commented and contains details on the functions/methods that you need to implement. Be sure to read the comments in this header file carefully before you start coding.
Assumptions & Requirements
- Data types and registers: The simulator operates on 32-bit integer numbers and 32-bit floating-point numbers stored in data memory. The simulator has 32 integer registers (R0-R31) and 32 single-precision floating- point registers (F0-F31).
- Memories: The instruction and data memories are separated. The instruction memory returns the instruction fetched within the clock cycle, while the data memory has a configurable
- Execution units and memory: The number of execution units and their latency must be configurable. The function init_exec_unit can be invoked at the beginning to configure the execution units used. Execution units can be of five kinds: INTEGER unit, floating-point ADDER, MULTIPLIER, DIVIDER and MEMORY (see exe_unit_t data type). The integer units are used for integer additions (and branches),subtractions and logic operations; the floating-point adders are for floating-point additions and subtractions; the multipliers are for integer and floating-point multiplications; and the dividers for integer and floating- point divisions. While the simulator can have multiple integer units, adders, multipliers and dividers, it will always have a single data memory unit (used by load and store instructions). While the processor can have multiple execution units of the same type with different latencies, in our test cases all the units of the same type will have the same latency. For simplicity, assume that all execution units are unpipelined. You can assume that the latency of the execution stage corresponds to that of the execution unit it Microprocessor Architecture代写
4.Reservation stations and load buffers:
Besides load buffers, the processor should have three kinds of reservation stations (see res_station_t data type). Integer reservation stations (INTEGER_RS) feed the integer units, add reservation stations (ADD_RS) feed the floating-point adders, and multiplier reservationstations (MULT_RS) are shared by multipliers and dividers. Load buffers (LOAD_B) are used by memory instructions.
5.Initialization: the size of the data memory, the size of the reorder buffer, the number of reservation stations and load buffers, and the issue width can be configured when instantiating the simulator.Microprocessor Architecture代写
/* Instantiates the simulator
Note: registers must be initialized to UNDEFINED value, and data memory to all 0xFF values
*/
sim_ooo(unsigned mem_size, // size of data memory (in byte) unsigned rob_size, // number of ROB entries
unsigned num_int_res_stations, // number of integer reservation stations unsigned num_add_res_stations, // number of ADD reservation stations unsigned num_mul_res_stations, // number of MULT/DIV reservation stations unsigned num_load_buffers, // number of LOAD buffersMicroprocessor Architecture代写
unsigned issue_width=1 // issue width
);
- Reservation stations, ROB, instruction window: the code template already contains data structures to model reservation stations, ROB and the instruction window. You can extend these data structures if you need, but you should not modify existing
- Instructionssupported: the processor simulator needs to support the following instructions (which are listed in the h header file).
- LW– Load word: Loads a 32-bit integer into a register from the specified address.
- SW– Store word: Stores a 32-bit integer into data memory at a specified address.
- ADD/SUB/XOR/AND/MULT/DIV – Add/Sub/Xor/And/Mult/Div: Computes the addition/subtraction/exclusive XOR/AND/multiplication/division of the values of two integer registers and stores the result into an integer
- ADDI/SUBI – Add/Sub Immediate: Computes the addition/subtraction of the value of an integer register and a sign-extended immediate value and stores the result into an integer
- BEQZ, BNEZ, BLTZ, BGTZ, BLEZ, BGEZ – Branch if the value of the register is =, ≠, <, >, ≤,≥Microprocessor Architecture代写
zero. JUMP – Unconditional branch.
- LWS– Load word: Loads a 32-bit floating-point into a register from the specified address.
- SWS– Store word: Stores a 32-bit floating-point value into data memory at a specified address.
- ADDS/SUBS/MULTS/DIVS – Add/Sub/Mult/Div: Computes the addition/subtraction/
/multiplication/division of the values of two floating-point registers and stores the result into a floating- point register.Microprocessor Architecture代写
- EOP – End of program: Special instruction indicating the end of the Asmentioned above, ECE463 do not need to support the SW and SWS instructions in their simulator.
8.Load/store instructions and dependencies:
Load and store instructions use the memory unit in different stages. Load instructions load data from memory in the execution stage and send their result to the reservation stations and ROB in the write result stage. Store instructions write data to memory at the commit stage and use the execution stage only for the computation of the effective memory address. Load and store instructions from/to different memory addresses do not conflict.Microprocessor Architecture代写
However, in a dynamically scheduled processor load and store instructions that operate on the same memory address can cause RAW, WAW and WAR memory hazards. Note, however, that since the ROB postpones writing memory to the commit stage (and this stage processes instructions in program order), in this case WAW and WAR dependencies are not an issue. So, your implementation should handle only RAW load/store dependencies occurring when a load operation follows a pending store operation to the same memory address. Your implementation should assume that, in this situation, the value that will be written to memory by the pending store should be bypassed to the load. To meet these requirements,
your implementation should process memory instructions asfollows: Microprocessor Architecture代写
- When a load or store instruction is issued, the address field of the corresponding reservation station isfilled with the value of the immediate field of the instruction. The address field is updated with the effective memory address when the instruction is moved to the execution stage.
- When multiple instructions are eligible to move from the issue to the execution stage, they are considered in program order. In addition, the effective memory address is computed in program order. Load instructions are not allowed to execute if preceded by potentially conflicting store
- Store instructions are allowed in the execution stage only if the values of both their input registers are Note that,
- while a less stringent timing is possible, this facilitates the handling of data bypassing between store and dependent load instructions.Microprocessor Architecture代写
- Store instructions update the destination field of the ROB with the effective memory address when they enter the execution stage, and they update the value field of the ROB with the result that they will later write to memory in the write-result
- In the presence of a pending store to the same address, data bypassing from the store to the load instructionis done when the load enters the execution In this stage, the load will save the value bypassed in the value2 field of its reservation station. In the write-result stage, the load will write this value in the reorder buffer.
- Bypassing happens through the ROB (for the timing, see timing constraint on data dependenciesbelow).
- Store instructions spend only one clock cycle in execution stage, and a number of clock cycles equal to the memory latency in the commit stage. In the presence of store-to-load data bypassing, load instructions spend only one clock cycle in the execution stage (since in this case they don’t use the memoryunit).Microprocessor Architecture代写
Note that data bypassing does not modify the timing of store instructions, but it only speeds up the processing of dependent load instructions.Microprocessor Architecture代写
9.Other timingconstraints:
- If an instruction I uses a ROB entry, such ROB entry becomes available and can be used in the clock cycle after I
- In the presence of a structural hazard on an execution unit, the execution unit is available to the second instruction in the clock cycle after the first instruction has finished using it. If an instruction I uses an execution unit in the execution stage, such execution unit is freed when I writes the result and is available to a different instruction starting from the next clock
- If an instruction I uses a reservation station or a load buffer, such reservation station or load buffer is freedwhen I writes the result, and is available to a different instruction starting from the next clock
- If an instruction J depends on instruction I, the data written by instruction I will be available to J in the clock cycle when I writes the result (say t), and (in the absence of data hazard), instruction J will be able to execute in the following clock cycle (sayt+1).Microprocessor Architecture代写
- In the write result stage branch instructions write their target address in the result field of the corresponding ROB entry. If the branch is mispredicted, the target instruction of the branch is fetched in the clock cycle after the branch instruction commits. While a less strict timing is possible, this allows precise exception
- CPI computation: The EOPinstruction should be excluded from the computation of the CPI.
Format of the output Microprocessor Architecture代写
The code template includes functions to print out the content of the data memory, the status of the registers (and, if their value is pending, the ROB entry that will update the register), the content of the reservation stations and load buffers, the content of the reorder-buffer, the status of the pending instructions (that is, the ones still in the ROB), and the instruction execution history (which shows the cycle-by-cycle execution of all instructions fetched during operation). These functions do not need to be modified, and they are invoked from the test case files.
//prints the values of the registers void print_registers();Microprocessor Architecture代写
Example output:
GENERAL PURPOSE REGISTERS
Register Value ROB
R1 10/0x0000000a –
R2 20/0x00000014 –
R3 10/0x0000000a -Microprocessor Architecture代写
F0 – 0
F1 10/0x41200000 –
F2 – 1
F3 – 2
F4 40/0x42200000 –
F5 50/0x42480000 –
F6 60/0x42700000 –
F7 70/0x428c0000 –
F8 80/0x42a00000 –
F9 90/0x42b40000 –
F10 100/0x42c80000 –
//prints the content of the reservation stations void print_reservation_stations();Microprocessor Architecture代写
Example output:
REORDER BUFFER
Entry Busy Ready PC State Dest Value
1 no no – – – –
2 yes no 0x00000004 EXE F2 –
3 yes no 0x00000008 ISSUE F3 –
4 yes no 0x0000000c ISSUE F4 –
5 no no – – – –
6 no no – – – –
//prints the content of the reservation stations
void print_reservation_stations();
//print the content of the instruction window void print_pending_instructions();
Example output:
PENDING INSTRUCTIONS STATUSMicroprocessor Architecture代写
PC Issue Exe WR Commit
– – – – – 0x00000004 1 3 4 –
0x00000008 2 – – –
0x0000000c 3 4 – –
0x00000010 4 – – –
– – – – –
//print log
void print_instruction history();
Example output:
EXECUTION LOG
PC Issue Exe WR Commit
0x00000000 0 1 2 3Microprocessor Architecture代写
0x00000004 0 3 4 5
0x00000008 0 5 7 8
0x0000000c 0 3 13 14
0x00000010 1 2 4 15
0x00000014 1 14 – –
0x00000018 4 5 – –
0x0000001c 8 9 11 –
0x00000020 12 – – –
0x00000020 16 17 19 20
0x00000024 16 20 22 23
Testing Microprocessor Architecture代写
As mentioned above, the compilation process generates a separate binary for each test case in the testcases
folder. To execute testcaseX, you can go in the c/c++ folder and invoke:
./bin/testcaseX
To check if your output is correct, you can compare it with file testcaseX.out in the testcases folder. On Linux, you can use the diff utility to do so.
For example, you can invoke Microprocessor Architecture代写
./bin/testcaseX > my_output
diff my_output testcases/testcaseX.out
The first command will run the test case and save its output into my_output. The second command will compare your output with the reference output line-by-line.
Grading guide Microprocessor Architecture代写
ECE463 students
- Report = 10points
- Code = max 40points
- 40 points if the code compiles and runs, and you can run successfully at least one testcase
- 0-40 points if the code does not compile/run, but you have written substantial amount of code. The exact score will depend on the status of your code (amount of code written/functionality implemented).
- Test cases = max 50points
- Test cases 1-5 = 8 pointseach
- § 0.5pointsif the number of clock cycles and instructions executed match (or almost match)
- § 1 point if the final content of registers and memorymatch
- § 3.5pointsif the execution log fully matches (according to diff)
- § 3pointsif the cycle-by-cycle execution fully matches (according to diff)
- Test case 6 = 10points Microprocessor Architecture代写
- § 0.5pointsif the number of clock cycles and instructions executed match (or almost match)
- § 1 point if the final content of registers and memorymatch
- § 4.5pointsif the execution log fully matches (according to diff)
- § 4pointsif the cycle-by-cycle execution fully matches (according to diff)
ECE563 students Microprocessor Architecture代写
- Report = 10points
- Code = max 40points
- 40 points if the code compiles, runs, and you can run successfully at least one testcase
- 0-40 points if the code does not compile/run, but you have written substantial amount of code. The exact score will depend on the status of your code (amount of code written/functionality implemented).
- Test cases = max 50 points (10 test cases, 5 pointseach)
- Test cases1-8
- § 0.5pointsif the number of clock cycles and instructions executed match (or almost match)
- § 0.5 points if the final content of registers and memorymatch
- § 2.5pointsif the execution log fully matches (according to diff)Microprocessor Architecture代写
- § 1.5pointsif the cycle-by-cycle execution fully matches (according to diff)
- Test cases 9 &10
- § 0.5pointsif the number of clock cycles and instructions executed match
- § 1 points if the final content of registers and memorymatch
- § 3.5pointsif the execution log fully matches (according to diff)
Self-grading
Test your code on grendel.ece.ncsu.edu, fill the following table (according to the grading guide below, and include them in your report).
Test case |
Match |
Partial match |
Comment (brief explanation of test case evaluation) |
Test case 1 |
Microprocessor Architecture代写 | ||
Test case 2 |
|||
Test case 3 |
|||
Test case 4 |
|||
Test case 5 |
Microprocessor Architecture代写 | ||
Test case 6 |
|||
Test case 7 |
Test case 8 |
|||
Test case 9 |
|||
Test case 10 |
Comment should be a brief explanation of the test case evaluation. Examples: “the test case and reference output fully match”, “everything but the cycle-by-cycle execution matches”, “the final content of memory and register and the execution log fully match”, etc.
Submission instructions Microprocessor Architecture代写
- Report:Your report should be no longer than 4 pages (11 pt. Times New Roman font), including figures. It should include the following information:
- Brief description of the salient aspects of yourimplementation;
- Brief explanation of what works and what not in your implementation;
- Tableabove
- If you modified the Makefile, indicate it in the report.
Save your report in pdf format, with file name project2_report.pdf. Include the report in the project2_code folder.
- Testcases: You should not modify any of the test All the functionality should be included in the sim_ooo.h and sim_ooo.c/cc, files. You can add header and C/C++ files, but you don’t need to.
- Code:Independently of the development environment and operating system you used to develop your code, your code should compile and run on the ece.ncsu.edu Linux machine, and it should compile using the provided Makefile (you can modify the Makefile if you are using libraries which are not included or if you have implemented the floating-point pipeline simulator). Have a look at Piazza for information on how to access the grendel machine.Microprocessor Architecture代写
- You should invoke “make clean” before submitting your code. That is, your submission should not contain any object or binary
- Remove the folder containing the templates that you did not use. In other words, if you used the C templates, delete the c++ folder and its content; if you used the C++ templates, delete the cfolder and its content. Your project2_code folder should contain the pdf and one of the c or c++ folders.
- Go to the parent folder of the project2_code Compress the whole project2_code directory into a tar.gzfile.
tar -zcvf project2_code.tar.gz project2_code
- Submit your project through Moodle (no need to print the report and take it toclass).
其他代写:考试助攻 计算机代写 java代写 algorithm代写 代写CS assembly代写 function代写 matlab代写 金融经济统计代写 web代写 数学代写 essay代写 编程代写 report代写 algorithm代写 数学代写 finance代写