计算机架构代写 Computer Architecture代写

I218 Computer Architecture Term-end Exam.

计算机架构代写 Problem 1 The following processor is given for this problem. Data cache Total data size (capacity)： 16K（16384）bytes Block size： 16 bytes

Problem 1 计算机架构代写

The following processor is given for this problem.

Data cache

Total data size (capacity)： 16K（16384）bytes
Block size： 16 bytes
4-way set associative／LRU replacement
Write-back for write by sw

Virtual memory / Page table

Page size： 4K（4096）bytes
Physical memory is large enough not to cause page faults.
There is a page table entry for translation from virtual page number 1 to physical page number 2.
There is a page table entry for translation from virtual page number 2 to physical page number 3.
There is a page table entry for translation from virtual page number 3 to physical page number 5.

Virtual memory / TLB for data accesses 计算机架构代写

The number of entries in the data TLB： 2
Fully associative／LRU replacement

The pipeline in this processor is the same as the MIPS 5-stage pipeline in the textbook and lecture slides, where data dependence on a load instruction (lw) makes the pipeline stall for one cycle. This processor implements delayed branching with a single delay slot, which can avoid bubbles caused by control hazards. Virtual and physical addresses consist of 32 bits.

On a cache miss for lw and sw, it takes cache miss penalty of 20 cycles to transfer a block from physical memory to a cache block entry. During the miss penalty, the pipeline stalls. Similarly, on a TLB miss, it takes TLB miss penalty of 10 cycles for the hardware to transfer a page table entry from a page table in physical memory to a TLB entry, during which the pipeline stalls.

The following instruction sequence is executed by this processor.

  Loop: lw $t0, 0 ($s1)
        lw $t1, 512 ($s1)
        add $t0, $t0, $t1
        sw $t0, 4096 ($s1)
        bne $s1, $s3, Loop
        addi $s1, $s1, 4        # ← delay slot

The initial value of $s1 is 2048, and that of $s3 is 2076 in advance of the loop execution. All data are initially in physical memory. (That is, no page faults happen.) All entries in the cache and TLB are initially invalid (valid = 0).

Assume that memory accesses for fetching instructions don’t cause any cache misses or TLB misses/page faults. (Focus only on data accesses.)

Answer the following questions. (3 pt. for each from （１） to （１０）) 计算机架构代写

（１）How many block entries are there in the data cache ? Also, how many sets ?

The number of block entries ＝ _________

The number of sets ＝_________

（２）How many bits do a tag field , an index field , and a block offset field in a physical memory address consist of ?

Tag field ＝ _________ bits

Index field ＝ _________ bits

Block offset field ＝ _________ bits

（３）In a physical memory address, how many bits does a physical page number consist of ? How many bits does a page offset consist of ?

Physical page number ＝ _________ bits

Page offset ＝ _________ bits

（４）How many cache misses happen during the whole loop execution ?

（５）Show the virtual address and physical address that the sw instruction in the last iteration references.

Virtual address ＝ _________

Physical address ＝ _________

（６）For data accesses in this loop execution, which can be seen, temporal locality or spatial locality ?

（７）How many TLB misses occur during the whole loop execution ?

（８）What is the CPI for the whole loop execution ? You can suppose that it takes one cycle to execute an instruction unless data hazards on lw, cache misses, or TLB misses happen.

（９） Apply loop unrolling such that “2” iterations in the original loop are packed into a new loop body, and code scheduling in order to obtain smallest CPI. Show the instruction sequence after the unrolling and code scheduling. You can assume that the initial value of $s3 is not 2076. In such a case, show the initial value. (Superscalar/Multiple issue execution is “NOT” introduced in this processor.)

Initial value of $s3 ＝ _________

Code after unrolling (Not mandatory. Just for preparation for the final answer)

Then, code after unrolling and code-scheduling

（１０）What is the CPI for the whole loop execution of the code sequence obtained in (９) ?

(Superscalar/Multiple issue execution is not introduced.) In addition, how faster is this execution than that of the original code ?

Problem 2 计算机架构代写

Answer the following questions. (2 pt. for each from （１） to （５）)

（１）What is “data hazard” ? Answer it briefly including how to solve it.

（２）The hazard detection unit for data dependence on a result of load instruction (lw) implements the following logic.

The above hazard detection logic can cause wrong (unnecessary) stalling that wouldn’t be required. Give a code sequence consisting of two instructions for what the condition causes such unnecessary stalling.

if ( ID/EX.MemRead == ‘1’ &&
   ( ID/EX.RegisterRt == IF/ID.RegisterRs || ID/EX.RegisterRt == IF/ID.RegisterRt ))
     stall the pipeline; /* negates PCWrite, IF/IDWrite and control signals in ID/EX */

（３）What are “temporal locality” and “spatial locality” in memory references?

（４）What is “LRU (Least Recently Used)” ?

（５）What is “page fault” exception?

合作平台：随笔代写论文代写写手招聘英国留学生代写