MATRICULATION NO.

## **REVISED 26-10-2023**

SURNAME

## FIRST NAME



Working hypothesis:

- the loop executes speculatively in terms of direction (always taken) but not regarding the branch condition; high-performance fetch breaks after fetching a branch;
- case A) no-speculation on branch condition; case B) speculation on branch condition;
- the issue stage (I) calculates the address of the actual read/write and push it into load/store queues; only 1 instruction is issued per cycle
- reads require 1 clock cycle; writes take 1 cycle
- when accessing memory (M), writes have precedence over reads and must be executed in-order
- there is a single CDB
- dispatch stage (P) and complete stage (W) require 1 clock cycle
- only 1 instruction is committed (C stage) per cycle
- there are separated integer units: one for the calculation of the actual address, one for arithmetic and logical operations,
- one of the integer multiplication and one for the evaluation of the branch condition, as illustrated in this table:

| Type of Functional Unit       | No. of Functional Units | Cycles for stage I+X | No. of reservation stations |
|-------------------------------|-------------------------|----------------------|-----------------------------|
| LS: Integer (effective addr.) | 1                       | 1                    | 2                           |
| A: Integer (op. A-L)          | 1                       | 1                    | 2                           |
| B: Integer (branch calc.)     | 1                       | 1                    | 2                           |

• the functional units do not take advantage of pipelining techniques internally

• reservation stations are busy until the end of CDB-write (except for Stores)

• the load queue has 5 slots; the store queue has 5 slots (writes wait for the operand in the store queue, i.e., in the execution stage)

Complete the following chart until the end of the **third** iteration of the code fragment above, both in the case of simple dynamic scheduling (case A) that in the case of dynamic scheduling with speculation (case B).

| Instr.<br>No | Instruc<br>name | tion     | P: disPatch<br>(start-stop) | I+X:Issue+Exec<br>(start-stop) | M: MEM.ACCESS<br>(clock) | W: CDB-write<br>(clock) | C: Commit<br>(clock) | Comments |
|--------------|-----------------|----------|-----------------------------|--------------------------------|--------------------------|-------------------------|----------------------|----------|
| 101          | LW              | R2,0(R1) | 1-4                         | 2                              | 3                        | 4                       | 5                    |          |
|              |                 |          |                             |                                |                          |                         |                      |          |
|              |                 |          |                             |                                |                          |                         |                      |          |

2) (POINTS 16/40) Given the sequence P1: R, P2: R, P3: R, P1: W, P2: W, P3: W (Px:R = read by the processor Px, Px:W write by the processor Px), with respect to a certain variable 'a ', show for each processor the sequence of states, and transactions on the bus that occur in a multiprocessor UMA with write-back caches for each processor and PSCR coherence protocol.