| MATR.NO  |  |
|----------|--|
| CIIDNAMF |  |

(REVISED 23-10-2023)

FIRST NAME\_\_\_

1) (POINTS 35/40) Consider a triple-dispatch (3 instructions per cycle) using Tomasulo's algorithm to perform the dynamic scheduling of instructions on the pipeline shown in the following figure.



This pipeline is executing the following program, which performs a search within a vector (initially, R1 = 0).

```
etic: LW R2, 0(R1) ; read Xi

MULI R2, R2, 3 ; multiplies Xi by 3

SW R2, 0(R1) ; write Xi

ADDI R1, R1, 4 ; update R1

BNE R2, R0, etic ; continue to loop if false
```

## Working hypothesis:

- the loop executes speculatively in terms of direction (always taken) and regarding the branch condition; high-performance fetch breaks after fetching a branch
- the issue stage (I) calculates the address of the actual reads and writes; only 1 instruction is issued per cycle
- reads require 5 clock cycles; writes take 0 cycles (they are written in a write-buffer + split-cache)
- when accessing memory (M), reads have precedence over writes and must be executed in-order
- there's only one CDB
- dispatch stage (P) and complete stage (W) require 1 clock cycle
- ASSUME that the reservation stations could be freed right after the issue phase
- only 1 instruction is committed (C stage) per cycle
- there are separated integer units: one for the calculation of the actual address, one for arithmetic and logical operations, one of the integer multiplication and one for the evaluation of the branch condition, as illustrated in this table:

| Type of Functional Unit       | No. of Functional<br>Units | Cycles for stage X | No. of reservation stations |
|-------------------------------|----------------------------|--------------------|-----------------------------|
| LS: Integer (effective addr.) | 1                          | 1                  | 2                           |
| A: Integer (op. A-L)          | 1                          | 1                  | 2                           |
| B: Integer (branch calc.)     | 1                          | 1                  | 2                           |
| M: Integer Multiplication     | 1                          | 4                  | 2                           |

- the functional units TAKE advantage of pipelining techniques internally
- the load queue has 3 slots; the store queue has 3 slots (writes wait for the operand in the store queue, i.e., in the execution stage)
- the rest of the processor and has the following characteristics

Complete the following chart until the end of the FOURTH iteration of the above code fragment in the case of dynamic scheduling with speculation. Also add the instruction that occupies a certain reservation station (one of the 8) as indicated:

| Instr.<br>No | Instruction name | ALU<br>RS1 | ALU<br>RS2 | EAU<br>RS1 | EAU<br>RS2 | BU<br>RS1 | BU<br>RS2 | MU MU<br>RS1 RS2 | P: Dispatch<br>(start-stop) | I+X: Issue<br>(start-stop) | MEM. ACCESS<br>(start-stop) | W: CDB-write<br>(clock) | C: Commit<br>(clock) | Comments |
|--------------|------------------|------------|------------|------------|------------|-----------|-----------|------------------|-----------------------------|----------------------------|-----------------------------|-------------------------|----------------------|----------|
| 101          | LW R2,0(R1)      |            |            | I01<br>1-1 |            |           |           |                  | 1-1                         | 2                          | 3-7                         | 8                       | 9                    |          |
|              |                  |            |            |            |            |           |           |                  |                             |                            |                             |                         |                      |          |
|              |                  |            |            |            |            |           |           |                  |                             |                            |                             |                         |                      |          |

- 2) (POINTS 5/30) On a Linux system, write the SINGLE command line to perform at the BASH shell prompt the following operation (please note that no intermediate files should be used):
  - The file 'data1.txt' contains a list of alpha-numerical values to be used as input
  - The file 'data2.txt' should contain a list of the lines which contain the string with "ciao"
  - The extracted list should also be directed to the printer

## HIGH PERFORMANCE COMPUTER ARCHITECTURE midterm exam 30-10-2019 - SOLUTION (REVISED 23-10-2023)

## **EXERCIZE 1**

| Instr. Instru |            | ALU<br>RS1   | ALU<br>RS2 | EAU<br>RS1          | EAU<br>RS2   | BU<br>RS1    | BU<br>RS2    | MU<br>RS     | MU<br>RS2     | P: Dispatch  | I+X: Issue   | MEM. ACC.    |               |         | nit Comments                                                |
|---------------|------------|--------------|------------|---------------------|--------------|--------------|--------------|--------------|---------------|--------------|--------------|--------------|---------------|---------|-------------------------------------------------------------|
| No name       |            | (start-      | (start-    | (start-             | (start-      | (start-      | (start-      | (start-      | (start-       | (start-stop) | (start-stop) | (start-stop) | write (clock) | (clock) |                                                             |
| IO1 LW R      | R2,0(R1)   | stop)        | stop)      | stop)<br>101<br>1-1 | stop)        | stop)        | stop)        | stop)1       | stop)         | 1-1          | 2-2          | 3-7          | 8             | 9       |                                                             |
| I02 MULI R    | R2,R2,3    |              |            |                     |              |              |              | 102<br>1-8   |               | 1-8          | 9-12         | +            | 13            | 14      | I waits R2 from 1/LW                                        |
| 103 SW R      | R2,0(R1)   |              |            | <b>V</b>            | 103<br>1-2   |              |              |              |               | 1-2          | 3-3          | 23           | \             | 24      | I waits issue logic; M waits R2 M waits mem                 |
| IO4 ADDI R    | R1,R1,4    | 104<br>2-3   |            |                     |              |              |              |              |               | 2-3          | 4-4          | <b>#</b>     | 5             | 25      | I waits issue logic;                                        |
| I05 BNE R     | R2,R0,etic |              |            |                     | <u> </u>     | I05<br>2-13  |              |              |               | 2-13         | 14-14        |              | /             | 26      | I waits R2 from 1/MULI                                      |
| 106 LW R      | R2,0(R1)   | S            |            | 106<br>3-5          |              |              |              |              | Ш             | 3-5          | 6-6          | 8-12/        | 14            | 27      | I waits R1; M waits mem; W waits for CDB                    |
| 107 MULI R    | R2,R2,3    |              |            |                     |              |              |              |              | 107<br>3-14   | 3-14         | 15-18        | X-7          | 19            | 28      | I waits R2 from 2/LW;                                       |
| 108 SW R      | R2,0(R1)   | <b>→</b>     |            |                     | 108<br>3-6   |              |              |              | W             | 3-6          | 7-7          | <b>1</b> 24  |               | 29      | I waits R1; I waits issue logic; M waits R2; M waits mem    |
| 109 ADDI R    | R1,R1,4    | 109<br>4-7   |            |                     |              |              |              |              | $\lambda$     | 4-7          | 8-8          |              | 9             | 30      | I waits R1 from 1/ADDI; I waits issue logic;                |
| I10 BNE R     | R2,R0,etic | <b>\</b>     |            | ¥                   | <b>♦</b>     |              | I10<br>4-19  |              |               | 4-19         | 20-20        | •            | \             | 31      | I waits R2 from 2/MULI;                                     |
| 111 LW R      | R2,0(R1)   |              |            | I11<br>6-9          |              |              |              | ¥            |               | 6-9          | 10-10        | 13/17        | 18 \          | 32      | P waits EA-RSs I waits issue logic; I waits R1; M waits mem |
| I12 MULI R    | R2,R2,3    |              |            |                     |              |              |              | I12<br>9-18  | $\land$       | 9-18         | 19-22        | X V          | 23            | 33      | P waits M-RSs; I waits R2 from 3/LW                         |
| 113 SW R      | R2,0(R1)   |              |            |                     | I13<br>9-10  |              |              |              | $    \rangle$ | 9-10         | 11-11        | 25           | <u> </u>      | 34      | I waits R1; I waits issue logic; M waits R2; M waits mem    |
| I14 ADDI R    | R1,R1,4    | I14<br>9-11  |            | <b>\</b>            | <b>\</b>     | +            |              |              |               | 9-11         | 12-12        | 7-11/        | 15            | 35      | I waits issue logic;                                        |
| I15 BNE R     | R2,R0,etic |              |            |                     |              | I15<br>14-23 |              |              | 1             | 14-20        | 24-24        | <b>y</b>     | /             | 36      | P waits B-RSs; I waits R2 from 3/MULI                       |
| 116 LW R      | R2,0(R1)   |              |            | I16<br>15-15        |              |              |              |              |               | 15-15        | 16-16        | 18-22        | 24            | 37      | I waits R1; M waits mem; W waits for CDB                    |
| I17 MULI R    | R2,R2,3    |              |            | L                   |              |              |              |              | I17<br>15-24  | 15-23        | 25-38        | 7- 1         | 29 //         | 38      | I waits R2 from 4/LW                                        |
| I18 SW R      | R2,0(R1)   |              |            | T                   | I18<br>15-25 |              |              |              |               | 15-25        | 26-26        | 30           | //            | 39      | I waits R1; I waits issue logic; M waits R2; M waits mem*   |
| I19 ADDI R    | R1,R1,4    | I19<br>16-16 |            | <del>-</del>        |              |              | $\downarrow$ | $\downarrow$ | 1             | 16-16        | 17-17        | /            | 20            | 40      | W waits for CDB                                             |
| I20 BNE R     | R2,R0,etic |              |            |                     | <b>+</b>     | 1            | 120<br>20-29 |              | 1             | 20-28        | 30-30        | <b>K</b>     |               | 41      | P waits B-RSs; I waits R2 from 4/MULI                       |

<sup>\*</sup> I18 (an SW) has to waitto issue until there is space in the SQ (there are 3 slots and they are all occupied by the previous stores until cycle 24), then at 25 and 26 previous instructions are issueing: it issues at 26.

## **EXERCIZE 2**

The requested command line is:

grep "ciao" data1.txt | tee data2.tx | lpr