## EECS 151/251A Homework 8

Due Monday, April 8<sup>th</sup>, 2024

## Introduction

This homework is meant to test your understanding of the RISC-V architecture as well as the design of memory blocks. There are 8 questions. Please check Ed first if you have any questions.

Below is a list of RISC-V instructions that you can use for problems 1 and 2.

|                       |       | Base Instr | uction S |             |         | _     |
|-----------------------|-------|------------|----------|-------------|---------|-------|
| imm[31:12]            |       |            |          | rd          | 0110111 | LUI   |
| imm[31:12]            |       |            |          | rd          | 0010111 | AUIPC |
| imm[20 10:1 11 19:12] |       |            | rd       | 1101111     | JAL     |       |
| imm[11:0]             |       | rs1        | 000      | rd          | 1100111 | JALR  |
| imm[12 10:5]          | rs2   | rs1        | 000      | imm[4:1 11] | 1100011 | BEQ   |
| imm[12 10:5]          | rs2   | rs1        | 001      | imm[4:1 11] | 1100011 | BNE   |
| imm[12 10:5]          | rs2   | rs1        | 100      | imm[4:1 11] | 1100011 | BLT   |
| imm[12 10:5]          | rs2   | rs1        | 101      | imm[4:1 11] | 1100011 | BGE   |
| imm[12 10:5]          | rs2   | rs1        | 110      | imm[4:1 11] | 1100011 | BLTU  |
| imm[12 10:5]          | rs2   | rs1        | 111      | imm[4:1 11] | 1100011 | BGEU  |
| imm[11                | - 1   | rs1        | 000      | rd          | 0000011 | LB    |
| imm[11:0]             |       | rs1        | 001      | rd          | 0000011 | LH    |
| imm[11                |       | rs1        | 010      | rd          | 0000011 | LW    |
| imm[11                |       | rs1        | 100      | rd          | 0000011 | LBU   |
| imm[11                | .:0]  | rs1        | 101      | rd          | 0000011 | LHU   |
| imm[11:5]             | rs2   | rs1        | 000      | imm[4:0]    | 0100011 | SB    |
| imm[11:5]             | rs2   | rs1        | 001      | imm[4:0]    | 0100011 | SH    |
| imm[11:5]             | rs2   | rs1        | 010      | imm[4:0]    | 0100011 | SW    |
| imm[11:0]             |       | rs1        | 000      | rd          | 0010011 | ADDI  |
| imm[11:0]             |       | rs1        | 010      | rd          | 0010011 | SLTI  |
| imm[11:0]             |       | rs1        | 011      | rd          | 0010011 | SLTIU |
| imm[11:0]             |       | rs1        | 100      | rd          | 0010011 | XORI  |
| imm[11:0]             |       | rs1        | 110      | rd          | 0010011 | ORI   |
| imm[11:0]             |       | rs1        | 111      | rd          | 0010011 | ANDI  |
| 0000000               | shamt | rs1        | 001      | rd          | 0010011 | SLLI  |
| 0000000               | shamt | rs1        | 101      | rd          | 0010011 | SRLI  |
| 0100000               | shamt | rs1        | 101      | rd          | 0010011 | SRAI  |
| 0000000               | rs2   | rs1        | 000      | rd          | 0110011 | ADD   |
| 0100000               | rs2   | rs1        | 000      | rd          | 0110011 | SUB   |
| 0000000               | rs2   | rs1        | 001      | rd          | 0110011 | SLL   |
| 0000000               | rs2   | rs1        | 010      | rd          | 0110011 | SLT   |
| 0000000               | rs2   | rs1        | 011      | rd          | 0110011 | SLTU  |
| 0000000               | rs2   | rs1        | 100      | rd          | 0110011 | XOR   |
| 0000000               | rs2   | rs1        | 101      | rd          | 0110011 | SRL   |
| 0100000               | rs2   | rs1        | 101      | rd          | 0110011 | SRA   |
| 0000000               | rs2   | rs1        | 110      | rd          | 0110011 | OR    |
| 0000000               | rs2   | rs1        | 111      | rd          | 0110011 | AND   |
| fm pred succ          |       | rs1        | 000      | rd          | 0001111 | FENCE |
| 00000000000           |       | 00000      | 000      | 00000       | 1110011 | ECALL |
| 000000000001          |       | 00000      | 000      | 00000       | 1110011 | EBREA |

Figure 1: RISV Instruction set

#### **Problem 1: Enabling Instructions**

Ignore the FENCE, ECALL, and EBREAK instructions. For the other instructions, which instructions would not be possible to execute on the datapath presented in class for the single-cycle processor we showed in class (Lecture 15 slide 15). For each instruction that cannot be run, please explain what signals/processing units are missing, and how you can modify the datapath to enable them. To clarify any instructions you are unfamiliar with, you can use the RISC-V Instruction Manual (https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf). You can assume the memory is byte-addressable.

## **Problem 2: Encoding Instructions**

Let's assume you wanted to allow a more compact encoding of each instruction (and didn't care about all instructions being the same size, 32-bits), but needed to retain the same size immediates for each instruction that uses one. List each instruction that can be minimized from 32-bits, and explain how the instruction is encoded.

## Problem 3: Implementing Multiply Add

How could you add an FMA (fused multiply add) instruction to the ISA. And FMA takes 3 registers, r1, r2, and rd and does rd = r1\*r2 + rd. Please explain the instruction encoding for this operation, as well as what signals and units (if any) you would add to the single-cycle datapath. Give 2 potential solutions to this problem and their trade-offs relative to each other.

#### Problem 4: Improving CPI

Assume you have the 3 stage pipeline (I, X, M). You can assume that data is available the cycle after M. Additionally you can assume that a load spends 3 cycles in the M stage, and a store takes 2 cycles. Assume that the memory block has 2 read ports and 1 write port, but is not pipelined. This means that at most 2 loads and 1 store can be in the M stage at once. Compute the total number of cycles that are needed to complete the program in each of the given scenarios for the following code block.

- 1. Scenario 1: You have implemented no forwarding. If data is unavailable, the only option to deal with the hazard is to stall. Additionally, you always predict Not Taken on the branch. If a branch is taken, the pipeline must be flushed as soon as it is resolved and the correct instruction is loaded on the next cycle.
- 2. Scenario 2: You implement data forwarding. This means that data is available the cycle after it is executed.
- 3. Scenario 3: You keep the forwarding from part 2 and additionally add a perfect branch predictor, which will never take the wrong branch.
- 4. Scenario 4: You have the forwarding and a perfect branch predictor, and implement a pipelined memory access. This means that loads do not have to stall if both ports are in use. Additionally, stores also do not have to wait for each other. Note that loads that come after a store to the same address must still wait (data hazard).
- 5. Scenario 5: You decide that the pipelined memory access is not worth it. Instead, you implement a cache that holds the most recent 2 addresses accessed (via loads and stores). If the address you try to load or store is in the cache, then the operation can be done in a single cycle. At the beginning of the code block, the cache is empty.

## Problem 5: Memory Block Design

We want to design a SRAM memory block with 1024 32-bit words. We know that in the 6T cell the word line connects to the gate of both access transistors and each bit line connects to the s/d node of one access transistor. We assume that the capacitance of an access transistor gate is equal to the s/d capacitance. In our design we would like to balance the word-line/bit-line delay. How many address bits do we need for the row decoder and how many for the column decoder?

# Problem 6: Expanding Memory

Using as many instances of a memory block that is 256 x 16-bits, draw a circuit for a memory block that is 1024 x 32-bits.

## Problem 7: Implementing a FIFO

Using the pointer based FIFO design presented in class (Lecture 17). Design the circuitry for the the FULL and EMPTY signals in the Status Flag Logic block as well as the *raddr* and *waddr* from the read and write pointer blocks. Assume that the FIFO that is 256 x 8-bits and using a simple dual ported memory block.

## Problem 8: Pre-Decoder

Design the row decoder with 256 outputs using the pre-decoder technique discussed in lecture. You may only use 1,2, and 3-input logic gates.