

EECS151/251A
Spring 2024
Digital Design and Integrated Circuits

Instructor:
John Wawrzynek

Lecture 18 - Energy



# **Energy and Power**

Energy is the ability to do work (W).

Power is rate of expending energy.

Energy Efficiency: energy per operation

$$P = \frac{dW}{dt}$$

- □ *Handheld and portable* (battery operated):
  - □ Energy Efficiency limits battery life
  - □ Power limited by heat



- □ *Infrastructure and servers* (connected to power grid):
  - □ Energy Efficiency dictates operation cost
  - Power heat removal contributes to TCO



Sad fact: Computers turn electrical energy into heat. Computation is a byproduct.

### **Energy and Performance**

Air or water carries heat away, or chip melts.



### Old example: Cooling an iPod nano ...

portable media player

2005-2017



Like resistor on last slide, iPod relies on passive transfer of heat from case to the air.

Why? Users don't want fans in their pocket ...

To stay "cool to the touch" via passive cooling, power budget of 5 W.

# Powering an iPod nano (2005 edition)



1.2 W-hour battery: Can supply 1.2 watts of power for 1 hour.

1.2 W-hr /  $5 W \approx 15$  minutes.

More W-hours require bigger battery and thus bigger "form factor" -- it wouldn't be "nano" anymore :-).

Real specs for iPod nano:
14 hours for music,
4 hours for slide shows.

85 mW for music.300 mW for slides.







4.7 inch iPhone6: 1,810mAh battery @3.8V = 6.88 Wh



*iPhone 5s:* 1570mAh @3.8V = 6 Wh

- The front side of the logic board:
  - Apple A8 APL1011 SoC + SK Hynix RAM as denoted by the markings H9CKNNN8KTMRWR-NTH (we presume it is 1 GB LPDDR3 RAM, the same as in the iPhone 6 Plus)
  - Qualcomm MDM9625M LTE Modem
  - Skyworks 77802-23 Low Band LTE PAD
  - Avago A8020 High Band PAD
  - Avago A8010 Ultra High Band PA + FBARs
  - SkyWorks 77803-20 Mid Band LTE PAD
  - InvenSense MP67B 6-axis Gyroscope and Accelerometer Combo



#### https://unitedlex.com/insights/apple-iphone-12-pro-max-teardown-report

### Iphone 12:

| iPhone Model      | Battery Capacity |
|-------------------|------------------|
| iPhone 12 Mini    | 2,227 mAh        |
| iPhone 12         | 2,815 mAh        |
| iPhone 12 Pro     | 2,815 mAh        |
| iPhone 12 Pro Max | 3,687 mAh        |
| iPhone 11         | 3,110 mAh        |
| iPhone 11 Pro     | 3,046 mAh        |
| iPhone 11 Pro Max | 3,969 mAh        |



14.13 Wh @ 3.8V (Pro Max)





### Notebooks ... as designed in 2006 ...

2006 Apple MacBook -- 5.2 lbs



Performance: Must be "close enough" to desktop performance ... most people no longer used a desktop (even in 2006).

Size and Weight. Ideal: paper notebook.

Heat: No longer "laptops" -- top may get "warm", bottom "hot". Quiet fans OK.

# Battery: Set by size and weight limits ...



Battery rating: 55 W-hour.

At 2.3 GHz, Intel Core Puo CPU consumes 31 W running a heavy load - under 2 hours battery life! And, just for CPU!

Almost full 1 inch depth. Width and height set by available space, weight.

At 1 GHz, CPU consumes 13 Watts. "Energy saver" option uses this mode ...

### 50Wh is 180,000 Joules!





MacBook Air ... design the laptop like an iPod/iPhone





35 W-h battery: 63% of 2006 MacBook's 55 W-h



# Servers: Total Cost of Ownership (TCO)



Machine rooms are expensive.

Removing heat dictates how many servers to put in a machine room.

Powering the servers + powering the air conditioners is a big part of TCO.

Reliability: running computers hot makes them fail more often.

Computations per W-h doubles every 1.6 years, going back to the first computer.

(Jonathan Koomey, Stanford).



### CMOS Circuits and Energy

### Switching Energy: Fundamental Physics

#### Every logic transition dissipates energy.





How can we limit switching energy?

- (1) Reduce # of clock transitions. But we have work to do ...
- (2) Reduce Vdd. But lowering Vdd limits the clock speed ...
- (3) Fewer circuits. But more transistors can do more work.
- (4) Reduce C per node. One reason why we scale processes.

# Chip-Level "Dynamic" Power



### Additional Dynamic Power - "short circuit current"





When gate switches, brief period when both pullup network and pulldown network could be on.

Worse when input is changing slowly compared to the output.

### Another Factor: Leakage Currents

logic gate isn't switching, it burns power.

Isub: Even when this nFet is off, it passes an loff leakage current.

We can engineer any loff we like, but a lower loff also results in a lower lon, and thus lower maximum clock speed.

Intel's 2006 processor designs, leakage vs switching power

A lot of work was done to get a ratio this good ... 50/50 is common. 27 Bill Holt, Intel, Hot Chips 17.

### Plot on a "Log" Scale to See "Off" Current



### Customize processing for product types and different circuit paths ...



From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005.

- Vt is controlled by channel doping.
- Modern IC processes have 2 or 3 different Vt values available.
- Standard cell libraries offer low Vt and high Vt versions of cells so that the tools can optimize on a per instance basis.
- (If high performance not needed then use high Vt to reduce leakage).



Transistor channel is a raised fin.

Gate controls channel from sides and top.

Channel depth is fin width. 12-15nm for L=22nm.







# Dynamic versus Leakage Power



Figure 1: The reduction of feature sizes from 45 to 7nm may induce drastic gains in power consumption and leakage power [Xie2015]

Xie, Q. (2015). Performance Comparisons between 7-nm FinFET and Conventional Bulk CMOS Standard Cell Libraries. IEEE Transactions on Circuits and Systems II: Express Briefs, 62(8), 761-765.

### Total Power = $P_{switching} + P_{short-circuit} + P_{leakage}$



# Some low-power design techniques



Parallelism and pipelining



**Power-down idle transistors** 



Slow down non-critical paths



Thermal management

### **Trading Hardware for Power**

via Parallelism and Pipelining ...

# Voltage Scaling

$$P_{sw} = 1/2 \alpha C V_{dd}^2 F$$

Reducing F, reduces power, but our computation now takes longer, and total energy does not change.

Reducing both F and Vdd, reduces power but also improves energy efficiency (total energy for computation is less).

Parallelism gives us a way to make up for lower performance from voltage scaling.



### And so, we can transform this:



Block processes stereo audio. 1/2 of clocks for "left", 1/2 for "right".

### Into this:

Top block processes "left", bottom "right".



THIS MAGIC TRICK BROUGHT TO YOU BY CORY HALL ...

# Chandrakasan & Brodersen (UCB, 1992)

**Minimizing Power Consumption in CMOS Circuits** 

| Architecture       | Power (normalized) |
|--------------------|--------------------|
| Simple             | 1                  |
| Parallel           | 0.36               |
| Pipelined          | 0.39               |
| Pipelined-Parallel | 0.2                |

| Architecture       | Area (normalized) |
|--------------------|-------------------|
| Simple             | 1                 |
| Parallel           | 3.4               |
| Pipelined          | 1.3               |
| Pipelined-Parallel | 3.7               |

| Architecture       | Voltage |
|--------------------|---------|
| Simple             | 5V      |
| Parallel           | 2.9V    |
| Pipelined          | 2.9V    |
| Pipelined-Parallel | 2.0     |



Anantha P. Chandrakasan

Robert W. Brodersen

#### Example: Intel Graphics Pipeline IP



Fig. 1. Phong Illumination for vertex and pixel shading.





A 2.05 GVertices/s 151 mW Lighting Accelerator for 3D Graphics Vertex and Pixel Shading in 32 nm CMOS

Farhana Sheikh, *Member, IEEE*, Sanu K. Mathew, *Member, IEEE*, Mark A. Anders, *Member, IEEE*, Himanshu Kaul, *Member, IEEE*, Steven K. Hsu, *Member, IEEE*, Amit Agarwal, *Member, IEEE*, Ram K. Krishnamurthy, *Fellow, IEEE*, and Shekhar Borkar, *Fellow, IEEE* 

#### Clock Rate and Power vs Voltage



### Multiple Cores for Low Power

Trade hardware for power, on a large scale ...

### Cell: The PS3 chip









**COMPUTER** ENTERTAINMENT





# Cell (PS3 Chip): 1 CPU + 8 "SPUs"



## A "Schmoo" plot for a Cell SPU ...

The lower Vdd, the less dynamic energy consumption.

$$E_{0\to 1} = \frac{1}{2} c V_{dd}^2$$
  $E_{1\to 0} = \frac{1}{2} c V_{dd}^2$ 

The lower Vdd, the longer the maximum clock period, the slower the clock frequency.



## Clock speed alone doesn't help E/op ...

But, lowering clock frequency while keeping voltage constant spreads the same amount of work over a longer time, so chip stays cooler ...





# Scaling V and f does lower energy/op

1 W to get 2.2 GHz 7W to reliably get 4.4 GHz performance. 26 C die temp. performance. 47C die temp.

If a program that needs a 4.4 Ghz CPU can be recoded to use two 2.2 Ghz CPUs ... big win.



### Dynamic Voltage/Frequency Scaling (DVFS)



Many modern processors have controls for dynamically changing operating frequency and voltage.

- □ BIO/OS software can adjust frequency to reduce heat and/or improve power efficiency with high performance not needed.
- □ Adjusting both voltage and frequency helps improve energy efficiency and allows higher frequency for a given power level.

Design Technique #2 (of 5)

### Powering down idle circuits

# Add "sleep" transistors to logic ...



Example: Floating point unit logic.

When running fixed-point instructions, put logic "to sleep".

+++ When "asleep", leakage power is dramatically reduced.

--- Presence of sleep transistors slows down the clock rate when the logic block is in use.

# Intel example: Sleeping cache blocks



A tiny current supplied in "sleep" maintains SRAM state.

#### Intel Medfield



#### Intel Medfield

Switches 45 power "islands."

Fine-grained control of leakage power, to track user activity.

"Race to idle" strategy -- finish tasks quickly, to get to power down.



### Playing a game ...





### Watching a video ...





#### Looking at phone screen, not doing anything ...





#### Phone in your pocket, waiting for a call ...





Design Technique #3 (of 5)

### Slow down "slack paths"

## Fact: Most logic on a chip is "too fast"



Most logic paths have hundreds of picoseconds to spare.

From "The circuit and physical design of the POWER4 microprocessor", IBM J Res and Dev, 46:1, Jan 2002, J.D. Warnock et al.

## Use several supply voltages on a chip ...



Why use multi-Vdd? We can reduce dynamic power by using low-power Vdd for logic off the critical path.



In practice, instead of multi-Vdd design ...
In a multi-Vt process, we can reduce leakage power on the off critical path logic by using high-Vth transistors.

Design Technique #5 (of 5)

### Thermal Management

### Keep chip cool to minimize leakage power



Figure 3: I<sub>CCINTQ</sub> vs. Junction Temperature with Increase Relative to 25°C