• No results found

Gain-Cell embedded DRAM:

N/A
N/A
Protected

Academic year: 2022

Share "Gain-Cell embedded DRAM:"

Copied!
41
0
0

Loading.... (view fulltext now)

Full text

(1)

14 October 2021

Gain-Cell embedded DRAM:

An alternative option for embedded memories

Prof. Adam Teman

Co-director EnICS Labs, Bar-Ilan University

(2)

October 14, 2021

© Adam Teman,

Who are we?

2

Prof. Alex Fish

Prof. Yossie Shor

Prof. Osnat Keren

Prof. Adam Teman

Dr. Itamar Levi

(3)

October 14, 2021

© Adam Teman,

Outline

(4)

Embedded Memories

Embedded

Memories GC-eDRAM DRT and

Refresh Other

Designs Summary

(5)

October 14, 2021

© Adam Teman,

The Computer Memory Hierarchy

(6)

October 14, 2021

© Adam Teman,

The Importance of Embedded Memories

• Memories dominate area and power.

6

Intel 10th Gen “Comet Lake” (2020) Source: Intel

Cerebras Wafer Scale Engine 2 (2021)

Source: wccftech.com

Intel Pentium-M (2001)

Source: Intel

2MB L3 Cache 20MB L3 Cache

16GB On-Chip Memory

IBM z15 – CP and SC chips (2020) 256MB L3 Cache 960MB L4 Cache

(7)

October 14, 2021

© Adam Teman,

Static is GOOD!

• A static circuit can replenish its state in light of a disruption.

• High noise margins!

7

VQ VQB VQ VQB VQ

VQ VQB

0V Q QB 1V

0V 1V 0V 1V 0V

‘0’ State 0.4V

0.4V 0.8V

0V ‘0’ State ‘1’ State

(8)

October 14, 2021

© Adam Teman,

SRAM is GOOD!

• SRAM is the exclusive solution for embedded memories in most ICs.

8

M2 M5

M3

M1

M6

M4

BLB BL

Q QB

WL WL

TSMC 7nm SRAM

Source: TSMC

TSMC 5nm SRAM Test Chip

Source: ISSCC 2020

Samsung 3nm GAA SRAM

Source: ISSCC 2021 Source: ISSCC 2020

Source: ST

(9)

October 14, 2021

© Adam Teman,

But… Nobody is Perfect

• SRAM is BIG

• 6 Transistors=1 bit

• SRAM is Leaky

• Several VDD to GND paths

• SRAM is Ratioed

• Fails under voltage scaling

9

M2 M5

M3

M1

M6

M4

BLB BL

Q QB

WL WL

0V 1V

Chang (IBM), IEEE Proc. 2009

(10)

October 14, 2021

© Adam Teman,

Dynamic is SMALL (and that’s GOOD!)

• DRAM can be made with a single transistor

• Up to 3X higher density than SRAM

• But the capacitor is very complicated to fabricate

• 1T-1C DRAM is fabricated in standalone chips at specialized fabs.

10

Source: CISCO

Trench cap Stacked caps

Source: Kang, McGraw-Hill’03

(11)

October 14, 2021

© Adam Teman,

Dynamic is complicated…

• Data retention is limited

• Lack of positive feedback in bitcell

• Leakage deteriorates levels

• Require periodic refresh operations to ensure data integrity

• Memory availability is limited

• During refresh operations, memory cannot be accessed

• Static power is not only leakage

• Also includes refresh power

• Retention Power = Leakage + Retention

11

clk ret

[%] 1 T

rows

Availability N

= − T

( )

ret read write leak

ret

N

rows

P E E P

= T + +

(12)

October 14, 2021

© Adam Teman,

But SIZE does matter!

• Higher density trumps complex operation

• 1T-1C embedded DRAM

• Used (until?) recently in high-end servers by IBM, Intel, …

• Provided in some process design kits.

• But…

• Requires expensive fabrication cost adders (Deep Trench Capacitor).

• Not provided in many process design kits.

• Doesn’t scale well… most advanced node to date: Global Foundries 14nm

• Can we provide a logic compatible embedded DRAM?

12

Source: EET Asia

(13)

Gain-Cell embedded DRAM

Embedded

Memories GC-eDRAM DRT and

Refresh Other

Designs Summary

(14)

October 14, 2021

© Adam Teman,

Introducing the “Gain-Cell”

• 1T-1C DRAM uses a single port for reading and writing

• Write: Drive charge through the port onto a storage capacitor.

• Read:

• Precharge bitline and enable charge sharing through the port

• The charge transferred from the storage node changes the bitline voltage

• A large storage capacitor is required to enable sensing this change

• It also destroys the stored data requiring write-back

• What if we provided a decoupled read port?

• We can amplify the stored charge (=“gain”)

• We can separately optimize read and write

• Read becomes non-destructive

• We get two-ported functionality

14

WBL

Write Port Read Port RBL

WWL RWL

BL

R/W Port WL

(15)

October 14, 2021

© Adam Teman,

Basic Gain Cell Operation

• All NMOS 2T Gain Cell

• Write Strong ‘0’, Weak ‘1’

• Boosted voltage for strong ‘1’

• Read:

• Precharge RBL, Pulse RWL low

• SN=‘0’ → RBL unchanged

• SN=‘1’ → RBL discharges

15

WBL

RBL WWL

RWL

MW SN MR

Vboost VDD

VDD

GND

‘0’‘1’

• All NMOS 3T Gain Cell

• Write is the same

• Read:

• Precharge RBL, Pulse RWL high

• SN=‘0’ → RBL unchanged

• SN=‘1’ → RBL discharges

WBL

RBL WWL

RWL

MW SN MR

MW

VDD VDD

RWL driven through diffusion

RBL discharge dependent on other cells on row

RBL saturation depends on other

cells in column

(16)

October 14, 2021

© Adam Teman,

GC-eDRAM Advantages

• Compared to SRAM:

• Smaller cell size (2-4T vs. 6T)

• Low leakage

• Non-ratioed

• Two-ported

• Compared to 1T-1C DRAM

• Logic-compatible

• Non-destructive read

• SRAM-like performance

16

M2 M5

M3

M1

M6

M4

BLB BL

Q QB

WL WL

VS.

VS.

(17)

October 14, 2021

© Adam Teman,

But, charge leaks away…

• Subthreshold conduction

• Exponentially depends on MW’s VT, VGS, and temp

• Depends on voltage difference between SN and WBL

• GIDL and junction leakage

• Asymmetrical between ‘1’ and ‘0’, Increases with temperature

• Gate leakage

• Asymmetrical between ‘1’ and ‘0’, Independent of temperature

17

Storing ‘1’ Storing ‘0’

(18)

October 14, 2021

© Adam Teman,

Write access statistics

• Sub-threshold leakage depends on the relation between SN and WBL

• Scenario 1: Worst-case access

• After writing a cell, WBL is permanently opposite to stored data

• Scenario 2: Retention mode

• After writing memory array, it remains in idle or read states, allowing WBL control -> pre-(dis)charge or bias WBL

18

Continuously Write ‘1’

Write ‘0’

(19)

October 14, 2021

© Adam Teman,

Data Retention Time Measurement

• Data Retention Time (DRT) is the time from

write until you can no longer read out the data.

• Various approaches for measuring:

• Effective data retention time (EDRT)

• Voltage-based data retention time (VDRT)

• Current-Based Data Retention Time Evaluation (IDRT)

19 Sources: R. Giterman, A. Bonetti, T. Noy, A. Teman, and A. Burg, IEEE Transactions on Circuits and Systems I (TCAS-I), 2020

(20)

Dealing with Refresh

Embedded

Memories GC-eDRAM DRT and

Refresh Other

Designs Summary

(21)

October 14, 2021

© Adam Teman,

The problems with Data Retention Time

• The main barrier for GC-eDRAM is its limited DRT, which leads to:

• Increased power consumption -

• Lower availability

• This gets worse with transistor scaling, as the parasitic capacitance is reduced

• In addition, DRT is a complex factor, as it is dependent on:

• Written voltage levels (Vboost, CI/CF)

• Read Frequency:

• Write Statistics

• Data stored in neighboring cells (for 1T read port)

• Accordingly, a wide range of research has focused on extending the DRT

ret 1

PDRT

refresh

Availability 1 DRT

T →

DRTCG  W L

RBL SN

ret

I V 1

  T

(22)

October 14, 2021

© Adam Teman,

Different Bitcells

• Many combinations of bitcells have been proposed for improving retention time and other circuit characteristics

CSN SN WWL

GND

RBL

WBL

MW

MR

PB

RWL

VDD

VDD

VDD PB

SN0 WWL

SN1

400mV 200mV

Somasekhar 08,09 2T

MW MR

WBL RWL

WWL

RBL

All PMOS 2T

Somasekhar 08, 09

Luk 2004 (SG) / 05 2T1D, Chang 07

MW

MR GD

BL WWL

RWL GND/

Vbias

2T1D

Luk 04, Chang 07 Chun 09,11 Boosted 3T

MW RWL

WBL

WWL

RBL

MS MR

Boosted 3T

Chun 09, 11

Chun 12 Asymmetric 2T

MW MR

WBL RWL

WWL

RBL

Asymmetric 2T

Chun 12

3T1D

Luk 06, Harel 21

(23)

October 14, 2021

© Adam Teman,

Dealing with CMOS Scaling

• The retention time of classic GC-eDRAM options drops significantly below 65nm

• For 28nm operation, a 4T internal-feedback gain cell (IFGC) was invented

• Silicon proven in both 28nm bulk and FD-SOI

23

180nm 28nm

Sources: R. Giterman, A. Fish, N. Geuli, E. Mentovich, A. Burg, and A. Teman,, IEEE Journal of Solid State Circuits (JSSC), 2018 R. Giterman , A. Fish, A. Burg and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2017.

(24)

October 14, 2021

© Adam Teman,

Different Technologies

• Bulk CMOS technologies suffer from increasing subthreshold leakages

• 180nm provided DRTs of ms, reduced to 10’s of us at 65nm

• Reduced leakage of FD-SOI and FinFET technologies provide new opportunities

24 Sources: R. Giterman , A. Fish, A. Burg and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2017.

28FD-SOI Test Chip

16nm FinFET Test Chip

(25)

October 14, 2021

© Adam Teman,

Body Biasing

• In mature processes, body biasing can be applied to lower leakage and extend DRT

• Silicon: 100mV RBB → 2.3X DRT Boost

• Can be more aggressively exploited in FD-SOI processes

Sources: P. Meinerzhagen,A. Teman, A. Fish, and A. Burg, IET Journal of Engineering (JoE), 2013

R. Giterman, A. Bonetti, A. Burg, andA. Teman, IEEE Transactions on Circuits and Systems II (TCAS-II), 2019 J. Narinx, A. Bonetti, N. Frigerio, C. Aprile, A. Burg and Y. Lenlebici, IEEE Asian Solid State Circuits (ASSCC), 2019

(26)

October 14, 2021

© Adam Teman,

Refresh Approaches

• Straightforward approach: ordinary periodic refresh (a.k.a., global refresh)

• Sequentially refresh entire array at 1/DRT frequency

• Reduced array Availability can limit the adoption of GC-eDRAM

• Requires access protocol that enables busy signal

• Not tolerable for all applications

• Not feasible with poor ratio of access time, number of rows, and DRT

• On the fly approaches can improve availability, e.g.:

• Row counters (Xiaoyao, 2007)

• Opportunistic refresh (Kazimirsky, 2016)

26

Normal Operation Refresh Normal Operation Refresh

(27)

October 14, 2021

© Adam Teman,

Hidden Refresh Algorithm

• Can we ensure 100% Availability?

• In order to provide a “drop-in” replacement for SRAM,

a GC-eDRAM macro must ensure 100% array availability.

• Hide the refresh using COIs (copies of instances)

27

Memory subarrays

COI’s (invisible to user)

Subarray 1 Refresh Subarray 2 Refresh Subarray N Refresh DRT

(28)

October 14, 2021

© Adam Teman,

Refreshing FIFOs

• What if the access is strictly ordered, such as in a FIFO?

Can we do any better?

• Yes.

• There is an upper bound on the number of interruptions that can occur.

• So we just need to trigger the refresh in time to ensure we can finish on time!

• Leads to very significant power savings (often no refresh is needed!)

28

A FIFO of size S is guaranteed to be refreshed on time if:

NDRT ≥ (S+1) + 2(S-1) = 3S-1

(NDRT is Retention Time in clock cycles)

Sources: T. Noy and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2020

(29)

October 14, 2021

© Adam Teman,

Replica Cells

• Utilize replica cells to track data retention time due to process variations, write statistics

• Silicon: 5X longer DRT, 5X lower refresh power

29

Calibrated die: VDD tracking Un-calibrated: W-disturb tracking

5X

(30)

October 14, 2021

© Adam Teman,

Internal Refresh

30

Sources: O. Harel, Y. Nachum, R. Giterman, Microelectronics Journal (MEJ), 2020

E. Levy, A. Sfez, R. Golman, O. Harel, and A. Teman, IEEE Int. Symp. on Circuits & Systems (ISCAS), 2020

Multi-Ported Gain-Cell

Overlapping Refresh

Double-pumped Read

(31)

Other Designs and Use Cases

Embedded

Memories GC-eDRAM DRT and

Refresh Other

Designs Summary

(32)

October 14, 2021

© Adam Teman,

Low-leakage Hybrid Memory

• A hybrid SRAM/GC-eDRAM cell can provide ultra low-leakage by

• Power gating the supply during standby

• Rely on dynamic storage of GC-eDRAM

• Use the SRAM latch to refresh the data

32

Sources: R. Giterman, A. Teman, P. Meinerzhagen, IEEE Transactions on Circuits and Systems II (TCAS-II), 2017 R. Giterman, A. Teman, P. Meinerzhagen, IEEE Int. Symp. on Circuits & Systems (ISCAS), 2017

(33)

October 14, 2021

© Adam Teman,

Radiation-Hardened Dynamic Memory

• A conventional 2T gain-cell is only susceptible to a one-direction bit-flip

• Combine complementary 2T cells and one will never fail!

• When reading, if both outputs are complementary → No error

• If both outputs are the same (presumable data ‘1’) → an error has occurred

• Add parity to correct the error!

• Can also be used for retention time extension.

33

Sources: R. Giterman, L. Atias and A. Teman , IEEE Transactions on VLSI (TVLSI), 2016 R. Giterman, L. Atias and A. Teman, US Patent 10,991,421

(34)

October 14, 2021

© Adam Teman,

True Approximate Storage

• Approximate computing does not require 100% error-free operation.

• However, this requires “graceful degradation”

• This is an inherent trait of DRT failures

34

Sources: A. Teman, G. Karakonstantis, R. Giterman, P. Meinerzhagen, and A. Burg, DATE 2015 S. Ganapathy, A. Teman, R. Giterman, A. Burg, and G. Karakonstantis, IEEE NEWCAS, 2015

R. Giterman, A. Fish, N. Geuli, E. Mentovich, A. Burg, and A. Teman,, IEEE Journal of Solid State Circuits (JSSC), 2018

1us 5us

10us 50us

28nm GC-eDRAM with

reduced refresh frequency Integrated dynamic and static RAM (iD-SRAM)

(35)

October 14, 2021

© Adam Teman,

Ternary Bitcells

• Static bitcells are bi-stable

• And therefore, can only store two values (VDD and GND)

• But dynamic circuits can be at intermediate levels

• The provides the capability to implement a multi-level cell

• A 5T bitcell allows digital readout of ternary values

• Can be used for higher density

• Can be used for ternary logic (e.g., ternary weights)

35

100 ‘0’ (𝐺𝑁𝐷) 010 ‘1’ (𝑣𝑑𝑑

2 ) 001 ‘2’ (V𝐷𝐷) Precharge:

RBLN→VDD RBLP→GND

SN=GND

Readout: ’11’ SN=VDD/2

Readout: ‘01’

SN=VDD Readout: ‘00’

(36)

October 14, 2021

© Adam Teman,

Cryogenic GC-eDRAM

• Cryogenic operation is used for certain applications:

• Quantum computing, Infra-red imaging, HPC

• Subthreshold leakage is highly

suppressed under these conditions

• Dynamic memories could be a great option!

37

(37)

Summary and Conclusion

Embedded

Memories GC-eDRAM DRT and

Refresh Other

Designs Summary

(38)

October 14, 2021

© Adam Teman,

A decade of GC-eDRAM research

• I started researching gain cells in 2012

• More than 40 published papers.

• One full-length book.

• 13 taped out test chips

• And much more to come…

• In memory computing

• Dynamic CAMs

• Reliability studies

• and more

• One clear thing is that GC-eDRAM is different than other memories and requires specialized/targeted research

DAFNA (2016) 28nm GREENBELT2

(2013) 180nm

BEER (2017) 28 FDSOI

dynOR (2015) 28 FDSOI CAMEL (2014)

65nm GREENBELT1

(2012) 180nm

MARTINI (2018) 28 FDSOI

KWAK (2019) 28 FDSOI

ERGODEC (2020) 28 FDSOI

NEGEV (2020) 16 FinFET

Sansa (2021) 16 FinFET LEO-I (2021)

65nm Rosetta (2020)

65nm

(39)

October 14, 2021

© Adam Teman,

Architectural Modeling

• Large variety of design tradeoffs:

• Read and write peripherals: power vs. access time

• Different bit-cells: area vs. retention time

• Geometry of basic array: rows/columns

• Breakdown into sub-arrays for larger arrays

• GEMTOO – a GC-eDRAM Modeling Tool

40

Data Retention Time

Access Time

Silicon Area

Refresh Rate

Memory Bandwidth Memory Density

Bitcell Topology

Memory Organization

GEMTOO Modelling

Tool

GEMTOO available for download at:

https://www.epfl.ch/labs/tcl/

resources-and-sw/gemtoo- a-gain-cell-embedded-

dram-modeling-tool/

(40)

October 14, 2021

© Adam Teman,

And the next step: RAAAM

41 Prof. Andreas Burg

CTO

Prof. Adam Teman

Technology Advisor

Dr. Robert Giterman

CEO Prof. Alex Fish

Technology Advisor Mr. Danny Biran

Business Advisor

Delivering the Highest Density Volatile Embedded Memories in Standard CMOS

Reduced Cost | Longer Battery-Life | Better Performance

Newest addition to the Silicon Catalyst family

October 2021

https://raaam-tech.com/

(41)

October 14, 2021

© Adam Teman,

Thank you

42

References

Related documents

This study addresses three research questions: (1) Why are individuals and local community organizations involved in engineering service-learning partnerships?, (2) How does

David Hume (1987), in his essays 'Of Money' and 'Of Interest' contain early statements of the non-neutral aspects of money as does Richard Cantillon's (1931) ‘Essay on the Nature

16-20". Spores were washed twice before testing for germination. Samples removed at intervals from a spore suspension incubated at pH 10-5 and 37" showed an increasing

For both industries and countries as test assets and for the full sample and the three subsamples the p-values are similar to those in panel B (without

Gifts of $2,500 and above may be used as performance sponsorships; sponsors receive reserved tickets, special recognition in Krannert center programs and marketing materials, and

However, this solution can be effective only during idle mode and most of the fuel consumption comes during the active mode (hoisting the containers). Figure 1-4 presents a