14 October 2021
Gain-Cell embedded DRAM:
An alternative option for embedded memories
Prof. Adam Teman
Co-director EnICS Labs, Bar-Ilan University
October 14, 2021
© Adam Teman,
Who are we?
2
Prof. Alex Fish
Prof. Yossie Shor
Prof. Osnat Keren
Prof. Adam Teman
Dr. Itamar Levi
October 14, 2021
© Adam Teman,
Outline
Embedded Memories
Embedded
Memories GC-eDRAM DRT and
Refresh Other
Designs Summary
October 14, 2021
© Adam Teman,
The Computer Memory Hierarchy
October 14, 2021
© Adam Teman,
The Importance of Embedded Memories
• Memories dominate area and power.
6
Intel 10th Gen “Comet Lake” (2020) Source: Intel
Cerebras Wafer Scale Engine 2 (2021)
Source: wccftech.com
Intel Pentium-M (2001)
Source: Intel
2MB L3 Cache 20MB L3 Cache
16GB On-Chip Memory
IBM z15 – CP and SC chips (2020) 256MB L3 Cache 960MB L4 Cache
October 14, 2021
© Adam Teman,
Static is GOOD!
• A static circuit can replenish its state in light of a disruption.
• High noise margins!
7
VQ VQB VQ VQB VQ
VQ VQB
0V Q QB 1V
0V 1V 0V 1V 0V
‘0’ State 0.4V
0.4V 0.8V
0V ‘0’ State ‘1’ State
October 14, 2021
© Adam Teman,
SRAM is GOOD!
• SRAM is the exclusive solution for embedded memories in most ICs.
8
M2 M5
M3
M1
M6
M4
BLB BL
Q QB
WL WL
TSMC 7nm SRAM
Source: TSMC
TSMC 5nm SRAM Test Chip
Source: ISSCC 2020
Samsung 3nm GAA SRAM
Source: ISSCC 2021 Source: ISSCC 2020
Source: ST
October 14, 2021
© Adam Teman,
But… Nobody is Perfect
• SRAM is BIG
• 6 Transistors=1 bit
• SRAM is Leaky
• Several VDD to GND paths
• SRAM is Ratioed
• Fails under voltage scaling
9
M2 M5
M3
M1
M6
M4
BLB BL
Q QB
WL WL
0V 1V
Chang (IBM), IEEE Proc. 2009
October 14, 2021
© Adam Teman,
Dynamic is SMALL (and that’s GOOD!)
• DRAM can be made with a single transistor
• Up to 3X higher density than SRAM
• But the capacitor is very complicated to fabricate
• 1T-1C DRAM is fabricated in standalone chips at specialized fabs.
10
Source: CISCO
Trench cap Stacked caps
Source: Kang, McGraw-Hill’03
October 14, 2021
© Adam Teman,
Dynamic is complicated…
• Data retention is limited
• Lack of positive feedback in bitcell
• Leakage deteriorates levels
• Require periodic refresh operations to ensure data integrity
• Memory availability is limited
• During refresh operations, memory cannot be accessed
• Static power is not only leakage
• Also includes refresh power
• Retention Power = Leakage + Retention
11
clk ret
[%] 1 T
rowsAvailability N
= − T
( )
ret read write leak
ret
N
rowsP E E P
= T + +
October 14, 2021
© Adam Teman,
But SIZE does matter!
• Higher density trumps complex operation
• 1T-1C embedded DRAM
• Used (until?) recently in high-end servers by IBM, Intel, …
• Provided in some process design kits.
• But…
• Requires expensive fabrication cost adders (Deep Trench Capacitor).
• Not provided in many process design kits.
• Doesn’t scale well… most advanced node to date: Global Foundries 14nm
• Can we provide a logic compatible embedded DRAM?
12
Source: EET Asia
Gain-Cell embedded DRAM
Embedded
Memories GC-eDRAM DRT and
Refresh Other
Designs Summary
October 14, 2021
© Adam Teman,
Introducing the “Gain-Cell”
• 1T-1C DRAM uses a single port for reading and writing
• Write: Drive charge through the port onto a storage capacitor.
• Read:
• Precharge bitline and enable charge sharing through the port
• The charge transferred from the storage node changes the bitline voltage
• A large storage capacitor is required to enable sensing this change
• It also destroys the stored data requiring write-back
• What if we provided a decoupled read port?
• We can amplify the stored charge (=“gain”)
• We can separately optimize read and write
• Read becomes non-destructive
• We get two-ported functionality
14
WBL
Write Port Read Port RBL
WWL RWL
BL
R/W Port WL
October 14, 2021
© Adam Teman,
Basic Gain Cell Operation
• All NMOS 2T Gain Cell
• Write Strong ‘0’, Weak ‘1’
• Boosted voltage for strong ‘1’
• Read:
• Precharge RBL, Pulse RWL low
• SN=‘0’ → RBL unchanged
• SN=‘1’ → RBL discharges
15
WBL
RBL WWL
RWL
MW SN MR
Vboost VDD
VDD
GND
‘0’‘1’
• All NMOS 3T Gain Cell
• Write is the same
• Read:
• Precharge RBL, Pulse RWL high
• SN=‘0’ → RBL unchanged
• SN=‘1’ → RBL discharges
WBL
RBL WWL
RWL
MW SN MR
MW
VDD VDD
RWL driven through diffusion
RBL discharge dependent on other cells on row
RBL saturation depends on other
cells in column
October 14, 2021
© Adam Teman,
GC-eDRAM Advantages
• Compared to SRAM:
• Smaller cell size (2-4T vs. 6T)
• Low leakage
• Non-ratioed
• Two-ported
• Compared to 1T-1C DRAM
• Logic-compatible
• Non-destructive read
• SRAM-like performance
16
M2 M5
M3
M1
M6
M4
BLB BL
Q QB
WL WL
VS.
VS.
October 14, 2021
© Adam Teman,
But, charge leaks away…
• Subthreshold conduction
• Exponentially depends on MW’s VT, VGS, and temp
• Depends on voltage difference between SN and WBL
• GIDL and junction leakage
• Asymmetrical between ‘1’ and ‘0’, Increases with temperature
• Gate leakage
• Asymmetrical between ‘1’ and ‘0’, Independent of temperature
17
Storing ‘1’ Storing ‘0’
October 14, 2021
© Adam Teman,
Write access statistics
• Sub-threshold leakage depends on the relation between SN and WBL
• Scenario 1: Worst-case access
• After writing a cell, WBL is permanently opposite to stored data
• Scenario 2: Retention mode
• After writing memory array, it remains in idle or read states, allowing WBL control -> pre-(dis)charge or bias WBL
18
Continuously Write ‘1’
Write ‘0’
October 14, 2021
© Adam Teman,
Data Retention Time Measurement
• Data Retention Time (DRT) is the time from
write until you can no longer read out the data.
• Various approaches for measuring:
• Effective data retention time (EDRT)
• Voltage-based data retention time (VDRT)
• Current-Based Data Retention Time Evaluation (IDRT)
19 Sources: R. Giterman, A. Bonetti, T. Noy, A. Teman, and A. Burg, IEEE Transactions on Circuits and Systems I (TCAS-I), 2020
Dealing with Refresh
Embedded
Memories GC-eDRAM DRT and
Refresh Other
Designs Summary
October 14, 2021
© Adam Teman,
The problems with Data Retention Time
• The main barrier for GC-eDRAM is its limited DRT, which leads to:
• Increased power consumption -
• Lower availability
• This gets worse with transistor scaling, as the parasitic capacitance is reduced
• In addition, DRT is a complex factor, as it is dependent on:
• Written voltage levels (Vboost, CI/CF)
• Read Frequency:
• Write Statistics
• Data stored in neighboring cells (for 1T read port)
• Accordingly, a wide range of research has focused on extending the DRT
ret 1
P DRT
refresh
Availability 1 DRT
T →
DRT CG W L
RBL SN
ret
I V 1
T
October 14, 2021
© Adam Teman,
Different Bitcells
• Many combinations of bitcells have been proposed for improving retention time and other circuit characteristics
CSN SN WWL
GND
RBL
WBL
MW
MR
PB
RWL
VDD
VDD
VDD PB
SN0 WWL
SN1
400mV 200mV
Somasekhar 08,09 2T
MW MR
WBL RWL
WWL
RBL
All PMOS 2T
Somasekhar 08, 09
Luk 2004 (SG) / 05 2T1D, Chang 07
MW
MR GD
BL WWL
RWL GND/
Vbias
2T1D
Luk 04, Chang 07 Chun 09,11 Boosted 3T
MW RWL
WBL
WWL
RBL
MS MR
Boosted 3T
Chun 09, 11
Chun 12 Asymmetric 2T
MW MR
WBL RWL
WWL
RBL
Asymmetric 2T
Chun 12
3T1D
Luk 06, Harel 21
October 14, 2021
© Adam Teman,
Dealing with CMOS Scaling
• The retention time of classic GC-eDRAM options drops significantly below 65nm
• For 28nm operation, a 4T internal-feedback gain cell (IFGC) was invented
• Silicon proven in both 28nm bulk and FD-SOI
23
180nm 28nm
Sources: R. Giterman, A. Fish, N. Geuli, E. Mentovich, A. Burg, and A. Teman,, IEEE Journal of Solid State Circuits (JSSC), 2018 R. Giterman , A. Fish, A. Burg and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2017.
October 14, 2021
© Adam Teman,
Different Technologies
• Bulk CMOS technologies suffer from increasing subthreshold leakages
• 180nm provided DRTs of ms, reduced to 10’s of us at 65nm
• Reduced leakage of FD-SOI and FinFET technologies provide new opportunities
24 Sources: R. Giterman , A. Fish, A. Burg and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2017.
28FD-SOI Test Chip
16nm FinFET Test Chip
October 14, 2021
© Adam Teman,
Body Biasing
• In mature processes, body biasing can be applied to lower leakage and extend DRT
• Silicon: 100mV RBB → 2.3X DRT Boost
• Can be more aggressively exploited in FD-SOI processes
Sources: P. Meinerzhagen,A. Teman, A. Fish, and A. Burg, IET Journal of Engineering (JoE), 2013
R. Giterman, A. Bonetti, A. Burg, andA. Teman, IEEE Transactions on Circuits and Systems II (TCAS-II), 2019 J. Narinx, A. Bonetti, N. Frigerio, C. Aprile, A. Burg and Y. Lenlebici, IEEE Asian Solid State Circuits (ASSCC), 2019
October 14, 2021
© Adam Teman,
Refresh Approaches
• Straightforward approach: ordinary periodic refresh (a.k.a., global refresh)
• Sequentially refresh entire array at 1/DRT frequency
• Reduced array Availability can limit the adoption of GC-eDRAM
• Requires access protocol that enables busy signal
• Not tolerable for all applications
• Not feasible with poor ratio of access time, number of rows, and DRT
• On the fly approaches can improve availability, e.g.:
• Row counters (Xiaoyao, 2007)
• Opportunistic refresh (Kazimirsky, 2016)
26
Normal Operation Refresh Normal Operation Refresh
October 14, 2021
© Adam Teman,
Hidden Refresh Algorithm
• Can we ensure 100% Availability?
• In order to provide a “drop-in” replacement for SRAM,
a GC-eDRAM macro must ensure 100% array availability.
• Hide the refresh using COIs (copies of instances)
27
Memory subarrays
COI’s (invisible to user)
Subarray 1 Refresh Subarray 2 Refresh Subarray N Refresh DRT
October 14, 2021
© Adam Teman,
Refreshing FIFOs
• What if the access is strictly ordered, such as in a FIFO?
Can we do any better?
• Yes.
• There is an upper bound on the number of interruptions that can occur.
• So we just need to trigger the refresh in time to ensure we can finish on time!
• Leads to very significant power savings (often no refresh is needed!)
28
A FIFO of size S is guaranteed to be refreshed on time if:
NDRT ≥ (S+1) + 2(S-1) = 3S-1
(NDRT is Retention Time in clock cycles)
Sources: T. Noy and A. Teman, IEEE Transactions on Circuits and Systems I (TCAS-I), 2020
October 14, 2021
© Adam Teman,
Replica Cells
• Utilize replica cells to track data retention time due to process variations, write statistics
• Silicon: 5X longer DRT, 5X lower refresh power
29
Calibrated die: VDD tracking Un-calibrated: W-disturb tracking
5X
October 14, 2021
© Adam Teman,
Internal Refresh
30
Sources: O. Harel, Y. Nachum, R. Giterman, Microelectronics Journal (MEJ), 2020
E. Levy, A. Sfez, R. Golman, O. Harel, and A. Teman, IEEE Int. Symp. on Circuits & Systems (ISCAS), 2020
Multi-Ported Gain-Cell
Overlapping Refresh
Double-pumped Read
Other Designs and Use Cases
Embedded
Memories GC-eDRAM DRT and
Refresh Other
Designs Summary
October 14, 2021
© Adam Teman,
Low-leakage Hybrid Memory
• A hybrid SRAM/GC-eDRAM cell can provide ultra low-leakage by
• Power gating the supply during standby
• Rely on dynamic storage of GC-eDRAM
• Use the SRAM latch to refresh the data
32
Sources: R. Giterman, A. Teman, P. Meinerzhagen, IEEE Transactions on Circuits and Systems II (TCAS-II), 2017 R. Giterman, A. Teman, P. Meinerzhagen, IEEE Int. Symp. on Circuits & Systems (ISCAS), 2017
October 14, 2021
© Adam Teman,
Radiation-Hardened Dynamic Memory
• A conventional 2T gain-cell is only susceptible to a one-direction bit-flip
• Combine complementary 2T cells and one will never fail!
• When reading, if both outputs are complementary → No error
• If both outputs are the same (presumable data ‘1’) → an error has occurred
• Add parity to correct the error!
• Can also be used for retention time extension.
33
Sources: R. Giterman, L. Atias and A. Teman , IEEE Transactions on VLSI (TVLSI), 2016 R. Giterman, L. Atias and A. Teman, US Patent 10,991,421
October 14, 2021
© Adam Teman,
True Approximate Storage
• Approximate computing does not require 100% error-free operation.
• However, this requires “graceful degradation”
• This is an inherent trait of DRT failures
34
Sources: A. Teman, G. Karakonstantis, R. Giterman, P. Meinerzhagen, and A. Burg, DATE 2015 S. Ganapathy, A. Teman, R. Giterman, A. Burg, and G. Karakonstantis, IEEE NEWCAS, 2015
R. Giterman, A. Fish, N. Geuli, E. Mentovich, A. Burg, and A. Teman,, IEEE Journal of Solid State Circuits (JSSC), 2018
1us 5us
10us 50us
28nm GC-eDRAM with
reduced refresh frequency Integrated dynamic and static RAM (iD-SRAM)
October 14, 2021
© Adam Teman,
Ternary Bitcells
• Static bitcells are bi-stable
• And therefore, can only store two values (VDD and GND)
• But dynamic circuits can be at intermediate levels
• The provides the capability to implement a multi-level cell
• A 5T bitcell allows digital readout of ternary values
• Can be used for higher density
• Can be used for ternary logic (e.g., ternary weights)
35
100 ‘0’ (𝐺𝑁𝐷) 010 ‘1’ (𝑣𝑑𝑑
2 ) 001 ‘2’ (V𝐷𝐷) Precharge:
RBLN→VDD RBLP→GND
SN=GND
Readout: ’11’ SN=VDD/2
Readout: ‘01’
SN=VDD Readout: ‘00’
October 14, 2021
© Adam Teman,
Cryogenic GC-eDRAM
• Cryogenic operation is used for certain applications:
• Quantum computing, Infra-red imaging, HPC
• Subthreshold leakage is highly
suppressed under these conditions
• Dynamic memories could be a great option!
37
Summary and Conclusion
Embedded
Memories GC-eDRAM DRT and
Refresh Other
Designs Summary
October 14, 2021
© Adam Teman,
A decade of GC-eDRAM research
• I started researching gain cells in 2012
• More than 40 published papers.
• One full-length book.
• 13 taped out test chips
• And much more to come…
• In memory computing
• Dynamic CAMs
• Reliability studies
• and more
• One clear thing is that GC-eDRAM is different than other memories and requires specialized/targeted research
DAFNA (2016) 28nm GREENBELT2
(2013) 180nm
BEER (2017) 28 FDSOI
dynOR (2015) 28 FDSOI CAMEL (2014)
65nm GREENBELT1
(2012) 180nm
MARTINI (2018) 28 FDSOI
KWAK (2019) 28 FDSOI
ERGODEC (2020) 28 FDSOI
NEGEV (2020) 16 FinFET
Sansa (2021) 16 FinFET LEO-I (2021)
65nm Rosetta (2020)
65nm
October 14, 2021
© Adam Teman,
Architectural Modeling
• Large variety of design tradeoffs:
• Read and write peripherals: power vs. access time
• Different bit-cells: area vs. retention time
• Geometry of basic array: rows/columns
• Breakdown into sub-arrays for larger arrays
• GEMTOO – a GC-eDRAM Modeling Tool
40
Data Retention Time
Access Time
Silicon Area
Refresh Rate
Memory Bandwidth Memory Density
Bitcell Topology
Memory Organization
GEMTOO Modelling
Tool
GEMTOO available for download at:
https://www.epfl.ch/labs/tcl/
resources-and-sw/gemtoo- a-gain-cell-embedded-
dram-modeling-tool/
October 14, 2021
© Adam Teman,
And the next step: RAAAM
41 Prof. Andreas Burg
CTO
Prof. Adam Teman
Technology Advisor
Dr. Robert Giterman
CEO Prof. Alex Fish
Technology Advisor Mr. Danny Biran
Business Advisor
Delivering the Highest Density Volatile Embedded Memories in Standard CMOS
Reduced Cost | Longer Battery-Life | Better Performance
Newest addition to the Silicon Catalyst family
October 2021
https://raaam-tech.com/
October 14, 2021
© Adam Teman,
Thank you
42