With the increase in application complexity and data set size, the contribution of memories (on-chip and off-chip) to the total energy consumption of the system is on the rise. This has lead to a number of research efforts that have investigated the use of various emerging memory technologies as potential CMOS memory replacements. Some of these efforts have shown STT-MRAM and PCRAM as promising candidates for implementing cache and main memory respectively [11–13, 41–46]. While STT- MRAM and PCRAM have desirable properties like high-density and non-volatility, they also have drawbacks such as high write energy and high write latency that need to be addressed. In addition, the limited endurance of PCRAM is a major concern. These issues were studied in detail and a number of optimizations have been proposed at the device, circuit and architecture levels. Other technologies like domain wall memory, memristor, etc. have also attracted significant interest [14, 15, 47–51].
2.1.1 Device and circuit optimizations
At the device level, researchers have optimized the write operation in STT-MRAM by designing different kinds of MTJ structures such as dual-pillar MTJ, tilted MTJ, dual-barrier MTJ, etc. [52–54]. Many of these device proposals decouple read/write paths, thereby relaxing the read vs. write design conflicts that are commonly present in memory design. Another approach at the device level involves exploring newer switching mechanisms like thermally-assisted-STT switching, resonant switching, etc. [55–59] to optimize write operations. At the circuit level, proposals to use 2T-1R structures with dual source line, adaptive bitline biasing, early write termination are aimed at reducing the total write energy consumption [60–63]. In addition, research efforts at the circuit level have also focused on analyzing the impact of process vari- ations [60, 64], improving the read latency by designing efficient sensing schemes [65] and enhancing the density of the cell through multi-level STT-MRAM designs [66]. In [67], the authors studied the design of energy-efficient and robust STT-MRAM arrays and showed the importance of considering the array level tradeoffs on the sta- bility and energy efficiency of STT-MRAM. In the case of PCRAM, structures like µ-trench [68, 69], wall [70, 71], cross spacer [72], edge [73], etc. have been proposed to address the high write current. The endurance problem with PCRAM was primarily addressed through doping of the phase change material [74]. In [75], the authors pro- posed fine-grained current regulation and voltage upscaling as a circuit level technique to improve the lifetime of PCRAMs.
Domain wall memory is a recently proposed spin-based memory that can achieve much higher density than STT-MRAM, PCRAM and other emerging memory tech- nologies [76, 77]. For this reason, it is considered to be highly promising and there have been significant efforts towards the realization of DWM [14, 78–80]. Recently, a prototype of domain wall memory array was demonstrated by IBM [20]. The po- tential use of DWM as shift register in re-configurable architectures was proposed in [81].
2.1.2 Architectural design and evaluation
At the architecture level, the impact of inefficient writes in STT-MRAM/PCRAM is minimized by reducing the number of write operations through suitable architec- tural design. One approach has been through the design of hybrid cache architectures consisting of both CMOS and STT-MRAM/PCRAM [41, 45, 82–88]. The motivation behind such an approach is to selectively direct memory blocks that incur large num- ber of writes to CMOS memory, while storing the rest in STT-MRAM or PCRAM. Subsequently, there were several approaches that proposed suitable cache manage- ment policies like adaptive line replacement [89], to reduce the writes to STT-MRAM in a hybrid cache architecture. In [85], the authors proposed an adaptive hybrid cache architecture in which part of the cache was reconfigured as a software controlled scratch pad memory to improve energy efficiency. Another approach to reducing the write intensity in STT-MRAM based lower level cache is write-biasing, which increases the residency of dirty blocks to avoid repeated writes [90]. An alternate approach to address the write-inefficiency is to eliminate redundant writes to memory, either by comparing the data before performing the write operation [63, 91] or by tracking the dirty blocks at a finer granularity [92, 93]. In the context of multi-level STT-MRAM cache, set-remapping [66] was proposed as a technique for energy-efficient encoding of bits to multiple resistance levels. In order to address the performance implications of inefficient writes, write buffers [93, 94] and scheduling mechanisms [95] that pri- oritize write requests to idle cache banks have been proposed. Some of the recent efforts have proposed volatile STT-MRAM design that relaxes the non-volatility at the device level to exploit the short lifetime of data in caches and improve the write efficiency of STT-MRAM [96,97]. Application of STT-MRAM as scratchpad memory was explored in [84].
In the context of PCRAMs, limited endurance can also be addressed by reducing the write intensity [91, 98, 99] through appropriate architectural policies. In [91], the authors proposed “Data comparison write (DCW)” to avoid writing redundant data
into the memory. To improve the efficiency of DCW, “FlipNwrite” [98], a technique that increases the amount of redundant write bits was proposed. In [99], the authors investigated the data patterns through static and dynamic profiling across different applications and proposed a frequent value based PCRAM design. In this technique, the data that are frequently written to PCRAM are stored in compressed form to reduce the write intensity. The other approach to address the endurance issue is wear-leveling [44] in which the writes are spread evenly across the entire memory array.
Apart from STT-MRAM, DWM and PCRAM, other technologies like memristors and TFET-based SRAMs have also attracted interest in recent years [48–50,100–102].