• No results found

The rest of this thesis is organized as follows. In Chapter 2, we present a brief survey of prior research efforts that have focused on various design issues related to emerging technologies. In Chapter 3, we provide background information related to

the structure and operation of spin-based memories (STT-MRAM and DWM) and spintronic logic at the device and circuit levels.

In Chapter 4, we describe the design of memory hierarchy of the domain-specific many-core processor architecture for RMS applications using spin-based memories. The proposed design consists of a two dimensional array of processing elements along with a two level on-chip memory hierarchy. The first level is formed by an array of FIFO memory units that are responsible for providing fast streaming access to data. The second level in the memory hierarchy is a random access memory that stores a sizable part of the data set being processed. Based on these design requirements and the memory device characteristics, we suggest the use of DWM and STT-MRAM to realize the first and second levels, respectively. We evaluate the proposed design for three representative recognition and mining algorithms, namely Support Vector Machines (SVM), k-means clustering and Generalized Learning Vector Quantization (GLVQ). Our analysis shows that the proposed domain-specific architecture that is tuned to match the device characteristics with application requirements can result in 1.5X-4X improvement in energy-delay product compared to CMOS baseline.

In Chapter 5, we propose Tapestri, in which we address the challenge of high write energy and write latency associated with the MTJ-based write mechanism in spin-based memories. Our proposal is based on the observation that domain wall shifts offers an efficient mechanism to perform write operations. We exploit this fact to design different bit-cell designs, 1bitDWM and multibitDWM, that are optimized for latency and area, respectively. We explore various circuit-level optimizations for the proposed bit-cells. We show that 1bitDWM can achieve all the benefits offered by STT-MRAM, while matching SRAM in its write efficiency. MultibitDWM, on the other hand, achieves much higher density than SRAM, STT-MRAM and 1bitDWM at the cost of variable access latencies.

In Chapter 6, we present TapeCache, in which we make the first attempt to de- sign the cache hierarchy of general-purpose processor using DWM. In this work, we address one of the major design challenges with DWM – the performance penalty

due to sequential accesses to data stored in DWM. We propose a circuit-architecture co-design technique consisting of (i) multi-port read-skewed multibitDWM bit-cell de- sign at the circuit level and (ii) hybrid cache organization and suitable management policies at the architecture level to maximally harness the performance potential of DWM. Our multi-port read-skewed multibitDWM bit-cell design exploits the read- write asymmetry in cache accesses to reduce the access latency of performance critical read operations. Our cache management policies exploits the spatial locality prop- erty to reduce the impact of sequential accesses on the overall performance of the system. TapeCache achieves 7.5X improvement in energy and 7.8X reduction in area at virtually identical perform compared to an iso-capacity CMOS SRAM baseline.

In Chapter 7, we propose STAG, a Spintronic-Tape Architecture for GPGPU cache hierarchy. STAG employs different DWM bit-cells to realize different mem- ory arrays in the GPGPU cache hierarchy based on their design requirements. To address the performance penalty associated with shift operations required to access data from multibitDWM bit-cell, STAG utilizes suitable architectural optimizations that predicts the cache access patterns based on the unique characteristics of GPGPU architecture and workloads, and prefetches data that are both likely to be accessed and require large number of shifts. STAG achieves 3.3X energy reduction and 12.1% performance improvement over SRAM-based cache under iso-area conditions.

In Chapter 8, we focus on the design of spin-based logic. We present Spintastic, in which we propose stochastic computing (SC) as a new direction to realize logic using spin-based devices. We establish the synergy between SC and spintronic logic by demonstrating that their characteristics mutually benefit each other. We show that the physical characteristics of spin devices enable efficient realization of different key components in stochastic logic circuits, while the low logic complexity and logic depth of SC can in-turn mitigate some of the drawbacks of spintronic logic. Our experiments shows that Spintastic achieves 7.1X energy reduction over CMOS implementations. In Chapter 9, we describe the modeling framework that is used to evaluate the proposed designs. We present various self-consistent device models that have been

validated with experimental data that are used to evaluate different spintronic de- vices. We also describe Spin-CACTI – a CACTI based cache simulator that computes various performance metrics of spin-based caches.

Finally, Chapter 10 concludes this thesis. In this chapter, we revisit the key benefits offered by the use of spintronic devices to design computing platforms and summarize the key findings.

2. RELATED WORK

CMOS technology is reaching its fundamental scaling limits and several new de- vices have been proposed as potential replacements. These include emerging memory technologies like phase change memory, spin-based memories (STT-MRAM, DWM), memristors, etc. and logic switches such as Tunnel FET (TFET), Bilayer pseudospin FET (BiSFET), Carbon Nanotubes (CNT), Spin Wave Device (SWD), etc. In the past decade, there has been increasing interest to explore and address some of the key issues related to designing with these devices at various levels of design abstraction. In this chapter, we describe some of the significant efforts in this direction.