CHAPTER VII PLA-LIKE, FLASH-BASED DIGITAL LOGIC CELL IMPLE-
VII.3.3 Programming the PFC
The programming of a PFC can be done in the same fashion as the programming of the FC (shown in Section IV.3.3).
VII.4 Experiments
In this section, we start by presenting the simulation environment used to evaluate our PFC-based digital design approach. We follow with a discussion about the imple-
mentation details of the PFC design. Finally, we present a discussion of the results of the FPC-based design.
VII.4.1 Simulation Environment
In this section, we compare the PFC design approach to both a CMOS standard cell-based design approach as well as the FC-based design approach presented in Chap- ter IV and [69]. All of the designs are implemented in a 45nm process technology. We use Synopsys Design Compiler [56] to synthesize and map the CMOS designs to the in- dustry grade 45nm Nangate FreePDK45 Open Cell Library [54, 55]. The mapped designs were simulated using the Synopsys HSPICE [56] circuit simulation tool. We use the 45nm PTM [57] model card to model the CMOS transistors. We implemented the FC based ap- proach for comparison purposes. The PFC-based designs are generated using our in-house tool chain. The flash transistors in the PFC-based designs are modeled using flash model cards generated using the same model card regression approach as described in Chapter IV and [69]. The FC and PFC-based designs are simulated in HSPICE and the correctness of their logical operation is verified through exhaustive simulations. The operating supply voltage for both the flash-based and CMOS standard cell-based designs is 1V. Flash-based designs use higher programming voltages (10V-20V) only during programming. Custom layouts for the PFC-based designs were fashioned using Cadence Virtuoso [65] using de- sign rules obtained from the ITRS reports [59]. The physical area of flash-based designs are compared to the cell area of the CMOS-based designs. The PFC-based design ap- proach is evaluated through 20 randomly generated circuit designs and compared to a
Figure 7.7: Example Layout View of a PFC (des00)
CMOS-based implementation of the same designs, and an FC-based approach as well. VII.4.2 Flash-based Implementation Details
The logic functions implemented in the CMOS-based, FC-based and the PFC-based digital circuits have 6 inputs (m = 6) and 3 outputs (n = 3). These values were found to achieve the best delay, power, energy and physical area for the flash designs. The results we present are a comparative study over 20 randomly generated functions (des00 to des19) implemented in the CMOS standard cell-based approach, FC-based approach and our PFC-based approach. We used CPB = 3 for the FC and the PFC designs. Also, the threshold voltages used in our flash-based designs are (V T0= -0.5 V) and (V T1= 0.5 V). VII.4.3 Results and Analysis
We report the delay (including precharge delay), power, energy and physical area ratios in Table 7.3. These results are obtained from implementing the 20 randomly gen- erated logic functions using the PFC-based design approach and compared to the CMOS standard cell based approach. For the PFC-based results, the precharge delay is 39% of the total delay, on average. The delay reported in the table (Dmax Ratio) is ratio of the
Circuit Dmax Ratio PavgRatio EngRatio Cell Area Ratio des00 0.81× 0.37× 0.30× 0.47× des01 0.81× 0.35× 0.29× 0.46× des02 0.87× 0.37× 0.32× 0.49× des03 0.74× 0.40× 0.30× 0.44× des04 0.89× 0.42× 0.38× 0.49× des05 0.76× 0.36× 0.27× 0.44× des06 0.96× 0.38× 0.36× 0.49× des07 0.81× 0.39× 0.32× 0.46× des08 0.87× 0.38× 0.33× 0.42× des09 0.87× 0.36× 0.31× 0.43× des10 0.93× 0.43× 0.40× 0.44× des11 0.87× 0.42× 0.37× 0.42× des12 0.85× 0.41× 0.35× 0.39× des13 0.89× 0.43× 0.38× 0.49× des14 0.80× 0.35× 0.28× 0.43× des15 1.01× 0.40× 0.40× 0.51× des16 0.88× 0.38× 0.33× 0.51× des17 0.77× 0.40× 0.31× 0.49× des18 0.77× 0.37× 0.29× 0.44× des19 0.74× 0.41× 0.31× 0.44× Average 0.85× 0.39× 0.33× 0.46× Stdev 0.07× 0.03× 0.04× 0.03×
Table 7.3: Delay, Power, Energy and Cell Area Ratios of PFC-based Digital Circuits Rel- ative to Their CMOS Standard Cell-based Counterparts
based implementation is dynamic, we accounted for the precharge delay in the reported delay shown in the table. As shown in the table, the delay of the flash-based digital cir- cuits ranges from 0.74× to 1.01× of the CMOS standard cell-based digital circuit delay, with an average of 0.85×. The delay of the PFC is substantially similar to that of the FC presented in Chapter IV and [69]. The standard deviation of the results is shown in the bottom row of Table 7.3. The standard deviation in delay (8%), power (2%), energy
(3%) and physical area (3%) are relatively low, and demonstrate that the characteristics of digital design implemented using a PFC-based approach are quite predictable over a large number of designs.
Table 7.3 also reports the average power dissipation (0.39× of CMOS) and energy utilization (0.33× of CMOS) when implementing the digital circuits using our flash-based logic compared to CMOS standard cell-based implementation. The PFC has∼11% higher power dissipation and energy consumption than those of the FC. This is because the FC does not allow cube sharing across outputs, unlike the PFC. Cube sharing also results in the evaluation of multiple pulldown stacks in the PFC, which increases the power dissipated in the evaluate and the precharge cycles of the clock, and hence, increases the average power dissipation and energy consumption.
We also report the area ratio of both implementations. The area reported for the CMOS standard cell-based implementation is the sum of physical cell areas, while the area of our flash-based approach is the layout area obtained from layout generation exper- iments. In this sense, the CMOS standard cell area is a lower bound of the physical area, while the PFC area is the true physical area. Design rules for flash were obtained from the ITRS 45nm flash technology node [59]. Digital circuits implemented in a PFC use 0.46× the physical area of a CMOS-based design, on average. This is∼18% lower than the area ratio of the FC (in Chapter IV and [69]). This is expected because unlike the FC, our PFC design exploits cube sharing between outputs. In Figure 7.7, we show the representative layout of a cluster of the PFC for the design des00. Note that the layout of the PFC shown
in Figure 7.7 does not include neither of PFOG0nor PF OG7. PFOG0is not implemented in the PFC of the design des00 since PFOG0is implemented by the precharge state. How- ever, PFOG7is not implemented only because the design des00 does not contain any input cubes that control all the outputs (which are implemented in PF OG7).
0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 -200-180-160-140-120-100-80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 2000.20 0.25 0.30 0.35 0.40 0.45
Delay Ratio (20% Precharge + Evaluate)
P avg /Energy Ratio VT Shift (mV) Delay Ratio Pavg Ratio Energy Ratio
Figure 7.8: Delay, Power and Energy of the Flash-based Designs as VT is Shifted.
Flash-based digital circuits have the ability of tuning their delay, power and energy characteristics. This is done by shifting the VT of the flash transistors in the circuit. The ability to shift VT offers the flash-based digital circuits huge advantages over the tradi- tional CMOS standard-cell based circuits when it comes to speed binning at the factory,
aging mitigation and performing post-manufacturing ECOs. Figure 7.8 shows the average delay, power and energy of the flash-based digital designs as their VT is modified around the nominal VT value (which is indicated by a ”VT shift” value of 0 mV). The delay in Fig- ure 7.8 is the sum of the evaluate and the precharge delays. Figure 7.8 shows that the PFC delay improves with a negative VT shift, allowing the manufacturer to do speed adjustment in the factory, or aging mitigation in the field. The speed improvement is accomplished by an increase in power as expected.
VII.5 Chapter Summary
Flash transistors have some important properties that distinguish them from CMOS transistors, such as the ability to shift the threshold voltage of the flash devices, as well as their small input capacitance and compact area. In the past, these properties have been exploited to implement non-volatile memory. This chapter presented an approach to use flash transistors to implement digital circuits using a PLA-like circuit structure. We present the details of the circuit topology that we use in our PFC-based digital circuit approach. Our HSPICE simulations show that, averaged over 20 designs, our approach yields 0.85× the delay, 0.39× the power, 0.33× the energy utilization and 0.46× the physical area of the equivalent circuit implemented using CMOS standard cell-based design. The PFC design exhibits improved area (∼18% smaller) than the FC design approach in Chapter IV and [69], at the cost of∼11% increased power and energy consumption. The improvement in area is due to the fact that unlike the FC, the PFC design can share input cubes across