5.3 The ASIC Pixel
5.3.4 Digital Memory
The digitized data is stored locally in the pixels in a custom static random access memory (SRAM). First memory and readout concepts have been studied in [50], the concepts were further elaborated in this thesis and published here: [51]. The initial proposal was to use a three transistor dynamic random access memory (3T DRAM) cell for the core of the memory. While the layout of the 3T DRAM cell can be made very compact, it consumes much more power. The data is stored on a high impedance node which is inevitably discharged by leakage currents, requiring periodic refresh cycles to keep the data alive. These are bothersome in terms of power consumption mainly because they have to run continuously, also during the readout phase because all of the time in the XFEL gaps is needed to transport the data off the chip. Several test chip iterations have shown that the required refresh frequency becomes bothersome. The DRAM would yield a capacity increase of ∼ 20% versus the final architecture chosen which is based on a dense 6T SRAM cell. We have decided that this gain does not justify taking the extra risk associated to the DRAM and therefore put it aside.
The dense SRAM cell uses special design rules which are proven for the implementation by the foundry. It is almost as compact as the custom optimized layout of the 3T DRAM cell despite for using double the transistor count and both NMOS and PMOS transistors, which imposes n-well spacing rules.
The dense SRAM cell has been used for the core of the memory while it was surrounded by full custom periphery. In principle, memory generators are available to configure the cells in an array and
provide all required periphery. They generate however a large overhead by adding address decoders and peripheral circuits which allow for fast access times. This overhead is not required in the DSSC readout ASIC because access times are very relaxed (> 200 ns). Consequently, it is much more efficient in terms of area usage to use a full custom periphery and addressing scheme. While an initial topology has been studied in [50], the final architecture and readout concept has been developed within the scope of the paper at hand. A memory capacity of 800 words (9 bit) per pixel could be achieved using an area of 76 × 229 µm2 (37% of the pixel).
SRAM Cell RowEn BL RowEn〈0〉 BL_B Data<0> ColEn〈9:0〉 ColBlockSel <9> PrechargeBLs
(to next RAM bit) (from prev RAM bit)
F1/2 SerIn SerOut SramRead 0 1 Column Mux. Data_B<0> Write PrechargeBB Controlled by the Serializer
Controlled by the SRAM Controller
<0> <9> <0>
RowEn〈1〉
Figure 5.14: Schematic of a BitBlock. The depicted circuitry includes the memory cells (green) the column access multiplexer (blue), readout register (red) and write driver. The block is replicated 9 times to store 9 bit words.
5.3 The ASIC Pixel
BitBlock
Col. Mux., Readout Registers
Cntrl Signal Buffers
SerIn SerOut
Figure 5.15: Layout of the memory. The memory size is 76 × 229 µm2. The capacity is 800 words of 9 bit, the area of the in-pixel periphery is marginal.
Memory
Pixel〈3〉 Pixel〈2〉 Pixel〈1〉 Pixel〈0〉
Memory Memory Memory
Memory Controller & Address Decoder
Figure 5.16: Buffering scheme for the address and control signals of the memory, exemplary for only four signals. The memory controller and address decoder are located in the periphery, signal distribution is shared among four pixels to minimize the routing in the pixel columns.
Memory Topology & Addressing Scheme
The pixel memory spans the full width of the pixel, and uses metal layers 1-3 for local routing and 4-5 for the global routing. Constraints for the layout are given in section 5.3.9.
The memory is arranged in sub blocks of so called BitBlocks, a schematic of a BitBlock is depicted in figure 5.14. It comprises a 40 rows × 20 columns bit cell matrix which stores one bit of each data word, a column access multiplexer and the associated peripheral circuits for reading and writing. An individual memory cell is addressed by asserting one RowEn signal to select the row and switching the column multiplexer to the desired column. The column access multiplexer is implemented by two stages of NMOS-only pass gates requiring a ColBlockSel and 10 one hot encoded ColEn signals. Although more stages of pass gates would be possible to reduce the number of ColEn signals, the layout of this topology integrated with the further periphery has given the best silicon and metal area usage. A detailed description of the SRAM working principle can be found for instance in [52]. The cell
is read by precharing the BitLines and subsequently connecting the cell of interest through an NMOS pass gate. In standard memories, differential sense amplifiers8 are used to evaluate the BitLines as fast as possible. For this ASIC, enough time is available and a simple inverter is sufficiently fast. The employed writing mechanism is based on precharge and pull down of one of the BitLines and subsequently connecting the cell. The cell is thus forced into the desired state. In order to reach the full supply rail level, both the BitBus and BitLines are pre-charged to VDD before the SRAM cell is connected.
The BitBlock is replicated 9 times to store full words. They share all control signals while they are serially connected among each other for the readout scheme (SerIn and SerOut in figure 5.14). The pixel therefore has a single serial data input pin and a single serial data output pin, which are connected to the neighboring pixels in the same pixel column. The serial readout scheme is handled in section 5.3.5. The explanation of the reading mechanism is therefore narrowed to loading the data to the serial readout register.
To further safe area, the pixels do not comprise an address decoder and one-hot encoded signals are propagated directly to the pixels. Besides these addressing lines, the memory needs eight further signals for control. The total of 58 signals however is too much to be routed in a single pixel column and is therefore shared among four pixels. This is conveniently possible because the memory spans the entire pixel width and the horizontal signal wires thus run solely above the memory and do not interfere with any other circuits. In each pixel column, one fourth of the control signals is routed which is little enough to avoid congestion. Each signal is buffered once for four pixels to de-load the vertically running signal lines. Each pixel cell therefore comprises 15 buffers, which are connected a level above the pixel cell in the hierarchy. The routing scheme is illustrated exemplary for four control signals in figure 5.16.
The testing mechanism for the memory is described in the section 5.3.5 because it makes use of the readout structures. The memory can be turned off by disabling the peripheral controller in which case only the last word written on the BitBus is held and transferred into the readout register (single frame readout).