6.3.1
Generated Design Checkpoint
Since it was not possible to choose the placement of the added 6LUTs, this was done by Vivado after the checkpoint was loaded back into the suite. Vivado’s placement algorithm placed these BELs close to the buffers in the delay line. Figure 6.9 shows the design after routing and placement with 150 added 6LUTs.
Figure 6.9: Device view after the design has been fully placed and routed by Vivado. The amount of added 6LUT BELs is 150.
6.3.2
Measurements
The frequency measurements have been performed on the FPGA (Zynq-7020 on a ZedBoard). Due to the unexpected low degree of linearity, a second set of measurements was performed on a different device (FPGA #2), of the same model, in order to gain more insights. The interval between amount of added 6LUTs in each checkpoint was increased in a few steps. The resulting plot is found in Figure 6.10.
Figure 6.10: Oscillation frequency of the RO as a function of the amount of added 6LUTs to each net. Using Equation 5.1 and 5.2, these frequency measurements have been compiled into the corresponding added delay times per DE, displayed in Figure 6.11. The delay time at zero added LUTs is taken as the reference point.
Figure 6.11: Additional delay time per DE, for a different amount of added LUT BELs. Measurements have been performed on two devices of the same model (Zynq-7020 on a ZedBoard). A line with the difference between the two FPGAs has been added.
The linearity of this curve leaves much to be desired. After a sharp rise in delay time, the growth comes to an end. The difference in delay time between the two FPGAs seems to be constant and has an average of 1.1 ps.
Chapter 7
Discussion
In this chapter, the validity of the results presented in this thesis are examined. Problems in the de- veloped Java software are described as well as factors that could have affected the measurement results.
7.1
Increased Path-Length
After inspecting the generated design checkpoints of the implemented path elongating program, some bugs in the program surfaced. The first problem occurs with the net routed with the smallest offset. This problem is an oversight in the program; instead of trying to route the net to the first encountered interconnect tile, it contains an additional wire in the originating tile itself (see Figure 6.3A). If this net would have used a wire in the first neighbouring interconnect, the resulting oscillation frequency is expected to be higher due to the path being shorter.
A second problem occurred when the program is requested to route the net with an offset of 16 tiles. The program failed to route these nets (see Figure 6.3B). As a result, Vivado routes these nets, causing a large peak in the measured frequencies.
The reason for this failure is a shortcoming in the program implemented with RapidWright. When it is selecting an offset tile to route the net through, it counts the amount of tiles that are prefixed with the characters ”INT ”. This way, the program would only use regular interconnect tiles for the routing. However, at an offset of 16, the router encounters an interconnect tile that does not feature a switchbox, but serves as an interface to external devices. This tile does not contain any wires that are suitable for the path, causing the router to fail. The program was able to route all other nets.
7.2
Increased Fan-Out
During implementation of the increased fan-out concept did not appear fruitful. It was expected that, even without the control over placement, a correlation between the amount of added BELs and the delay time was present (over the whole range).
It was discovered after the measurements that PIPs have a property that could be the reason for this unexpected behaviour. This property describes whether or not the PIP is buffered. If a PIP is imple- mented as a transmission gate (which was assumed during the experiment), as displayed in Figure 7.1, the impedance that the starting node experiences at the PIP is that of the attached node. The PIP is then modelled as a switch.
Figure 7.1: A transmission gate circuit, used to electrically connect two nodes (A and B). If EN is high, the nodes are connected, otherwise not.
However, if the PIP is buffered, the attached node will be driven by the buffer, depending on the voltage at the starting node. The two notes are isolated in the sense that the impedance experienced by the starting node is that of the buffer, which is not influenced by the impedance of the node on the other side. This means that added capacitance on the attached node will not accumulate with that of the starting node. The two scenarios are both displayed in Figure 7.2.
Figure 7.2: Circuit diagram A) shows a simple model of the delayed net if a PIP would consist of a transmission gate. Circuit B) shows the same model, except if the PIP was buffered, and implemented with a logical AND gate. In this case, the capacitance behind the AND gate would not affect the node before the gate.
7.3
Inaccuracies
Temperature and core voltage can have a significant impact on the frequency of ring oscillators in FPGAs [5]. Due to the short time span of this thesis, these variables were not controlled during experi- ments.
Even though the experiments were done in rapid succession, and attempts have been made to prevent the device from heating up or cooling down during experiments, temperature fluctuations could have
affected measurements. This could be caused by changing in ambient temperature, or by designs having different power consumption.
It should also be noted that within-die variations (caused during production or ageing of the device), and thus absolute placement of the design, has an influence on ring oscillator frequencies [18]. This was not accounted for during the experiments. The delay line buffers locations were fixed during the experiments in order to reduce sweeping parameters.
Intra-die variations have been briefly investigated by performing the path-length measurements on two devices. This number of devices, however, results in too little data to compile into meaningful information about accuracy.
Chapter 8
Conclusion
Two concepts of DE have been implemented and evaluated. The first one, which makes use of path length in order to increase delay, shows to be working. The problems of the software are known, and can be solved easily. However, the model used for the used routing algorithm assumes that all wires add the same delay time; the algorithms only counts the amount of wires, but does not consider the type of wires. It is expected that this had a negative impact on the accuracy of the generated DEs. In Chapter 9 a solution for this shortcoming is given.
The second concept, which exploits the fan-out in order to increase delay, has not been proven to work yet. However, as with the previous one, this method’s problems are known and can be solved easily. At this moment, there is not enough information to make judgements on it’s capabilities. Nevertheless, a working version of this method has many options for optimisation, due to the large number of param- eters available: the number of pins, the length of the wires to those pins and the types of BELs used. Additionally, the performed measurements showed a relatively low spread, in comparison with the first method, even though there was a low degree of control. Further experiments can determine if the vari- ations can be controlled, or that they are the result of low accuracy. In the end, there is no evidence that this concept cannot succeed. Therefore, in combination with previously mentioned arguments, this method is judged to have the highest chance of becoming a method for creating high precision delay elements.
Chapter 9
Recommendations
Increasing the path length is a working method to create delay in an FPGA. However, the developed method can be improved. Obviously, the bugs in the program should be prevented. Furthermore, the routing algorithm used in the program can be improved upon; timing models could be extracted from Vivado to increase the accuracy of the custom router. The resolution can then be based on the predicted delay of single wires, instead of on the amount of used tiles.
The added delay by increase fan-out has a chance of success if the buffered interconnect points are evaded. Then, assuming it works, more accuracy in the delay can be achieved by introducing con- trol over the placement of BELs and the routing (and therefore the length and capacitance of the wires).
A better model of the FPGA fabric could help with predicting the DE time values. The model used in this thesis can be improved upon by increasing the complexity. For example, if the resistance of the wires and PIPs is taken into account, the model would consist of multiple cascaded RC filters. Additionally, efforts can be made to perform simulations on the fully implemented designs (manipulated using a RapidWright program) in Vivado. As explained in Chapter 5, this was not done in this thesis.
Because the next steps in the development of a DE based on increase fan-out are very small, it is recommended to first continue the research in this direction. After implementing this technique, while accounting for buffered PIPs, the effectiveness of the method can truly be evaluated.
Lastly, an improved testing setup should be developed. This setup should, for one, account for envi- ronmental effects by controlling and monitoring the temperature of the device and its core voltage. This would increase the measurement accuracy. Additionally, prototyping and testing can be sped up and the setup can be reduced in complexity by implementing frequency measurement hardware in the FPGA itself (using an asynchronous counter and the processing system on a SoC device). An architecture for this purpose is proposed in Appendix A.
References
[1] Wikipedia,Synchronous circuit, Accessed at: March 12, 2019. [Online]. Available:https://en. wikipedia.org/wiki/Synchronous_circuit.
[2] J. Kalisz, “Review of methods for time interval measurements with picosecond resolution”, 1, vol. 41, IOP Publishing, Dec. 2003, pp. 17–32. DOI:10.1088/0026- 1394/41/1/004. [Online]. Available:https://doi.org/10.1088%2F0026-1394%2F41%2F1%2F004.
[3] A quick review of Time-to-Digital Converter architectures, Jul. 2016. [Online]. Available:https:
//transistorized.net/post/stdal/post56.htm.
[4] Helion Technology Limited,Physically Unclonable Function (PUF) in FPGA and ASIC, Accessed at: March 25, 2019. [Online]. Available:https://www.heliontech.com/puf.htm.
[5] J. J. L. Franco, E. Boemo, E. Castillo, and L. Parrilla, “Ring oscillators as thermal sensors in fpgas: Experiments in low voltage”, in2010 VI Southern Programmable Logic Conference (SPL), Mar. 2010, pp. 133–137.DOI:10.1109/SPL.2010.5483027.
[6] Wikipedia,Xilinx Vivado, Accessed at: March 12, 2019. [Online]. Available:https://en.wikipedia. org/wiki/Xilinx_Vivado.
[7] Xilinx Inc.,Vivado Design Suite Properties Reference Guide, version UG912 (v2018.3), [Online]. Available:https : / / www . xilinx . com / support / documentation / sw _ manuals / xilinx2018 _ 3 / ug912-vivado-properties.pdf.
[8] C. Lavin and A. Kaviani, “RapidWright: Enabling Custom Crafted Implementations for FPGAs”, in
IEEE International Symposium on Field-Programmable Custom Computing Machines, April 29 -
May 1, Boulder, CO, USA, 2018.
[9] Xilinx Inc.,RapidWright. [Online]. Available:https://www.rapidwright.io/.
[10] David J. Griffiths,Introduction to Electrodynamics, 4th ed. Pearson, 2013,ISBN: 9781292021423. [11] Wikipedia,Speed of electricity, Accessed at: March 26, 2019. [Online]. Available:https://en.
wikipedia.org/wiki/Speed_of_electricity.
[12] H. W. Min Zhang and Y. Liu, “A 7.4 ps fpga-based tdc with a 1024-unit measurement matrix”,
Sensors (ISSN 1424-8220; CODEN: SENSC9), vol. 17, 4 Apr. 2017.DOI:10.3390/s17040865.
[13] Xilinx Inc.,Vivado Design Suite 7 Series FPGA and Zynq-7000 SoC Libraries Guide, version UG953 (v2018.2), [Online]. Available:https://www.xilinx.com/support/documentation/sw_manuals/ xilinx2018_2/ug953-vivado-7series-libraries.pdf.
[14] ——,Xilinx. [Online]. Available:https://www.xilinx.com/.
[15] Avnet Inc.,ZedBoard Technical Specifications, Accessed at: March 6, 2019. [Online]. Available:
http://zedboard.org/content/zedboard-0.
[16] ——, ZedBoard, Accessed at: March 6, 2019. [Online]. Available: http : / / zedboard . org /
product/zedboard.
[17] J. O. S. III,Physical Audio Signal Processing: for Virtual Musical Instruments and Digital Audio
Effects. W3K Publishing, Dec. 2010, ch. Tapped Delay Line (TDL), Accessed at: March 17, 2019,
ISBN: 978-0974560724. [Online]. Available:https://www.dsprelated.com/freebooks/pasp/ Tapped_Delay_Line_TDL.html.
[18] P. Sedcole and P. Y. K. Cheung, “Within-die delay variability in 90nm fpgas and beyond”, in2006
IEEE International Conference on Field Programmable Technology, Dec. 2006, pp. 97–104.DOI:
Appendix A
Proof of Concept Design
An simple architecture for development of the DEs was designed and partially implemented during this thesis. It was designed for the ZedBoard. The design describes a system that creates glitches and feeds them into a signal monitor, based on a tapped delay line (see Figure A.1). The design features a controller in the programmable logic which controls the signal generator, the delay line, and the programmed interconnect features (a demultiplexer is used to switch the system between RO mode and signal monitor mode).
Figure A.1: Block Diagram of proof of concept system that was not fully implemented. Bold lines indicate combined signals and wires (busses).
The tapped delay line can also be put into a loop with an inverter, turning it into a ring oscillator. This way, the system contains all the functions needed to measure delay line, without changing the programming of the logic. In order to exclude the interconnect delay from that of the delay line during measurements, it should be measured too. This is done using a second demultiplexer.
Figure A.2: The two modes in which the ring oscillator can be configured.
The controller in the FPGA part of the device is connected to the processing system through the AXI interface of the device. On the processing system, a conventional program can run that serves as an interface between an external device (which can be connected over USB) and the system on the FPGA. This way, an external computer can command the system and log the measured data.
Figure A.3 shows the architecture in a block diagram in Vivado. Not all modules in the diagram are implemented, they are nonfunctionalblack boxes.
Appendix B
Program Source Code
In this appendix includes the developed Java code (using the RapidWright framework). Both programs are the content of a.javafile which is placed in an Eclipse project, together with the RapidWright source code.
B.1
Detour Routing Program
The following Java program uses the RapidWright framework to route the nets with a detour (as ex- plained in Chapter 4). 1 import j a v a . u t i l . A r r a y L i s t ; import j a v a . u t i l . A r r a y s ; 3 import j a v a . u t i l . Comparator ; import j a v a . u t i l . HashSet ; 5 import j a v a . u t i l . L i s t ; import j a v a . u t i l . P r i o r i t y Q u e u e ; 7 import j a v a . u t i l . I t e r a t o r ; 9 import com . x i l i n x . r a p i d w r i g h t . d e s i g n . C e l l ;
import com . x i l i n x . r a p i d w r i g h t . d e s i g n . Design ; 11 import com . x i l i n x . r a p i d w r i g h t . d e s i g n . Net ;
import com . x i l i n x . r a p i d w r i g h t . d e v i c e . BEL ; 13 import com . x i l i n x . r a p i d w r i g h t . d e v i c e . Device ;
import com . x i l i n x . r a p i d w r i g h t . d e v i c e . PIP ; 15 import com . x i l i n x . r a p i d w r i g h t . d e v i c e . S i t e ;
import com . x i l i n x . r a p i d w r i g h t . d e v i c e . T i l e ; 17 import com . x i l i n x . r a p i d w r i g h t . d e v i c e . Wire ;
import com . x i l i n x . r a p i d w r i g h t . e d i f . EDIFCell ; 19 import com . x i l i n x . r a p i d w r i g h t . r o u t e r . RouteNode ;
21 import j a v a . i o . F i l e W r i t e r ; import j a v a . i o . I O E x c e p t i o n ; 23 import j a v a . i o . P r i n t W r i t e r ; import j a v a . t e x t . SimpleDateFormat ; 25 import j a v a . u t i l . Date ; 27 p ub li c class DetourRouter V3 { 29 / / Constants p r i v a t e f i n a l s t a t i c S t r i n g [ ] p r e f i x C h a r s = {” A ” , ” B ” , ”C” , ”D”}; 31
p r i v a t e f i n a l s t a t i c S t r i n g d c p I n F i l e = ”C:\ \Users\\User\\Documents\\UT\\M12\\EXEC 1\\
P r o j e c t p a r t i a l l y p l a c e d\\c h e c k p o i n t . dcp ” ; 33 p r i v a t e s t a t i c i n t d e t o u r T i l e s ; 35 p r i v a t e s t a t i c Design d e s i g n ; 37 p r i v a t e s t a t i c Device d e v i c e ;
39 / / Change w i t h each i t e r a t i o n o f DEs p r i v a t e s t a t i c i n t deIndex ; 41 p r i v a t e s t a t i c C e l l d e S r c C e l l ; p r i v a t e s t a t i c EDIFCell d e P a r e n t C e l l ; 43 p r i v a t e s t a t i c Net deNet ; p r i v a t e s t a t i c T i l e d e s t i n a t i o n T i l e ;