Future work - Conclusions and Future Work

Conclusions and Future Work

6.1 Future work

We have successfully accelerated the non-bonded force computations to an extent that it is no longer the dominating function. There are a number of future directions for this research including accelerating the currently dominant neighborlist build function on the second FPGA. We showed the theoretical estimation of an un-optimized design of FPGA-based neighborlist build function in Section 4.2.1. The design had four pipelines and was estimated to be 6x slower than the half neighborlist build on the host (i.e. about 3x slower than a full neighborlist build function on the host). The design could be optimized in the future to reduce the number of iterations in the pipelines. The SRC-7 H MAP used in this research consists of 16 logical OBM banks, thus allowing 16 concurrent OBM accesses without any stalls in the pipeline.

With an FPGA-based system that would allow more concurrent data accesses (i.e.

more OBM banks), it will be possible to implement more concurrent neighborlist computation (pipelines) and thus achieve better performance. Another potential research direction is to use fixed-point arithmetic instead of floating-point and reduce the FPGA resource utilization, which may allow us to implement additional force computation pipelines in the Stratix II FPGA device.

Further, only one of the two cores available on the host is used in the current implementation. In the future, we plan to implement a threaded version of the FPGA-accelerated LAMMPS that will make use of both the cores available on the dual-core Xeon host and both the FPGAs available on the MAP processor. Such an implementation will be the most resource-efficient implementation on a single SRC-7 H MAP reconfigurable system. We then plan to extend this work to a cluster of MAPstations and study the effect of FPGA acceleration on the parallel execution of LAMMPS.

Bibliography

[1] H. Kai and B. A. Faye, Computer Architecture and parallel processing. McGraw-Hill Book Company, 1984.

[2] C. H. Crawford, P. Henning, M. Kistler, and C. Wright, “Accelerating computing with the cell broadband engine processor,” in CF ’08: Proceedings of the 5th conference on Computing frontiers, (New York, NY, USA), pp. 3–12, ACM, 2008.

[3] D. A. Case, T. E. Cheatham, T. Darden, H. Gohlke, R. Luo, K. M. Merz, A. Onufriev, C. Simmerling, B. Wang, and R. J. Woods, “The AMBER biomolecular simulation programs.,” Journal of computational chemistry, vol. 26, pp. 1668–1688, December 2005.

[4] S. R. Alam, P. K. Agarwal, M. C. Smith, J. S. Vetter, and D. Caliga, “Us-ing FPGA devices to accelerate biomolecular simulations,” Computer, vol. 40, pp. 66–73, March 2007.

[5] J. C. Phillips, G. Zheng, S. Kumar, and L. V. Kale, “NAMD: Biomolecular simulation on thousands of processors,” in Proceedings of SC 2002, pp. 1–18, 2002.

[6] J. E. Stone, J. C. Phillips, P. L. Freddolino, D. J. Hardy, L. G. Trabuco, and K. Schulten, “Accelerating molecular modeling applications with graphics pro-cessors,” Journal of Computational Chemistry, vol. 28, pp. 2618–2640, September 2007.

[7] N. Azizi, I. Kuon, A. Egier, A. Darabiha, and P. Chow, “Reconfigurable molec-ular dynamics simulator,” in Proceedings of 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2004., vol. 0, (Los Alamitos, CA, USA), pp. 197–206, IEEE Computer Society, April 2004.

[8] “The Transmogrifier-3,” [Online]. http://www.eecg.utoronto.ca/˜tm3/. [Ac-cessed:March 2010].

[9] R. Scrofano and V. K. Prasanna, “Computing Lennard-Jones potentials and forces with reconfigurable hardware,” in International Conference on Engineer-ing of Reconfigurable Systems and Algorithm, 2004.

[10] “Xilinx ML300,” [Online]. http://www.xilinx.com/products/boards/ml300/,”

[Accessed:March 2010]

[11] Y. Gu, T. VanCourt, and M. C. Herbordt, “Accelerating molecular dynam-ics simulations with configurable circuits,” in International Conference on Field Programmable Logic and Applications, 2005., vol. 0, (Los Alamitos, CA, USA), pp. 475–480, IEEE Computer Society, August 2005.

[12] “Wildstar II pro,” [Online]. http://www.annapmicro.com/wsiipace.html,”

March 2010.

[13] R. Scrofano, M. B. Gokhale, F. Trouw, and V. K. Prasanna, “Accelerating molec-ular dynamics simulations with reconfigurable computers,” IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 6, pp. 764–778, 2008.

[14] Y. Gu and M. C. Herbordt, “High performance molecular dynamics simula-tionswith FPGA coprocessors,” in RSSI ’07: Proceedings of the Reconfigurable Systems Summer Institute, 2007.

[15] T. Matthey, T. Cickovski, S. Hampton, A. Ko, Q. Ma, M. Nyerges, T. Raeder, T. Slabach, and J. A. Izaguirre, “Protomol, an object-oriented framework for pro-totyping novel algorithms for molecular dynamics,” ACM Trans. Math. Softw., vol. 30, pp. 237–265, September 2004.

[16] D. S. Poznanovic, “Application development on the src computers, inc. systems,”

in IPDPS ’05: Proceedings of the 19th IEEE International Parallel and Dis-tributed Processing Symposium (IPDPS’05) - Papers, (Washington, DC, USA), IEEE Computer Society, 2005.

[17] SRC-7 Carte v3.2 C Programming Environment Guide, June 2009.

[18] V. Kindratenko and D. Pointer, “A case study in porting a production scien-tific supercomputing application to a reconfigurable computer,” in FCCM ’06:

Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Cus-tom Computing Machines, (Washington, DC, USA), pp. 13–22, IEEE Computer Society, April 2006.

[19] P. M. Martin, “Acceleration methodology for the implementation of scientific application on reconfigurable hardware,” Master’s thesis, Clemson University, May 2009.

[20] “Xtremedata Inc,” [Online]. http://www.xtremedata.com, [Accessed:March 2010].

[21] “Impulse-C,” [Online]. http://www.impulseaccelerated.com, [Accessed:March 2010].

[22] “Introduction to the SRC-7 mapstation,” November 2007.

[23] J. Grotendorst and D. Marx, Quantum Simulations of Complex Many-Body Sys-tems: From Theory to Algorithms, Lecture Notes, vol. 10. John von Neumann Institute for Computing, 2002.

[24] M. Crowley, T. Darden, T. Cheatham, and D. Deerfield, “Adventures in improv-ing the scalimprov-ing and accuracy of a parallel molecular dynamics program,” Journal of Supercomputing, vol. 11, pp. 255–278, November 1997.

[25] “LAMMPS home page,” [Online]. http://lammps.sandia.gov, [Accessed:March 2010].

[26] “The AMBER molecular dynamics package,” [Online]. http://amber.scripps.edu, [Accessed:March 2010].

[27] S. Plimpton, R. Pollock, and M. Stevens, “Particle-mesh ewald and rRespa for parallel molecular dynamics simulations,” in In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997.

[28] G. Estrin, B. Bussell, R. Turn, and J. Bibb, “Parallel processing in a re-structurable computer system,” Electronic Computers, IEEE Transactions on, pp. 747–755, December 2006.

[29] K. Compton and S. Hauck, “Reconfigurable computing: a survey of systems and software,” ACM Computing Surveys, vol. 34, pp. 171–210, June 2002.

[30] “Stratix II device handbook, volume 1.,” [Online].

http://www.altera.com/literature/hb/stx2/stratix2 handbook.pdf, [Ac-cessed:April 2010].

[31] S. R. Alam, J. S. Vetter, and M. C. Smith, “An application specific memory characterization technique for co-processor accelerators,” pp. 353–358, March 2008.

In document Acceleration of Biomolecular Simulations using FPGA-based Reconfigurable Computing (Page 62-67)