We have presented CloudVal, a fault injection framework that supports in- jecting different types of fault models. The experiments demonstrate the use of this framework in KVM and Xen virtualization systems (managed by a virt-manager virtual machine manager) by using soft error, guest misbehav- ior, performance fault, and maintenance fault models. The experiment results show that the presented fault injection mechanism and design of fault mod- els are a good starting point to develop a common benchmark for assessing cloud virtualization infrastructures.
REFERENCES
[1] G. Inc., “Market trends: Software as a service, worldwide, 2008-2013, update,” 2009. [Online]. Available: http://www.gartner.com/id=1221513
[2] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of cloud computing,” Communications of the ACM, vol. 53, no. 4, pp. 50–58, Apr. 2010. [Online]. Available: http://doi.acm.org/10.1145/1721654.1721672
[3] A. Stern, “Update from Amazon regarding Friday’s s3 downtime,” 2008. [Online]. Available: http://www.centernetworks.com/amazon-s3- downtime-update/
[4] S. Wilson, “Appengine outage,” 2008. [Online]. Available: http://www.cio-weblog.com/50226711/appengine outage.php
[5] WindowsAzure, “The Windows Azure malfunc-
tion this weekend,” 2008. [Online]. Available: http://blogs.msdn.com/b/windowsazure/archive/2009/03/18/the- windows-azure-malfunction-this-weekend.aspx
[6] D. Ionescu, “Microsoft red-faced after massive data sidekick data loss,” 2009. [Online]. Avail- able: http://www.pcworld.com/article/173470/microsoft redfaced after massive sidekick data loss.html
[7] M.-C. Hsueh, T. K. Tsai, and R. K. Iyer, “Fault injection techniques and tools,” Computer, vol. 30, no. 4, pp. 75–82, 1997.
[8] D. P. Siewiorek, R. Chillarege, and Z. T. Kalbarczyk, “Reflections on industry trends and experimental research in dependability,” Dependable and Secure Computing, IEEE Transactions on, vol. 1, no. 2, pp. 109– 127, 2004.
[9] A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori, “KVM: The Linux virtual machine monitor,” in Proceedings of the Linux Symposium, vol. 1, 2007, pp. 225–230.
[10] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neuge- bauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,” in ACM SIGOPS Operating Systems Review, vol. 37, no. 5. ACM, 2003, pp. 164–177.
[11] “Virt-manager,” 2012. [Online]. Available: http://virt- manager.et.redhat.com/
[12] M. Le, A. Gallagher, and Y. Tamir, “Challenges and opportunities with fault injection in virtualized systems,” in 1st Int. Workshop on Virtual- ization Performance: Analysis, Characterization, and Tools, 2008. [13] D. T. Stott, B. Floering, D. Burke, Z. Kalbarczpk, and R. K. Iyer, “NF-
TAPE: A framework for assessing dependability in distributed systems with lightweight fault injectors,” in Computer Performance and Depend- ability Symposium, 2000. IPDS 2000. Proceedings. IEEE International. IEEE, 2000, pp. 91–100.
[14] Z. Mwaikambo, A. Raj, R. Russell, J. Schopp, and S. Vaddagiri, “Linux kernel hotplug CPU support,” in Linux Symposium, vol. 2, 2004. [15] R. Nou, “Energy efficiency: A case study,” University of Catalonia
(UPC)-Computer Architecture Department, Tech. Rep. UPCDAC-RR- CAP-2009-14, 2009.
[16] T. K. Tsai, M.-C. Hsueh, H. Zhao, Z. Kalbarczyk, and R. K. Iyer, “Stress-based and path-based fault injection,” Computers, IEEE Trans- actions on, vol. 48, no. 11, pp. 1183–1201, 1999.
[17] “OProfile,” 2012. [Online]. Available: http://oprofile.sourceforge.net [18] C. Basile, L. Wang, Z. Kalbarczyk, and R. Iyer, “Group communication
protocols under errors,” in Reliable Distributed Systems, 2003. Proceed- ings. 22nd International Symposium on, 2003, pp. 35–44.
[19] D. Pelleg, M. Ben-Yehuda, R. Harper, L. Spainhower, and T. Adeshiyan, “Vigilant–out-of-band detection of failures in virtual machines,” Oper- ating Systems Review, vol. 42, no. 1, p. 26, 2008.
[20] T. Garfinkel and M. Rosenblum, “A virtual machine introspection based architecture for intrusion detection,” in Proc. Network and Distributed Systems Security Symposium, 2003, pp. 191–206.
[21] K. Chanchio, C. Leangsuksun, H. Ong, V. Ratanasamoot, and A. Shafi, “An efficient virtual machine checkpointing mechanism for hypervisor- based HPC systems,” in High Availability and Performance Computing Workshop, 2008.
[22] A. Kangarlou, P. Eugster, and D. Xu, “VNsnap: Taking snapshots of virtual networked environments with minimal downtime,” in Dependable Systems & Networks, 2009. DSN’09. IEEE/IFIP International Confer- ence on. IEEE, 2009, pp. 524–533.
[23] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus: High availability via asynchronous virtual ma- chine replication,” in Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, 2008, pp. 161–174. [24] L. Wang, Z. Kalbarczyk, R. K. Iyer, and A. Iyengar, “Checkpointing vir-
tual machines against transient errors,” in On-Line Testing Symposium (IOLTS), 2010 IEEE 16th International. IEEE, 2010, pp. 97–102. [25] B. D. Payne, M. de Carbone, and W. Lee, “Secure and flexible monitor-
ing of virtual machines,” in Computer Security Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual. IEEE, 2007, pp. 385–397. [26] J. Carreira, H. Madeira, and J. G. Silva, “Xception: A technique for the experimental evaluation of dependability in modern computers,” Software Engineering, IEEE Transactions on, vol. 24, no. 2, pp. 125– 136, 1998.
[27] A. Benso, M. Rebaudengo, and M. S. Reorda, “FlexFi: A flexible fault injection environment for microprocessor-based systems,” in Computer Safety, Reliability and Security. Springer, 1999, pp. 323–335.
[28] Z. Segall, D. Vrsalovic, D. Siewiorek, D. Yaskin, J. Kownacki, J. Bar- ton, R. Dancey, A. Robinson, and T. Lin, “Fiat-fault injection based automated testing environment,” in Fault-Tolerant Computing, 1988. FTCS-18, Digest of Papers, Eighteenth International Symposium on. IEEE, 1988, pp. 102–107.
[29] S. Han, K. G. Shin, and H. A. Rosenberg, “Doctor: An integrated soft- ware fault injection environment for distributed real-time systems,” in Computer Performance and Dependability Symposium, 1995. Proceed- ings, International. IEEE, 1995, pp. 204–213.
[30] S. Dawson, F. Jahanian, T. Mitton, and T.-L. Tung, “Testing of fault- tolerant and real-time distributed systems via protocol fault injection,” in Fault Tolerant Computing, 1996., Proceedings of Annual Symposium on. IEEE, 1996, pp. 404–414.
[31] W.-I. Kao, R. K. Iyer, and D. Tang, “Fine: A fault injection and moni- toring environment for tracing the unix system behavior under faults,” Software Engineering, IEEE Transactions on, vol. 19, no. 11, pp. 1105– 1118, 1993.
[32] E. Jenn, J. Arlat, M. Rimen, J. Ohlsson, and J. Karlsson, “Fault injec- tion into VHDL models: The MEFISTO tool,” in Fault-Tolerant Com- puting, 1994. FTCS-24. Digest of Papers., Twenty-Fourth International Symposium on. IEEE, 1994, pp. 66–75.
[33] L. Young, R. K. Iyer, K. Goswami, and C. Alonso, “Hybrid monitor assisted fault injection environment,” in Proceedings of the 3th IFIP Intl Working Conference Dependable Computers for Critical Applica- tions (DCCA-3). IEEE, 1993.
[34] J. Arlat, Y. Crouzet, and J.-C. Laprie, “Fault injection for dependabil- ity validation of fault-tolerant computing systems,” in Fault-Tolerant Computing, 1989. FTCS-19, Digest of Papers, Nineteenth International Symposium on. IEEE, 1989, pp. 348–355.
[35] J. R. Samson Jr, W. Moreno, and F. Falquez, “A technique for au- tomated validation of fault tolerant designs using laser fault injection (LFI),” in Fault-Tolerant Computing, 1998. Digest of Papers. Twenty- Eighth Annual International Symposium on. IEEE, 1998, pp. 162–167. [36] P. Joshi, H. S. Gunawi, and K. Sen, “PREFAIL: A programmable tool for multiple-failure injection,” in ACM SIGPLAN Notices, vol. 46, no. 10. ACM, 2011, pp. 171–188.
[37] H. S. Gunawi, T. Do, P. Joshi, P. Alvaro, J. M. Hellerstein, A. C. Arpaci- Dusseau, R. H. Arpaci-Dusseau, K. Sen, and D. Borthakur, “FATE and DESTINI: A framework for cloud recovery testing,” NSDI (to appear), 2011.
[38] H. S. Gunawi, T. Do, J. M. Hellerstein, I. Stoica, D. Borthakur, and J. Robbins, “Failure as a Service (FaaS): A cloud service for large-scale, online failure drills,” University of California, Berkeley, vol. 3, 2011. [39] F. Faghri, S. Bazarbayev, M. Overholt, R. Farivar, R. H. Campbell,
and W. H. Sanders, “Failure scenario as a service (FSaaS) for hadoop clusters,” in Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management. ACM, 2012, p. 5. [40] D. Chen, G. Jacques-Silva, Z. Kalbarczyk, R. K. Iyer, and B. Mealey, “Error behavior comparison of multiple computing systems: A case study using Linux on Pentium, Solaris on SPARC, and AIX on POWER,” in Dependable Computing, 2008. PRDC’08. 14th IEEE Pa- cific Rim International Symposium on. IEEE, 2008, pp. 339–346. [41] W. Gu, Z. Kalbarczyk, Ravishankar, K. Iyer, and Z. Yang, “Charac-
terization of Linux kernel behavior under errors,” in Dependable Sys- tems and Networks, 2003. Proceedings. 2003 International Conference on, 2003, pp. 459–468.