Generalizing Architectures for Hardware Tags

All the above described systems make use of hardware tags for dynamic analysis. The common features of these applications include association of metadata with data at a fine granularity, and hardware maintenance and checking of metadata. Additionally, the analyses that interact with software require both software management of policies governing the metadata, and a low-overhead mechanism for invoking a software handler for further analysis. Specifically, all these systems require that hardware maintain the metadata in order to have low performance overheads, and perform periodic checks on the metadata at certain boundaries (defined by the system). When the analysis interacts with software, the system must maintain a software handler that both manages the policies in order to ensure flexibility and configurability, and perform a further analysis in the case of a tag exception.

As Table 8.1 illustrates, the previously mentioned systems have two fundamental dif- ferences. First, not all systems require propagation of tags. While every analysis requires some kind of support for tag checks, only information flow analyses such as DIFT and profiling require support for propagation of tags. The second difference is the decoupling allowed between data and metadata. Some analyses such as DIFT do not require precise tag exceptions, allowing for the use of a coprocessor such as the one described in Chapter 5 to minimize changes required to be made to the main processor core.

A general architecture for tags must thus have the following features:

• Ability to associate metadata with every word of data in the system. Hardware should provide a fine-grained tag management scheme, allowing the analysis to be able to specify policies at the granularity of words, or even bytes, of memory. In addition, many analyses have shown that metadata exhibits significant spatial locality. Thus, the architecture must also have the ability to specify metadata at coarser granularities, such as at the granularity of a page of data. The system must also provide support for a multi-granular tag management scheme to account for the spatial locality that tags tend to exhibit [24, 96]. This in turn begets the need for a flexible scheme for maintaining and caching tags. This scheme would provide correct tag management in the caches, when configured with the desired length of tags.

• Hardware to perform low-level operations on the metadata. The hardware should store the metadata, and perform tag checks. In order for the architecture to be com- pliant with existing DRAM memory formats, it is necessary to maintain metadata on a separate page. This requires that the operating system be made aware of metadata in order to perform memory allocation and schedule memory swapping accordingly. Tag propagation and decoupling tag analyses onto a dedicated coprocessor are related issues that are not central to all analyses. The techniques described in Chapter 5 are applicable to any analysis that requires information flow propagation. Other analyses

that do not fit the information flow paradigm could use a more generalized propagation mechanism such as that implemented in FlexiTaint [88], where software is re- sponsible for setting the propagation policies on a per-instruction basis. While many analyses such as those using pointer bits, or full-empty bits require tight coupling between data and tags, analyses such as DIFT allow for the decoupling of metadata processing. These analyses differ in the granularity of synchronization required between data and tags. Analyses that do not require synchronization on every instruction can be decoupled to a coprocessor. Analyses such as information flow control require support for precise exceptions. Decoupling such analyses would require that instruction commit be delayed until the metadata is processed and checked by the coprocessor. This is similar to the DIVA architecture for reliability, which shows that the performance overheads of such a scheme, while higher than that of the DIFT coprocessor described in Chapter 5, are acceptable under certain scenarios [3]. • Software management of metadata policies. As argued in Chapter 3, hardcoding

policies in hardware restricts the adaptability and malleability of the analysis system. As illustrated by Table 8.1, many analysis systems require the ability to specify and configure the analysis policies in a software handler. Software policies can be encoded in hardware registers which in turn define the check (and if required, propagation) policies. In order to be able to apply an analysis routine on the operating system, the software handler must run in a special operating mode outside supervisor mode.

• Low-overhead hardware exceptions. Many analysis architectures require the abil- ity to invoke the software handler to run further analysis, log data, or terminate the application as the case may be. The frequency of invocation of this handler is de- pendent upon the analysis chosen. In order to reduce the overhead of the software

analysis routine, hardware must provide a low-overhead exception mechanism. Tra- ditional exception mechanisms require context switch operations which are very ex- pensive operations. Running the software handler in the same address space as the application allows for an inexpensive transition to the analysis routine when a hardware check fails. This provides the system with the ability to run more complex analyses in software as required, extending its capabilities significantly.

As mentioned earlier, features such as propagation of tags are not central to all analysis systems. The ability to incorporate such features is thus, best provided by means of a decoupled coprocessor. This minimizes the changes to the main core, and allows for the ability to update the coprocessor easily depending upon the choice of analysis.

8.8 Related Work

While there has been significant work on adding analysis-specific microarchitectural features to systems [32, 35, 81], very few systems have focused on adding a configurable set of features that can be programmed to serve different needs. Consequently, chip designers are often loathe to adding such analysis-specific features to their designs, since they cannot be reused for other purposes. The log-based architecture [12, 13] is one such design that attempts to provide a set of hardware primitives that can be used to perform a variety of dynamic analyses. As explained in Chapter 5, this architecture offloads the functionality of the analysis to another core in a multi-core chip. The analysis is performed in a software dynamic binary translation environment. The core running the application generates a trace of executing instructions which is used by the analysis core. While this approach provides the flexibility to implement arbitrarily complex analyses in software, the hardware changes are invasive, and have a high area and performance overhead, as explained in Chapter 5.

Smart Memories [31, 56] is an architecture that provides configurability in memory controllers, and breaks down the on-chip memory system’s functionality into a set of basic

operations. The system also provides the necessary means for combining and sequencing these operations. This configurability allows the system to dynamically change the data communication protocol implemented by its memory controller. In order to provide this configurability, there are six metadata bits associated with every data word of memory whose functionality can be extensively programmed. The memory controller also has the ability to update these bits on a hardware access, and accesses them concurrently with data. Smart Memories used these bits to implement a variety of memory models by configur- ing them to implement cache line states, transaction read/write sets, or even fine-grained locks [56]. The system provides both the ability to associate metadata with every word of memory, and the support to maintain and manage this metadata. Combined with a software monitor for managing the metadata policies and a low-overhead hardware exception mechanism, it could potentially serve as a generalized architecture for metadata analysis.

8.9 Summary

Architectural support for dynamic analysis has been a fertile area of research. There have been many architectures proposed that make use of tags for dynamic analyses. For an architectural change to be practically viable to processor vendors, it must be applicable to a suite of applications, thus allowing for the cost of implementation to be amortized. Since most of the applications require a certain common subset of features to be implemented by the analysis system, it is possible to build a general tag architecture framework that can be used by a whole suite of analyses.

In this chapter, we surveyed some of the more common tag architectures, and codified the common primitives exposed by these systems, in order to obtain a blueprint of a generalized tag architecture. Such an architecture would maintain and manage tags in hardware, and manage policies in software, with a low-overhead tag exception mechanism. Other application-specific features such as propagation of tags could be optionally implemented

in an offcore coprocessor similar to the one proposed in Chapter 5. This allows hardware vendors to amortize the cost and design complexity of tags over multiple processor designs, and use them for multiple analyses and applications, thereby decreasing the risk of implementation.

Conclusions

Dynamic Information Flow Tracking, or DIFT, is a powerful and flexible security technique that provides comprehensive protection against a variety of critical software threats. This dissertation demonstrated that a well-designed hardware DIFT system can protect unmodified applications, and even the operating system, from a wide range of vulnerabilities, with little or no performance, area, and cost penalties.

We developed Raksha, a flexible hardware DIFT platform that allows specification of DIFT security policies using software managed tag policy registers. Raksha provides comprehensive protection against low-level memory corruption exploits such as buffer over- flows and high-level semantic attacks such as SQL injections on unmodified applications, and even the operating system kernel. We built a full-system prototype of Raksha using a synthesizable SPARC V8 processor and an FPGA board, and demonstrated that the area and performance overheads of the Raksha architecture are minimal.

We developed a coprocessor based DIFT architecture to address the practicality issue of implementing DIFT in the real world. Using a coprocessor that encapsulates all DIFT functionality greatly reduces the design and validation overheads of implementing DIFT in the main processor pipeline, and allows for easy reuse across different designs. We prototyped this architecture on a synthesizable SPARC V8 core on an FPGA board. This

decoupled design had low performance overheads, and did not compromise the security of the DIFT approach.

We provided a practical and fast hardware solution to the problem of inconsistency between data and metadata in multiprocessor systems when DIFT functionality is decoupled from the main core. This solution leverages cache coherence mechanisms to record interleaving of memory operations from application threads and replays the same order on metadata processors to maintain consistency, thereby allowing correct execution of dynamic analysis on multithreaded programs.

We also explored using tagged memory architectures to solve security problems other than DIFT. We showed that HiStar, an existing operating system, could take advantage of a tagged memory architecture to enforce its information flow control policies directly in hardware, and thereby reduce the amount of trusted code in its kernel by over a factor of two. Using a full-system prototype built with a synthesizable SPARC core and an FPGA board, we showed that the overheads of such an architecture are minimal.

9.1 Future Work

While there has been significant interest in DIFT in academia, there remain several challenges to the widespread adoption of DIFT in the real world. More study is required to determine what security policies scale to enterprise environments, and what the necessary configurations are. There has also been very little work in exposing APIs to allow for system administrators to easily express their security policies in terms of DIFT mechanisms. Additionally, some web based vulnerabilities will benefit greatly from DIFT support in the language. Very little is known about the implications of adding DIFT support to an existing language [22].

tags. While Chapter 8 identified some critical features required by different dynamic analyses, no current architecture is flexible enough to accommodate all the different require- ments of these applications. This would require a flexible software interface, and APIs to allow system administrators and even application developers to specify their policies that would be directly enforced by the hardware. Such a design would also require the ability to run multiple orthogonal analyses simultaneously with minimal performance and power penalties. Multiplexing different policies on the same tag bits would reduce the storage overhead required, but would impose other correctness and performance challenges on the system. Progress in these areas would be an excellent first step in promoting industry-wide adoption of DIFT and hardware analysis techniques.

[1] AMD. AMD I/O Virtualization Technology Specification, 2007.

[2] AMD. AMD Lightweight Profiling Proposal. http://developer.amd.com/

assets/HardwareExtensionsforLightweightProfilingPublic20070720. pdf, 2007.

[3] Todd Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In the Proc. of the 32nd International Symposium on Microarchitecture (MI- CRO), Haifa, Israel, November 1999.

[4] David E. Bell and Leonard LaPadula. Secure computer system: Unified exposition and Multics interpretation. Technical Report MTR-2997, Rev. 1, MITRE Corp., Bed- ford, MA, March 1976.

[5] Fabrice Bellard. QEMU, a fast and portable dynamic translator. In Proc. of the 2005 USENIX, Freenix track, Anaheim, CA, April 2005.

[6] Kenneth J. Biba. Integrity considerations for secure computer systems. Technical Report TR-3153, MITRE Corp., Bedford, MA, April 1977.

[7] Christian Bienia, Sanjeev Kumar, and Kai Li. PARSEC vs. SPLASH-2: A Quantita- tive Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors.

In the Proc. of the 2008 International Symposium on Workload Characterization (IISWC), Seattle, WA, 2008.

[8] Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In the Proc. of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), Toronto, Canada, October 2008.

[9] Edson Borin, Cheng Wang, Youfeng Wu, and Guido Araujo. Software-based Trans- parent and Comprehensive Control-flow Error Detection. In the Proc. of the 4th Intl. Symp. Code Generation and Optimization (CGO), New York, NY, March 2006. [10] The Burroughs 5500 computer architecture.

[11] CERT Coordination Center. Overview of attack trends. http://www.cert.org/ archive/pdf/attack\ trends.pdf, 2002.

[12] Shimin Chen, Babak Falsafi, et al. Logs and Lifeguards: Accelerating Dynamic Pro- gram Monitoring. Technical Report IRP-TR-06-05, Intel Research, Pittsburgh, PA, 2006.

[13] Shimin Chen, Michael Kozuch, Theodoros Strigkos, Babak Falsafi, Phillip B. Gib- bons, Todd C. Mowry, Vijaya Ramachandran, Olatunji Ruwase, Michael Ryan, and Evangelos Vlachos. Flexible Hardware Acceleration for Instruction-Grain Program Monitoring. In the Proc. of the 35th International Symposium on Computer Architec- ture (ISCA), Beijing, China, June 2008.

[14] Shuo Chen, Jun Xu, Nithin Nakka, Zbigniew Kalbarczyk, and Ravishankar Iyer. De- feating Memory Corruption Attacks via Pointer Taintedness Detection. In the Proc. of the 35th International Conference on Dependable Systems and Networks (DSN), Yokohama, Japan, June 2005.

[15] Shuo Chen, Jun Xu, Emre C. Sezer, Prachi Gauriar, and Ravishankar K. Iyer. Non- Control-Data Attacks Are Realistic Threats. In the Proc. of the 14th USENIX Security Symposium, Baltimore, MD, August 2005.

[16] Andy Chou, Junfeng Yang, Benjamin Chelf, and Dawson Engler. An empirical study of operating system errors. In the Proc. of the 18th ACM Symposium on Operating Systems Principles (SOSP), 2001.

[17] Jim Chow, Ben Pfaff, Tal Garfinkel, Kevin Christopher, and Mendel Rosenblum. Un- derstanding Data Lifetime via Whole system Simulation. In the Proc. of the 13th USENIX Security Conference, August 2004.

[18] JaeWoong Chung, Michael Dalton, Hari Kannan, and Christos Kozyrakis. Thread- Safe Dynamic Binary Translation using Transactional Memory. In the Proc. of the 14th International Conference on High-Performance Computer Architecture (HPCA), Salt Lake City, UT, February 2008.

[19] M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham. Vigilante: End-to-end containment of internet worms. In the Proc. of the 20th ACM Symposium on Operating Systems Principles (SOSP), Brighton, UK, October 2005. [20] Jedidiah R. Crandall and Frederic T. Chong. MINOS: Control Data Attack Prevention

Orthogonal to Memory Model. In the Proc. of the 37th International Symposium on Microarchitecture (MICRO), Portland, OR, December 2004.

[21] Cross-Compiled Linux From Scratch. http://cross-lfs.org.

[22] Michael Dalton. The Design and Implementation of Dynamic Information Flow Tracking Systems For Software Security. PhD thesis, Stanford University, Decem- ber 2009.

[23] Michael Dalton, Hari Kannan, and Christos Kozyrakis. Deconstructing Hardware Ar- chitectures for Security. In the 5th Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD), Boston, MA, June 2006.

[24] Michael Dalton, Hari Kannan, and Christos Kozyrakis. Raksha: A Flexible Informa- tion Flow Architecture for Software Security. In the Proc. of the 34th International Symposium on Computer Architecture (ISCA), San Diego, CA, June 2007.

[25] Michael Dalton, Hari Kannan, and Christos Kozyrakis. Real-World Buffer Overflow Protection for Userspace and Kernelspace. In the Proc. of the 17th Usenix Security Symposium, San Jose, CA, July 2008.

[26] Michael Dalton, Christos Kozyrakis, and Nickolai Zeldovich. Nemesis: Preventing Authentication and Access Control Vulnerabilities in Web Applications. In the Proc. of the 18th Usenix Security Symposium, Montreal, QC, August 2009.

[27] David Culler, Jaswinder Pal Singh, Anoop Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1998.

[28] Dorothy E. Denning and Peter J. Denning. Certification of programs for secure infor- mation flow. ACM Communications, 20(7), 1977.

[29] E. Marcus and H. Stern. Blueprints for High Availability. John Willey and Sons, 2000.

[30] Petros Efstathopoulos, Maxwell Krohn, Steve VanDeBogart, Cliff Frey, David Ziegler, Eddie Kohler, David Mazi`eres, Frans Kaashoek, and Robert Morris. La- bels and event processes in the Asbestos operating system. In the Proc. of the 20th ACM Symposium on Operating Systems Principles (SOSP), Brighton, UK, October 2005.

[31] Amin Firoozshahian, Alex Solomatnikov, Ofer Shacham, Zain Asgar, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. A Memory System Design Framework: Creating Smart Memories. In the Proc. of the 36th International Sympo- sium on Computer Architecture (ISCA), Austin, TX, June 2009.

[32] George Davison, Constantine Pavlakos, Claudio Silva. Final Report for the Tera Com- puter TTI CRADA. Sandia National Labs Report SAND97-0134, January 1997. [33] Vivek Haldar, Deepak Chandra, and Michael Franz. Dynamic taint propagation for

java. Computer Security Applications Conference, Annual, 0:303–311, 2005.

[34] Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. Transactional memory coherence and consistency. In the Proc. of the 31st International Symposium on Computer Architecture (ISCA). Munchen, Germany, Jun 2004.

[35] IBM Corporation. IBM system i. http://www-03.ibm.com/systems/i. [36] Imperva Inc., How Safe is it Out There: Zeroing in on the vulnerabili-

ties of application security. http://www.imperva.com/company/news/ 2004-feb-02.html, 2004.

[37] Intel. Intel Itanium Architecture Software Developer’s Manual.

[38] Intel Corporation. Intel i960 processors. http://developer.intel.com/ design/i960/.

[39] Intel Virtualization Technology (Intel VTx). http://www.intel.com/ technology/virtualization.

[40] Hari Kannan. Ordering Decoupled Metadata Accesses in Multiprocessors. In the Proc. of the 42nd International Conference on Microarchitecture (MICRO), New York City, NY, December 2009.

[41] Hari Kannan, Michael Dalton, and Christos Kozyrakis. Raksha: A Flexible Architec- ture for Software Security. In the Technical Record of the 19th Hot Chips Symposium, Stanford, CA, August 2007.

[42] Hari Kannan, Michael Dalton, and Christos Kozyrakis. Decoupling Dynamic Infor- mation Flow Tracking with a Dedicated Coprocessor. In the Proc. of the 39th Inter-

In document THE DESIGN AND IMPLEMENTATION OF HARDWARE SYSTEMS FOR INFORMATION FLOW TRACKING (Page 157-178)