ECS 235A Project - NVD Visualization Using TreeMaps
Kevin Griffin
Email: [email protected] December 12, 2013
1
Introduction
The National Vulnerability Database (NVD) is a continuously updated United States Government repository of vulnerability data [2]. The repository contains a large set of data dating back to around 1997 to the present. The NVD is also a multivari-ate dataset containing attributes like vulnerability score, attack vector, access complexity, integrity impact, etc. The NVD website provides an inter-face for users with a priori knowledge and clues to conduct targeted searches of the underlying data. There are also applications, like Nessus1, that use various components of this data. However, what is missing is a way to explore and visualize the un-derlying dataset, without a priori knowledge and clues, to find trends and vulnerabilities of interest for analysis and hypothesis generation. Traditional visualizations fall short for two main reasons. The first reason is that visualization components like bar, line, and pie charts are not space filling which only allows a very limited amount of data to be visualized at once. This is an issue with NVD since it contains over fifteen years of vulnerability data. Second, most traditional visualizations typi-cally can only handle data with a single attribute. NVD is a multivariate data set that revels a lot of information to the user when subsets of these attributes are visualized together. The purpose of this research is to demonstrate how using a lesser known and utilized visualization, the treemap [3] [7] [8], can solve the short-comings associated with traditional visualizations by being able to visual-ize large datasets, because its a space filling visu-alization that can use the entire display space, and
1
http://www.tenable.com/products/nessus
is able to handle multivariate data. Multivariate data is visualized with treemaps by mapping the various attributes of the NVD data to the various visual attributes of the treemap like size, shape, color, and height.
The main contributions of this project are:
1. Understanding Treemap’s utility for visualiz-ing large data sets
2. Measuring Treemap’s utility for visualizing multivariate data
3. Showing Treemap’s advantages over tradi-tional visualizations (i.e line and bar charts) 4. Visual Analysis Tool. The current system
pro-vides a simple, interactive visual analysis en-vironment to explore the NVD data.
• Coordinated Visualization Views. The system consists of a main overview, using a treemap that was invented in the early 1990s by Ben Shneiderman at the Uni-versity of Maryland, and two secondary bar chart views. All of these views are integrated together and allows the user to perform detailed analysis of the NVD data.
• Filtering. Programmatic filtering of the NVD data has been implemented and is based on the year the vulnerability of in-terest was discovered. Future enhance-ments will allow the user to filter on other attributes of the data, like vendor, prod-uct, and access complexity, in real time from the user interface. This will give the
user the ability to explore the underlying data, without a priori knowledge, to find trends and vulnerabilities of interest for analysis and hypothesis generation.
2
Related Work
The work done by [5] uses NVD along with other security metrics (Nessus scans, router configura-tions, and firewall rules) to create custom se-curity metrics (Patch Risk, Criticality, Sese-curity Score, Time Series) and visualize them using scat-ter graphs, pie charts, ring graphs, bar charts, his-tograms, and quartiles (see Figure 1). They also provide a modest what-if visual analysis of security changes to the computers and networks.
Figure 1: Automatic Security Analysis Dashboard The Scientific Applications & Visualization Group within the National Institute of Standards and Technology (NIST) created a tool, NVDvis (see Figure 2), that reads the lastest version of the National Vulnerability Database [4]. The user can choose Common Vulnerabilities and Exposures (CVE) 1.2 or 2.0. The tool does an initial analysis that is displayed in the Data Analysis pane of the tool. It displays which CVE database was selected and how many entries there were. It provides the average vulnerabilty score as well as the distribu-tion of the scores. NVDvis also gives the number of elements as well as the percentage for each value of the six attributes that make up the score as well as the part and Common Weakness Enumeration
Identifier (CWE-ID) and distribution of date-time. The tool enables the user to:
1. Filter the data in a variety of ways. NVDvis can filter on the vulnerability score as well as the six attributes that contribute to the score: Access vector, access complexity, authentica-tion, confidentiality impact, integrity impact, and availability impact. It also provides ac-cess to Part (application, hardware, operating system), CWE-ID, date-time, and vendor. Af-ter each filAf-tering operation, the Data Analysis pane is updated as well as the visualization. 2. Parallel Coordinate plot the data. These plots
are a way to visualize multidimensional data. They were invented by Alfred Inselberg, who has a tutorial online. Our visualization can be viewed both on the desktop as well as in our immersive environment.
3. Output data in csv, arff, or binary format for further analysis
Figure 2: NVDvis
Other visualization work using this type of data has been primarily in the form of attack graphs. The work by [6] is an example of this type of work. CVE data, which is a subset of the NVD data, is used to identify hosts in a network that have vulnerabilities. An attack graph is then generated that shows the sequence of hosts that an attacker can exploit to gain access to a system. Figure 3 illustrates this type of visualization with the CVE data overlaid on the graph.
Figure 3: Attack Graph
3
System Architecture
The overall system architecture is illustrated in Figure 4. The database is initially populated with data from the NVD XML Data feed with Common Vulnerability Scoring System (CVSS) and Com-mon Platform Enumeration (CPE) mappings (ver-sion 2.0). Each year’s published vulnerabilities are kept in an XML file of the format nvdcve-2.0-[year|recent|modified].xml, where year ∈ [2002 . . . 2013]. The file nvdcve−2.0−[year ].xml contains all of the vulnerabilities found in year, nvdcve-2.0-recent.xml contains all of the recently published vulnerabilities, and nvdcve-2.0-modified.xml con-tains all of the recently published and recently up-dated vulnerabilities. The files are parsed, using a SAX parser, and inserted into a MySQL2database. The complete dataset contains over sixteen years of vulnerability data totaling more than 1.5 million database records. Finally, once the view is ready to be made visible, the data is formated and placed into an appropriate data structure by the Viz Pre-Processor. The pre-processor then hands the data off to the visualization interface.
3.1 Data Storage
The data is stored in a MySQL database using the schema shown in Figure 5. The entity table contains most of the data parsed from the XML file except for the vulnerable software informa-tion and the CWE identifiers. The entity table contains over 58, 000 records. The software table stores, along with other attributes, the name of the vendors, vendor’s products, and product ver-sions affected by vulnerabilities stored in the entity
2
http://www.mysql.com
Figure 4: System Architecture
table. The software table contains over 148, 000 records. The entity software join table maps the CVE vulnerability in the entity table to the vul-nerable product in the software table. This table is the largest with approximately 1.6 million entries.
Figure 5: Database Schema
3.2 Visualizing Large Data Sets
As Figure 6 shows, treemaps are very good for dis-playing large datasets because of its space-filling characteristics. The treemap visualization on the left is displaying over 10,000 software products. In contrast, the bar charts on the right, both top and bottom, are displaying 20 products/vendors com-bined. If you increase that number to only 100 the two bar chart visualizations become almost
un-readable.
Figure 6: Microsoft
3.3 Visualizing Multivariate Data
As stated earlier, NVD is a multivariate dataset. Multivariate data requires a subset of its attributes to be visualized together before the user can start extracting useful meaning from the underlying dataset. For example, Figure 7 shows vulnerability data for Apple in both the Treemap display on the left and the bar chart at the top right. The bar chart gives the vulnerability count for each Apple product. While this gives the user some informa-tion, it falls short on providing a complete under-standing of the underlying data. In particular, it doesnt answer questions like; What type of vulner-abilities are they? How many vulnervulner-abilities were severe (root access) or just minor nuisances? or What vulnerabilities are easy to exploit? If we as-sume that the size of each treemap node indicates how difficult/easy a vulnerability is to exploit and the color (red = severe, green = minor) indicates the severity of the exploit, we can see that we start to get a better understanding of the underlying NVD dataset. At a glance we get a rough idea of how many severe vulnerabilities each product has, how easy it is to exploit them, and how each of the vulnerabilities for each product compare to each other. Furthermore, if other attributes were mapped to the height of each node we get an even better visual interpretation of the underlying data set. Because of the ability to map multiple at-tributes to treemap atat-tributes, treemaps are expo-nentially better than bar charts at conveying the
full meaning of the underlying dataset.
Figure 7: Apple
3.4 Visual Analysis Tool
3.4.1 Overview
The visual analysis tool was designed using a treemap visualization as its main display with co-ordinated bar chart views for providing detailed in-formation on selected nodes (see Figure 8). There are two groupings used for the treemap visualiza-tion. The main grouping is based on the vendor (i.e Microsoft) and the subgrouping is based on the vendor’s product (i.e. Internet Explorer). The nodes in the treemap represent a one-to-one map-ping of vulnerability to vendor’s product. A semi-transparent tooltip dialog shows additional details for each node as the user probes the treemap. The top right bar chart provides the vulnerability count for the selected vendor’s top ten products. The bottom right bar chart provides the overall vulnerability count for the top ten vendors. The JFreeChart [1] API was used to implement the bar charts.
3.4.2 Future Work Real-Time Filtering:
Currently the data is only filtered by the vulnera-bility discovery year. A very useful enhancement is to allow the user to be able to filter the data, in real-time, on the various attributes of the dataset. The NVD XSD file (nvd.nist.gov/schema/nvd-cve-feed 2.0.xsd) can be viewed for the complete
Figure 8: NVD Visualization
list of attributes to filter on. Automated Analysis:
Future work in this area will include automati-cally infering trends and patterns about the data. Important things to infer would be:
• Vendors/Products that are the worst/best for providing a particular capability (i.e. Web Server)
• Products that are potentially targets of the next round of zero-day exploits
• The Vendors/Products most susceptiple to a certain type of exploit (buffer overflow) TreeMap Enhancements:
Additional enhancements to the treemap include; mapping of dataset attributes to the height of the treemap nodes, semantic zooming, the ability to drill up/down on a particular group or subgroup
of the treemap, and ordering of the treemap nodes based on certain characteristics of the node like size.
4
Conclusion
This project allowed me to experiment with visual-izing a large, multivariate dataset using treemaps. The preliminary results showed some of the ad-vantages of using treemaps over traditional visu-alizations. In particular, treemaps proved to be very effective at visualizing large quantities of data and providing a more accurate visual interpreta-tion of the underlying dataset. Future enhance-ments will provide a more robust exploration and visualization capability for the National Vulnera-bility Database.
References
[1] Jfreechart - http://www.jfree.org/jfreechart/. [2] National Vulnerability Database NVD
-http://nvd.nist.gov/home.cfm.
[3] Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg. Ordered and quantum treemaps: Making effective use of 2d space to display hierarchies. ACM Trans. Graph., 21(4):833–854, 2002.
[4] John Hagedorn Styvens Belloge Terence Griffin Sandy Ressler Judith E. Terrill, Kevin Rawlings. Visualization and analysis of the national vulnerability database -http://www.nist.gov/itl/math/hpcvg/nvdvis.cfm. [5] Sun Kun, S. Jajodia, J. Li, Cheng Yi, Tang
Wei, and A. Singhal. Automatic security anal-ysis using security metrics. In MILITARY COMMUNICATIONS CONFERENCE, 2011 -MILCOM 2011, pages 1207–1212.
[6] O. Sheyner and J. Wing. Tools for generating and analyzing attack graphs. In Formal meth-ods for components and objects, pages 344–371. Springer.
[7] Ben Shneiderman. Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans. Graph., 11(1):92–99, 1992.
[8] Ben Shneiderman. Treemaps for space-constrained visualization of hierarchies, 2009.