ggplot can generate”
01
Install ggplot2First run R. The ggplot2 R package isn’t installed by default, so check if you already have it installed by running:
> require(ggplot2)
Loading required package: ggplot2 If it’s not installed, download and select: > install.packages(“ggplot2”)
If you execute the library() function without arguments, you’ll get a list of installed packages. To get a detailed output, run the installed. packages() command without any arguments.
02
About R and data visualisationR is a command line application, which is fi ne for plain text output but not for graphical output. RStudio is a more preferable graphical wrapper for R.
When visualising data, remember that not every plot suits every data set. This knowledge comes from experience, and experience comes from experimentation, so don’t forget to experiment with your data and the various types of plots that R and ggplot2 can generate.
03
Use quickplot()The ggplot2 package offers two main functions: quickplot() and ggplot(). The quickplot() function, qplot(), is similar to the plot() R function and is good for simple plots. The quickplot() function hides what happens, whereas ggplot() is harder to use but fl exible.
The following commands draw a plot using columns V2 and V3 from the data variable: > str(data) ‘data.frame’: 16180 obs. of 3 variables: $ V1: num 0 0 0 0 0 0 0.98 1 1.06 1 ... $ V2: num 0.01 0.01 0.01 0.01 0.03 0.01 0.58 0.85 1.01 1.01 ... $ V3: num 0.05 0.05 0.05 0.05 0.05 0.05 0.27 0.48 0.65 0.75 ... > quickplot(data$V2, data$V2)
The following version adds colour to the output: > quickplot(data$V2, data$V3, color=data$V1)
04
Work with ggpairs()The ggpairs() command fi nds relations between variables and then calculates the coeffi cient of correlation value. The coeffi cient of correlation is linked to the statistical
05
Generate bar plotsNow use a sample dataset for plotting. The LUD dataset, available from FileSilo, is stored in a plain text fi le, named Lud.data. The titles of the columns are named to refer to their values. You can load the dataset into R using the following command: > LUD <- read.table(“lud.data”, header=TRUE)
The bar plot is very simple. The following command generates it:
> ggplot(LUD, aes(x=RAM, y=SSD)) + geom_ bar(stat=“identity”)
If you type ggplot(LUD, aes(x=RAM, y=SSD)) without specifying a plot, the command will show the ‘Error: No layers in plot’ message.
To change the colour of the bars, try the following variation:
> ggplot(LUD, aes(x=RAM, y=SSD,
fill=Uptime)) + geom_bar(stat=“identity”) correlation, a technique that shows whether or not two variables are related. As the coeffi cient of correlation approaches zero there is less of a relationship (no correlation), whereas the closer the coeffi cient is to -1 or +1, the stronger the correlation (positive or negative) is. A positive correlation shows that if one variable gets bigger then the other does as well. Conversely, a negative correlation denotes that if one variable gets bigger then the other becomes smaller.
The presented plot was produced using the following commands: > data <-read.table(“uptime.data”, header=TRUE) > require(ggplot2) > require(GGally) > require(CCA) > ggpairs(data)
03
Use quickplot()09
Add smooth layersAnother type of layer is the smooth layer. It doesn’t display raw data, but rather a statistical transformation of the data.
The (method=“lm”) parameter generates a linear regression line instead of a LOESS (local polynomial regression fi tting) curve, which is the default for samples with less than 1000 observations. For bigger samples, the default method is called GAM (generalised additive model). The produced plot was generated with the following commands:
> q <- ggplot(LUD, aes(x=RAM, y=SSD)) > q + geom_point() + geom_smooth() > q + geom_point() + geom_ smooth(method=‘lm’)
08
Create histogramsGenerate histograms using the geom_ histogram() function, similar to the geom_bar() function, and change the number of bars using the binwidth argument. Plot a simple histogram using the following command:
> ggplot(LUD, aes(Years)) + geom_ histogram(binwidth=1, color=‘gray’) Use geom_density() function for a density plot: > ggplot(LUD, aes(Years)) + geom_ density(binwidth=1)
The following command draws a histogram and a density plot on the same plot:
> ggplot(LUD) + geom_
histogram(aes(Years, ..density..), binwidth=2, color=‘white’) + geom_ density(aes(Years, ..density..), binwidth=2, color=‘red’)
If you put the geom_density() command fi rst, the histogram will be on top of the density plot and therefore the density plot will not be completely visible.
07
More about titles and labelsYou can add or change the appearance, size, font and colour of all the titles and labels. The following command makes the title blue and its size larger using the theme() function: > ggplot(LUD, aes(x=RAM, y=SSD,
fill=Uptime)) + geom_bar(stat=“identity”) + labs(title=“This is a Title”) + xlab(“X Label”) + ylab(“Y Label”) + theme(plot. title = element_text(size = rel(2), colour = “blue”))
To change the attributes of the X and Y axes, use the axis.line function:
> ggplot(LUD, aes(x=RAM, y=SSD,
fill=Uptime)) + geom_bar(stat=“identity”) + labs(title=“This is a Title”) + xlab(“X Label”) + ylab(“Y Label”) + theme(plot. title = element_text(size = rel(2), colour = “blue”), axis.line = element_ line(size = 3, colour = “red”, linetype = “dotted”))
06
Add titles and labelsSooner or later, it’s likely that you will want to add a title and labels to the output. Adding a main title is simple – you just need to make use of the labs() function in order to do so. The previously plotted bar plot can thus be modifi ed with inclusion of the the following command at the end:
> ggplot(LUD, aes(x=RAM, y=SSD,
fill=Uptime)) + geom_bar(stat=“identity”) + labs(title=“This is a Title”)
Adding X and Y labels can be done by entering the following:
> ggplot(LUD, aes(x=RAM, y=SSD,
fill=Uptime)) + geom_bar(stat=“identity”) + labs(title=“This is a Title”) + xlab(“X Label”) + ylab(“Y Label”)
11
Visualising Chrome’s history fi le11
Visualising Chrome’s history fi leThe history fi le of Chrome (simply called History) stores its history of visited websites in SQLite3 database format. Therefore, you can use the RSQLite R package to read it. The ‘chrome.R’ script generates an impressive
10
Work with shapes and facetsThe following plot draws points using different shapes depending on the value of the non-continuous Linux variable:
> ggplot(LUD, aes(x=RAM, y=Uptime)) + geom_point(aes(shape = Linux))
A facet allows you to split up your data by one or more variables and then plot the subsets of data together. Using facets is also a great way of generating conditional plots. Try the following point plot, which will generate two plots depending on the two different values of the Linux variable:
> ggplot(LUD, aes(x=RAM, y=Uptime)) + geom_point() + facet_grid(Linux ~ .) The facet_grid() function works fi ne when using continuous variables.
12
Use box plotsA box plot can give you information regarding the shape, the variability and the median of a data set, quickly and effi ciently. The presented box plot was generated using the following R command:
> ggplot(LUD, aes(Linux, Uptime)) + geom_point() + geom_boxplot(colour = “red”) + labs(title=“A Box Plot”)
Based on the two discrete values of the Linux variable, the output is divided into two subsets. For each subset, a separate box plot is produced individually.
14
Final thoughtsBefore we wrap up this tutorial, here are some things to keep in mind. The more you use ggplot2 and the better you know your data, the better the output you’ll achieve. Just make sure that you don’t forget that ggplot2 works using layers! Also, to take full advantage of plotting you’ll have to plot the right metrics, and fi nding the right metrics is not always as simple as it might seem.
13
Create R ScriptsIt is very useful to learn how to create R scripts in order to use ggplot2 inside bigger scripts that can run as cron jobs. A sample script fi le, named ‘ggplot.R’ shows you how:
$ chmod 755 ggplot.R $ ./ggplot.R
$ ll total 160
-rwxr-xr-x@ 1 mtsouk staff 234 Nov 14 22:41 ggplot.R
-rw-r--r-- 1 mtsouk staff 73820 Nov 14 22:43 ggplot.png
$ file ggplot.png
ggplot.png: PNG image data, 1280 x 800, 8-bit/color RGBA, non-interlaced
output using RSQLite and ggplot2 with many layers. It can be executed as follows:
$ ./chrome.R
Loading required package: methods Loading required package: DBI $ ls -l Rplots.pdf
-rw-r--r--@ 1 mtsouk staff 5089 Nov 27 09:42 Rplots.pdf
The produced result is automatically stored in a fi le called ‘Rplots.pdf’ fi le.