When it comes to two-dimensional plotting, line and point plots are probably the most important ones in finance; this is because many data sets embody time series data, which generally is visualized by such plots. Chapter 6 addresses financial times series data in detail. However, for the moment we want to stick with the two-dimensional data set and illustrate some alternative, and for financial applications useful, visualization approaches. The first is the scatter plot, where the values of one data set serve as the x values for the
other data set. Figure 5-13 shows such a plot. Such a plot type is used, for example, when you want to plot the returns of one financial time series against those of another one. For this example we will use a new two-dimensional data set with some more data:
In [16]: y = np.random.standard_normal((1000, 2)) In [17]: plt.figure(figsize=(7, 5)) plt.plot(y[:, 0], y[:, 1], ‘ro’) plt.grid(True) plt.xlabel(‘1st’) plt.ylabel(‘2nd’)
plt.title(‘Scatter Plot’)
matplotlib also provides a specific function to generate scatter plots. It basically works in
the same way, but provides some additional features. Figure 5-14 shows the corresponding scatter plot to Figure 5-13, this time generated using the scatter function:
In [18]: plt.figure(figsize=(7, 5))
plt.scatter(y[:, 0], y[:, 1], marker=‘o’) plt.grid(True)
plt.xlabel(‘1st’) plt.ylabel(‘2nd’)
plt.title(‘Scatter Plot’)
Figure 5-14. Scatter plot via scatter function
The scatter plotting function, for example, allows the addition of a third dimension,
which can be visualized through different colors and be described by the use of a color bar. To this end, we generate a third data set with random data, this time with integers between 0 and 10:
In [19]: c = np.random.randint(0, 10, len(y))
Figure 5-15 shows a scatter plot where there is a third dimension illustrated by different colors of the single dots and with a color bar as a legend for the colors:
In [20]: plt.figure(figsize=(7, 5))
plt.scatter(y[:, 0], y[:, 1], c=c, marker=‘o’) plt.colorbar()
plt.grid(True) plt.xlabel(‘1st’) plt.ylabel(‘2nd’)
Figure 5-15. Scatter plot with third dimension
Another type of plot, the histogram, is also often used in the context of financial returns. Figure 5-16 puts the frequency values of the two data sets next to each other in the same plot:
In [21]: plt.figure(figsize=(7, 4))
plt.hist(y, label=[‘1st’, ‘2nd’], bins=25) plt.grid(True)
plt.legend(loc=0) plt.xlabel(‘value’) plt.ylabel(‘frequency’) plt.title(‘Histogram’)
Figure 5-16. Histogram for two data sets
Since the histogram is such an important plot type for financial applications, let us take a closer look at the use of plt.hist. The following example illustrates the parameters that
are supported:
plt.hist(x, bins=10, range=None, normed=False, weights=None, cumulative=False, bottom=None, histtype=‘bar’, align=‘mid’, orientation=‘vertical’, rwidth=None, log=False, color=None, label=None, stacked=False, hold=None, **kwargs)
Table 5-5 provides a description of the main parameters of the plt.hist function.
Parameter Description
x list object(s), ndarray object
bins Number of bins
range Lower and upper range of bins
normed Norming such that integral value is 1
weights Weights for every value in x
cumulative Every bin contains the counts of the lower bins
histtype Options (strings): bar, barstacked, step, stepfilled
align Options (strings): left, mid, right
orientationOptions (strings): horizontal, vertical
rwidth Relative width of the bars
log Log scale
color Color per data set (array-like)
label String or sequence of strings for labels
stacked Stacks multiple data sets
Figure 5-17 shows a similar plot; this time, the data of the two data sets is stacked in the histogram:
In [22]: plt.figure(figsize=(7, 4))
plt.hist(y, label=[‘1st’, ‘2nd’], color=[‘b’, ‘g’], stacked=True, bins=20)
plt.grid(True) plt.legend(loc=0) plt.xlabel(‘value’) plt.ylabel(‘frequency’) plt.title(‘Histogram’)
Figure 5-17. Stacked histogram for two data sets
Another useful plot type is the boxplot. Similar to the histogram, the boxplot allows both a concise overview of the characteristics of a data set and easy comparison of multiple data sets. Figure 5-18 shows such a plot for our data set:
In [23]: fig, ax = plt.subplots(figsize=(7, 4)) plt.boxplot(y)
plt.grid(True)
plt.setp(ax, xticklabels=[‘1st’, ‘2nd’]) plt.xlabel(‘data set’)
plt.ylabel(‘value’) plt.title(‘Boxplot’)
This last example uses the function plt.setp, which sets properties for a (set of) plotting
instance(s). For example, considering a line plot generated by: line = plt.plot(data, ‘r’)
the following code:
plt.setp(line, linestyle=‘—’)
changes the style of the line to “dashed.” This way, you can easily change parameters after the plotting instance (“artist object”) has been generated.
Figure 5-18. Boxplot for two data sets
As a final illustration in this section, we consider a mathematically inspired plot that can also be found as an example in the gallery for matplotlib. It plots a function and
illustrates graphically the area below the function between a lower and an upper limit — in other words, the integral value of the function between the lower and upper limits. Figure 5-19 shows the resulting plot and illustrates that matplotlib seamlessly handles LaTeX type setting for the inclusion of mathematical formulae into plots:
In [24]: from matplotlib.patches import Polygon
def func(x):
return 0.5 * np.exp(x) + 1 a, b = 0.5, 1.5 # integral limits
x = np.linspace(0, 2) y = func(x)
fig, ax = plt.subplots(figsize=(7, 5)) plt.plot(x, y, ‘b’, linewidth=2) plt.ylim(ymin=0)
# Illustrate the integral value, i.e. the area under the function
# between the lower and upper limits
Ix = np.linspace(a, b) Iy = func(Ix)
verts = [(a, 0)] + list(zip(Ix, Iy)) + [(b, 0)]
poly = Polygon(verts, facecolor=‘0.7’, edgecolor=‘0.5’) ax.add_patch(poly)
plt.text(0.5 * (a + b), 1, r”$\int_a^b f(x)\mathrm{d}x$”, horizontalalignment=‘center’, fontsize=20) plt.figtext(0.9, 0.075, ‘$x$’)
plt.figtext(0.075, 0.9, ‘$f(x)$’) ax.set_xticks((a, b))
ax.set_xticklabels((‘$a$’, ‘$b$’)) ax.set_yticks([func(a), func(b)])
ax.set_yticklabels((‘$f(a)$’, ‘$f(b)$’)) plt.grid(True)
Figure 5-19. Exponential function, integral area, and LaTeX labels
Let us go through the generation of this plot step by step. The first step is the definition of the function to be integrated:
def func(x):
return 0.5 * np.exp(x) + 1
The second step is the definition of the integral limits and the generation of needed numerical values:
x = np.linspace(0, 2)
y = func(x)
Third, we plot the function itself:
fig, ax = plt.subplots(figsize=(7, 5))
plt.plot(x, y, ‘b’, linewidth=2)
plt.ylim(ymin=0)
Fourth and central, we generate the shaded area (“patch”) by the use of the Polygon
function illustrating the integral area: Ix = np.linspace(a, b)
Iy = func(Ix)
verts = [(a, 0)] + list(zip(Ix, Iy)) + [(b, 0)]
poly = Polygon(verts, facecolor=‘0.7’, edgecolor=‘0.5’)
ax.add_patch(poly)
The fifth step is the addition of the mathematical formula and some axis labels to the plot, using the plt.text and plt.figtext functions. LaTeX code is passed between two dollar
signs ($ … $). The first two parameters of both functions are coordinate values to place the
respective text:
plt.text(0.5 * (a + b), 1, r”$\int_a^b f(x)\mathrm{d}x$”, horizontalalignment=‘center’, fontsize=20)
plt.figtext(0.9, 0.075, ‘$x$’)
plt.figtext(0.075, 0.9, ‘$f(x)$’)
Finally, we set the individual x and y tick labels at their respective positions. Note that although we place variable names rendered in LaTeX, the correct numerical values are used
for the placing. We also add a grid, which in this particular case is only drawn for the selected ticks highlighted before:
ax.set_xticks((a, b))
ax.set_xticklabels((‘$a$’, ‘$b$’))
ax.set_yticks([func(a), func(b)])
ax.set_yticklabels((‘$f(a)$’, ‘$f(b)$’))