Other Plot Styles - Python for Finance Analyze Big Financial Data {TechieAce} pdf

When it comes to two-dimensional plotting, line and point plots are probably the most important ones in finance; this is because many data sets embody time series data, which generally is visualized by such plots. Chapter 6 addresses financial times series data in detail. However, for the moment we want to stick with the two-dimensional data set and illustrate some alternative, and for financial applications useful, visualization approaches. The first is the scatter plot, where the values of one data set serve as the x values for the

other data set. Figure 5-13 shows such a plot. Such a plot type is used, for example, when you want to plot the returns of one financial time series against those of another one. For this example we will use a new two-dimensional data set with some more data:

In [16]: y = np.random.standard_normal((1000, 2)) In [17]: plt.figure(figsize=(7, 5)) plt.plot(y[:, 0], y[:, 1], ‘ro’) plt.grid(True) plt.xlabel(‘1st’) plt.ylabel(‘2nd’)

plt.title(‘Scatter Plot’)

matplotlib also provides a specific function to generate scatter plots. It basically works in

the same way, but provides some additional features. Figure 5-14 shows the corresponding scatter plot to Figure 5-13, this time generated using the scatter function:

In [18]: plt.figure(figsize=(7, 5))

plt.scatter(y[:, 0], y[:, 1], marker=‘o’) plt.grid(True)

plt.xlabel(‘1st’) plt.ylabel(‘2nd’)

plt.title(‘Scatter Plot’)

Figure 5-14. Scatter plot via scatter function

The scatter plotting function, for example, allows the addition of a third dimension,

which can be visualized through different colors and be described by the use of a color bar. To this end, we generate a third data set with random data, this time with integers between 0 and 10:

In [19]: c = np.random.randint(0, 10, len(y))

Figure 5-15 shows a scatter plot where there is a third dimension illustrated by different colors of the single dots and with a color bar as a legend for the colors:

In [20]: plt.figure(figsize=(7, 5))

plt.scatter(y[:, 0], y[:, 1], c=c, marker=‘o’) plt.colorbar()

plt.grid(True) plt.xlabel(‘1st’) plt.ylabel(‘2nd’)

Figure 5-15. Scatter plot with third dimension

Another type of plot, the histogram, is also often used in the context of financial returns. Figure 5-16 puts the frequency values of the two data sets next to each other in the same plot:

In [21]: plt.figure(figsize=(7, 4))

plt.hist(y, label=[‘1st’, ‘2nd’], bins=25) plt.grid(True)

plt.legend(loc=0) plt.xlabel(‘value’) plt.ylabel(‘frequency’) plt.title(‘Histogram’)

Figure 5-16. Histogram for two data sets

Since the histogram is such an important plot type for financial applications, let us take a closer look at the use of plt.hist. The following example illustrates the parameters that

are supported:

plt.hist(x, bins=10, range=None, normed=False, weights=None, cumulative=False, bottom=None, histtype=‘bar’, align=‘mid’, orientation=‘vertical’, rwidth=None, log=False, color=None, label=None, stacked=False, hold=None, **kwargs)

Table 5-5 provides a description of the main parameters of the plt.hist function.

Parameter Description

x list object(s), ndarray object

bins Number of bins

range Lower and upper range of bins

normed Norming such that integral value is 1

weights Weights for every value in x

cumulative Every bin contains the counts of the lower bins

histtype Options (strings): bar, barstacked, step, stepfilled

align Options (strings): left, mid, right

orientationOptions (strings): horizontal, vertical

rwidth Relative width of the bars

log Log scale

color Color per data set (array-like)

label String or sequence of strings for labels

stacked Stacks multiple data sets

Figure 5-17 shows a similar plot; this time, the data of the two data sets is stacked in the histogram:

In [22]: plt.figure(figsize=(7, 4))

plt.hist(y, label=[‘1st’, ‘2nd’], color=[‘b’, ‘g’], stacked=True, bins=20)

plt.grid(True) plt.legend(loc=0) plt.xlabel(‘value’) plt.ylabel(‘frequency’) plt.title(‘Histogram’)

Figure 5-17. Stacked histogram for two data sets

Another useful plot type is the boxplot. Similar to the histogram, the boxplot allows both a concise overview of the characteristics of a data set and easy comparison of multiple data sets. Figure 5-18 shows such a plot for our data set:

In [23]: fig, ax = plt.subplots(figsize=(7, 4)) plt.boxplot(y)

plt.grid(True)

plt.setp(ax, xticklabels=[‘1st’, ‘2nd’]) plt.xlabel(‘data set’)

plt.ylabel(‘value’) plt.title(‘Boxplot’)

This last example uses the function plt.setp, which sets properties for a (set of) plotting

instance(s). For example, considering a line plot generated by: line = plt.plot(data, ‘r’)

the following code:

plt.setp(line, linestyle=‘—’)

changes the style of the line to “dashed.” This way, you can easily change parameters after the plotting instance (“artist object”) has been generated.

Figure 5-18. Boxplot for two data sets

As a final illustration in this section, we consider a mathematically inspired plot that can also be found as an example in the gallery for matplotlib. It plots a function and

illustrates graphically the area below the function between a lower and an upper limit — in other words, the integral value of the function between the lower and upper limits. Figure 5-19 shows the resulting plot and illustrates that matplotlib seamlessly handles LaTeX type setting for the inclusion of mathematical formulae into plots:

In [24]: from matplotlib.patches import Polygon

def func(x):

return 0.5 * np.exp(x) + 1 a, b = 0.5, 1.5 # integral limits

x = np.linspace(0, 2) y = func(x)

fig, ax = plt.subplots(figsize=(7, 5)) plt.plot(x, y, ‘b’, linewidth=2) plt.ylim(ymin=0)

# Illustrate the integral value, i.e. the area under the function

# between the lower and upper limits

Ix = np.linspace(a, b) Iy = func(Ix)

verts = [(a, 0)] + list(zip(Ix, Iy)) + [(b, 0)]

poly = Polygon(verts, facecolor=‘0.7’, edgecolor=‘0.5’) ax.add_patch(poly)

plt.text(0.5 * (a + b), 1, r”$\int_a^b f(x)\mathrm{d}x$”, horizontalalignment=‘center’, fontsize=20) plt.figtext(0.9, 0.075, ‘$x$’)

plt.figtext(0.075, 0.9, ‘$f(x)$’) ax.set_xticks((a, b))

ax.set_xticklabels((‘$a$’, ‘$b$’)) ax.set_yticks([func(a), func(b)])

ax.set_yticklabels((‘$f(a)$’, ‘$f(b)$’)) plt.grid(True)

Figure 5-19. Exponential function, integral area, and LaTeX labels

Let us go through the generation of this plot step by step. The first step is the definition of the function to be integrated:

def func(x):

return 0.5 * np.exp(x) + 1

The second step is the definition of the integral limits and the generation of needed numerical values:

x = np.linspace(0, 2)

y = func(x)

Third, we plot the function itself:

fig, ax = plt.subplots(figsize=(7, 5))

plt.plot(x, y, ‘b’, linewidth=2)

plt.ylim(ymin=0)

Fourth and central, we generate the shaded area (“patch”) by the use of the Polygon

function illustrating the integral area: Ix = np.linspace(a, b)

Iy = func(Ix)

verts = [(a, 0)] + list(zip(Ix, Iy)) + [(b, 0)]

poly = Polygon(verts, facecolor=‘0.7’, edgecolor=‘0.5’)

ax.add_patch(poly)

The fifth step is the addition of the mathematical formula and some axis labels to the plot, using the plt.text and plt.figtext functions. LaTeX code is passed between two dollar

signs ($ … $). The first two parameters of both functions are coordinate values to place the

respective text:

plt.text(0.5 * (a + b), 1, r”$\int_a^b f(x)\mathrm{d}x$”, horizontalalignment=‘center’, fontsize=20)

plt.figtext(0.9, 0.075, ‘$x$’)

plt.figtext(0.075, 0.9, ‘$f(x)$’)

Finally, we set the individual x and y tick labels at their respective positions. Note that although we place variable names rendered in LaTeX, the correct numerical values are used

for the placing. We also add a grid, which in this particular case is only drawn for the selected ticks highlighted before:

ax.set_xticks((a, b))

ax.set_xticklabels((‘$a$’, ‘$b$’))

ax.set_yticks([func(a), func(b)])

ax.set_yticklabels((‘$f(a)$’, ‘$f(b)$’))

In document Python for Finance Analyze Big Financial Data {TechieAce} pdf (Page 118-125)