3. Fundamentals of NumPy for Quants
3.1. In the Matrix of NumPy
3.2.4. Independent Copy of NumPy Array
For those of you who are familiar with programming, it is nearly intuitive that when you create a matrix-variable, say a, and you wish to create its (modified) copy, say b = a + 1, b in fact will be independent of a. This is not a case within older versions of NumPy due to the
"physical" pointing at the same object in the memory. Analyse the following case study:
>>> a = np.array([1,2,3,4,5])
>>> b = a + 1
>>> a; b
array([1, 2, 3, 4, 5]) array([2, 3, 4, 5, 6])
but now, if:
>>> b[0] = 7
>>> a; b
array([7, 2, 3, 4, 5]) array([7, 3, 4, 5, 6])
we affect both 0-th elements in a and b arrays. In order to “break the link” between them, you should create a new copy of a matrix using a
.copy function:
.nonzero()
>>> a = np.array([1,2,3,4,5])
>>> b = a.copy()
>>> b = a + 1
>>> b[0] = 7
>>> a; b
array([1, 2, 3, 4, 5]) array([7, 3, 4, 5, 6])
Fortunately, in Python 3.5 with NumPy 1.10.1+ that problem ceases to exist:
>>> import numpy as np
>>> np.__version__
'1.10.1'
>>> a = np.array([1,2,3,4,5])
>>> b = a + 1
>>> a; b
array([1, 2, 3, 4, 5]) array([2, 3, 4, 5, 6])
>>> b[0] = 7
>>> a; b
array([1, 2, 3, 4, 5]) array([7, 3, 4, 5, 6])
however, keep that pitfall in mind and check for potential errors within your future projects. Just in case. ☺
3.2.5. 1D Array Flattening and Clipping
For any already existing row vector you can substitute its elements with a desired value. Have a look:
>>> a = np.array([1,2,3,4,5])
>>> a.fill(0);
>>> a
array([0, 0, 0, 0, 0])
or
>>> a = np.array([1,2,3,4,5])
>>> a.flat = -1
>>> a
array([-1, -1, -1, -1, -1])
It is so-called flattening. On the other side, clipping in its simplistic form looks like:
>>> x = np.array([1., -2., 3., 4., -5.])
>>> i = np.where(x < 0)
>>> x.flat[i] = 0
>>> x
array([ 1., 0., 3., 4., 0.])
Let’s consider an example. Working daily with financial time-series, sometimes we wish to separate, e.g. a daily return-series into two sub-series storing negative and positive returns, respectively. To do that, in NumPy we can perform the following logic by employing a .clip
function.
.copy()
.fill
.flat
np.where Returns an array with the indexes corresponding to a specified condition (see Section 3.3.3 and 3.8)
Say, the vector r holds daily returns of a stock. Then:
>>> r = np.array([0.09,-0.03,-0.04,0.07,0.00,-0.02])
>>> rneg = r.clip(-1, 0)
>>> rneg
array([ 0. , -0.03, -0.04, 0. , 0. , -0.02])
>>> rneg = rneg[rneg.nonzero()]
>>> rneg
array([-0.03, -0.04, -0.02])
Here, we end up with rneg array storing all negative daily returns.
The .clip(-1, 0) function should be be understood as: clip all values less than -1 to -1 and greater than 0 to 0. It makes sense in our case as we set a lower boundary of -1 (-100.00% daily loss) on one side and 0.00% on the other side. Since zero is usually considered as a
“positive” return therefore the application of the .nonzero function removes zeros from the rneg array.
The situation becomes a bit steeper in case of positive returns. We cannot simply type rneg=r.clip(0, 1). Why? It will replace all negative returns with zeros. Also, if r contains daily returns equal 0.00, extra zeros from clipping would introduce an undesired input. We solve this problem by replacing “true” 0.00% stock returns with an abstract number of, say, 9 i.e. 900% daily gain, and proceed further as follows:
>>> r2 = r.copy(); r2
array([ 0.09, -0.03, -0.04, 0.07, 0. , -0.02])
>>> i = np.where(r2=0.); r2[i] = 9 # alternatively r2[r2==0.] = 9
>>> rpos = r2.clip(0, 9)
>>> rpos
array([ 0.09, 0. , 0. , 0.07, 9. , 0. ])
>>> rpos = rpos[rpos.nonzero()]
>>> rpos
array([ 0.09, 0.07, 9. ])
>>> rpos[rpos == 9.] = 0.
>>> rpos
array([ 0.09, 0.07, 0. ])
If you think for a while, you will discover that in fact all the effort can be shortened down to two essential lines of code providing us with the same results:
>>> r = np.array([0.09,-0.03,-0.04,0.07,0.00,-0.02])
>>> rneg = r[r < 0] # masking
>>> rpos = r[r >= 0] # masking >>> rneg; rpos
array([-0.03, -0.04, -0.02]) array([ 0.09, 0.07, 0. ])
however by doing so you’d miss a lesson on the .clip function ☺. More on masking for arrays in Section 3.8.
As you can see, Python offers more than one method to solve the same problem. Gaining a flexibility in knowing majority of them will make you a good programmer over time.
By separating two return-series we gain a possibility of conducting an additional research on, for instance, the distribution of extreme losses
.clip
or extreme gains for a specific stock in a given time period the data come from. In the abovementioned example our return-series is too short for a complete demonstration, however in general, if we want to extract from each series two most extreme losses and two highest gains, then:
If repeated for, say, 500 stocks (daily return time-series) traded within S&P 500 index, the same method would lead us to an insight on an empirical distribution of extreme values both negative and positive that could be fitted with a Gumbel distribution and tested against GEV theory.
3.2.6. 1D Special Arrays
NumPy delivers an easy solution in a form of special arrays filled with: zeros, ones, or being “empty”. Suddenly, you stop worrying about creating an array of specified dimensions and flattening it.
Therefore, in our arsenal we have:
>>> x = np.zeros(5); x
The alternative way to derive the same results would be with an aid of the .repeat function acting on a 1-element array:
>>> x = np.array([0])
>>> x2 = np.full((1, 5), 1, dtype=np.int64) .sort()
By default this function sorts all elements of 1D array in an ascending order and alters the matrix itself
>>> x2
array([[1, 1, 1, 1, 1]])
where the shapes of the arrays, x1 and x2, have been provided within the inner round brackets: 1 row and 5 columns.
An additional special array containing numbers from 0 to N-1 we create using the arange function. Analyse the following cases:
>>> a = np.arange(11)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> a = np.arange(10) + 1
>>> a
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> a = np.arange(0, 11, 2)
>>> a
array([ 0, 2, 4, 6, 8, 10])
>>> a = np.arange(0, 10, 2) + 1
>>> a
array([1, 3, 5, 7, 9])
and also
>>> b = np.arange(5, dtype=np.float32)
>>> b
array([ 0., 1., 2., 3., 4.], dtype=float32)
Array—List—Array
The conversion of 1D array into Python’s list one can achieve by the application of the .tolist() function:
>>> r = np.array([0.09,-0.03,-0.04,0.07,0.00,-0.02])
>>> l = r.tolist()
>>> l
[0.09, -0.03, -0.04, 0.07, 0.0, -0.02]
On the other hand, to go from a flat Python list to NumPy 1D array employ the asarray function:
>>> type(l)
<class 'list'>
>>> a = np.asarray(l)
>>> a
array([ 0.09, -0.03, -0.04, 0.07, 0. , -0.02])
>>> a.dtype dtype('float64')