matrix_powerraises a square array or matrix to an integer power, andmatrix_power(x,n)is identical to
x**n.
svd
svdcomputes the singular value decomposition of a matrix X , defined as
X = U ΣV
whereΣ is diagonal, and U and V are unitary arrays (orthonormal if real valued). SVDs are closely related to eigenvalue decompositions when X is a real, positive definite matrix. The returned value is a tuple containing(U,s,V)whereΣ = diag (s ).
cond
condcomputes the condition number of a matrix, which measures how close to singular a matrix is. Lower numbers indicate that the input is better conditioned (further from singular).
>>> x = matrix([[1.0,0.5],[.5,1]]) >>> cond(x) 3 >>> x = matrix([[1.0,2.0],[1.0,2.0]]) # Singular >>> cond(x) inf slogdet
slogdetcomputes the sign and log of the absolute value of the determinant. slogdetis useful for com- puting determinants which may be very large or small to avoid numerical problems.
solve
solvesolves the system Xβ = y when X is square and invertible so that the solution is exact.
>>> X = array([[1.0,2.0,3.0],[3.0,3.0,4.0],[1.0,1.0,4.0]]) >>> y = array([[1.0],[2.0],[3.0]]) >>> solve(X,y) array([[ 0.625], [-1.125], [ 0.875]])
lstsq
lstsqsolves the system Xβ = y when X is n by k, n > k by finding the least squares solution. lstsq
returns a 4-element tuple where the first element isβ and the second element is the sum of squared resid- uals. The final two outputs are diagnostic – the third is the rank of X and the fourth contains the singular values of X . >>> X = randn(100,2) >>> y = randn(100) >>> lstsq(X,y) (array([ 0.03414346, 0.02881763]), array([ 3.59331858]), 2, array([ 3.045516 , 1.99327863]))array([[ 0.625], [-1.125], [ 0.875]]) cholesky
choleskycomputes the Cholesky factor of a positive definite matrix or array. The Cholesky factor is a lower triangular matrix and is defined as C in
C C0 = Σ
whereΣ is a positive definite matrix.
>>> x = matrix([[1,.5],[.5,1]]) >>> C = cholesky(x) >>> C*C.T - x matrix([[ 1. , 0.5], [ 0.5, 1. ]]) det
detcomputes the determinant of a square matrix or array.
>>> x = matrix([[1,.5],[.5,1]]) >>> det(x)
0.75
eig
eigcomputes the eigenvalues and eigenvectors of a square matrix. When used with one output, the eigen- values and eigenvectors are returned as a tuple.
>>> x = matrix([[1,.5],[.5,1]]) >>> val,vec = eig(x)
>>> vec*diag(val)*vec.T matrix([[ 1. , 0.5],
[ 0.5, 1. ]])
eigh
eighcomputes the eigenvalues and eigenvectors of a symmetric array. When used with one output, the eigenvalues and eigenvectors are returned as a tuple.eighis faster thaneigfor symmetrix inputs since it exploits the symmetry of the input.eigvalshcan be used if only eigenvalues are needed from a symmetric array.
inv
invcomputes the inverse of an array. inv(R)can alternatively be computed usingx**(-1)whenxis a
matrix. >>> x = array([[1,.5],[.5,1]]) >>> xInv = inv(x) >>> dot(x,xInv) array([[ 1., 0.], [ 0., 1.]]) >>> x = asmatrix(x) >>> x**(-1)*x matrix([[ 1., 0.], [ 0., 1.]]) kron
kroncomputes the Kronecker product of two arrays,
z = x ⊗y
and is written asz = kron(x,y).
trace
tracecomputes the trace of a square array (sum of diagonal elements).trace(x)equalssum(diag(x)).
matrix_rank
matrix_rankcomputes the rank of an array using a SVD.
>>> x = array([[1,.5],[1,.5]]) >>> x array([[ 1. , 0.5], [ 1. , 0.5]]) >>> matrix_rank(x) 1
8.4
Exercises
1. Letx = arange(12.0). Use bothshapeandreshapeto produce 1×12, 2×6, 3×4,4×3, 6×2 and 2×2×3 versions or the array. Finally, returnxto its original size.
2. Letx = reshape(arange(12.0),(4,3)). Useravel,flattenandflatto extract elements 1, 3, . . ., 11 from the array (using a 0 index).
3. Let x be 2 by 2 array, y be a 1 by 1 array, and z be a 3 by 2 array. Construct
w = x y y y y y y z z 0 y y y
usinghstack,vstack, andtile.
4. Letx = reshape(arange(12.0),(2,2,3)). What doessqueezedo to x ? 5. How can a diagonal matrix containing the diagonal elements of
y =
"
2 .5 .5 4
#
be constructed usingdiag?
6. Using the y array from the previous problem, verify thatcholeskywork by computing the Cholesky factor, and then multiplying to get y again.
7. Using the y array from the previous problem, verify that the sum of the eigenvalues is the same as the trace, and the product of the eigenvalues is the determinant.
8. Using the y array from the previous problem, verify that the inverse of y is equal to V D−1V0where
V is the array containing the eigenvectors, and D is a diagonal array containing the eigenvalues.
9. Simulate some data wherex = randn(100,2),e = randn(100,1),B = array([[1],[0.5]])and y =
xβ + ε. Uselstsqto estimateβ from x and y . 10. Suppose y = 5 −1.5 −3.5 −1.5 2 −0.5 −3.5 −0.5 4
usematrix_rankto determine the rank of this array. Verify the results by inspecting the eigenvalues usingeigand check that the determinant is 0 usingdet.
11. Letx = randn(100,2). Usekronto compute
Chapter 9
Importing and Exporting Data
9.1
Importing Data using pandas
Pandas is an increasingly important component of the Python scientific stack, and a complete discussion of its main features is included in Chapter17. All of the data readers in pandas load data into a pandas DataFrame (see Section17.1.2), and so these examples all make use of thevaluesproperty to extract a NumPy array. In practice, the DataFrame is much more useful since it includes useful information such as column names read from the data source. In addition to the three formats presented here, pandas can also read json, SQL, html tables or from the clipboard, which is particularly useful for interactive work since virtually any source that can be copied to the clipboard can be imported.
9.1.1 CSV and other formatted text files
Comma-separated value (CSV) files can be read usingread_csv. When the CSV file contains mixed data, the default behavior will read the file into an array with anobjectdata type, and so further processing is usually required to extract the individual series.
>>> from pandas import read_csv
>>> csv_data = read_csv(’FTSE_1984_2012.csv’) >>> csv_data = csv_data.values >>> csv_data[:4] array([[’2012-02-15’, 5899.9, 5923.8, 5880.6, 5892.2, 801550000L, 5892.2], [’2012-02-14’, 5905.7, 5920.6, 5877.2, 5899.9, 832567200L, 5899.9], [’2012-02-13’, 5852.4, 5920.1, 5852.4, 5905.7, 643543000L, 5905.7], [’2012-02-10’, 5895.5, 5895.5, 5839.9, 5852.4, 948790200L, 5852.4]], dtype=object) >>> open = csv_data[:,1]
When the entire file is numeric, the data will be stored as a homogeneous array using one of the numeric data types, typically float64. In this example, the first column contains Excel dates as numbers, which are the number of days past January 1, 1900.
>>> csv_data = read_csv(’FTSE_1984_2012_numeric.csv’) >>> csv_data = csv_data.values
>>> csv_data[:4,:2]
array([[ 40954. , 5899.9], [ 40953. , 5905.7],
[ 40952. , 5852.4], [ 40949. , 5895.5]])
9.1.2 Excel files
Excel files, both 97/2003 (xls) and 2007/10/13 (xlsx), can be imported usingread_excel. Two inputs are required to useread_excel, the filename and the sheet name containing the data. In this example, pandas makes use of the information in the Excel workbook that the first column contains dates and converts these todatetimes. Like the mixed CSV data, the array returned hasobjectdata type.
>>> from pandas import read_excel
>>> excel_data = read_excel(’FTSE_1984_2012.xls’,’FTSE_1984_2012’) >>> excel_data = excel_data.values >>> excel_data[:4,:2] array([[datetime.datetime(2012, 2, 15, 0, 0), 5899.9], [datetime.datetime(2012, 2, 14, 0, 0), 5905.7], [datetime.datetime(2012, 2, 13, 0, 0), 5852.4], [datetime.datetime(2012, 2, 10, 0, 0), 5895.5]], dtype=object) >>> open = excel_data[:,1] 9.1.3 STATA files
Pandas also contains a method to read STATA files.
>>> from pandas import read_stata
>>> stata_data = read_stata(’FTSE_1984_2012.dta’) >>> stata_data = stata_data.values
>>> stata_data[:4,:2]
array([[ 0.00000000e+00, 4.09540000e+04], [ 1.00000000e+00, 4.09530000e+04], [ 2.00000000e+00, 4.09520000e+04], [ 3.00000000e+00, 4.09490000e+04]])