Flow matrix factor analysis: Identification of trade regions of interstate

Chapter 2. Spatial Pattern of the U.S. Interstate Commodity Flows

2. Data and Methods

2.2. Exploratory spatial data analysis of interstate commodity flows

2.2.2. Flow matrix factor analysis: Identification of trade regions of interstate

Factor analysis has been utilized as one method to abstract an underlying structure of flows from such a large and complex origin-to-destination interaction matrix by reducing sets of flows into basic flow components. This study employs a principal component analysis (PCA) procedure as an extraction method for flow matrix factor analysis.

2.2.2.1. Flow matrix factor analysis

The basic purpose of flow matrix factor analysis is the derivation of clusters of areas with a similar spatial structure of interstate trade flows in terms of the geographic origins and destinations of the flows. In a sense, trading areas (or trading zones) are identified and categorized based on relative similarity of their flows displaying a particular combination of source or destination locations of commodity flows among states.

In undertaking a PCA with commodity flow data, two matrices are examined closely: the

(1966), the R-mode and the Q-mode factor analyses need to be carried out for columns and rows of the original flow matrix separately to identify the factors (or components) explaining the spatial structure of commodity flows. It is because the origin-to-destination flow matrices are square, but usually not symmetric. Therefore, a separate analysis should be conducted for columns and rows and different factors are likely to be obtained. The first step of the factor analysis of commodity flow matrix is to compute the correlations between patterns of individual state’s inflows (R-mode) or outflows (Q-mode). For the R-mode PCA, a set of 48 by 48 matrices of correlation coefficients between 48 columns representing 48 origin states is produced, while the Q-mode PCA requires a set of 48 by 48 matrices of correlation coefficients between 48 rows indicating destination states. These correlation matrices are considered as connection matrices required for flow matrix factor analysis. Two states whose rows have a correlation coefficient of 1 are believed to have proportionally identical amount of outflows to every other state. When the coefficient for two states is less than 1, it indicates the extent to which the destinations of two state’s outflows differ. A parallel explanation can be made for the correlation between two state’s inflows. In other words, two states of which columns have a correlation coefficient of 1 are considered to have proportionally equal amount of inflows from every other state. The coefficient for two states less than 1 implies the extent to which the origins of inflows shipped into the two states from are not identical.

The next step is to extract principal components for grouping states with similar commodity flow patterns based on the correlation matrices produced in the first step. The Q-mode PCA extracts a set of components based on the similarities in the way origins ship their products to destination so as to yield groups of origin or producing regions, while the R-mode PCA accomplishes something of a similar nature based on the similarities in the way destinations

assemble their needs so as to identify the selection of destination states. States with high component loadings indicate that they are sharing proportionally similar destinations in the former analysis. In the latter analysis, states with high component loadings are interpreted that they have proportionally similar origins in common for their inflows. To simplify the structure of component loadings, a rotation technique is often used. In conjunction with loadings on components, the standardized component scores (usually called ‘factor scores’) on the components for the each state’s outflow and inflow assist in understanding the characteristics of the commodity flows. They measure the significance of a destination or an origin for the group of states that have similar outflow or inflow patterns, respectively. When looking into the outflow shipments, a large positive score indicates an especially strong destination of the outflows for those states that load highly on the component. A large negative score, conversely, indicates an extremely weak destination for the outflows from these states that have high component loadings. The interpretation for the inflow shipments can be made in a similar manner. For the states that have high loadings on the component, a large positive score indicates a particularly strong source of inflows, whereas a large negative score represents an extremely weak origin for the inflow shipment into these states.

This study conducts the factor analysis on state-to-state commodity flow matrices over the periods 1993, 1997, 2002, and 2007 using the software of SPSS Statistics version 17.0.

Commodity flow matrix factor analyses are executed with the aggregated shipments of all the commodities in terms of value of shipments. In using flow matrix factor analysis using the SPSS software, PCA is employed as an extraction method and correlation matrix of variables is chosen as an analysis object. Those correlation matrices, that denote the connection matrix, are produced by a pair-wise deletion, so that states are compared only for the 46 states (as origins or

destinations) that they have in common. Extractions are based on the standard that the eigenvalues are greater than 1.4⁸ and a varimax criterion is employed as a rotation technique.

And, a regression-based method is employed for calculating the standardized scores of components. All these factor analysis procedures are conducted with an original format of 48 by 48 commodity flow matrix for R-mode PCA as well as its transposed matrix for the Q-mode PCA, respectively. The former extracts components on inflows to each state and the latter identifies components on outflows from each state. Through these flow matrix factor analyses, the structure of commodity flows in the U.S. will be portrayed in an expression of “trade regions (or trade zones)” based on the spatially similar trading patterns in terms of sharing origins and destinations in an interstate trade system.

2.2.2.2. Dyadic factor analysis of commodity flow patterns

A dyadic factor analysis approach applied by Berry (1966) is replicated to analyze the flow structure of 13 different commodity groups. If the factor analysis procedure mentioned above is applied to all 13 different commodity groups for each year in two different terms of values and weight of shipments, at least 104-time factor analysis (13 commodity groups × 4 years × 2 terms) would need to be conducted. Instead, a dyadic factor analysis is conducted to reduce this large number of steps with state-to-state flow matrices for each specific commodity, and this dyadic factor analysis identifies the general flow pattern that can be understood based on the commodity characteristics. First, the 2,256 (=48×(48-1)) dyads, after excluding the main diagonal dyads,

8 The cut-off point of eigenvalue that was commonly used in the previous research is 1.0 (Berry, 1966; Black, 1973;

Plane & Isserman, 1983; Ellis et al., 1993; Pandit, 1994). However, this study uses 1.4 as the cut-off eigenvalue level for extraction of components. This level was determined by looking into the scree plots. On average, an elbow appears around at the level of eigenvalues between 1.4 and 1.5. Around this level, the number of components extracted (ultimately the number of trade regions grouped based on the factor analysis results) lands between seven and nine, which is almost half as many as the case using the level of eigenvalue greater than 1.0.

are arrayed as row observations with the 13 different commodities by column. The commodity groups are treated as variables in this case. Then, we have eight (for 2 terms × 4 years) dyadic data matrices with 2,256 rows and 13 columns. Given these data matrices, the correlations between the columns are found in the second step. These correlations indicate similarities between the way commodities flow over the dyads of the systems. Finally, a factor analysis of these correlations groups commodities on the factor (or component). All detailed methods employed for this factor analysis are the same as those applied to the factor analysis described in the previous section except the cut-off point of eigenvalue is for a level greater than 1.0.

Component scores are also calculated for each dyad on each component. Through this dyadic factor analysis on commodity groups as well as subsequent mapping processes with results, the major commodity flow patterns in the U.S. during 1993~2007 will be portrayed.

3. Analysis results: Spatial patterns of U.S. interstate commodity flows

In document Trade and Spatial Economic Interdependence: U.S. Interregional Trade and Regional Economic Structure (Page 42-46)