factor analysis (primary characteristics) project index factor analysis (decision tree)

English or languish - Probing the ramifications
of Hong Kong's language policy

FACTOR ANALYSIS
Explanation of Decision Tree

Variable Selection and Sample Size - Factor analysis is a diagnostic statistical technique used for the purpose of data reduction and summarization. As a statistical technique it looks for the common variation of a large number of variables, and from this variation creates a smaller set of factors (new variables). Shared variation among variables can arise when different variables respond similarly to the same phenomena, or when different variables constitute different aspects of the same phenomenon. In the first instance the common variation is brought about by an external force to which all variables respond in a similar manner. In the second instance, an external source is unnecessary to bring about a common pattern of variance, because the variables in question always behave similarly no matter what force might be present. Flotsam moves together atop undulating ocean waves not because the individual pieces of flotsam are related, rather because the waves that cause the individual pieces of flotsam to rise and fall are indifferent to their presence. In contrast the limbs of one's own body always demonstrate similar movement when the entire body is moved -- no matter the origin of the force that causes the body to move. Thus, it is important to know the approximate relationship or lack of relationship among the variables that one enters into the analysis before one can effectively interpret the reduced number of factors that results.

The number of variables and number of observations in any statistical analysis is crucial. As the number of variables and observations included in the HKLNA-Project are both expected to be large, sample size will only become an issue with regard to the size of the population that is being tested, not the statistical technique itself. (statistical index (factor analysis) | decision tree)

Correlation Matrix - As factor analysis is a statistical procedure that ignores cause and effect relationships, it treats the variance of all variables (R-analysis) or all observations (Q-analysis) similarly. In other words, one can calculate the correlation matrix with respect to either variables or observations simply by turning the input data matrix on its side.

Factor Model - Common factor analysis and principal component analysis are the two principal techniques employed by factor analysis to obtain factor solutions. These two statistical methods differ technically in the amount of information they employ in the selection of the factor solutions. Whereas principal component analysis uses all available variance to calculate a factor solution, common factor analysis uses only that variance which is shared among variables to determine a solution. This difference is highlighted by the structure of the correlation matrix -- namely, the elements of the principal diagonal of the correlation matrix. Whereas the diagonal elements of the principal component matrix consist only of ones and thus reflect the full variance of each variable, the elements of the common factor model are the communalities associated with each of the input variables. In short, the common factor model ignores the unique variance of each variable in calculating the final factor solution.

Which of the two models to use depends on two considerations: the research objective and prior knowledge about the variance structure.

See under general uses for further clarification.

Method of Extraction - Once the appropriate factor model has been determined one must choose between an orthogonal or oblique extraction method (factor solution). When the goal of extraction is to obtain independent factors for use in other statistical techniques that require a high degree of independence among the explanatory variables, orthogonal extraction is the appropriate choice. Thus, principal component analysis and orthogonal extraction often go hand-in-hand.

As may be deduced from the names of these two extraction processes, orthogonal extraction assumes indepedence among the extracted factors. Oblique extraction assumes that the factors are correlated. When the objective of the analysis is to identify underlying factors or latent constructs (common factor analysis) both orthogonal and oblique factor extraction methods can be employed. (statistical index (factor analysis) | decision tree)

Closely associated with the method of extraction is factor rotation.

Factor Rotation (factor extraction criteria)- In order to understand the importance of factor rotation it is useful to examine how the factors of an unrotated orthogonal extraction are obtained.

Factor Interpretation - (statistical index (factor analysis) | decision tree) Once the number of factors have been determined, they must be interpreted. Interpreting factors is largely a question of which variables load on which factors and in what amount. Several guidelines have been suggested for determining when a factor loading is significant. Variables that do not load heavily on a particular factor should not be used to interpret that factor

  Significance Level

1 Percent

5 Percent
Sample size Minimum Factor Loadings Percent of variance captured Minimum Factor Loadings Percent of variance captured
100 | ±0.19 | < 3.6 | ±0.26 | < 6.8
200 | ±0.14 | <  2.0 | ±0.18 | < 3.2
300 | ±0.11 | <  1.2 | ±0.15 | < 2.3
| | = absolute value

Obviously very high restrictions on the level of significance demand that a far larger number of factors be included in the final solution.

Minimum Factor Loadings
Sample size Number of variables 5th Factor 10th Factor
50 20 0.292 0.393
50 0.267 0.274
100 20 0.216 0.261
50 0.202 0.214 
5 percent siginificant level

A tabled summary of the above relationships can be found here.

Interpreting the factor matrix
(factor analysis (index) | decision tree) | cluster analysis (identification)

Some helpful steps for interpreting the factor matrix include the following:

  1. Write out the names of the variables.
  2. Find the highest loading of each variable on each factor and highlight it. This is best achieved by selecting one variable and then finding the factor on which it loads the highest before moving on to the next variable.
  3. After identifying the highest factor loading of each variable, identify other loadings that are significant.
  4. Critically evaluate those variables that do not load significantly on any factor. Variables that load significantly on no factor and fail to demonstrate high communality can be eliminated and a new factor solution obtained.
  5. Having identified all variables that load significantly examine each factor separately and assign a name based upon those variables which load most heavily on it. If no name can be found then a factor should be labeled as undefined.

Factor Scores and Surrogate Variables (statistical index (factor analysis) | decision tree) - Researchers wishing to perform further experiments using different statistical tests can do so in either of two ways: one, select a surrogate variable for each of the factors, or two, employ the factor scores associated with each factor.

 top