HKLNA Project - Factor Analysis

English or languish - Probing the ramifications
of Hong Kong's language policy

Factor Analysis
Principal Component and Common Factor Analysis

project index | statistical modelling (diagnostics)
cluster analysis (research design issues) | mds (number and interpretation of dimensions)

Key Features
Variables Metric (interval or ratio) - Many Dummy variables Sample Size - As the number of respondents to the HKLNA study is expected to be large, sample size with regard to this procedure is not likely to be a problem. As a general rule the number of obversations should be four to five times greater than the number of variables. Objective - Data reduction and summarization General Uses cluster analysis (key features) Factor analysis is an interdependence technique that examines the interrelationships among a large number of variables in an effort to determine underlying dimensions (factors). Factor analysis can be used for the following purposes: R - Analysis identifies a set of underlying dimensions among a large number of variables. Q - Analysis cluster analysis (proximity measures) The procedure condenses a large number of observations into distinctly different groups whose shared characteristics describe the population from which the observations are drawn. Identify key variables among a large number of variables for the purpose of further analysis using other, often predictive, statistical analyses - surrogate variable selection. Create an entirely new, but less numerous, set of variables to replace the original set for the purpose of further analysis using other statistical techniques. (return to factor model) Statistical Procedure - Although a very useful statistical procedure factor analysis requires a large number of decisions in order to obtain meaningful results. For this reason a decision tree and corresponding explanation have been created. There are two major approaches two factor analysis Prinicipal component analysis Common factor analysis

top

Key Terms
Communality - The communality of a variable is the proportion of a variable's total variance explained by all of factors on which it loads. It is the sum of the squared loadings for each variable on all factors. (return to factor model) H_i² = a_i1² + a_i2² + ... + a_ip² The square root of the communality for each variable SQRT(H_i² ) is the length of the variable's factor loading in factor vector space. Subtracting the communality of a variable from one yields that variable's uniqueness (unique variance). Eigen Values - The sum of the column of squared loadings for each factor EV_j = a_1j² + a_2j² + ... + a_nj² Eigen values are the roots of the characteristic equation for the correlation matrix. There is one eigen value for each factor. Σλ_j = ΣH_i² for j = 1, 2,..., P and i = 1, 2,..., N Dividing eigenvalues by either the number of variables (component analysis) or the sum of the communalities (common factor analysis) and multiplying by 100 yields the percent of variation (see percent of variance) which a single factor takes into account. Although the eigensums are the same for both the rotated and unrotated factor solutions, the eigenvalues themselves have different meanings. In the initial, unrotated solution the vertical sum of the squared loadings tells us something about the relative importance of each factor. This is not true for the rotated version! Factor Loading a_ij - A factor loading is the correlation between an original variable (or observation when performing Q-analysis) and a particular factor. a_ij² - The square of a factor loading is the percent fraction of variation that a variable shares in common with a particular factor. Factor Matrix - The tabulated numerical output of a factor solution. A factor matrix generally includes a list of factors with their associated variable factor loadings. Communalities and eigen values are also provided. Factor Pattern Matrix - This matrix contains the coefficients of the factors that describe the linear relationship among the factors which determine the standardized values of each variable for any given observation: Z_i = a_i1F₁ + a_i2F₂ + ... + a_iPF_P + e_iY_i where Z_i = the standardized value of the i^th variable ( X_i), where a_i1 = a regression coefficient for the i^th variable and the j^th factor where F_j = the factor score for the j^th factor, and where ε_iY_i = the error term. Factor Structure Matrix - This matrix contains the correlations (factor loadings) of each variable and factor. Factor Scores cluster analysis (proximity measures) Composite measures that reflect the importance of each factor relative to individual observations. Factor solutions - A factor solution is simply the set of factors that result from a factor analysis experiment. Factor solutions fall into two major types: Orthogonal Factor Solutions, and Oblique Factor Solutions Orthogonal factor solutions yield factors that are statistically independent and can be used with other statistical procedures that must satisfy assumptions of statistical independence. Oblique factor solutions are solutions for which factors are correlated. They are usually a better reflection of the underlying reality which the researcher seeks to describe. (return to extraction method) Percent of Trace - The part of total model variance taken into account by a single factor. PV_j = [(a_1j² + a_2j² + ... + a_3j²) / T] · 100 where j = the j_th factor where T = the trace of the correlation matrix Caution: The percents of trace for rotated and unrotated common factor solutions are derived differently from those of principal component analysis. For common factor analysis the trace for rotated solutions is the same as that for unrotated solutions. Whereas the summed communalities for unrotated solutions are equal to the trace of the correlation matrix, their sum tor rotated solutions is not. Total Percent of Trace - Summing across the percents of trace for each factor yields the total percent of trace. The total percent of trace is an indicator of how well a particular factor solution accounts for the variance of all the variables. If the variables are all very different the total percent of trace will be low. If the variables are similar the total percent of trace will be high. Percent of Variance - The part of total variance taken into account by a single factor. (return to eigenvalue) PV_j= λ_j/ N where N = the total number of variables or total model variance. For common factor aalysis the percent of trace and percent of variance are different insofar as the first measures the proportion of common variance and the second the proportion of total variance taken into account by the factor in question. Percent of Total Variance (PTV) - The common variance explained by all factors as a percentage of total variance. PTV = PV₁ + PV₂ + ... + PV_p Principal Diagonal - The principal diagonal of the correlation matrix whose elements are defined differently according to the factoring procedure employed. (return to factor model) Principal component analysis - the elements of the principal diagonal are equal to one. In other words the variable correlates exactly with itself and all variance associated with that variable, both systematic and unsystematic, are included in the analysis Common factor analysis - the elements of the principal diagonal are not equal to one; rather, they are equal to the communalities associated with the variables of the original unrotated solution. Sources of Variance (return to uniqueness) Common variance - the variance that a single variable shares in common with one or more of the other variables of the analysis. Unique variance - the sum of both the systematic and unsystematic variance that is common to no other variable. (return to factor model) Specific variance - the systematic variance specific to a particular variable and not shared with all other variables . Error variance - the unsystematic variance specific to a particular variable. Total variance - For any given variable the total variance associated with that variable is equal to the sum of its common variance and unique variance, or alternatively total variance = common varaince + unique variance where unique variance = specific variance + error variance Surrogate variable - That variable which loads heaviest on a factor and is used to represent that factor in subsequent analysis. Surrogate variables can be utilized in lieu of factor scores. (return to factor model) Trace - The sum of the elements of the principal diagonal of the correlation matrix (return to percent of trace) Principal component analysis - the trace equals the total number of variables included in the analysis Common factor analysis - the trace equals the sum of the communalities of all variables of the initial unrotated factor solution. Uniqueness (return to communality) - The unique variance of a variable with respect to other variables of a factor analysis is obatined as follows: U_i = 1 - H_i² See Sources of Variance for further discussion.

top

Orthogonal Rotation
There are several orthogonal rotation techniques including Quartimax - The Quartimax approach seeks to simplify the rows of the factor matrix. The goal is to obtain factor loadings in which each variable loads high on only one factor and low on all others. As a result many variables tend to load heavily on single factors. Varimax - The Varimax approach seeks to simplify the columns of the factor matrix. Thus, for anyone factor all variables tend to load either very high or very low. Equimax - The Equimax approach seeks a balance between row and factor simplification. A thorough analysis might employ them all.

top

Factor Extraction Criteria
There are several different criteria commonly employed to determine the number of factors to extract. These include the Latent root criterion A priori criterion Percentage of variance criterion, and Scree test

top

Reference List
Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham, and Bernie J. Grablowsky. 1984. Multivariate Data Analysis with Readings. New York: Macmillan Publishing Co. Hughes, Adele. 1984. Class notes to graduate coursework in Applied Stastistical Methods.

top