|
|
|
|
|
|
|
|
|
|
|
|
English
or languish - Probing the ramifications
of Hong Kong's language policy |
|
|
|
Factor
Analysis
Principal Component and Common
Factor Analysis
project index | statistical modelling (diagnostics)
cluster analysis (research
design issues) | mds (number
and interpretation of dimensions) |
|
|
|
|
|
Key
Features
- Variables
- Metric (interval or ratio) - Many
- Dummy variables
- Sample Size - As the number of respondents to the
HKLNA study is expected to be large, sample size with regard
to this procedure is not likely to be a problem. As a general
rule the number of obversations should be four to five times
greater than the number of variables.
- Objective - Data reduction and summarization
- General Uses
cluster analysis (key
features)
Factor analysis is an interdependence technique
that examines the interrelationships among a large number of
variables in an effort to determine underlying dimensions (factors).
Factor analysis can be used for the following purposes:
- R - Analysis identifies a set of underlying dimensions
among a large number of variables.
- Q - Analysis
cluster analysis (proximity
measures)
The procedure condenses a large number of observations into distinctly
different groups whose shared characteristics describe the population
from which the observations are drawn.
- Identify key variables among a large number of variables
for the purpose of further analysis using other, often predictive,
statistical analyses - surrogate variable selection.
- Create an entirely new, but less numerous, set of variables
to replace the original set for the purpose of further analysis
using other statistical techniques. (return to
factor model)
- Statistical Procedure - Although
a very useful statistical procedure factor analysis requires
a large number of decisions in order to obtain meaningful results.
For this reason a decision tree
and corresponding explanation
have been created. There are two major approaches two factor
analysis
- Prinicipal component analysis
- Common factor analysis
|
|
|
top |
|
Key Terms
- Communality - The communality
of a variable is the proportion of a variable's total variance
explained by all of factors on which it loads. It is the sum
of the squared loadings for each variable on all factors. (return to factor
model)
H = a
+ a + ... + a
The square root of the communality for each variable SQRT(H
) is the length of the variable's factor loading in factor
vector space.
Subtracting the communality of a variable from one yields that
variable's uniqueness (unique variance).
- Eigen
Values - The sum of the column of squared loadings for
each factor
- EV = a + a
+ ... + a
- Eigen values are the roots of the characteristic equation
for the correlation matrix. There is one eigen value for each
factor.
- Σλ
= ΣH
for j = 1, 2,..., P and i = 1, 2,..., N
- Dividing eigenvalues by either the number of variables (component
analysis) or the sum of the communalities (common factor analysis)
and multiplying by 100 yields the percent of variation (see percent of variance) which a single factor takes
into account.
- Although the eigensums are the same for both the rotated
and unrotated factor solutions, the eigenvalues themselves have
different meanings. In the initial, unrotated solution the vertical
sum of the squared loadings tells us something about the relative
importance of each factor. This is not true for the rotated version!
- Factor Loading
- a
- A factor loading is the correlation
between an original variable (or observation when performing
Q-analysis) and a particular factor.
- a - The square of a factor loading
is the percent fraction of variation that a variable shares in
common with a particular factor.
- Factor Matrix - The tabulated
numerical output of a factor solution. A factor matrix generally
includes a list of factors with their associated variable factor
loadings. Communalities and eigen values are also provided.
- Factor Pattern Matrix - This matrix contains the coefficients
of the factors that describe the linear relationship among the
factors which determine the standardized values of each variable
for any given observation:
Z = aF + aF
+ ... + aF + eY
where Z = the
standardized value of the i
variable ( X),
where a
= a regression coefficient for the i
variable and the j
factor
where F =
the factor score for the j
factor, and
where εY = the error term.
- Factor
Scores
cluster analysis (proximity
measures)
Composite measures that reflect the importance of each factor
relative to individual observations.
- Factor solutions - A factor
solution is simply the set of factors that result from a factor
analysis experiment. Factor solutions fall into two major types:
- Orthogonal Factor Solutions, and
- Oblique Factor Solutions
Orthogonal factor solutions yield factors that are
statistically independent and can be used with other statistical
procedures that must satisfy assumptions of statistical independence.
Oblique factor solutions are solutions for which factors
are correlated. They are usually a better reflection of the underlying
reality which the researcher seeks to describe. (return
to extraction method)
- Percent
of Trace - The part of total model variance taken into
account by a single factor.
PV = [(a + a
+ ... + a) / T] · 100
where j = the j
factor
where T = the trace of the correlation
matrix
Caution: The percents of trace
for rotated and unrotated common factor solutions are
derived differently from those of principal component analysis.
For common factor analysis the trace for rotated solutions is
the same as that for unrotated solutions. Whereas the summed
communalities for unrotated solutions are equal to the trace
of the correlation matrix, their sum tor rotated solutions is
not.
- Total Percent of Trace
- Summing across the percents of trace for each factor yields
the total percent of trace. The total percent of trace is an
indicator of how well a particular factor solution accounts for
the variance of all the variables. If the variables are all very
different the total percent of trace will be low. If the variables
are similar the total percent of trace will be high.
- Percent
of Variance - The part of total variance taken into account
by a single factor. (return to eigenvalue)
PV= λ/ N
where N = the total number of variables or total model variance.
For common factor aalysis the percent of trace and percent
of variance are different insofar as the first measures the proportion
of common variance and the second the proportion of total variance
taken into account by the factor in question.
- Percent
of Total Variance (PTV) - The common variance explained
by all factors as a percentage of total variance.
PTV = PV + PV + ... + PV
- Principal
Diagonal - The principal diagonal of the correlation
matrix whose elements are defined differently according to the
factoring procedure employed. (return to factor model)
- Principal component analysis - the elements of the
principal diagonal are equal to one. In other words the variable
correlates exactly with itself and all variance associated with
that variable, both systematic and unsystematic, are included
in the analysis
- Common factor analysis - the elements of the principal
diagonal are not equal to one; rather, they are equal to the
communalities associated with the variables of the original
unrotated solution.
- Sources of Variance (return
to uniqueness)
- Common variance - the variance that a single variable
shares in common with one or more of the other variables of the
analysis.
- Unique variance - the sum of both the systematic and
unsystematic variance that is common to no other variable. (return to factor
model)
- Specific variance - the systematic variance specific
to a particular variable and not shared with all other variables
.
- Error variance - the unsystematic variance specific
to a particular variable.
- Total variance - For any given variable the total
variance associated with that variable is equal to the sum of
its common variance and unique variance, or alternatively
total variance = common varaince + unique variance
where unique variance = specific variance + error variance
- Surrogate variable - That
variable which loads heaviest on a factor and is used to represent
that factor in subsequent analysis. Surrogate variables can be
utilized in lieu of factor scores. (return to
factor model)
- Trace - The sum of the elements
of the principal diagonal of the correlation matrix (return
to percent of trace)
- Principal component analysis - the trace equals the total
number of variables included in the analysis
- Common factor analysis - the trace equals the sum of the
communalities of all variables of the initial unrotated factor
solution.
- Uniqueness (return
to communality) - The unique
variance of a variable with respect to other variables of a factor
analysis is obatined as follows:
U = 1 - H
See Sources of Variance for further discussion.
|
|
|
top |
|
Orthogonal Rotation
There are several orthogonal rotation techniques including
- Quartimax - The Quartimax approach seeks to simplify the
rows of the factor matrix. The goal is to obtain factor loadings
in which each variable loads high on only one factor and low
on all others. As a result many variables tend to load heavily
on single factors.
- Varimax - The Varimax approach seeks to simplify the columns
of the factor matrix. Thus, for anyone factor all variables tend
to load either very high or very low.
- Equimax - The Equimax approach seeks a balance between row
and factor simplification.
A thorough analysis might employ them all.
|
|
|
top |
|
Factor Extraction
Criteria
There are several different criteria commonly employed to
determine the number of factors to extract. These include the
|
|
|
top |
|
Reference
List
Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham,
and Bernie J. Grablowsky. 1984. Multivariate Data Analysis with
Readings. New York: Macmillan Publishing Co.
Hughes, Adele. 1984. Class notes to graduate coursework in
Applied Stastistical Methods.
|
|
|
top |
|