Dear Google Answers Experts
I posted this questions in the maths section but I had not
answers. This is more like the sort of thing that psychologists
and sociologists do well so I decided to ask here.
https://answers.google.com/answers/main?cmd=threadview&id=66729
I am a lecturer in English and culture at a Japanese University with
a statistics problem. The nub of the problem is in (5) below.
Our language department is thinking of setting up a degree consisting
of about half language courses and about half "other courses."
We are wondering what "other courses" to put into the mix (culture,
psychology, film studies, literature, commerce, or whatever.)
I asked high school students who came to our university open day to
fill out a survey. I am trying to analyse the survey asking which
asked they are interested in. I have 283 respondents rating 60
courses (including "foreign language")
There are also some questions what do you expect to gain from
being at university (which includes a response "foreign language
study" as well as "analytical ability," "general education" etc.)
I am using XLstat to analyse the data.
http://www.xlstat.com/
Download page
http://www.xlstat.Com/xlstat5122.exe
(There is a more recent version that this but I have been
told that it has a bug). I recommend XLstat heartily and since it
is shareware you can use it 20 times for free. At the same time
I am not 100% sure if I trust the software - it was so cheap!
($100 at the time of purchase, a bit more for the latest editions
now)
I am doing a factor analysis. I would like to see what courses
factor with foriegn language education.
Here is the data and results of the factor analysis -
http://www.bh.wakwak.com/~takemoto/factoranalysis_example3.xls
This is the varimax transformation results
http://www.bh.wakwak.com/~takemoto/varimax.xls
I have added some color to the tables to show what I think the
factors might be. I chose 7 factors since the Eigen Value of the
8th is less than one.
Questions
1) Please perform a factor analysis on the raw data (starting
with communications theory (cell aa5) to "Care for the Elderly"
(cell CN5), with the last row of subjects beeing on row 287 to
check my and XLSstats results.
2) There is also a lot of overlap between the factors (the last
column of the chart shows "Communalities" which is a measure of
overlap. Does this matter?
3) In the non-varimaxed results
http://www.bh.wakwak.com/~takemoto/factoranalysis_example3.xls
there is a massive first factor that contains almost everything.
I have called it "Keenness." In the Varimaxed version
http://www.bh.wakwak.com/~takemoto/varimax.xls
the first two factors are rather large and both contain
foreign language education. Why is(are) there such a big
first factor(s)? Does this matter? Does it mean that factors
could not be found? That the data is pretty random?
4) Another problem is that XLstat does not give the data in the
way that I am used to with the components ordered for the first
second third, etc, factors. I think that spss and stat-view give
you an indication of what elements to include in each factor.
In the statistics package that I was used to (SAS? SPSS?)I got a
result like the example from the page below.
http://trochim.human.cornell.edu/tutorial/flynn/factor.htm
<QUOTE>
Below is an example of what the factors might look
like if we rotated them. Notice that the loadings are
distributed between the factors, and that the results
are easier to interpret.
Table #3: Rotated Factor Matrix Variables Factor 1 Factor 2 Communality
Ability to define problems *.68 .17 .87
Ability to supervise others *.87 .24 .79
Ability to make decisions *.65 .07 .90
Ability to build consensus .16 *.76 .88
Ability to facilitate decision-making .30 *.83 .67
Ability to work on a team .19 *.69 .72
<END of QUOTE>
But with XLstat
4.1) The variables are not ordered, nor is there an indication
of which variables go in which factor. What cut off should I
use? (.3 or .4 or .35? and why)
4.2) There are two tables one that calls itself "Principle
Factor Analysis" and "Results of Varimax Transformation."
I am not sure which table I should be looking at. I guess
I want the "Varimax Transformation of the Principle Factor
Analysis."
I played around with the varimax rotated chart and tried to
identify some factors.
5) Basically I would like help performing a factor analysis
on the data. What are the factors? What does it mean?
What conclusions can I draw?
One of the reasons why I am unsure of myself is that I
performed factor analyses at two stages during the data input
when I had input 100 subjects and 200 subjects. Both times
the factors came out different.
108 respondents
http://www.bh.wakwak.com/~takemoto/factoranalysis_example2.xls
202 respondents data.
http://www.bh.wakwak.com/~takemoto/factoranalysis_example2.xls
6) I would be grateful of some references to Factor Analysis
academic papers and Web pages.
And, by the way, this is me
http://www.mii.kurume-u.ac.jp/~leuers/informal-homepage.htm
I am British non-tenured Language Instructor due to get the
sack in a couple of years. However if we can persuade the
management to let us build a new department then perhaps I
can keep my job!
If you have a good answer then please feel free to post to
the other question site on the maths board as well - making
a total of $50. |
Request for Question Clarification by
jeremymiles-ga
on
08 Oct 2002 13:56 PDT
Hello,
I have had a look at your problem, and had a couple of queries.
First, I count 66 variables in your dataset, where you say that there
are 60. Is that correct?
Second, you have some values of 1.00001 in your data. Are these
missing data? (i.e. where the respondent didn't answer the question).
Third, using eigenvalues over 1, I get 13 factors, not 7 as you
report. If possible could you post (either here or on your web pages)
the correlation/covariance matrix that you are analysing.
And a partial answer to some of the questions you asked:
You said that there is a large factor prior to rotation
(non-varimaxed, as you put it). This is necessarily the case with
factor analysis, and this is why rotations were developed.
4.2) There are two tables one that calls itself "Principle
Factor Analysis" and "Results of Varimax Transformation."
I am not sure which table I should be looking at. I guess
I want the "Varimax Transformation of the Principle Factor
Analysis."
Yes, that's right.
The communality is not the overlap between factors, it is the
proportion of the variance in the variable that is explained by the
factors. It is analoghous to the R^2 for the variable (if that means
anything to you.)
4) Another problem is that XLstat does not give the data in the
way that I am used to with the components ordered for the first
second third, etc, factors. I think that spss and stat-view give
you an indication of what elements to include in each factor.
They don't really, they just reorder the factor loading matrix, to
help you to interpret it. They don't mean anything by it. You can
reorder the matrix yourself, using the same rules.
There is a lot of argument about cutoffs to use. However, something
around 0.3 - 0.4 should be OK.
One of the reasons why I am unsure of myself is that I
performed factor analyses at two stages during the data input
when I had input 100 subjects and 200 subjects. Both times
the factors came out different.
That's not surprising. 100 is low for a factor analysis of 60 items,
and 200 is approaching a minimum. In addition, factor analysis is
trying to describe the data that you have, there are an infinite
number of solutions, each of which is equally good, as far as the
mathematics behind the analysis is concerned.
6) I would be grateful of some references to Factor Analysis
academic papers and Web pages.
I am a little short of time right now to answer this one in detail,
however a good book is "an easy guide to factor analysis" by Paul
Kline.
A good web page is here:
http://quantrm2.psy.ohio-state.edu/maccallum/factornew.htm
This is a book that was never completed. It may be a little more
detailed than you require.
I will come back to this in a day or two, however, another researcher
may be able to help you with the rest of your question before I get
back to this.
I hope all of this is some help to you, and that the rest of your
question is answered shortly (either by me, or someone else).
When I went to the page that you gave to download XLSstat, I got a
'file not found' error.
Finally, I haven't ever heard of any problems with XLstat, a freeware
program you can use is CEFA, by Michael Browne.
http://quantrm2.psy.ohio-state.edu/browne/
(Go to the bottom of the page.)
jeremymiles-ga
|
Clarification of Question by
takemototim-ga
on
15 Oct 2002 15:54 PDT
Dear Jeremy Miles,
Thank you very much for getting back to me. Alas I was so busy over the
weekend that I did not have time to respond. And now I must present the
results of my analysis at 2:40 today (Tokyo Time).
> First, I count 66 variables in your dataset, where you say that there
> are 60. Is that correct?
Correct. Sorry.
There are 60 courses but when I performed a factor analysis on them alone
I did not like the results (factors large and meaningless) and so I tried
adding the variables in the "factors that are important to you when selecting
a university" section of the survey. These include "foreign language education."
I should really stick to the last 60 variables which are "How interested are
you in these courses."
> Second, you have some values of 1.00001 in your data. Are these
> missing data? (i.e. where the respondent didn't answer the question).
Yes. I have removed the rows wherein there are many of these in my latest
attempt to analyse the data.
> Third, using eigenvalues over 1, I get 13 factors, not 7 as you
> report. If possible could you post (either here or on your web pages)
> the correlation/covariance matrix that you are analysing.
I get 7 factors no matter how many times I do this. Please mail me
for the full data set. My mail address is given below.
> They don't really, they just reorder the factor loading matrix, to
> help you to interpret it. They don't mean anything by it. You can
> reorder the matrix yourself, using the same rules.
I am not sure about what the rules are.
> That's not surprising. 100 is low for a factor analysis of 60 items,
> and 200 is approaching a minimum. In addition, factor analysis is
> trying to describe the data that you have, there are an infinite
> number of solutions, each of which is equally good, as far as the
> mathematics behind the analysis is concerned.
Oh dear. I don't think that the people to whom I am presenting are
going to like it if I say that "There are a number of solutions,
each of which is equally good." This would mean that each of them
are equally bad.
> http://quantrm2.psy.ohio-state.edu/maccallum/factornew.htm
I am afraid that this link does not seem to work now.
> When I went to the page that you gave to download XLSstat, I got a
> 'file not found' error.
The link I was giving was to an old version of the software because
there was a bug in the latest version at the time. The latest version
now does not have a bug and the trial version can be downloaded from -
http://www.xlstat.com/
I don't have time to post the very large data file to the net now.
Please can I ask interested parties to mail me and I will be happy
to mail them the data.
timothy*at*zd.wakwak.com
The "*at*" should be replaced by an "at mark"
Thanks again,
Timothy Takemoto
I am a British national living and working and married in Japan.
|
Request for Question Clarification by
jeremymiles-ga
on
19 Oct 2002 15:50 PDT
> Second, you have some values of 1.00001 in your data. Are these
> missing data? (i.e. where the respondent didn't answer the
question).
Yes. I have removed the rows wherein there are many of these in my
latest
attempt to analyse the data.
---------------
OK, thanks.
---------------
> Third, using eigenvalues over 1, I get 13 factors, not 7 as you
> report. If possible could you post (either here or on your web
pages)
> the correlation/covariance matrix that you are analysing.
I get 7 factors no matter how many times I do this. Please mail me
for the full data set. My mail address is given below.
-------------
I have done the analysis in XLStat, and I get 7 factors. In SPSS I
get more.
XLstat says it uses 'principal factors' extraction, this is not an
option in SPSS, although 'principal components' is, and 'principal
axis' is. Both of these give 12 factors (using eigenvalues over 1).
SPSS and XLstat are analysing the same correlation matrix (it took me
a while to understand your missing data, but I think I got it in the
end).
BTW, you don't need to remove the cases where there is missing data -
XLstat will remove them on its own.
-------------
Oh dear. I don't think that the people to whom I am presenting are
going to like it if I say that "There are a number of solutions,
each of which is equally good." This would mean that each of them
are equally bad.
------------
Oh sorry, ignore that. It was the sort of flippant, off the cuff
remark that factor analysts sometimes use. All are equally good
mathematically, all are not equally good in interpretation. You are
trying to understand the relations amongst your measures, and some
solutions will help you, and some won't. The purpose of procedures
for rotation, such as varimax, is to attempt to give you a useful
solution.
This question is about to expire - you might want to repost, and we
will keep trying.
jeremymiles-ga
|