Multivariate regression question: Question: I am wondering if
including as an independent regressor variable the result of an
indirect transformation of the dependent variable on a third data
point is a statistically sound method.
I know that directly including the dependent variable or even a
transformation (eg LN, Sqd, etc) as an independent variable is unsound
but not sure about this question.
For example: i prepare an exam paper with 100 questions each worth one
point. I then record the number of hours the students study per week
and have them write the exam. I build a regression analysis of the
form %Score = b0 + b1*Hours.
I then calculate, assuming that i lower each students score by 1%,
what that implies for the change in the number of points out of 100
they would get. this is like a sensitivity measure. Can i include in
my regression equation this sensitivity factor as one of the
independent variable? |
Request for Question Clarification by
mathtalk-ga
on
02 Feb 2005 09:44 PST
Hi, floor37-ga:
I don't see in your example how the "sensitivity factor" constitutes
"an indirect transformation of the dependent variable on a third data
point".
Wouldn't knowing the effect on number of points out of 100 a student
has from lowering each student's score by 1% be equivalent to knowing
that student's score, at least in perfect precision?
If what you have in mind is to truncate or otherwise round that
"extra" data, it would still be positively correlated with the
student's score (the higher their score, the more points they stand to
lose by lowering by 1%) unless the truncation is so severe as to
remove all information.
So perhaps the example needs a bit of tweaking to uncover the issue of
"a third data point".
regards, mathtalk-ga
|
Clarification of Question by
floor37-ga
on
02 Feb 2005 11:38 PST
Hi Mathtalk-ga,
Let me try and clarify my question. I hypothesis that %Score =
(10Points / 100TotalPoints) * Hours. (I am using only one variable for
now to keep the example less confusing for the both of us) So,
obviously if the student studies 1 hour i say they will get 10 points
or 10% on the exam, etc. Before running the regression i perform a
backward calculation that solves for the number of points correct if i
change the %Score by -1%, very simple math, the number of points will
be 9 and the percent change in points given a -1% change in score will
be -1%. In this example a direct linear relationship exists (the third
data point being the number of correct points, on which i am
determining the %change thereof given an assumed change in the $Score)
Now, what i am questioning is the soundness of including in my
regression formula, as an independent variable, this calculated change
in the number of correct points given my selected change in the
%Score. So my regression calculation looks like: %Score = b0 +
b1*Hours +b3*(%chg.inCorrectPoints for 1% reduction in %Score) . This
discussion is highlighting for me that this may be completely circular
and therefore unsound but i am still not sure. Also, my actual
underlying model has many more variables as does my regression formula
- but i don't think that this should change the nature of this
question?
Thanks, floor37-ga
|
Clarification of Question by
floor37-ga
on
02 Feb 2005 11:54 PST
One further clarification: i am calculating the %change of correct
points using an assumed coefficient for b1, but i don't actually know
what b1 is until i run the regression. I am however using the calced
%score that i get from my hypothesised formula. Not sure if all this
helps or just confuses things even more.
floor37-ga
|