Hi vitaminic!
OK, here is your answer, and on Monday night :-)
a) Regressions I-III used only the age in order to explain the
mortality rate. Regression II and III used sub-samples: regression II
used only the group of smokers, while regression III used only the
group of non-smokers. Regression IV again used the whole sample, but
now introduced information on the smoking habits of each person
(observation), and allowed this information to affect the constant
(through a dummy for smoker/non-smoker). Regression V used this same
information, but also allowed for a different coefficient for the age
for smokers and non-smokers.
The connection between the coefficients can be understood as follows:
regressions II and III shows the process of mortality rate as a
function of age for smokers (reg. II) and non-smokers (reg. III).
Regression I pooled these two groups. Since there is an equal number
of smokers and non-smokers in the sample, we can see that the
coefficients in regression I are exactly the average of the
coefficients in equations II and III. For example, the constant in
equation I (-0.8080) is the average between the constant of equation
II (0.0104) and III (-1.6264). The same goes for the coefficient of
age. It's exactly the average only because the number of smokers and
non-smokers is the same. If there were more of any group, the
coeffcients in equation I would be a weighted average of the
coefficients in equations II and III, with more weight applied to the
group with mroe observations. Regarding regression IV, we can see that
since we're taking the whole sample, the coefficient of age is the
same as in equation I.
Regression V allows a different constant and coefficient for smokers
and non-smokers. This is so in the following way. Regression V is:
M = -1.6264 + 0.1037*X 1.6368*d -0.0164*d*X
If we want to see the process for mortality for smokers fro mthis
equation, we have to set d=1 (smoker). When d=1, this equation
becomes:
M = -1.6264 + 0.1037*X + 1.6368 -0.0164*X
= 0.0104 + 0.0873*X
which takes us back to9 equation II. The same can be seen when we want
to see the process for non-smokers, and thus we set d=0. We will get
the same equation as III.
b) The idea of the F-test is to test if none of the chosen variables
(besides the constant) have any explanatory power over the dependent
variable. It uses the null hypothesis that all the explanatory
variables are equal to zero. The alternative hypothesis is that at
least one is different from zero. More information on this test can be
found at
The F-test
http://biosys.bre.orst.edu/BRE571/regress/f-test.doc
So for equations I-III the null hypothesis is that b1=0
For equation IV, the null is that b1=0 and r=0
For equation V, the null is that b1=0, r=0 and alpha=0
If we can't reject the null hypothesis of thsi test, it means that the
chosen explanatory variables have no relaqtionship with the dependent
variable. Clearly, the F-test is equivalent to the t-test whenever
there is only one explanatory variable besides the constant, such as
in equations I-III. You can check that the values shown for the F-test
are such that these hypothesis are rejected (using an F-distribution
table and the directions provided in the link above). In equations
I-III, this implies that there IS indeed a relationship between the
mortality rate and age.
c) There is evidence that there is there a significant difference
between smokers and non-smokers in the relationship between mortality
rate and age. This is so because the coefficient for alpha in equation
V is significantly different from zero (because its t-value is lower
than -2). Why do we have to look at this coefficient. Let's review
equation V:
M = -1.6264 + 0.1037*X + 1.6368*d -0.0164*d*X
From the equation, we see that ALL the coefficients are statistically
different from zero (all the t-values are outside the [-2,2] range),
so the equation stays like this. We can then rewrite this equation as:
M = -1.6264 + (0.1037-0.0164*d)*X + 1.6368*d
The relationship between age and the mortality rate is given by the
coefficient of X, which shows the effect of an extra year of age on
the mortality rate. Since 0.0164 is statistically different from zero,
we can see this coefficient does change when d=0 or d=1. This shows
that the coefficient is differnet from smokers and non-smokers, thus
implying that the relationship between mortality rate and age is
actually different for smokers and for non-smokers. In particular, we
can see that an extra year of age has a greater impact on the
mortality rate for non-smokers than for smokers.
Google search strategy:
f-test regression
://www.google.com/search?q=f-test+regression&hl=en&lr=&ie=UTF-8&oe=UTF-8&start=0&sa=N
I hope this helps! If you have any doubt regarding this answer, please
don't hesitate to request a clarification. Otherwise, I await your
rating and comments.
Best wishes!
elmarto |