Hi 1e2!
In order to find the standard errors for each coefficient, one must
first calculate the variance-covariance matrix of the coefficients.
From there, the calculation of the standard deviation of the
coefficients is immediate. The computation of this matrix requires a
large amount of calculations, so calculating it "by hand" can take an
exceedingly long time (with no small probability of a mistake), so a
computer is used to calculate it. In particular, the most difficult
step in computing this matrix is the inversion of a (potentially very
large) matrix. Anyway, here is the formula to calculate it. I will
first use "Greek" notation (I'm sorry, it can't be helped :-) ) and
then I will show you what numbers you have to plug in each of the
symbols, so you shouldn't have any trouble understanding it.
We must first define the X matrix. Each row of the X matrix
corresponds to an observation, while each column represents the value
of the explanatory variables in that observation. For example, the
first few lines in the data you provided are:
Y X1 X2 X3 X4 X5 X6
43 51 30 39 61 92 45
63 64 51 54 63 73 47
71 70 68 69 76 86 48
61 63 45 47 54 84 35
The X matrix doesn't include the explained variable Y; only the
explanatory ones and the constant. If we didn't include the constant,
the (first few rows of the) X matrix would be:
51 30 39 61 92 45
64 51 54 63 73 47
70 68 69 76 86 48
63 45 47 54 84 35
and so on. It shows the values of the explanatory variables. Since
you're also interested in the constant, we have to include it. The
constant is like another explanatory variable, whose value is always
1. Therefore, the X matrix becomes:
1 51 30 39 61 92 45
1 64 51 54 63 73 47
1 70 68 69 76 86 48
1 63 45 47 54 84 35
Obviously, it has 1 more column than if it didn't include the
constant.
Having defined X, here is the formula for the variance-covariance
matrix:
Cov. Matrix = (s^2)*(X'*X)^(-1)
Let's see how to compute each of the components. We'll see first how
to compute (X'*X)^(-1)
X' means "X transposed". Thus, X' will be like X but with the columns
and rows switched. Again, since the first few rows of X are:
1 51 30 39 61 92 45
1 64 51 54 63 73 47
1 70 68 69 76 86 48
1 63 45 47 54 84 35
Then X' is:
1 1 1 1 ...
51 64 70 63 ...
30 51 68 45 ...
39 54 69 47 ...
61 ... . . . ... ...
92 ... . . . ... ...
45 ... . . . ... ...
Now that you have X', the next step is to multiply X' by X (which is
(X'*X) ). This is one step that takes an enormous time to complete
without the aid of a computer. You have to perform a matrix
multiplication here, which is explained in the following page:
http://www.aps.uoguelph.ca/~gjansen/MBG4030/notes/chap03.pdf
also:
http://www.fw.umn.edu/fw5601/Lecture/matrices/Matrices.html
(find Multiplication)
As you can see, as matrices get larger, matrox multiplication becomes
increasingly tedious. In your case, you have to multiply X', which is
a matrix with 7 rows (because of the 7 explanatory variables) and 30
columns (because of the 30 observations), by X, which is a matrix with
30 rows and 7 columns. You have to multiply a 7x30 matrix by a 30x7
matrix. A computer or calculator with matrix operations capabilities
is highly recommended if you want to compute this matrix. The result
of this multiplication will be a 7x7 matrix. If you happen to have
Microsoft Excel, matrix multiplication can be done with the MMULT
command.
The other long step is the following. Recall from the formula that we
actually need (X'*X)^(-1). That is, we have to find the inverse matrix
of X'*X. Finding the inverse of a matrix requires several operations,
which are detailed at:
http://www.fw.umn.edu/fw5601/Lecture/matrices/Matrices.html
(find Inversion)
Again, without the aid of a computer, calculation of the inverse
matrix can take a long time and be very boring. In Microsoft Excel,
matrix inversion is quickly done with the MINVERSE command.
Now you should have a matrix (X'*X)^(-1). We must still compute the
(s^2) component of the covariance matrix formula.
Calculating s^2 is easier than what we've done so far. This component
is simply the sum of the squared residuals of the regression, divided
by (n-k), where n is the number of observations (n=30 in this case)
and k is the number of explanatory variables (k=7 in your case -
because of the constant plus 6 explanatory variables).
Once you have s^2 (which is a number - not a matrix) you just have to
multiply it by each element of the (X'*X)^(-1) matrix. This gives the
covariance matrix. The elements in the diagonal of this matrix are the
variances of the coefficients, in the same order as in the X matrix.
For example, if in the X matrix the constant was the first column, X1
was the second one, etc; then the 1st element of the diagonal is the
variance of the constant coefficient, the 2nd one is the variance of
the coefficient of X1, etc. Finally, in order to find the standard
error, just take the square root of each of the variances.
I have done all these calculations in Microsoft Excel. If you have
this software, please let me know so I can put the file in my web page
for you to download, so you can see the formulas involved (which are
exactly the ones I've explained here).
More information on the covariance matrix at:
http://www.rci.rutgers.edu/~dhjones/APPLIED_LINEAR_STATISTICAL_MODELS(PHD)/LECTURES/LECTURE06/2-Simple%20linear%20regression%20model%20in%20matrix%20terms.pdf
http://www.roguewave.com/support/docs/sourcepro/analyticsug/3-2.html
Regarding your second question, it's not possible to find the p-values
without a table, unless you use the formula for the t distribution,
which is quite complicated. You can find it at:
t- distribution
http://mathworld.wolfram.com/Studentst-Distribution.html
(it's the formula F(t), the cummulative distribution function).
If you want to use this function, the p-value is: if t is positive,
2*(1-F(t)); if t is negative, 2*F(t). The degrees of freedom of the t
distribution in this case is (n-k), that is, 30-7=23 in your case. I
would guess that this is the calculation the computer does when it
displays the p-value.
Even with a table, it's usually not possible to compute the p-value.
t-distribution tables usually only list t-values for the following
p-values: 0.1, 0.05, 0.02, 0.01, 0.005, 0.001. If the p-value you
wanted to find were different from any of those (as is usually the
case), you won't be able to find the p-value, since the t-value will
not be listed. Fortunately, I have located a java table that will help
you find the p-values
Student's t-distribution
http://stat-www.berkeley.edu/users/stark/Java/tHiLite.htm
In order to use it for your data, enter the degrees of freedom (23 in
your case). Use the t-value with plus and minus for the upper and
lower endpoint of this page. For example, the t-value of the constant
is 0.93. So in the lower endpoint you should enter -0.93, and in the
upper endpoint, 0.93. You will get a percentage for the highlighted
area (it's 63.8% in the 0.93 case). The p-value is simply 1 minus that
percentage. It would be 36.2% in this case. The same procedure goes
for the other t-values.
I hope this helps! Recall that this question is not finished until
you're satisfied with it. If you need any further assistance, please
do let me know through a clarification request. Otherwise, I await
your rating and comments.
Best regards,
elmarto |