C3.1
(i)
Positive
(ii)
Yes.
Positive: High income, can buy more goods
Negative: base on fatheduc, most family have high education and know the danger of smoking, so high
income family with high education tend not to smoke
CIGS FAMINC
CIGS 1.000000 -0.173045
FAMINC -0.173045 1.000000
(iii)
cigs
Dependent Variable: BWGHT Method: Least Squares Date: 12/20/14 Time: 16:35 Sample: 1 1388
Included observations: 1388
Variable Coefficient Std. Error t-Statistic Prob.
C 119.7719 0.572341 209.2668 0.0000
CIGS -0.513772 0.090491 -5.677609 0.0000
R-squared 0.022729 Mean dependent var 118.6996
Adjusted R-squared 0.022024 S.D. dependent var 20.35396 S.E. of regression 20.12858 Akaike info criterion 8.843598 Sum squared resid 561551.3 Schwarz criterion 8.851142 Log likelihood -6135.457 Hannan-Quinn criter. 8.846420
F-statistic 32.23524 Durbin-Watson stat 1.924390
Prob(F-statistic) 0.000000
faminc
Dependent Variable: BWGHT Method: Least Squares Date: 12/20/14 Time: 16:37 Sample: 1 1388
Included observations: 1388
Variable Coefficient Std. Error t-Statistic Prob.
C 116.9741 1.048984 111.5118 0.0000
CIGS -0.463408 0.091577 -5.060315 0.0000
FAMINC 0.092765 0.029188 3.178195 0.0015
R-squared 0.029805 Mean dependent var 118.6996
Adjusted R-squared 0.028404 S.D. dependent var 20.35396 S.E. of regression 20.06282 Akaike info criterion 8.837772 Sum squared resid 557485.5 Schwarz criterion 8.849089 Log likelihood -6130.414 Hannan-Quinn criter. 8.842005
F-statistic 21.27392 Durbin-Watson stat 1.921690
Prob(F-statistic) 0.000000
(i) result: price = -19,315 + 0,128 sqrft + 15,198 bdrms Dependent Variable: PRICE
Method: Least Squares Date: 12/20/14 Time: 16:50 Sample: 1 88
Included observations: 88
Variable Coefficient Std. Error t-Statistic Prob.
C -19.31500 31.04662 -0.622129 0.5355
SQRFT 0.128436 0.013824 9.290506 0.0000
BDRMS 15.19819 9.483517 1.602590 0.1127
R-squared 0.631918 Mean dependent var 293.5460
Adjusted R-squared 0.623258 S.D. dependent var 102.7134 S.E. of regression 63.04484 Akaike info criterion 11.15907 Sum squared resid 337845.4 Schwarz criterion 11.24352 Log likelihood -487.9989 Hannan-Quinn criter. 11.19309
F-statistic 72.96353 Durbin-Watson stat 1.858074
Prob(F-statistic) 0.000000
(ii) $15.198,19
(iii)
$ 33,17923
(iv) R2= 63%
(v) 354.600
(vi) Estimate price = 354.600 Since actual =300.000
(i) Log (salary) = B0 + B1 log (sales) + B3 log (mktval)
Log (salary) = 4,621 + 0,162 log (sales) + 0,107 log (mktval) Dependent Variable: LSALARY
Method: Least Squares Date: 12/20/14 Time: 19:38 Sample: 1 177
Included observations: 177
Variable Coefficient Std. Error t-Statistic Prob.
C 4.620918 0.254408 18.16339 0.0000
LSALES 0.162128 0.039670 4.086899 0.0001
LMKTVAL 0.106708 0.050124 2.128880 0.0347
R-squared 0.299114 Mean dependent var 6.582848
Adjusted R-squared 0.291057 S.D. dependent var 0.606059 S.E. of regression 0.510294 Akaike info criterion 1.509146 Sum squared resid 45.30966 Schwarz criterion 1.562979 Log likelihood -130.5594 Hannan-Quinn criter. 1.530979
F-statistic 37.12852 Durbin-Watson stat 2.092115
Prob(F-statistic) 0.000000
(ii) Log (salary) = B0 + B1 log (sales) + B3 log (mktval) + B3 profits
Log (salary) = 4,687 + 0,161 log (sales) + 0,097 log (mktval) + 3,57*10-5 profits
Dependent Variable: LSALARY Method: Least Squares Date: 12/20/14 Time: 19:42 Sample: 1 177
Included observations: 177
Variable Coefficient Std. Error t-Statistic Prob.
C 4.686924 0.379729 12.34280 0.0000
LSALES 0.161368 0.039910 4.043299 0.0001
LMKTVAL 0.097529 0.063689 1.531333 0.1275
PROFITS 3.57E-05 0.000152 0.234668 0.8147
R-squared 0.299337 Mean dependent var 6.582848
Adjusted R-squared 0.287186 S.D. dependent var 0.606059 S.E. of regression 0.511686 Akaike info criterion 1.520127 Sum squared resid 45.29524 Schwarz criterion 1.591904 Log likelihood -130.5312 Hannan-Quinn criter. 1.549237
F-statistic 24.63628 Durbin-Watson stat 2.096546
Prob(F-statistic) 0.000000
The R2 is almost the same, including variable profits only gives small influence to the model.
70% of variation in log salary is unexplained.
(iii) Log (salary) = B0 + B1 log (sales) + B3 log (mktval) + B3 profits + B4 ceoten
Log (salary) = 4,558 + 0,162 log (sales) + 0,102 log (mktval) + 2,91*10-5 profits + 0,012 ceoten
Dependent Variable: LSALARY Method: Least Squares Date: 12/20/14 Time: 19:49 Sample: 1 177
C3.4
(i) Descriptive stat
ATNDRTE PRIGPA ACT
Mean 81.70956 2.586775 22.51029 Median 87.50000 2.560000 22.00000 Maximum 100.0000 3.930000 32.00000 Minimum 6.250000 0.857000 13.00000 Std. Dev. 17.04699 0.544714 3.490768 Skewness -1.578799 0.161246 0.075404 Kurtosis 5.693665 2.760582 2.645701 Jarque-Bera 488.0774 4.570791 4.200994 Probability 0.000000 0.101734 0.122396 Sum 55562.50 1759.007 15307.00 Sum Sq. Dev. 197317.3 201.4684 8273.928 Observations 680 680 680
(ii) Estimate the model
Atndrte = 75,70 + 17,261 prigpa – 1,717 act
The intercept of 75.70 is the predicted percent of classes attended for a student with 0 cumulative GPA prior to the current term and an ACT score of 0. I would not call this particular meaning “useful.” The intercept is useful, but its interpretation is not.
Dependent Variable: ATNDRTE Method: Least Squares Date: 12/20/14 Time: 20:09 Sample: 1 680
Included observations: 680
Variable Coefficient Std. Error t-Statistic Prob.
C 75.70041 3.884108 19.48978 0.0000
PRIGPA 17.26059 1.083103 15.93624 0.0000
ACT -1.716553 0.169012 -10.15640 0.0000
R-squared 0.290581 Mean dependent var 81.70956
Adjusted R-squared 0.288486 S.D. dependent var 17.04699 S.E. of regression 14.37936 Akaike info criterion 8.173867 Sum squared resid 139980.6 Schwarz criterion 8.193817 Log likelihood -2776.115 Hannan-Quinn criter. 8.181589
F-statistic 138.6513 Durbin-Watson stat 2.010991
Prob(F-statistic) 0.000000
(iii) Additional point for GPA will increase the class attendance. However, additional score for ACT test will decrease the class attendance. Unexpected result. Perhaps gaining high score means that student thinks they do not have the necessity to attend the class
(iv)
104,36
would seem to be a very good student. But no student attends more than 100% of classes!
(observation number 569). The model provides residual
(v)
The difference in predicted attendance between Student A and Student B is 93.09 - 67.23=
25.86%
log(wage) =0.284+ 0.092 educ + 0.0041 exper + 0.022 tenure. Dependent Variable: LWAGE
Method: Least Squares Date: 12/20/14 Time: 20:35 Sample: 1 526
Included observations: 526
Variable Coefficient Std. Error t-Statistic Prob.
C 0.284360 0.104190 2.729230 0.0066
EDUC 0.092029 0.007330 12.55525 0.0000
EXPER 0.004121 0.001723 2.391437 0.0171
TENURE 0.022067 0.003094 7.133071 0.0000
R-squared 0.316013 Mean dependent var 1.623268
Adjusted R-squared 0.312082 S.D. dependent var 0.531538 S.E. of regression 0.440862 Akaike info criterion 1.207406
Sum squared resid 101.4556 Schwarz criterion 1.239842
Log likelihood -313.5478 Hannan-Quinn criter. 1.220106
F-statistic 80.39092 Durbin-Watson stat 1.768805
Prob(F-statistic) 0.000000
Partialling out on educ coefficient
What we are doing is trying to find the effect of educ on log(wage), controlling for exper and tenure This effect is equal to the effect on log(wage) of the portion of educ that is NOT explained by exper and tenure. First we need to construct a variable that is equal to the portion of educ that is not explained by exper and tenure. The easiest way to do that is to take the residual from the regression:
Educ = g0 + g1 exper + g2 tenure + u Educ = 13,574 – 0,074 exper + 0,048 tenure
Dependent Variable: EDUC Method: Least Squares Date: 12/20/14 Time: 20:41 Sample: 1 526
Included observations: 526
Variable Coefficient Std. Error t-Statistic Prob.
C 13.57496 0.184324 73.64710 0.0000
EXPER -0.073785 0.009761 -7.559282 0.0000
TENURE 0.047680 0.018337 2.600162 0.0096
R-squared 0.101342 Mean dependent var 12.56274
Adjusted R-squared 0.097906 S.D. dependent var 2.769022 S.E. of regression 2.629980 Akaike info criterion 4.777517
Sum squared resid 3617.483 Schwarz criterion 4.801843
Log likelihood -1253.487 Hannan-Quinn criter. 4.787042
F-statistic 29.48955 Durbin-Watson stat 1.869826
Prob(F-statistic) 0.000000
To find the residuals in this regression I subtract educ from educ:
C3.6
(i) EDUC 3.533829 0.192210 18.38530 0.0000 (ii) EDUC 0.059839 0.005963 10.03492 0.0000 (iii) EDUC 0.039120 0.006838 5.720784 0.0000 IQ 0.005863 0.000998 5.875413 0.0000 (iv)C3.7
(i)
Dependent Variable: MATH10 Method: Least Squares Date: 12/20/14 Time: 21:16 Sample: 1 408
Included observations: 408
Variable Coefficient Std. Error t-Statistic Prob.
C -20.36076 25.07287 -0.812063 0.4172
LEXPEND 6.229691 2.972634 2.095680 0.0367
LNCHPRG -0.304585 0.035357 -8.614468 0.0000
R-squared 0.179927 Mean dependent var 24.10686
Adjusted R-squared 0.175877 S.D. dependent var 10.49361 S.E. of regression 9.526228 Akaike info criterion 7.353301
Sum squared resid 36753.36 Schwarz criterion 7.382795
Log likelihood -1497.073 Hannan-Quinn criter. 7.364972
F-statistic 44.42926 Durbin-Watson stat 1.902822
Prob(F-statistic) 0.000000
math10 = -20,36 + 6,23 lexpend – 0,305 lnchprg
The sign of the coefficients are as expected: the percentage of students passing a math exam is increasing in expenditure per student and decreasing in the percentage of students who are in a school lunch program (presumably a subsidized lunch program)
(ii) No. for lexpend cannot set to 0 because log 0 = undefined. At least $1 for lexpend. For lnchprg we can set it to 0
(iii) Math10 with lexpend Dependent Variable: MATH10 Method: Least Squares Date: 12/20/14 Time: 21:27 Sample: 1 408
Included observations: 408
Variable Coefficient Std. Error t-Statistic Prob.
C -69.34108 26.53013 -2.613673 0.0093
LEXPEND 11.16439 3.169011 3.522990 0.0005
R-squared 0.029663 Mean dependent var 24.10686
Adjusted R-squared 0.027273 S.D. dependent var 10.49361 S.E. of regression 10.34953 Akaike info criterion 7.516649
Sum squared resid 43487.76 Schwarz criterion 7.536312
Log likelihood -1531.396 Hannan-Quinn criter. 7.524429
F-statistic 12.41146 Durbin-Watson stat 1.614623
Prob(F-statistic) 0.000475
The magnitude of the slope coefficient has gotten larger. It was previously 6.23 and is now 11.16. This speaks to a negative correlation between log(expend) and lnchprg.
LEXPEND LNCHPRG
LEXPEND 1.000000 -0.192704
LNCHPRG -0.192704 1.000000
student spends more for lexpend than lnchprg. Negative correlation (v) the inclusion of lnchprg suppressed the coefficient on log(expend)
(1) when lnchprg increases, math10 decreases; (2) when lexpend increases, lnchprg decreases. Therefore, when lexpend increases, what happens, in total? When lexpend increases, lnchprg decreases, which causes math10 to go . . . up.
(i) descriptive stat PRPBLCK INCOME Mean 0.113486 47053.78 Median 0.041444 46272.00 Maximum 0.981658 136529.0 Minimum 0.000000 15919.00 Std. Dev. 0.182416 13179.29 Skewness 2.700012 0.962831 Kurtosis 10.56841 7.551386 Jarque-Bera 1473.100 416.2135 Probability 0.000000 0.000000 Sum 46.41594 19244998
Sum Sq. Dev. 13.57651 7.09E+10
Observations 409 409
prpblck = percentage income = dollar (ii)
Psoda = 0,956 + 0,115 prpblck + 1,6*10-6
Dependent Variable: PSODA Method: Least Squares Date: 12/20/14 Time: 21:39 Sample: 1 410
Included observations: 401
Variable Coefficient Std. Error t-Statistic Prob.
C 0.956320 0.018992 50.35379 0.0000
PRPBLCK 0.114988 0.026001 4.422515 0.0000
INCOME 1.60E-06 3.62E-07 4.430130 0.0000
R-squared 0.064220 Mean dependent var 1.044863
Adjusted R-squared 0.059518 S.D. dependent var 0.088798 S.E. of regression 0.086115 Akaike info criterion -2.058820 Sum squared resid 2.951465 Schwarz criterion -2.028940 Log likelihood 415.7934 Hannan-Quinn criter. -2.046988
F-statistic 13.65691 Durbin-Watson stat 1.696180
Prob(F-statistic) 0.000002
The coefficient on prpblck is 0.1149882. The literal interpretation would be: when prpblck increases by 1, the price of a medium soda increases by 11 cents. The only problem is, the notion of increasing prpblck by 1 is not very meaningful. prpblck is the proportion of individuals in a zip code who are black cannot increase by 1 unless the proportion of individuals in a zip code starts out as 0. That is, the only zip code that can increase by 1 is a zip code that starts out with no individuals who are black, and then becomes a zip code that is made up only of individuals who are black. This is not a very useful marginal effect. In order to interpret the marginal effect more usefully, look at smaller (more realistically-sized) changes. For instance, an increase of 0.01 (an increase of 1 in the percentage of individuals who are black in a zip code) is predicted to increase the price of a medium soda by 0.1149882 × 0.01 = 0.00114988,
C4.1
(i) As expenditure of candidate A increases for 1%, percentage of vote for candidate A will increase for B1/100
(ii) H0: B1=-B2 or H0: B1+B2=0
1% increases expendA and 1% increases expendB leaves voteA unchanged (iii) Estimate model
voteA = 45,079 + 6,083 lexpendA – 6,615 lexpendB + 0,152 prtystrA Dependent Variable: VOTEA
Method: Least Squares Date: 12/21/14 Time: 16:52 Sample: 1 173
Included observations: 173
Variable Coefficient Std. Error t-Statistic Prob.
C 45.07893 3.926305 11.48126 0.0000
LEXPENDA 6.083316 0.382150 15.91866 0.0000
LEXPENDB -6.615417 0.378820 -17.46321 0.0000
PRTYSTRA 0.151957 0.062018 2.450210 0.0153
R-squared 0.792557 Mean dependent var 50.50289
Adjusted R-squared 0.788874 S.D. dependent var 16.78476 S.E. of regression 7.712335 Akaike info criterion 6.946369
Sum squared resid 10052.14 Schwarz criterion 7.019277
Log likelihood -596.8609 Hannan-Quinn criter. 6.975948
F-statistic 215.2266 Durbin-Watson stat 1.604129
Prob(F-statistic) 0.000000
Yes, 1% increases on expend A will probably increase vote for A. 1% increases on expend B will decrease vote for A.
(iv) t-test