-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathCoding_Week3.R
More file actions
116 lines (74 loc) · 4.08 KB
/
Coding_Week3.R
File metadata and controls
116 lines (74 loc) · 4.08 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#################################################
## PPHA 311 Statistics for Data Analysis II Winter 2024
## Professors Yukiko Asai & Dmitri Koustas & Austin Wright
## Coding TA session (Week 3)
## TA: Margot Bond and Jade Jiang
#################################################
#####
#PS3 The Tennessee STAR experiment Revisited
#####
# tell R not to use scientific notation
options(scipen = 999)
options(digits = 10)
#remove object from my environment
rm(list = ls())
#set directory
setwd("/Users/uchennaofforjebe/Documents/runitinR")
#read data
data_ta <- read.csv("data/star.csv")
str(data_ta)
## Q1. Estimate the following multivariate OLS regressions:
#treadsski = Beta0 + Beta1T1i + Beta2T2i + ui
#tmathsski = Beta'0 + Beta'1 T1i + Beta'2 T2i + u'i
#Compare your findings (both the estimated effects and statistical significance), to your answers
# in Problem Set #1, Q2.2.
reg1 <- lm(treadssk~ T1+T2, data = data_ta)
summary(reg1)
reg2 <- lm(tmathssk ~ T1+T2, data = data_ta)
summary(reg2)
## Q2. Include the following controls to the regressions you estimated in (1): sch inner city, sfemale,
#swhite, sfree lunch, ttotexpk, tmasters, twhite. How do your estimated coefficients on T1 and
#T2 change? How much more of the variation in test scores is explained by these models
#compared with those in (1)?
reg1_cntrls<-lm(treadssk ~ T1+T2 + sch_inner_city +sfemale +swhite + sfree_lunch + ttotexpk + tmasters + twhite, data = data_ta)
summary(reg1_cntrls)
reg2_cntrls<-lm(tmathssk ~ T1+T2 + sch_inner_city +sfemale +swhite + sfree_lunch + ttotexpk + tmasters + twhite, data = data_ta)
summary(reg2_cntrls)
## Q3. Using the models estimated in (1), conduct statistical tests for Beat1 = Beta2 and Beta'1 = Beta'2 and
#discuss your findings.
data_ta$newvar <- data_ta$T1+data_ta$T2 #obv are either T1, T2, or neither specify data set$newvar
reg1_test <- lm(treadssk~T1+newvar, data = data_ta)
summary(reg1_test)
reg2_test <- lm(tmathssk~T1+newvar, data = data_ta)
summary(reg2_test)
### another option is to do an f test
# You could also have done an F-test in this case. Note you will get the exact same p-values as on T1 above!
q <- 1 # number of restrictions
k <-2 # number of regressors in the unrestricted model.
#reading scores
n<- nrow(data_ta [ is.na(data_ta$treadssk)!=1 & is.na(data_ta$T1) != 1 & is.na(data_ta$T2) != 1 ,]) # subset anything that is not NA 1 in the declared fields of data_ta
reg1_r <- lm(treadssk ~ newvar, data= data_ta)
reg1_r <- resid(reg1_r) #Save the residuals of reg1_r as an object
SSR1_r <- sum(reg1_r^2) #save the sum of squared residuals as an object
res1_ur <-resid(reg1)
SSR1_ur<- sum(res1_ur^2)
Fvalue1 <-((SSR1_r - SSR1_ur / q) (SSR1_r / (n-k-1)))
pf(Fvalue1, q, n-k-1, lower.tail = FALSE)
print(SSR1_r)
## Q4. For this question ONLY, you can restrict your attention just to math test scores. Using the
#model estimated in (2), calculate statistical tests that the student and teacher controls that
#were added in (2) matter for math test scores (i.e. the effects are jointly 0). Discuss your
#findings. For this question ONLY, you should subset the data to focus on observations with
#non-missing values for the controls added in (2).
q <- 7 # number of restrictions
k <- 9 #number of rgressors in unrestricted model (control and treatments)
# to perfectly match what Stata would give you, drop missing when calculating residuals in restricted model
data_ta$flagmissing <- 1* is.na(data_ta$tmathssk) | is.na(data_ta$T1) | is.na(data_ta$TA2)
is.na()
### Flag missing will be a 0/1 that shows there is an NA is present.
# n <- nrow(data_ta[data_ta$flagmissing == 0]) will take all the ones with that missingflag as 0 and set to n
#manual calculation
###F2 = ((SSR_r - SSR_UR)/ q) (SSR_ur/ (n-k-1))
### pf(F2, q, n-k-1, lower.tail=FLASE)
###anova function in R - another way of getting the F value // comparing restrictive models.
###This creates a regression with all the restrictions, you will also subset the data for the values you wany, and anova is one regression vs another regression