7.1 Running models

#Only first variables from each category
model1 <- lm(pctGOP~median_rent+median_age+median_income+perc_hs, data = metadata)
#Only second variables from each category
model2 <- lm(pctGOP~perc_rent+perc_white+perc_below_pov+perc_doc, data = metadata)
#All variables
model3 <- lm(pctGOP~median_rent+median_age+median_income+perc_hs+perc_rent+perc_white+perc_below_pov+perc_doc, data = metadata)
#Only perc_rent, perc_below_pov, perc_hs
model4 <- lm(pctGOP~perc_rent+perc_below_pov+perc_hs, data = metadata)
stargazer(model1, model2, model3, model4, 
          type = "html", 
          report=('vc*p'),
          keep.stat = c("n","rsq","adj.rsq"), 
          notes = "<em>&#42;p&lt;0.1;&#42;&#42;p&lt;0.05;&#42;&#42;&#42;p&lt;0.01</em>", 
          notes.append = FALSE,
          model.numbers = FALSE, 
          column.labels = c("(1)","(2)", "(3)", "(4)"))
Dependent variable:
pctGOP
(1) (2) (3) (4)
median_rent -0.0005*** -0.0002***
p = 0.000 p = 0.000
median_age 0.004*** -0.003***
p = 0.000 p = 0.000
median_income 0.00001*** -0.00000***
p = 0.000 p = 0.001
perc_hs 0.003*** 0.003*** 0.005***
p = 0.00000 p = 0.000 p = 0.000
perc_rent -0.010*** -0.008*** -0.022***
p = 0.000 p = 0.000 p = 0.000
perc_white 0.005*** 0.005***
p = 0.000 p = 0.000
perc_below_pov 0.005*** -0.004*** -0.002***
p = 0.000 p = 0.000 p = 0.00002
perc_doc -0.058*** -0.036***
p = 0.000 p = 0.000
Constant 0.632*** 0.298*** 0.783*** 0.875***
p = 0.000 p = 0.000 p = 0.000 p = 0.000
Observations 3,108 3,108 3,108 3,108
R2 0.321 0.531 0.614 0.214
Adjusted R2 0.320 0.530 0.613 0.213
Note: *p<0.1;**p<0.05;***p<0.01
htmlTable(round(cor(metadata[,c("median_rent", "median_age", "median_income", "perc_hs", "perc_rent", "perc_white", "perc_below_pov", "perc_doc")]), digits = 3), caption = 'Multicollinearity Test:', css.cell = 'padding: 0px 2px 0px; font-size: 10px;', css.header = "font-size: 10px; font-weight: normal;")
Multicollinearity Test:
median_rent median_age median_income perc_hs perc_rent perc_white perc_below_pov perc_doc
median_rent 1 -0.237 0.666 -0.294 0.187 -0.168 -0.394 0.453
median_age -0.237 1 -0.034 -0.185 -0.331 0.345 -0.195 -0.25
median_income 0.666 -0.034 1 -0.576 -0.039 0.164 -0.746 0.308
perc_hs -0.294 -0.185 -0.576 1 -0.008 -0.309 0.601 -0.318
perc_rent 0.187 -0.331 -0.039 -0.008 1 -0.333 0.258 0.32
perc_white -0.168 0.345 0.164 -0.309 -0.333 1 -0.466 -0.073
perc_below_pov -0.394 -0.195 -0.746 0.601 0.258 -0.466 1 -0.128
perc_doc 0.453 -0.25 0.308 -0.318 0.32 -0.073 -0.128 1

In order to make an accurate estimate, we need to check our four assumptions: Linearity of parameters, Independent and Random Sampling, No perfect Multicollinearity, and Zero Conditional Mean. For the first assumption, we tried to determine by comparing the linear fits of the scatter plot with a nonlinear fit (using method = "loess"). For most of the variables (excluding perc_doc), we see that most nonlinear curves actually overfit the data. So it is inconclusive whether the first assumption is violated. Because our data was taken from the census, and used as a cross sectional data, we can view this data as a random sample, meeting our second assumption. In addition, when we ran a perfect multicollinearity test, we found that there is no perfect multicollinearity, so we know our third assumption is also accurate. However, we cannot assume that the Zero Conditional Mean is met in these models. Because the number of explanatory variables for the percent of GOP votes is so large, there is no feasible way to control for all of the explanatory variables within our eight variables. For instance, data such as % Male or % with kids might have an impact that we are unable to represent.

Given that the Zero Conditional Mean assumption is likely to not be true, we can say that we should not interpret any of the results as causal.