7.1 Running models

#Only first variables from each category
model1 <- lm(pctGOP~median_rent+median_age+median_income+perc_hs, data = metadata)
#Only second variables from each category
model2 <- lm(pctGOP~perc_rent+perc_white+perc_below_pov+perc_doc, data = metadata)
#All variables
model3 <- lm(pctGOP~median_rent+median_age+median_income+perc_hs+perc_rent+perc_white+perc_below_pov+perc_doc, data = metadata)
#Only perc_rent, perc_below_pov, perc_hs
model4 <- lm(pctGOP~perc_rent+perc_below_pov+perc_hs, data = metadata)

stargazer(model1, model2, model3, model4, 
          type = "html", 
          report=('vc*p'),
          keep.stat = c("n","rsq","adj.rsq"), 
          notes = "<em>&#42;p&lt;0.1;&#42;&#42;p&lt;0.05;&#42;&#42;&#42;p&lt;0.01</em>", 
          notes.append = FALSE,
          model.numbers = FALSE, 
          column.labels = c("(1)","(2)", "(3)", "(4)"))


	Dependent variable:

	pctGOP
	(1)	(2)	(3)	(4)

median_rent	-0.0005^***		-0.0002^***
	p = 0.000		p = 0.000

median_age	0.004^***		-0.003^***
	p = 0.000		p = 0.000

median_income	0.00001^***		-0.00000^***
	p = 0.000		p = 0.001

perc_hs	0.003^***		0.003^***	0.005^***
	p = 0.00000		p = 0.000	p = 0.000

perc_rent		-0.010^***	-0.008^***	-0.022^***
		p = 0.000	p = 0.000	p = 0.000

perc_white		0.005^***	0.005^***
		p = 0.000	p = 0.000

perc_below_pov		0.005^***	-0.004^***	-0.002^***
		p = 0.000	p = 0.000	p = 0.00002

perc_doc		-0.058^***	-0.036^***
		p = 0.000	p = 0.000

Constant	0.632^***	0.298^***	0.783^***	0.875^***
	p = 0.000	p = 0.000	p = 0.000	p = 0.000


Observations	3,108	3,108	3,108	3,108
R²	0.321	0.531	0.614	0.214
Adjusted R²	0.320	0.530	0.613	0.213

Note:	p<0.1;p<0.05;**p<0.01

htmlTable(round(cor(metadata[,c("median_rent", "median_age", "median_income", "perc_hs", "perc_rent", "perc_white", "perc_below_pov", "perc_doc")]), digits = 3), caption = 'Multicollinearity Test:', css.cell = 'padding: 0px 2px 0px; font-size: 10px;', css.header = "font-size: 10px; font-weight: normal;")

	median_rent	median_age	median_income	perc_hs	perc_rent	perc_white	perc_below_pov	perc_doc
Multicollinearity Test:
median_rent	1	-0.237	0.666	-0.294	0.187	-0.168	-0.394	0.453
median_age	-0.237	1	-0.034	-0.185	-0.331	0.345	-0.195	-0.25
median_income	0.666	-0.034	1	-0.576	-0.039	0.164	-0.746	0.308
perc_hs	-0.294	-0.185	-0.576	1	-0.008	-0.309	0.601	-0.318
perc_rent	0.187	-0.331	-0.039	-0.008	1	-0.333	0.258	0.32
perc_white	-0.168	0.345	0.164	-0.309	-0.333	1	-0.466	-0.073
perc_below_pov	-0.394	-0.195	-0.746	0.601	0.258	-0.466	1	-0.128
perc_doc	0.453	-0.25	0.308	-0.318	0.32	-0.073	-0.128	1

In order to make an accurate estimate, we need to check our four assumptions: Linearity of parameters, Independent and Random Sampling, No perfect Multicollinearity, and Zero Conditional Mean. For the first assumption, we tried to determine by comparing the linear fits of the scatter plot with a nonlinear fit (using method = "loess"). For most of the variables (excluding perc_doc), we see that most nonlinear curves actually overfit the data. So it is inconclusive whether the first assumption is violated. Because our data was taken from the census, and used as a cross sectional data, we can view this data as a random sample, meeting our second assumption. In addition, when we ran a perfect multicollinearity test, we found that there is no perfect multicollinearity, so we know our third assumption is also accurate. However, we cannot assume that the Zero Conditional Mean is met in these models. Because the number of explanatory variables for the percent of GOP votes is so large, there is no feasible way to control for all of the explanatory variables within our eight variables. For instance, data such as % Male or % with kids might have an impact that we are unable to represent.

Given that the Zero Conditional Mean assumption is likely to not be true, we can say that we should not interpret any of the results as causal.