Homework 7 – Quantitative Reasoning II (30pts)
Import the following data sets in R: MeasuresofAmerica_QR_II MeasuresofAmerica_QR_IIand CEREALS CEREALS
Below you'll see various questions asking you use various statistical tests. For each questions:
Clearly state the null and alternative hypothesis.
Explain your answer using the output of your test and relevant statistical concepts and reasoning.
Submit both the R input and output with each of your answers
Add visuals for Bonus points
1/ Consider the CEREALS data set and answer the following questions (10pts)
a/ Does the average fat content of cereals deviate from 1.0g significantly?
b/ Does the fat content of Hot and Cold Cereals differ significantly?
c/ Is there a significant difference between Manufactures in fat content? Also determine the the mean fat content of each manufacturer. (BONUS: do the same for Carbohydrate content +5pts)
BONUS: Add informative visualizations of the data made in Tableau (+5pts)
2/ Consider the MeasuresofAmerica Data (10pts)
a/ Does the proportion of states that voted democratic in the 2016 presidential election significantly differ from your expectation? First explain what proportion you would reasonable expect and then test if the observed proportion significantly different.
b/ Is there a dependence between legality of Cannabis and level of minimum federal wage in the United States (Cannabis_Legal, Min_Wage_Fed )
BONUS: Is there a significant difference between regions in the proportion of states within the region where cannabis is illegal? What if we only look at NorthEast vs South? (+5pts)
BONUS: add informative visualizations for each of these made in Tableau (+5pts)
3/ Consider the MeasuresofAmerica Data and answer the following questions: (10pts)
a/Make a linear regression model between the following two variables (X.Smoking, X.BingeDrink). Determine the correlation value, and make a scatter plot with the trendline in TABLEAU. Is there a strong correlation between the two variables? What about the fit of the data to the linear model?
b/ Make a Multiple Linear regression model that predicts level of Childhood Poverty (X.CHLDRNPVRTY based on 3 or more other quantitative variables of your own choice.
Which factors contributes positively and which negatively? Do you think there is strong correlations? Which factors seem to contribute most?