End of Semester Exam 
MATH 161B Spring 2020 Professor Gottlieb 
Tagner et al (in two separate publications in 1979 and 1983) reported analyses of a study aimed at 
assessing children’s pulmonary function in the absence and presence of cigarette smoke including expo- 
sure to second hand smoke by at least one parent. These papers were some of the earliest attempts to 
systematically study the effects of second-hand smoking on children. The variables in the data set are 
Fev (forced expiratory volume, the amount of air a child can exhale in the first second of a forceful 
exhale, in liters), the age of the child (in years), the sex (coded 0 for female and 1 for male), the smoking 
status (coded 0 for not exposed to cigarette smoke, 1 for exposed to cigarette smoke). A total of 654 
children participated in the study. 
1. For this problem you will only be using the variables Fev and Age. 
(a) Use JMP to create a scatterplot of Fev against Age and include the scatterplot in your 
exam. Describe the general pattern you observe. 
(b) Fit a simple linear regression model with Fev as response and Age as predictor. Include 
the JMP output (Summary of Fit, ANOVA, and coefficients tables) in your exam. 
(c) Write down the fitted model and use this model to predict the Fev of a 14 year old. 
(d) Would it be reasonable to use this model to predict the Fev of a college student? Justify 
your answer. 
(e) Interpret the estimated slope βˆ1 in the context of the problem. Also report and interpret a 
95% confidence interval for the population slope β1. 
(f) Is there a meaningful interpretation of the intercept parameter in this problem? If so, provide 
this interpretation. If not, explain why. 
(g) Find the value of R2 and interpret this value in the context of the problem. 
(h) The population model for the line you fit in part (b) can be written as 
y = β0 + β1x+  
where y=Fev, x=Age and  
IID∼ N(0, σ2). Find an estimate for σ2. 
(i) Find the values of the largest and smallest residual. 
(j) Conduct a residual analysis to investigate the model assumptions. Use the studentized resid- 
uals and include the qq-plot the studentized residuals vs. predicted values plots in your 
exam. Are any model assumptions violated? If so how? 
(k) Transform the response variable Fev using a square-root transformation. That is, fit a new 
model using 
√ 
Fev instead of Fev as the response. Conduct a residual analysis on this 
updated model and include the two plots mentioned in part (j). Do the model assumptions 
seem reasonably satisfied in this case? 
1 
2. For this problem you will use all the variables in the data set with x1 = Age, x2 = Sex, and 
x3 = Smoke. Continue to use the transformed response 
√ 
Fev. 
(a) First, consider the multiple regression model with only two predictors: x1 = Age and x2 = 
Sex. √ 
Fev = β0 + β1x1 + β2x2 + β12x1x2 + . 
Write out the assumptions for  and specify the model for boys and girls separately. 
(b) Fit this model in JMP and include a copy of the coefficients table in your exam. (Friendly 
Reminder: By default, JMP will center interaction terms. If you prefer to fit a model without 
centering use the following: In the Fit Model dialog, you will find in the upper left corner of 
the dialog window there is a red triangle right next to the words “Model Specification.” Click 
on that red triangle to bring up a context menu which includes as the first entry “Center 
Polynomials.” Uncheck this box.) 
(c) Write down the fitted model for males. Use this model to predict the 
√ 
Fev for a 10 year 
old boy. 
(d) For a fixed period of development (for example 1 year) is the average change in 
√ 
Fev the 
same for boys and girls? Refer to the model specified in part (a) and identify an appropriate 
hypothesis test involving one of the β parameters. State both the null and the alternative 
hypotheses. 
(e) Conduct the test from part (d). Find the test statistic value and corresponding p-value from 
the JMP output, and formulate a conclusion in the context of the problem. 
(f) Finally consider the full multiple linear regression model with the following interaction terms: 
√ 
Fev = β0 + β1x1 + β2x2 + β3x3 + β12x1x2 + β13x1x3 + β23x2x3 + β123x1x2x3 +  
How many different regression lines do we fit in this model? Write down the regression line 
equation for each sub-population. What are the relationships (if any) on the intercepts and 
slopes of the individual regression lines? 
(g) Fit the model from (e) in JMP. Include a copy of the coefficients table in your exam. Which 
of the βs are significantly different from zero (at significance level α = 0.05) given the other 
predictors are in the model? 
(h) Write down the fitted line that you would use to predict the average 
√ 
Fev of boys who 
were not exposed to second-hand smoke. Use this fitted model to to predict the Fev of a 
non-exposed 10 year old boy. 
(i) Write down the model again, but omitting the β terms that are not significantly different 
from zero. What does the exclusion of these terms mean for the resulting regression model? 
That is, describe how the relationships between slopes and intercepts in the updated model 
differ from the ones you have described in (f). In particular, which, if any, of the regression 
lines in this new model are parallel or share an intercept? (You do not need to fit this model 
in JMP.)