ECON7300
Project
Semester 1, 2025
Instructions
• Questions in this file should be answered by students whose family names start with a letter falling within the range A-K.
• For Part I, use the Excel files Part1_Dataset1 and Part1_Dataset2 to answer the questions.
• For Part II, use the Excel file Part2_Dataset1 to answer the questions.
• For Part III, use the Excel file Part3_Dataset1 to answer the questions.
• A 100% penalty will apply if your answers are not based on the questions and datasets assigned to your family name.
Part I: ANOVA
Note: For questions in Part I, assume the assumptions underlying ANOVA (i.e., randomness and independence, normality, and homogeneity of variance) are met. Use the closest degrees of freedom for the denominator to get critical values from the F-table and to find QU in the studentised range Q-table.
(1) Using Part1_Dataset1, test at a 5% level of significance if there is any evidence of a significant difference in the average annual salary in thousands of US dollars (salary) for chief executive officers (CEOs) in four groups defined by the type of firm (type) where they are employed. Follow all the necessary steps to perform. the test and verify your results using Excel/PHStat. Note: In your data, the variable “type” is coded 1 for CEOs of industrial firms, 2 for those of financial firms, 3 for consumer product firms, and 4 for transportation or utilities firms.
(2) If your results in (1) indicate that it is appropriate, use the Tukey-Kramer procedure to determine which firm-type groups differ in average annual salary. Use a 5% level of significance. Follow all the necessary steps to perform. the test and verify your results using Excel/PHStat.
(3) To determine if alcohol consumption impacts students’ cognitive performance , ten students partake in an experiment on three consecutive Saturdays, with their time in seconds to solve a puzzle (time) recorded after a number of alcoholic drinks (drinks). The file Part1_Dataset2 displays the time to complete the same puzzle for the randomly assigned drink options: no alcohol on one of the Saturdays (zero standard drinks, coded 1), one standard drink of alcohol on another Saturday (coded 2), and five standard drinks of alcohol on another (coded 3). Hint. The variable “student” is the blocking variable.
Based on the information given, answer the following questions.
(a) At the 5% level of significance, is blocking effective? Follow all the necessary steps to perform. the test and verify your results using Excel/PHStat.
(b) Using a 5% level of significance, is there a significant difference in the mean time to solve the puzzle for the different drink options? Follow all the necessary steps to perform. the test and verify your results using Excel/PHStat.
(c) If your results in (b) indicate that it is appropriate, use the Tukey procedure to determine which drink options differ in the mean solving time. Use a 5% level of significance. Follow all the necessary steps to perform the test.
Part II: Simple Regression Analysis
To study the relationship between expenditure on food (food) and total household expenditure (totexp), a researcher samples 1,519 households in the United Kingdom. The variables in the dataset (Part2_Dataset1) are:
• food (Y, in UK pounds sterling per day)
• totexp (X, in UK pounds sterling per day)
The dependent variable for your analysis is food.
Answer the following questions using Part2_Dataset1.
(1) Estimate a regression model using X to predict Y. Include the regression output and state the simple linear regression equation.
(2) Interpret the meaning of the slope coefficient.
(3) Predict Y when X = 230.
(4) Compute the coefficient of determination and interpret its meaning.
(5) Complete the t test for the slope, following all the necessary steps. Assume a 5% level of significance.
(6) Complete the F test for the slope, following all the necessary steps. Assume a 5% level of significance.
(7) Complete the test for the correlation coefficient, following all the necessary steps.
Assume a 5% level of significance.
(8) Construct a 95% confidence interval estimate of the mean Y when X = 230 for all households in the United Kingdom, and interpret its meaning.
(9) Construct a 95% prediction interval of Y when X = 230 for a household in the United Kingdom, and interpret its meaning.
Part III: Multiple Regression Analysis
The following information was collected for a random sample of 114 countries: inflation, openness as proxied by imports as a percentage of GDP, per capita income, and whether the country was a major oil producer between 1973 and 1990. The variables in the provided dataset (Part3_Dataset1) are:
• inf (Y, average annual inflation in percent from 1973 to 1990)
• open (X1, openness measured as imports as a percentage of GDP from 1973 to 1990)
• pcinc (X2, 1980 per capita income in US dollars)
• oil (X3, coded 1 if major oil producer between 1973 and 1990 and 0 if not major oil producer in that period)
The dependent variable for your analysis is inf.
Answer the following questions using Part3_Dataset1.
(1) Estimate a regression model using X1 and X2 to predict Y. Include the regression output and state the multiple linear regression equation.
(2) Interpret the meaning of each of the slope coefficients.
(3) Perform. a residual analysis by analysing the relevant residual plots. Is there any evidence that the regression assumptions have been violated? Explain your answers.
(4) Determine the variance inflation factor (VIF) for each independent variable (X1 and X2) in the model. Is there reason to suspect the existence of collinearity? Explain your answer.
(5) At the 5% level of significance, use t tests to determine whether each independent
variable (X1 and X2) makes a significant contribution to the regression model. Follow all the necessary steps. Based on these results, suggest which independent variables should be included in the model.
(6) Test for the significance of the overall multiple regression model with two
independent variables (X1 and X2) at the 5% level of significance. Follow all the necessary steps.
(7) Compute the coefficients of partial determination of the multiple regression model with two independent variables (X1 and X2) and interpret the meaning of each coefficient of partial determination.
(8) Estimate a regression model using X1, X2 and X3 to predict Y. Include the regression output and state the multiple linear regression equation, the regression equation for major oil-producing countries, the regression equation for countries that are not major oil producers, and interpret the coefficient for X3.
(9) Estimate a regression model using X1, X2, X3, an interaction between X1 and X2, an interaction between X1 and X3, and an interaction between X2 and X3 to predict Y.
Include the regression output and state the multiple linear regression equation.
(10) Test the joint significance of the three interaction terms using a partial F test to
determine if the interaction terms significantly improve the regression model. Assume a 5% level of significance. Follow all the necessary steps. If you reject the null hypothesis, you also need to test the contribution of each interaction term separately (using partial F tests) to determine which interaction terms to include in the model).