ECO00032I
DEPARTMENT OF ECONOMICS AND RELATED STUDIES
ECONOMETRIC ANALYSIS
SUMMATIVE ASSIGNMENT - SPRING 2025
The submission deadline for the Econometric Analysis assignment is Thursday 29 May 2025, by 11am (UK time).
Introduction
You are expected to conduct an econometric analysis to answer the two research questions presented in pages 4 and 6 and submit a report of a maximum of 2,500 words. Instructions on how to structure your report are provided below.
The project data are a sample of cross-sectional data from the Quarterly Labour Force Survey (QLFS), collected between October and December 2024. The data set is called Project2025.dta. Instructions on how to download the data set and a description of the variables are presented in page 6. You are advised to use STATA to conduct the econometric analysis. Support after the project is released will be limited to purely technical help with STATA.
On the Econometric Analysis VLE site, you will find references to core undergraduate Labour Economics textbooks that should provide sufficient background for your assignment.
A detailed marking grid, which will be used for the overall assessment of your project and shared as individual feedback, is available at the VLE submission point. Markers will be looking for strong evidence of a sound understanding of key concepts and methods of econometrics, ability to conduct an econometric analysis as well as critical and original thinking.
Therefore, we would encourage you to view your project as a way to ‘showcase’ your econometric skills. For example, by explaining how a test is constructed, how to interpret the results and what are the implications for your econometric model and estimates. Further, we encourage you to take the space and time in your project to fully interpret your results, make your answer to the research questions you are investigating as clear as possible and discuss the limitations of your methods and results.
Computing
You are advised to use STATA. This package is the only one for which the course tutors will provide support. The dataset has a .dta format and can be opened directly with STATA.
You can download STATA on your own computer or laptop. You will find instructions on the Econometric Analysis VLE site in Learning by doing / Introduction to STATA. The software is also installed on all computers on campus.
Support
A Padlet collecting anonymous questions is available on the Econometric Analysis VLE site. This will be the only communication channel available. Please do not send emails to your course tutors, we will redirect you systematically to the Padlet.
You can ask clarification questions on the project outline and receive help on purely technical issues with STATA. Your course tutors will not provide advice on how to conduct your econometric analysis. This is to ensure fairness and consistency.
Word limit and format
The project report should not be longer than 2,500 words of text and excessive length will be penalised: only the first 2,500 words will be graded. Please note that the project title, exam number, figures, equations, tables and references are not included in the 2,500 word count. Recommended word limits for each section are provided in brackets.
The main results (regression outputs, tests) should be integrated to the main body of your report. Additional tables of results, graphs and diagrams etc. can be presented in appendices and will not be counted within the 2,500 words. However, the appendices should not exceed eight pages. For example, you might include the calculation of test statistics in the appendices and the hypotheses, explanation of the test, results and interpretation in the main body of the project text.
Please give consideration to the readability of your project:
Use a standard font (Calibri, Arial or Times New Roman), size 12, font colour Black.
Your figures, tables and regression outputs should be legible and captioned.
You can provide screenshots of relevant STATA outputs or export the results in tables.
All materials (academic papers, textbooks) should be appropriately referenced using Harvard referencing style.
Your final report should be compiled in a single PDF document:
It is your responsibility to make sure that the PDF document is legible.
You do not have to submit your Stata logs or do file.
Your report will be marked anonymously. Do not include your name, student number or exam number.
Academic Integrity
Under no circumstances should you submit a project that you have worked on with another student, this is an individual project for you to complete on your own.
We strongly recommend that you consider the University of York guidance on how you can appropriately use digital tools (including generative Artificial Intelligence) to assist you in the completion of your assignment. The guidance also details inappropriate uses that you must avoid:https://www.york.ac.uk/students/studying/assessment-and-examination/ai/
Submission
The submission deadline for the Econometric Analysis assignment is Thursday 29 May 2025, by 11am (UK time).
Your project will be submitted electronically through the VLE. Please follow the instructions on the Econometric Analysis VLE site. You will also find important information on the exceptional circumstances process.
This formally assessed project forms 90% of your final module mark for Econometric Analysis (ECO00032I).
Research Questions and Project report outline
Your project report should answer the two research questions below and include the following sections (Section 1 – 1.1 to 1.5; Section 2 – 2.1 to 2.2).
Section 1
The wage equation is a fundamental concept in economics, aiming to understand the determinants of wages within a labour market (or “wage structure”). It explores various factors influencing wages including education, experience, skills, gender or firms’ characteristics. By analysing these components, economists seek to grasp the dynamics shaping individuals' earnings, understand disparities in labour market outcomes and formulate policies to address income inequality.
Research Question 1
Using the Quarterly Labour Force Survey (QLFS) data, estimate and interpret an econometric model of the wage equation, with a specific focus on estimating the gender wage gap and returns to education.
|
Research Question 2
Using the QLFS data, evaluate the evidence that the gender wage gap changes for different levels of education.
|
1.1 Introduction and description of the economic model
Provide a brief introduction to the wage equation and consider variables that you would like to include in your model. Briefly explain how the gender wage gap (i.e. differences in wages between men and women) could change for different levels of education.
[250 words]
1.2 Description of your econometric model(s)
Present your econometric model(s) in the form. of a population regression function. This should be your “preferred” or final model(s). Describe the variables included as well as the functional form. that you will be using.
You are advised to choose a semi-log model specification where the dependent variable is a logarithm. You are still encouraged to formally investigate the appropriateness of this functional form in section 1.3. You can present more than one model but should explain why you think this is appropriate or relevant.
[300 words]
1.3 Presentation of your estimated model(s) and specification tests
Present your estimated model(s) in the form of a sample regression function and provide the relevant STATA output.
Present your specification tests (heteroskedasticity, misspecification tests), explain why they are relevant to consider and how they have been undertaken. Present the results of the specification tests and discuss the implications for your model and estimates.
[350 words]
1.4 Statement of the hypotheses to be tested
Your statement of hypotheses should include tests to investigate Research Questions 1 and 2. For example, you can present tests of individual, joint or overall significance, tests for differences in regression functions across groups.
For each test, present the null and alternative hypotheses and explain how you will undertake the test. The actual testing of your hypotheses and interpretation of results should be presented in Section 1.5.
[300 words]
1.5 Interpretation of your results
Provide an interpretation of the sign, magnitude and statistical significance of all estimated coefficients (based on appropriate standard errors given your specification tests undertaken in section 1.3). Make sure that you interpret your results appropriately and fully given the functional form of the model. Consider each of the partial regression coefficients fully in relation to whether the partial regression coefficient is, for example, attached to a dummy variable, or whether there is a quadratic form. in the explanatory variable of interest.
Provide and interpret the results of the tests you presented in Section 1.4 Provide an answer to Research Questions 1 and 2.
[500 words]
Section 2
2.1 Discussion and limitations
Discuss potential limitations of your data, approach and results. Specifically, discuss whether you can measure a causal effect of education on wages.
[300 words]
2.2 Endogeneity issues and possible remedies
Estimates of the relationship between education and wages are often considered to suffer from problems of endogeneity. Using examples covered during the module, explain how you might be able to overcome that problem if you had access to additional variables as part of the QLFS data set.
[500 words]
Data
As part of the project, you will be using real research data from the UK Data Service. This data was collected from real people who agreed for their data to be used for research and learning purposes. Before you can access this data, you need to agree to some important conditions of use. These conditions are presented on the Econometric Analysis VLE site.
The Project2025.dta data set is a sample from the Quarterly Labour Force Survey (QLFS) collected during the period October-December 2024. The QLFS is a voluntary sample survey of private households in the UK. The basic unit of the survey is the household and the data should be considered as a cross-sectional data set.
The sample you have been given has employees with permanent jobs aged 25 to 60 (inclusive) who have left full-time education. There is a total of 4,992 employees.
In the QLFS dataset, employees are identified either as male or female. Education can be measured in different ways. The continuous variable edage provides the age at which the employee left full-time education. The binary variables none, gcse, alevels and degree represent the highest education qualification achieved by the employee.
You are allowed to create additional variables based on the variables already provided in the dataset (see list of variables p8, and summary statistics p8-9). For example, you can use a log transformation or create additional binary variables. Make sure that you explain clearly how you built, named and interpret these additional variables when you present your econometric models.