COMP5310 Project Stage 2
Develop and evaluate a predictive model
Due: 11:59PM on 15th of May 2025 (Week 11)
This assignment is worth 25% of the final mark of the unit of study.
GROUPS
This stage is done with the same group members you worked with for Stage 1. However, under exceptional circumstances, an alternative group may be created by the tutor when a group is reduced in size due to members discontinuing this unit. If this applies to you, please email the unit coordinator maryam.khaniannajafab[email protected] or the TAs:
[email protected] or w[email protected] to discuss this.
Note: Each member of the group is required to complete individual tasks, but the project will be submiCed as a combined effort. The final project will be marked as a whole, with both individual and group components contribuHng to the final grade. All assessments will be based on the single, submiCed document.
Dispute resolution
If, during the course of the assignment work, there is a dispute among group members that you can’t resolve or that will impact your group’s capacity to complete the task well, you need to inform. the unit coordinator [email protected] or the TAs: [email protected] or [email protected]. Make sure that your email specifies the group name and is explicit about the difficulty; also, make sure this email is copied to all group members (including anyone you are complaining about).
We need to know about problems in 3me to help fix them, so set early deadlines for group members, and deal with non-performance promptly (don’t wait till a few days before the work is due to complain that someone is not delivering on their tasks). If necessary, the coordinator will split a group and leave anyone who didn’t participate effectively in a group by themselves (they will need to achieve all the outcomes on their own). This op3on is only available up un3l Thursday Week 9, which is the last day with time to resolve the issue before the due date. For any group issues that arise aRer this time, you will need to try to resolve the problem on your own, and you will continue to be treated as a single group. If someone doesn’t provide the material required for the report, or their material is not of the agreed standard, you should still have the report show what that person did. Their section of the report may be empty if they don’t produce anything, or it may have material but not enough. In such cases, please put a “Note to marker” on the front page of the report, which describes the circumstances. That way, we can consider how best to apply the marking scheme. Note that it is not expected or sensible for other members to do the work that someone failed to deliver.
PROJECT
Overview
The objective of Stage 2 of the project is to build a robust predictive model using the clean dataset obtained in Stage 1. This stage will involve advanced predictive modelling techniques, as well as thorough model evaluation and optimisation processes.
Important Notes:
1. You MUST work in the same groups you worked on during Stage 1.
2. Further cleaning of the dataset, addiHon of previously dropped columns, and or removal of columns are permiCed if you wish.
3. Each member must use a different modelling technique to develop their predicHve model.
4. Changing of target variable and research quesHon is also permiCed, if the group chooses to do so.
DELIVERABLES
Report
The report must have a maximum of 3 pages for each individual section and maximum of 3 pages for the group section (including both group components 1 and 2) for a group of 2, and a maximum of 4 pages for the group section for a group of 3. You must use the high-level headings, as provided below, to indicate the different sections and sub-sections of the report.
You must use line spacing of at least 1.15pt, margins of at least 1.8cm, and body font size of at least 10pt. The goal is to convey the problem clearly and concisely.
The report should be in PDF format, named using the following convention: “GroupX_A2_Report.pdf”, where X is your group number. DO NOT SUBMIT A FOLDER THAT IS NAMED GroupX_A2_Report. It must have a front page that gives the group number, and the list of members involved (giving their SIDs AND unikeys, NOT their names).
The body of the report must have a structure as follows:
Group Component 1
The report must begin with a group section including:
1. Topic and research question: Describe the research problem comprehensively, emphasizing its significance in the domain. All members must agree upon and aim to answer the same research question. Clearly articulate the research question and highlight its implications for various stakeholders. Discuss how addressing this question could lead to actionable insights or improvements in decision-making for the stakeholders.
2. Dataset: Provide a detailed overview of the dataset and discuss any challenges, class imbalances, and or biases present in the data and how they might impact the modelling process.
3. Setup
3.1. Modelling agreements: Identify an a`ribute that you will all make predictions about and agree on at least two measures of success for the predictive models you will be producing. These measures should go beyond standard accuracy metrics and may include areas under the receiver operating characteristic curve (AUC-ROC), F1-score, precision-recall curves, etc. Explain the rationale behind these measures and their suitability for the research question.
3.2. Data division: Describe the process of how you divided your data into training, validation (if applicable), and test sets. Explain the rationale behind the data division, considering strategies like temporal validation or stratified sampling.
Individual Component
The report must include a dedicated section for each group member. Each section should clearly state the member's Unikey to identify their individual contribution / component:
The report must include a dedicated individual secHon for each group member. Each secHon should clearly state the member's Unikey to idenHfy their individual secHons in the report (THIS IS A UNIKEY: ABCD1234). DO NOT PROVIDE STUDENT ID OR STUDENT NAME TO IDENTIFY ANY OF THE SECTIONS.
1. Predictive model
Note: Each member must choose a different predicHve modelling technique.
1.1. Model Description: Name and describe your technique, discuss the assumptions underlying this technique, and critically evaluate their validity in the context of the dataset. Highlight the strengths and limitations of the chosen technique and justify its suitability for the research question and dataset characteristics. Modelling techniques not covered in the tutorial sessions, such as neural networks (CNNs, LSTMs, RNNs, GANs, etc) or including bagging or gradient boosting techniques (GBM, XGBoost, LightGBM, CatBoost, AdaBoost, etc.) are preferred.
1.2. Model Algorithm: Provide a detailed explanation of the algorithm powering your chosen technique, including its underlying principles, such as (but not limited to) mathematical equations, hyperparameters, and potential variations. Using pseudocode or flowchart diagrams, provide the step-by-step execution of the algorithm. (You can type the pseudocode in Jupyter Notebook and put the screenshot of the pseudocode here. You cannot put the screenshot of the pseudocode in the appendix. If you do, it will not be marked). If you choose to draw a flowchart, you can create it on any online tool or so`ware and aCach its screenshot here. You must put the screenshot of the flowchart diagram here, in the main report. If you put it in the appendix, it will not be marked.
1.3. Model Development: Describe the process of building the predictive model, including advanced data preprocessing techniques such as feature scaling, dimensionality reduction (e.g., Principal Component Analysis), or feature engineering. Discuss the selection of model-specific functions and hyperparameters, providing theoretical justification and empirical validation. Also, you will identify the Python functions and chosen parameters you selected and what they mean.
Note: You don’t have to include the code in the report, as you will submit it separately.
2. Model Evalua3on and Op3miza3on
2.1. Model Evalua3on: Perform. a comprehensive evaluation of your model's performance using the agreed-upon measures of success. Interpret the results in the context of the research question and dataset characteristics, considering factors such as class imbalance, noise, and interpretability. Discuss the implications of the evaluation metrics and identify potential areas for improvement.
2.2. Model Op3misa3on: Explore advanced optimisation techniques to further enhance your model's performance, explaining your choices clearly. This may involve hyperparameter tuning using techniques like grid search.
Group Component 2
Finally, a second group section at the end of the report should include:
1. Discussion: Engage in a critical discussion on the strengths and limitations of each modelling technique employed by group members. Compare and contrast the performance of various models quantitatively and qualitatively. Reflect on the broader implications of model selection for addressing the research question effectively.
2. Conclusion: Synthesize the findings from individual model evaluations and provide a recommendation on the most effective predictive model for answering the research question. Justify your recommendation based on empirical evidence, theoretical considerations, and domain knowledge. Propose potential avenues for future research, including data collection strategies, model refinement techniques, and interdisciplinary collaborations.