COMP3411/9814 Artificial Intelligence  
Term 1, 2020  
Assignment 2 – Machine Learning  
Due: Tuesday 21 April, 10pm  
Marks: 30% of final assessment for COMP3411/9814 Artificial Intelligence  
Question 1: Decision trees  
Consider the decision tree learning algorithm of Figure 7.7 and the data of Figure 7.1  
from Poole  Mackworth [1], also presented below. Suppose, for this question, the  
stopping criterion is that all of the examples have the same classification. The tree of  
Figure 7.6 was built by selecting a feature that gives the maximum information gain.  
This question considers what happens when a different feature is selected.  
Figure 7.1: Examples of a user’s preferences  
a) Suppose you change the algorithm to always select the first element of the list of  
features. What tree is found when the features are in the order [Author, Thread,  
Length, WhereRead]? Does this tree represent a different function than that found  
with the maximum information gain split? Explain.  
b) What tree is found when the features are in the order [WhereRead, Thread, Length,  
Author]? Does this tree represent a different function than that found with the  
maximum information gain split or the one given for the preceding part? Explain.  
c) Is there a tree that correctly classifies the training examples but represents a different  
function than those found by the preceding algorithms? If so, give it. If not, explain  
why.  
Question 2: Decision trees  
The goal is to take out-of-the-box models and apply them to a given dataset. The task is  
to analyse the data and build a model to predict whether income exceeds $50K/yr  
based on census data (also known as "Census Income" dataset).  
Use the data set Adult Data Set from the Machine Learning repository [2].  
Use the supervised learning methods discussed in the lectures, Decision Trees.  
Do not code these methods: instead use the implementations from scikit-learn. Read the  
scikit-learn documentation on Decision Trees [3], and the linked pages describing the  
parameters of the methods.  
This question will help you master the workflow of model building. For example,  
you’ll get to practice how to use the critical steps:  
• Importing data  
• Cleaning data  
• Splitting it into train/test or cross-validation sets  
• Pre-processing  
• Transformations  
• Feature engineering  
Use the sklearn documentation pages for instructions. You should need the classification  
algorithms.  
There are also available Tutorials:  
• Sklearn – official tutorial for the sklearn package  
• Predicting wine quality with scikit-learn – Step-by-step tutorial for training a  
machine learning model  
The data is available here: http://archive.ics.uci.edu/ml/ 
machine-learning-databases/adult/  
Preferences  
1. Poole Mackworth, Artificial Intelligence: Foundations of Computational Agents,  
Chapter 7, Supervised Machine Learning)  
2. http://archive.ics.uci.edu/ml/datasets/Adult).  
3. https://scikit-learn.org/stable/modules/tree.html  
4. https://scikit-learn.org/stable/modules/naive_bayes.html  
Submission  
This assignment must be submitted electronically.  
Put your zID and your name at the top of every page of your submission!  
give cs3411 assign2 ...  
The give script will accept *.pdf *.txt *.doc *.rtf  
Late submissions will incur a penalty of 10% per day, applied to the maximum mark.  
Group submissions will not be allowed. By all means, discuss the assignment with your  
fellow students. But you must write (or type) your answers individually. Do NOT copy  
anyone else’s assignment, or send your assignment to any other student.