CSCI433/CSCI933: Machine Learning - Algorithms and 
Applications 
Assignment Problem Set #2 
Lecturer: Prof. Philip O. Ogunbona() 
School of Computing and Information Technology 
University of Wollongong 
Due date: Saturday May 2, 6:00 p.m. 
Introduction 
Often the number of features collected in a machine learning problem is very large and can be repre- 
sented as data in a large dimensional vector space. For instance one may need to solve a classification 
problem and the number of features collected may number in several hundreds or thousands. Hence 
the feature vector will be of a high dimension. Dealing with such data could be problematic because 
of the so-called curse of dimensionality. It may be the case that the information required to charac- 
terize the classification problem can be represented with a feature vector of much smaller dimension. 
In this situation the information characterising the problem lies in a low-dimensional manifold of the 
original vector space. The problem of dimensionality reduction is how to find this low-dimensional 
manifold. 
In this assignment, you will study some of the non-linear dimensionality reduction methods (van 
der Maaten, Postma,  van den Herik, 2008) used in machine learning. You are to read, study, 
understand and replicate aspects of the paper by van der Maaten et al. (2008). The assignment gives 
you opportunity to generate and visualize artificial data and to work with both artificial and natural 
dataset. You will use the Python programming language and the libraries available for machine 
learning (scikit-learn), plotting and visualization (e.g. matplotlib, seaborn, etc.) to explore some of 
the methods of dimensionality reduction. You will be aiming to replicate the results obtained by the 
authors of the paper cited as (van der Maaten et al., 2008). There is also an extended version of 
the paper that describes how the artificial data was generated (van der Maaten, Postma,  van den 
Herik, 2009). This should help you when implementing code to generate the data. The two papers 
are included in the specification pack provided for this assignment. 
What needs to be done 
1. Read, study and understand the two papers. You are replicating the short paper (van der 
Maaten et al., 2008). The longer paper describes how to generate the artificial datasets and 
includes more details about the techniques. 
2. Generate and plot (visualize) the artificial datasets Swiss roll, Broken Swiss and Helix. 
See for example Fig. 4 in van der Maaten et al. (2008). You will include the plot you generated 
in your report and write about it. 
3. Download and prepare to use the natural datasets: MNIST and Olivetti faces. You can use 
the scikit-learn module in Python to download the MNIST and Olivetti faces datasets as 
shown in this code snippet (or read the scikit-learn documentation). 
1 
.. 
import sklearn 
from sklearn import datasets 
from sklearn.datasets import fetch_openml 
mnist_data = fetch_openml(’mnist_784’, version=1, return_X_y=True) 
olivetti_faces = sklearn.datasets.fectch_olivetti_faces 
. 
. 
Ensure that you really understand the organisation of the datasets. This is absolutely important 
- check the size, shape, etc. 
4. Using Python programming language, implement the dimensionality reduction methods: PCA, 
Kernel PCA, Autoencoders, LLE (see Table 2 in van der Maaten et al. (2008)) as described in 
the paper. Use the parameter settings provided in the paper. As a hint, these techniques are 
implemented in the scikit-learn Python machine learning library. 
5. Using generalization errors of 1-Nearest Neighbour classifier trained on the datasets, compare 
the performance of the dimensionality reduction methods mentioned in item (4) above. Your 
results will be presented as in Table. 4 of the paper for the datasets listed in items (2 and 3) 
above. 
6. Your report will be presented in a conference paper format (see accompanying template) and 
should detail your understanding of theory of the techniques and experiments in the assigned 
paper. You will describe the techniques in your own words with appropriate equations. When 
you write an equation, the meaning of the symbols must be explained as well as the intuition 
behind the equation itself. Your report MUST not be more than nine (9) pages in the format 
specified by the template. 
7. Please cite any other paper or book you have read in gaining deeper understanding of the 
concepts and methods. 
What needs to be submitted 
• You will prepare a “zip” or “rar” file containing your report (9-page PDF file) and Python 
code (named : “dim reduc.py”) file. 
• Your code must run from command line as: 
python3 dim_reduc.py 
and write your results to standard output (stdout). 
• Submit the “zip” or “rar” via Moodle dropbox provided on or before the deadline. 
References 
van der Maaten, L. J. P., Postma, E. O.,  van den Herik, H. J. (2008). Dimen- 
sionality reduction : A comparative review. online. Retrieved March 2020, from 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.112.5472rep=rep1type=pdf 
van der Maaten, L. J. P., Postma, E. O.,  van den Herik, H. J. (2009). Dimen- 
sionality reduction : A comparative review. online. Retrieved March 2020, from 
https://lvdmaaten.github.io/publications/papers/TR Dimensionality Reduction Review 2009.pdf