CS544 Final Project  
Picking the Data Set  
Look into the following sites as an example and select a data set that interests you.  
1. https://www.kaggle.com/datasets  
2. https://github.com/fivethirtyeight/data  
3. http://www.kdnuggets.com/datasets/index.html  
4. Any other source of your choice  
Register your Project Title and Dataset  
• Register using this link. Project selections are on first-come basis.  
http://kalathur.com/projects/cs544_a2.php  
• Your project source should be different from other students. The earlier  
you decide and register, the better.  
Preparing the data  
• Import the data set into R.  
• Document the steps for the import process and any preprocessing had  
to be done prior to or after the import. Any R code used in the process  
should be included.  
Analyzing the data  
• Do the analysis as in Module3 for at least one categorical variable and at least one  
numerical variable. Show appropriate plots for your data.  
• Do the analysis as in Module3 for at least one set of two or more variables. Show  
appropriate plots for your data.  
• Pick one variable with numerical data and examine the distribution of the data.   
• Draw various random samples of the data and show the applicability of the  
Central Limit Theorem for this variable.  
• Show how various sampling methods can be used on your data. What are your  
conclusions if these samples are used instead of the whole dataset.  
• Implementation of any feature(s) not mentioned in the above specification.  
Presenting the Project   
• You will do your project presentation with the Professor using Zoom.  
• Each presentation is for at most 10 minutes. Signup sheet will be provided  
later.  
Submitting the Project  
Upload a zip file (CS544Final_lastName.zip) containing all the code (R file), the  
presentation document (PDF or PPT), and all the results in a Word/PDF Document, or the  
appropriate RMarkdown file and knitted HTML.