MSDS 16:954:597 
Data Wrangling and Husbandry 
Final Project Instructions 
Spring 2020 
1 Project Instructions 
For the final project for the course, your assignment is essentially to wrangle some 
data and to show off your skills. I am intentionally not specific about how you do 
so, but you have the weekly assignments as models. Think of the project as the 
equivalent of chaining together multiple weeks of assignments: you should bring 
data into R, clean it, tidy it, perhaps create new variables, perhaps summarize your 
data, and report on it with tables and figures. However, there are some required 
elements: 
• You must get your data from at least two distinct sources, at least one of 
which must be at least somewhat difficult to work with (requires scraping or 
cleaning). 
• You must use Git and Github to manage your project. If you have not done 
so already, please create an account. 
• All of your code and the R Markdown file should run in its own directory, 
without any additional files or code. 
• Every code chunk must be labeled. 
• You must include a step where you save a tidy version of (perhaps just some 
of) your data as a csv file. The idea is that the csv file would be an easy place 
for someone else to start from. 
• Your report, generated from an R Markdown file, should be as good looking 
and well formatted as you can make it — that includes tables and figures. Do 
not use echo = TRUE except as truly needed. 
• We have not done statistical analyses more sophisticated than correlation and 
linear regression in this course and there is no need for more advanced analyses 
in your report. You can do so if you wish, however. 
• If some parts of your project are relatively easy, you should balance that out 
by going into more depth in other aspects. 
• Your report should explain the steps you’ve taken and why — I do not want 
to see just a collection of tables and figures. Feel free to describe approaches 
that didn’t work or were more troublesome than expected. 
• I expect that you will discuss this project with others, but please avoid using 
datasets in common (I realize that might still happen by coincidence). All of 
the work submitted must be your own. Be sure to credit the sources of your 
data and any other material — it is better to over-credit than to under-credit. 
If you have any questions about properly crediting others’ work, just see me 
about it. 
page 1 of 2 
Data Wrangling and Husbandry April 2, 2020 
2 Presentation or Written Project 
Students will be given the following two options for their final project. 
(a) Give a 5-10 minute presentation of your project during our last class on Mon- 
day, May 4 (think 5-10 slides). Besides the presentation, you will also turn in 
your slides and other components required for the project, including a 5-page 
report of your project. Students who give a presentation will have until the 
end of that week (May 8) to turn in their project. Because everything is now 
virtual, be sure you have a working webcam with microphone. 
(b) Produce a 10-page formal report. Students who hand in a formal report will 
turn in their project at the time of the last class (May 4). 
In any case, focus on why you were interested in the datasets, some of the issues 
in wrangling it, and a few interesting figures or tables. While keeping in mind that 
what was time-consuming for you may not be interesting for others, remember that 
the course has emphasized mechanics and that your classmates may very well be 
interested in, say, what regular expression you used to reformat a particular column. 
3 Procedures and Dates 
Submit (via Canvas) a short description of your data and plans for it by Tuesday, 
April 7. I will also ask for your preference for a presentation versus report as part 
of that “assignment”. The description should include links to your data sources. 
There is no grade associated with this part. 
You will submit your final project (again, via Canvas) by giving the URL to 
clone your GitHub repository. Also submit any API keys required using the format 
api.key.. <- "abcdefg". 
Your final project will be graded holistically, but I will be looking at these elements: 
• That you have demonstrated your ability to use R to accomplish your tasks. 
• That your code is easy to understand with supporting comments. 
• That your report is well-written (i.e., clear and concise) with well-presented 
tables and figures. 
page 2 of 2