EXAM 2  
MSCI:3250  
SPRING 2020  
• This exam comprises 4 questions and is worth 150 points  
• You will have 90 minutes to complete this exam:  
• Early submission bonus: Submissions before 10:05 PM will receive a 3  
point bonus  
• Late penalty: Submissions after 10:15 PM will be deducted 3 points for  
each minute late  
• The exam is open-book, open-notes, and you can use the Internet, but no  
communication with other individuals (with the exception of the instructor) is  
allowed  
• By taking this exam, you agree to abide by the Tippie Honor Code and Honor  
Pledge below  
HONOR PLEDGE  
On my honor, I pledge that during this examination I have neither  
given nor received assistance, and that I did not have advanced  
knowledge of the exam content.  
Specifications  
• Submit an R script file with your codes for all questions  
o Name your file “lastname_exam2.R”  
o Add a comment with your name at the top of the file and comments denoting  
each question number  
o You may add other comments for clarification   
o Add the command rm(list=ls()) at the top of your file to clear the workspace  
• The solution for each question must be generated as R variables or plots with specific  
names as instructed  
o All solutions should be generated by running your codes without any  
customization or modification by the instructor  
o Load required packages with the library() command. Your script should not  
include any unnecessary packages or install() commands  
o Assume all input files are in the working directory. Do not include the setwd()  
command in your script  
Background  
April 20th has become known as “Weed  
Day”, prompting annual celebrations and  
rallies across the country. For this exam,  
we will analyze crime and demographic  
data from Denver, CO, where recreational  
marijuana has been decriminalized.  
Carefully review all provided files  
(“mj_crimes1.csv”, “mj_crimes2.csv”, and  
“neighborhoods.txt”) before beginning.  
Then answer the following questions:  
1. (30 points) Read “mj_crimes1.csv” and “mj_crimes2.csv” into data frames and then  
merge them. Do not convert strings to factors. Treat empty cells as missing values.  
Output variables:  
o part1	(10	pt): data frame created from “mj_crimes1.csv”  
o part2	(10	pt): data frame created from “mj_crimes2.csv”  
o crime	(20	pt): data frame created by vertically merging part1	and part2	 
(Hint: Assume that the column names in part1 are correct. Should produce a  
data frame with 1,203 rows and 12 columns)  
2. (50 points) Read “neighborhoods.txt” into a data frame (do not convert strings to factors,  
treat empty cells as missing values). Then merge with crime.  
Output variables:  
o nbhd	(13	pt): data frame created from “neighborhoods.txt”.  
o mj_df	(47	pt):	data frame created by horizontally merging	crime	and nbhd.  
Only include neighborhoods that have crime reports. (Hint: Make sure that the  
values in the shared columns match. There are 2 values in crime that should be  
corrected. Assume that the values in	nbhd	are correct. Should produce a data  
frame with 1,203 rows and 21 columns)  
3. (30 points) Analyze marijuana industry-related crimes:  
Output variables:  
o industry_table:	frequency table that counts the number of crime reports that  
were industry or non-industry related for each offense category. Display the  
offense categories as rows, industry vs. non-industry as columns.  
o industry_mod: logistic regression that models the likelihood of a crime being  
marijuana industry-related based on whether the crime was violent, plus the  
neighborhood’s population, age, vacant housing units, and home value  
o industry_r2: calculate the pseudo R2 for industry_mod using the following  
formula  
1 − !"##!"##  
!"##  represents the deviance of the full model !"##  represents the deviance of the intercept-only model  
4.  (40 points) Analyze crime reports by neighborhood:  
Output variables and plots:  
o nbhd_summary:	dplyr	summary table that calculates the total crime reports  
(“TotalReports”), median age (“MedianAge”) and median poverty rate  
(“MedianPoverty”) for each neighborhood	 
o age_cor: use nbhd_summary to calculate the correlation between a  
neighborhood’s total crime reports and median age 	 
o Create a scatterplot that visualizes the relationship between a neighborhood’s  
total crime reports and poverty rate. Add appropriate labels/titles. Set the points  
to be shaped like a triangle (point up), and filled with the color green (Hint: use  
the pch argument to change the marker type)