IST 387/687 Final Project - Spring 2020  
Final Project Submission   
Deliverables:   
One 10-15 slide presentation that summarizes your analysis.  The audience for this  
will be the executives level leadership of an Airline company.  Please assume these  
executives do not know too much about statistics, so you probably should not quote  
terms like “R-squared” or “p-value” but rather describe your statistical results in  
plain language.  
One MS-Word file, containing a detailed report all your work.   This report should  
include sections for all the phases of data science discussed in this course.  A  
suggested template of the deliverables will be provided.  The audience for this report  
is your Data Science professor’s.  the audience you lab professor who understands R  
code and Data Science. Please make sure to include all assumptions made and any  
analysis completed, whether you found it significant or not.    
Rules of Engagement: This is an honor system assignment: You may consult with  
IST687 professors and Faculty Assistants (FA) , the textbook, and publications on the  
Internet at any time. You may not consult, collaborate, or seek assistance from any  
other human besides me. Your attribution statement, at the top of your R-code file,  
must reflect these constraints. You may not share your results or work in progress  
with any other human besides professors and Faculty Assistants (FA). Note that your  
data file is unique to you: The results that other students in IST387/687 obtain will be  
different from yours. Project updates from you will be due for 687 students, on the  
dates provided to ensure you are on track and there are not any outstanding  
questions.  
Project Goal: The goal of this term project is for you to use all of the skills you have  
developed in the IST387/687 labs/homework’s to make sense of a novel dataset, to  
perform some essential analyses on the dataset, and to explain/document what you  
have done. The dataset contains summaries of air travel within the U.S, one row per  
customer, per trip.  
Accessing Your Data File: The data will be available to you. The  file contains about  
32 columns/variables. Each row represents one customer’s airplane trip from an  
origin to a destination.  
Recommended Project Phases  
Data Pre-processing  / Data Preparation Phase  
• Phase 1: Mitigate Missing Data. There are several columns in the dataset that may contain  
missing data. Write code that examines each column to see if it contains missing data. To  
mitigate missing data, use mean substitution for numeric variables. Use comments in your  
code to document how many missing data values you had to repair.   
• Phase 2: Summarize variables. For each numeric variable, create a histogram. Add a  
comment that describes the shape of the histogram as symmetric, positively skewed (long  
right tail), or negatively skewed (long left tail). For each factor variable (e.g., Gender), use  
the table() command to summarize how many observations are in each category.   
Exploratory Analysis Phase    
• Phase 3: Predictive Modeling . Many columns contain data relating to the characteristics of  
each customer’s trip. Using the modeling techniques, we learned in the class (Liner  
Modeling, Assoc Rules, SVM), develop 3-5 different predictive models that analyze the  
data.  
• Phase 4: Map Low Satisfaction Routes. Subset your data to create a smaller data set  
containing only the trips where customers reported the lowest levels of satisfaction. The  
latitude and longitude of each origin and destination is shown in the data set. Use ggplot to  
place route curves onto an outline map of the U.S. states. The geom_curve() geometry  
supports this kind of plotting.   
Business Recommendations Development Phase  
•  Phase 5: Make Sense of Low Satisfaction Segments. The client wants to know why  
customers become dissatisfied with their air travel. Use insights from Phase 3 and Phase 4  
to explain why certain trips have low satisfaction. Conduct any appropriate follow-up  
analyses to provide evidence for your ideas. Make sure to document any additional code  
with appropriate comments.   
• Phase 6: Develop Marketing Plan. Identify three interesting Market Segments. Define the  
demographic characteristics associated with each Market Segment. Finally recommend  
three ideas you for each segment that you believe would increase the NPS for the segment.          
Your presentation should provide the client (presumably the Executive/leaders of the  
airline) with an explanation of your results in language that is suitable for an  
Executive to understand. Your report should contain the data and visualizations that  
support insights and recommendations you are trying to communicate to the client.