CSE 231 Sumer 2018 Computer Project #4 
Asignment Overview 
This asignment focuses on the implementation of Python programs to read files and proces data by 
using lists and functions. 
Asignment Deliverable 
The deliverable for this asignment is the following file: 
proj04.py – the source code for your Python program 
Be sure to use the specified file name and to submit it for grading via the Mirmir system before the 
project deadline. 
Asignment Background 
One commonly hears reference to “the one percent” refering to the people whose income is in the 
top 1% of incomes. What is the data behind that number and where do others fal? Using the 
National Average age Index (AWI), an index used by the Social Security Administration to gauge 
a individual's earnings for the purpose of calculating their retirement benefit, we can answer such 
questions. 
In this project, you wil proces AWI data. Example data for 2014 and 2015 is provided in the files 
year2014.txt and year2015.txt (2015 is the most recent year of complete data—the 2016 
data isn’t available until October). The data is a table with the first row as the title and the second row 
defining the data fields; remaining rows are data. Note that the 2014 data is nicely formated in 
columns, but the 2015 data is not. The URL for the data is: https:/ww.sa.gov/cgi-
bin/netcomp.cgi?year=2015 
Here is the second line of data from the file followed by descriptions of the data. Notice that some 
data are ints and some are floats: 
5,000.00 - 9,999.99 13,848,841 36,423,281 23.02549 102,586,913,092.61 7,407.62 
Column 0 is bottom of this income range. 
Column 1 is the hyphen separating the bottom of the range from the top. 
Column 2 is the top of this income range. 
Column 3 is the number of individuals in the income range. 
Column 4 is the cumulative number of individuals in this income range and al lower ranges. 
Column 5 is the Column 4 value represented as a cumulative percentage of al individuals. 
Column 6 is the combined income of al the individuals in this range of income. 
Column 7 is the average income of individuals in this range of income. 
Asignment Specifications 
1. The program ust provide following functions to extract some statistics. Note that the 
data_list parameter specified in these functions may be the same for al functions or 
diferent for diferent functions—that is your choice. A skeleton file is provided on Mirmir. 
a) open_file()prompts the user to enter a year number for the data file. The program wil 
check whether the year is betwen 1990 and 2015 (both inclusive). If year number is valid, 
the program wil try to open data file with file name ‘yearXX.txt’, where XX is the 
year. Appropriate eror mesage should be shown if the data file cannot be opened or if the 
year number is invalid. This function wil loop until it receives proper input and 
succesfully opens the file. It returns a file pointer and year. 
i. Hint: use string concatenation to construct the file name 
b) read_file(fp)has one parameter, a file pointer read. This function returns a list of your 
choosing containing data you need for other parts of this project. 
c) find_average(data_list) takes a list of data (of some organization of your 
choosing) and returns the average salary. The function does not print anything. Hints: 
i. This is NOT (!) the average of the last column of data. It is not mathematicaly valid to 
find an average by finding the average of averages—for example, in this case there are 
many more in the lowest category than in the highest category. 
ii. How many wage earners are considered in finding the average (denominator)? There 
are a couple of ways to determine this. I think the easiest uses the “cumulative number” 
column (Column 4), but using Column 3 is not hard and may make more sense to some 
students. 
iii. How does one find the total dollar value of income (numerator)? Notice that “Column 6 
is the combined income of al the individuals in this range of income.” 
iv. For testing your function notice that for the 2014 data the average should be $44,569.20. 
As a check, note that that value is listed on the web page referenced above. 
d) find_median(data_list) takes a list of data (of some organization of your choosing) 
and returns the median income. The function does not print anything. Unfortunately, this 
file of data is not sufficient to find the true median so we need to approximate it. 
i. Here is the rule we wil use: find the data line whose cumulative percentage (Column 5) 
is closest to 50% and return its average income (Column 7). If both data lines are 
equaly close, return either one. 
ii. Hint: Python’s abs() function (absolute value) is potentialy useful here. 
iii. Hint: your get_range() function should be useful here. 
iv. For testing your function, using our rule the median income for the 2014 data is 
$27,457.00 
e) get_range(data_list, percent) takes a list of data (of some organization of your 
choosing) and a percent (float) and returns the salary range as a tuple (Columns 0 and 2) 
for the data line whose cumulative percentage (Column 5) is greater than or equal to the 
percent parameter, the cumulative percentage value (Column 5) and the average income 
(Column 7). Stated another way: ((col_0,col_2),col_5,col_7) The function 
does not print anything. 
i. For testing using the 2014 data and a percent value of 90 your function wil return 
((90000.0, 94999.99), 90.80624, 92420.5) 
f) get_percent(data_list, income) takes a list of data (of some organization of 
your choosing) and an income (float) and returns the cumulative percentage (Column 5) 
for the data line that the specified income is in the income range (Columns 0 and 2), and 
income range (Columns 0 and 2) . Stated another way: ((col_0,col_2),col_5) The 
function does not print anything. 
i. For testing using the 2014 data and an income value of 150,000 your function wil return 
((150000.0, 154999.99), 96.87301) 
g) do_plot(x_vals,y_vals,year) provided by us takes two equal-length lists of 
numbers and plots them. Note that if you plot the whole file of data, the income ranges are 
so skewed that the result is a nearly vertical plot at the leftmost edge so close to the edge that 
you cannot se it in the plot—it looks like nothing was plotted. Plotting the lowest 40 
income ranges results in a more easily readable plot. 
2. main() 
a) Open the data file 
b) Read the data file (using the file pointer from the opened file). 
c) Print the year, the average income, and the median income (and a header). Here is the 
output format that I used: "{:<6d}${:<14,.2f}${:<14,.2f}" 
d) Prompt whether to plot the data and if “yes”, plot the data: cumulative percentage (Column 
5) vs. income (Column 0) – only the lowest 40 income ranges. 
e) Loop, prompting for either “r” for range , “p” for percent, or nothing 
i. r: prompt for a percent (float) and output the income that is below that percent. Print an 
eror mesage, if an invalid number is entered (a percent must be betwen 0 and 100). 
Here is the output format that I used: 
"{:4.2f}% of incomes are below ${:<13,.2f}." 
ii. p: prompt for an income (float) and output the percent that earn more. Print an eror 
mesage, if an invalid income is entered (income must be positive). Here is the output 
format that I used: 
"An income of ${:<13,.2f} is in the top {:4.2f}% of incomes." 
iii. if only a cariage-return is entered, halt the program. 
3. Cal main() using 
if __name__ == "__main__": 
main() 
Asignment Notes 
1. Items 1-9 of the Coding Standard wil be enforced for this project. 
2. Files for year2014.txt and year2015.txt are provided so that you can test your program. 
We wil also test you on the year 2000 data, but are not sharing that file with you. 
3. Note that most data has commas. I wrote functions that converted a string with commas into a 
number without commas. I wrote separate functions for int and float, but you may find that 
one combined function suits your needs. I used a try-except statement in case the string 
wasn’t realy a number. 
4. For output you need to insert commas. There is a format specification, e.g. if you might have 
formated a floating-point value without commas as {:<12.2f} you can simply insert a comma 
before the dot as in {:<12,.2f}. 
5. There are multiple ways to handle the “and over” wording in the last line of the input files. 
One way you might not have thought of uses the special value float(“inf”) which represents 
infinity in the sense of a value bigger than al others. 
Suggested Procedure 
• Solve the problem using pencil and paper first. You cannot write a program until you have 
figured out how to solve the problem. This first step may be done collaboratively with 
another student. However, once the discussion turns to Python specifics and the subsequent 
writing of Python statements, you must work on your own. 
• Construct the program one function at a time—testing before moving on. 
• Use the Mirmir system to turn in the first version of your solution. 
Cycle through the steps to incrementaly develop your program: 
o Edit your program to add new capabilities. 
o Run the program and fix any erors. 
o Use the Mirmir system to submit the current version of your solution. 
• Be sure to log out when you leave the room, if you’re working in a public lab. 
Tests 
There are unit tests for functions: find_average, find_median, get_range, and 
get_percent. The tests al cal your read_file function to get your data structure to pas 
to those functions. The file read for these unit tests is the 2014 data. 
Test 1 
Enter a year where 1990 <= year <= 2015: 2014 
Year Mean Median 
2014 $44,569.20 $27,457.00 
Do you want to plot values (yes/no)? no 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: 
Test 2 
Enter a year where 1990 <= year <= 2015: 2014 
Year Mean Median 
2014 $44,569.20 $27,457.00 
Do you want to plot values (yes/no)? no 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r 
Enter a percent: 90 
90.00% of incomes are below $90,000.00 . 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p 
Enter an income: 100000 
An income of $100,000.00 is in the top 92.57% of incomes. 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: 
Test 3 
Enter a year where 1990 <= year <= 2015: xxxx 
Error in year. Please try again. 
Enter a year where 1990 <= year <= 2015: 1900 
Error in year. Please try again. 
Enter a year where 1990 <= year <= 2015: 1999 
Error in file name: year1999.txt Please try again. 
Enter a year where 1990 <= year <= 2015: 2015 
Year Mean Median 
2015 $46,119.78 $27,459.59 
Do you want to plot values (yes/no)? no 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: x 
Error in selection. 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r 
Enter a percent: 104 
Error in percent. Please try again 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r 
Enter a percent: -2 
Error in percent. Please try again 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r 
Enter a percent: 90 
90.00% of incomes are below $90,000.00 . 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p 
Enter an income: -20 
Error: income must be positive 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p 
Enter an income: 100000 
An income of $100,000.00 is in the top 92.03% of incomes. 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: 
Test 4 
Enter a year where 1990 <= year <= 2015: 2000 
Year Mean Median 
2000 $30,846.09 $17,471.75 
Do you want to plot values (yes/no)? no 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r 
Enter a percent: 40 
40.00% of incomes are below $15,000.00 . 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p 
Enter an income: 20000 
An income of $20,000.00 is in the top 56.96% of incomes. 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: 
Test 5 
(not on Mirmir because this tests the plot – TAs wil run this test.) 
Enter a year where 1990 <= year <= 2015: 2015 
Year Mean Median 
2015 $46,119.78 $27,459.59 
Do you want to plot values (yes/no)? yes 
Enter a choice to get (r)ange, (p)ercent, or nothing to stop: 
Grading Rubric 
Computer Project #4 Scoring Summary 
General Requirements