BISM7217 – 2020 S1 – Assignment 1 
Advanced Business  
Data Analytics  
BISM7217  
ASSIGNMENT 1 
BISM7217 – 2020 S1 – Assignment 1 
Summary   
• Type: Project report  
• Learning Objectives Assessed: 1, 2, 3, 4, 5  
• Due Date: 20 Apr 2020 11 AM  
• Deliverable: A written report submitted via TurnItIn and a RapidMiner process  
• Weight: 30%  
This assignment is an individual assignment. The aim is to provide experience in the steps involved  
with creating, evaluating, improving classification models, and finally presenting and interpreting the  
model in a business report. You are strongly encouraged to commence this assignment by the end of  
the third week of the semester, and you should progress thoughtfully through the steps. Hasty decisions  
made early in the design process may result in much more work later.   
Feel free to discuss concepts and ideas with peers, but remember your submission must be your work.  
Be careful not to allow anyone to copy your work.  
Specification  
Direct marketing is a form of advertising which allows organizations to communicate directly to  
customers through a variety of media, including phone cell calls and emails. As selecting the best set  
of clients, i.e., that are more likely to subscribe a product, is a complex task (Nobibon et al., 2011),  
various technologies should be employed to improve marketing by focusing on specific customers, thus  
allowing companies to build more extended relations aligned with their business strategies (Rust et al.,  
2010). Centralizing customer remote interactions in a contact center eases operational management of  
campaigns, and communicating with customers through the telephone is one way to conduct direct  
marketing activities (Moro et al., 2014). Marketing operationalized through a contact center is called  
telemarketing (Kotler et al., 2009). In the banking industry, deciding on the target customers for  
telemarketing is of crucial importance, under a growing pressure to increase profits and reduce costs.  
Banks are now pressured to increase capital requirements in various ways, including capturing more  
long term deposits (Moro et al., 2014).   
Under this context, the use of predictive modeling based on a previous data to predict the result of a  
telemarketing phone call to sell long term deposits is a valuable tool to support client selection decisions  
of bank campaign managers. As an analyst in BOP, a Portuguese bank, you are going to propose a  
classification model that can predict the result of a phone call to sell long term deposits. Such a model  
is valuable to assist managers of BOP bank in prioritizing and selecting the next customers to be  
contacted during bank marketing campaigns. Your model will help managers, including the Director of  
BISM7217 – 2020 S1 – Assignment 1  
3  
Market Intelligence, to analyze the probability of success. Consequently, the time and costs of such  
campaigns would be reduced, and by performing fewer and more effective phone calls, client stress and  
intrusiveness would be diminished.  
Dataset  
The data is related to direct marketing campaigns of BOP bank. The marketing campaigns were based  
on phone calls. Often, more than one contact with the same client was required to assess if the product  
is of interest to a customer. The provided dataset contains 41188 records and 20 inputs, ordered by date  
(from May 2008 to November 2010). The classification goal is to predict if the client will subscribe  
(yes/no) a term deposit (subscription variable).  
There are 4 types of input variables and only 1 target/label/special variable:  
A) Bank client data:  
1. Age (type: numeric)  
2. Job: type of job (type: categorical)  
3. Marital: marital status (type: categorical)  
4. Education (type: categorical)  
5. Default: has credit in default? (type: categorical)  
6. Housing: has a housing loan? (type: categorical)  
7. Loan: has a personal loan? (type: categorical)  
B) Related with the last contact of the current campaign:  
1. Contact: contact communication type (type: categorical)  
2. Month: last contact month of the year (type: categorical)  
3. Day_of_week: last contact day of the week (type: categorical)  
4. Duration: last contact duration, in seconds (type: numeric).   
Important note: The duration attribute profoundly affects the output target. For example, if the  
duration is ZERO, then y would be most likely “NO”. Yet, the duration is not known before a  
call is performed. Also, after the end of the call y is known. Thus, you should discard the  
duration attribute if you intend to have a realistic predictive model.  
C) Other attributes:  
1. Campaign: number of contacts performed during this campaign and for this client (type:  
numeric)  
Note: This attribute includes the last contact.  
2. Pdays: number of days that passed by after the client was last contacted from a previous  
campaign (type: numeric)  
Note: 999 means the client was not previously contacted.  
3. Previous: number of contacts performed before this campaign and for this client (type: numeric)  
4. Poutcome: outcome of the previous marketing campaign (type: categorical)  
D) Social and economic context attributes  
1. Emp.var.rate: employment variation rate - quarterly indicator (type: numeric)  
BISM7217 – 2020 S1 – Assignment 1  
4  
2. Cons.price.idx: consumer price index - monthly indicator (type: numeric)  
3. Cons.conf.idx: consumer confidence index - monthly indicator (type: numeric)  
4. Euribor3m: euribor 3 month rate - daily indicator (type: numeric)  
5. Nr.employed: number of employees - quarterly indicator (type: numeric)  
E) Output variable (desired target):  
• Subscription: indicates if the client subscribed to a term deposit (type: binary)  
Deliverables  
Your reports should include the following parts:  
• Executive summary: Include those results that are most significant for your strategy  
development and recommendations and justify them.   
• Introduction or data exploration  
• Model building.   
• Model evaluation   
It is up to you to decide what proportion of your report goes to each part. You may include tables, charts,  
or tables of your analysis and models. At the end of your analysis, your RapidMiner process should be  
exported to your desktop or laptop in .rmp format and then uploaded along with your report.   
The consistency of your .rmp file will be checked with the results in your report. You do not need to  
provide the screenshots of your RapidMiner process, as the marker can observe them from your .rmp  
file. Consider the following points for designing your process:  
• You need to create only one .rmp file with as many operators and outputs that are needed.  
• You should not modify “BISM7217_2020_S1_A1_Data.xlsx” file before importing it in  
RapidMiner.   
• All of your analysis should be done after importing “BISM7217_2020_S1_A1_Data.xlsx” in  
RapidMiner, not Excel, or any other analytical tool.   
• Process should start with loading “BISM7217_2020_S1_A1_Data.xlsx” file from your  
desktop.   
Formatting and professionalism  
The project report is to be written to a professional standard. This requires a formal writing style – do  
not use dot points - and adopt a professional tone. Given the report’s nature, you may choose to write  
this essay in the first person. The report must be consistent with the University’s policies on academic  
integrity, plagiarism, and consequences as noted below. The report should be typed (in Times Roman  
12-point font or larger, single-spaced) and the Word Count should be 1500 words (+/- 10%) in total  
length. The Word Count excludes the title page, tables, footnotes and references (if required). The word  
limit must be observed or the assessment will be affected as noted in the Rubric.  No appendices are to  
be provided.   
BISM7217 – 2020 S1 – Assignment 1  
5  
Submission  
To be done through Blackboard Assignment Submission and TurnItIn as indicated in Learn.UQ.    
Acceptable submission formats are Microsoft Word and PDF formats for the reports and .rmp for the  
process. The files MUST be named in the format of BISM7217_StudentLastName_StudentID.pdf (or  
a. docx or .doc extension). If your ID is 41724593 and your surname is Mory, the name of your files  
would be BISM7217_Mory_41724593.pdf. The written assignment file should not be zipped.  
Plagiarism  
It is understandable that students talk with each other regularly, and discuss problems and potential  
solutions. However, it is expected that the submitted assignment is a unique document – all parts of the  
assignment are to be completed solely by the individual student.  In cases where an assignment is  
perceived to not be a unique work, a loss of marks and other implications can result.  For further  
information about academic integrity, plagiarism and consequences, please visit:  
http://ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct.  
Frequently Asked Questions  
Question: How can I format my report?  
Answer: The most common approach is considering 4 parts: 1) Executive summary, 2) Introduction 3)  
Model building and 3) Model evaluation. You may wish to other sections such as Conclusion (Optional)  
or References (Optional).  
Question: What should I include in ES?  
Answer: Executive Summary (ES) is the essence of your work that should be very brief. Since your  
report is a maximum of 1500 words, it is better no to aim for more than 200 words for ES, but again it  
is your choice, and it is essential to provide a quality and persuasive report.  
Question: What can I discuss in the model building section?   
Answer: You can discuss the following items c in this section: How you build various models? If you  
changed the parameter, and why? Did you try to improve your models, and how? Could you improve  
your models?  
Question: What should I include in the model evaluation section?  
Answer: How did you evaluate your models? What metrics you used, and why? Which model  
performed better, and why you think so? Can you rely on your results, and why?  
Question: What are the expectations when describing a Decision Tree (DT)? Do we need to talk about  
every branch?  
Answer: The advantage of DTs is that they are very intuitive, and you can interpret them by elaborating  
on their branch. So, yes, but you do not need to elaborate on all of them. You can pick some more  
indicative ones and elaborate on them. You can use model improvement techniques, such as AdaBoost,  
Bagging, and Random Forests, along with decision trees and also elaborate on them too.   
Question: Do I have to have all the DTs with different configurations/and different model improving  
methods in the .rmp file, to show how I tried different modeling? Or is it ok to have only the models  
that I am satisfied with and that I decide to use in my report?  
Answer: You can only submit the process of the models that you discuss in your report. But it worth  
mentioning in your report the additional work you havev done.  
Question: How can I export the figures generated in RapidMiner to my report?  
Answer: You can use windows snipper.  
Question: Which one is more important, accuracy, or presentation? And how high accuracy we are  
expected to reach?   
Answer: Your approach, the undertaken steps, and their justification are more important than the final  
accuracy level. You need to show that you tried your best, but if available data is not enough for  
achieving higher accuracy, it is not your fault. It is the maximum that we can learn from the available  
data.  
Question: Can I upload as many as rmp files?  
Answer: We prefer only one process.  
Question: Whenever I choose the export process from the File menu and save the process on my  
computer, I am unable to find it?   
Answer: Make sure to choose your desktop while exporting rmp file.  
Administrative Requirements  
Consultation sessions  
To ensure that an equal and sufficient amount of time is allocated for every student who attends  
consultation sessions regarding the practical aspects of BISM7217, the average consultation time  
(during busy consultation times) will be limited to 5 minutes per student. The main aim of this restriction  
during busy periods is to ensure student equity and minimise waiting time. However, in circumstances  
where no other students are waiting, longer consultation times will be provided.    
Tutors will advise you of their consultation times during tutorials – these details are also available on  
the BISM72117 Blackboard site under “Contacts”.  
Submission Date  
BISM7217 – 2020 S1 – Assignment 1  
7  
11 AM 9th April 2020 For each calendar day (i.e. including Saturdays and Sundays) or part thereof after  
the submission deadline, a penalty of 5% of the total possible assignment marks will be deducted until  
the assignment is submitted.  
Deadline extensions  
An extension to the assignment deadline will only be considered for legitimate reasons and with  
supporting documentation (e.g. medical certificate). A request for an extension is assessed by the  
Assessment, Examinations  Misconducts Coordinator. You may discuss your situation with your  
course coordinator, but you still need to make a formal extension request using the form identified on  
the Electronic Course Profile for this course. Extensions will not be granted where the School is not  
satisfied; you took reasonable measures to avoid the circumstances that contributed to you not  
submitting by the due date. The following are not grounds for an extension:  
• holiday arrangements (including overseas travel)  
• misreading a due date  
• social and leisure events  
• moving house  
• the pressure of work/competing deadlines  
• computer issues  
Please refer to the Electronic Course Profile for this course for more detail.  
Marking rubric  
Your report will be graded on its structure, rationale, arguments, use of academic support/sources, and  
overall presentation quality.  This assignment is worth 20 marks. The marking rubric on the next page  
is designed to reflect a marking schema of 100 points that are scaled back to 20 marks.  Part marks are  
rounded up or down to the nearest half mark.