MA4601/MAT061 Stochastic Search and Optimisation 
Assignment 4: Multi-armed Bandits 
Due 12:00 mid-day, Thursday 23rd April 
The goal of this assignment is to explore the tradeoff between exploration and exploitation 
in multi-armed bandit heuristics. 
You will need to submit two files: a programme file titled YOUR NAME programme.r (or .py, 
.jl, etc.) and a report as a pdf file titled YOUR NAME report.pdf. Submission by email to 
. The report should be presented as a stand-alone document that 
can be understood without having to read your code. It should be no more than four pages 
long. 
Consider the following modifications of the -greedy, UCB1, and Bayesian decision rules. 
-greedy For some ρ, with probability 1 − ρ/t choose the bandit with highest θˆi, otherwise 
choose a bandit uniformly at random. 
UCB(ρ) 
i(t) = arg maxi 
( 
θˆi(t− 1) + 
√ 
ρ log t 
Ti(t− 1) 
) 
. 
Bayesian Let q(Θi(t), ρ) be the 100ρ percentage point of Θi(t), then 
i(t) = arg maxiq(Θi(t), ρ). 
Implement these decision rules and compare their performance using 10 multi-armed bandits 
with randomly chosen returns. 
Use Bayesian Global Optimisation to find the optimal value of ρ in each case. You may use 
the function BayesianOptimization from the R package rBayesianOptimization. 
Marks will be allocated on the following basis: 
50% Code correctness (how well does it work). 
25% Quality of analysis (what have we learnt about these decision rules). 
25% Clarity of report.