15-122 Programming Homework 11  12 Page 1 of 16 
15-122: Principles of Imperative Computation, Spring 2020 
Programming Homework 11  12: The C0VM 
Due: Thursday 23 
rd 
April, 2020 and 
Thursday 30 
th 
April, 2020 by 9pm 
In this assignment you will implement a virtual machine for C0, the C0VM. It has been 
influenced by the Java Virtual Machine (JVM) and the LLVM, a low-level virtual machine 
for compiler backends. We kept its definition much simpler than the JVM, following the 
design of C0. Bytecode verification, one of the cornerstones of the JVM design, fell victim 
to this simplification; in this way the machine bears a closer resemblance to the LLVM. It 
is a fully functional design and should be able to execute arbitrary C0 code. 
The purpose of this assignment is to give you practice in writing C programs in the kind of 
application where C is indeed often used in practice. C is appropriate here because a virtual 
machine has to perform some low-level data and memory manipulation that is difficult to 
make simultaneously efficient and safe in a higher-level language. It should also help you 
gain a deeper understanding of how C0 (and, by extension, C) programs are executed. 
The code handout for this assignment is on Autolab. The file README.txt in the code 
handout goes over the contents of the handout and explains how to hand the assignment 
in. There is a TWENTY (20) PENALTY-FREE HANDIN LIMIT for the checkpoint , 
and a TWENTY (20) PENALTY-FREE HANDIN LIMIT for the full assignment. Every 
additional handin for each will incur a small (5%) penalty (even if using a late day). Your 
score for each part of this assignment (checkpoint and full) will be the score of your last 
Autolab submission. The checkpoint is for Tasks 1 and 2 only. This is far, far less than half 
the assignment. 
After the checkpoint, you can no longer earn points for Tasks 1 and 2, although the full 
autograder will continue to run tests against them. It will also run tests for Tasks 3 and 4, but 
not for Task 5 until after the handin deadline has passed. This means that you must do all 
your own testing for Task 5 and in order to earn full points you must convince yourself that 
you fully understand it and have tested it for correct behavior and for memory leaks where 
appropriate (you will see later in the writeup that certain memory leaks are unavoidable). 
About this writeup 
The C0VM is defined in stages, and we have test programs which exercise only part of the 
specification. We strongly recommend that you construct your implementation following 
these stages and debug and test each stage (on your own and with the autograder) before 
moving on to the next. Each part has its own challenges, but each part should be relatively 
small and self-contained. 
This document describes the structure of the C0VM first and then the instruction set 
c© Carnegie Mellon University 2020 
15-122 Programming Homework 11  12 Page 2 of 16 
(bytecodes) for the C0VM. After this, the document will describe the tasks you need to 
perform, step by step. Read this document very carefully as you prepare to do your work. 
1 The Structure of the C0VM 
Compiled code to be executed by the C0 virtual machine is represented in a byte code format, 
typically stored in a file ending in extension .bc0 which we call the bytecode file. This file 
contains numerical and string constants as well as byte code for the functions defined in the 
C0 source. The precise form of this file is specified in Appendix A and in lib/c0vm.h. 
1.1 The Type c0_value 
C0 has so-called small types int, bool, char, string, t[], and t*. Values of these types 
can be passed to or from functions and held in variables. In the C0VM, we will represent 
each of these types in one of two ways: the primitive types are represented as 32-bit signed 
integers (which we will abbreviate as w32) and the reference types are represented as void 
pointers (which we will abbreviate as *). 
C0 type C0VM type C type 
int w32 int32_t 
bool w32 int32_t 
char w32 int32_t 
t[] * void* 
t* * void* 
In lib/c0vm.h, we create a special type c0_value of C0 values. A c0_value can 
store both values of primitive type (which we write as x, i, or n) and values of reference or 
pointer type (which we write as a). We can turn primitive types into C0 values with the 
function int2val, and we can turn C0 values which we know to be primitive types back into 
integers with val2int. Similarly, ptr2val(x) and val2ptr(x) move between reference 
types (represented by void*) and C0 values. We always have val2int(int2val(x)) == x 
for any integer x and val2ptr(ptr2val(a)) == a for any pointer a. There's a function 
val_equal(v1,v2) which checks whether the two C0 values v1 and v2 are equal. 
1.2 Runtime Data 
The C0VM defines several types of data that are used during the execution of a program. 
You'll need to consider the first three of these (the operand stack, bytecode, and the program 
counter) to get started with Task 1. 
The execute function you are extending in this assignment is passed a struct bc0_file 
containing all the information from the bytecode. As you read this section, you should refer 
to both Appendix A and lib/c0vm.h where this struct is described. 
c© Carnegie Mellon University 2020 
15-122 Programming Homework 11  12 Page 3 of 16 
1.2.1 The Operand Stack (S) 
The C0VM is a stack machine, similar in design to Clac and the JVM. This means arithmetic 
operations and other instructions pop their operands from an operand stack and push their 
result back onto the operand stack. 
The C0VM is defined in terms of operand stack S, a stack of C0 values. Because these 
values are such an important part of the VM, we define stacks that only contain C0 values in 
lib/c0v_stack.h. We recommend writing some helper functions like push_int(S, i) and 
pop_ptr(S) early on so that you won't have to repetitively write c0v_push(S, int2val(i)) 
and val2ptr(c0v_pop(S)) over and over. 
1.2.2 Bytecode (P) 
In Clac, the program instructions were strings, and they were stored in a queue of strings. 
In the C0VM, instructions for the current C0 function are stored in an array of (unsigned) 
bytes (ubyte *P). Each function in a compiled C0 program is represented by a struct 
function_info that is stored in the array bc0->function_pool. The main() function 
that you want to run first is always stored as the first struct in this array, at index 0. When 
you start your VM, you should initialize P to be the bytecode code stored in that struct. 
You don't ever need to allocate any memory to store bytecode. Just refer to the bytecode 
given to the execute function. 
1.2.3 The Program Counter (pc) 
The program counter pc holds the address of the program instruction currently being exe- 
cuted. It always starts at 0 when you begin executing a function. Unless a non-local transfer 
of control occurs (goto, conditional branch, function call or return), the program counter is 
incremented by the number of bytes in the current instruction before the next instruction is 
fetched and interpreted. 
1.2.4 Local variables (V) 
Locals should be stored in a c0_value array. Every struct function_info has a field 
num_vars which specifies how big this array needs to be in order to store all the locals of 
that function. The four components S, P, pc, and V are everything that we need to know in 
order to represent the current state of any function. 
1.2.5 The Call Stack 
The C0VM has a call stack consisting of frames, each one containing local variables, a local 
operand stack, and a return address. The call stack grows when a function is called and 
shrinks when a function returns, deallocating the frame during the return. 
Every frame consists of the four components that represent the current state of some 
function call that got interrupted in order to call another function. It contains the operand 
stack S for computing expression values, a pointer P to the function's byte code, a return 
address pc, which is the address of the next instruction in the interrupted function, and an 
c© Carnegie Mellon University 2020 
15-122 Programming Homework 11  12 Page 4 of 16 
array of locals V. At any point during the execution there is a current frame as well as a 
calling frame. The latter becomes the current frame when a function returns to its caller. 
1.2.6 Constant Pools 
Numerical constants requiring more than 8 bits and all string constants occurring in the 
program will be allocated in constant pools, called the integer pool and the string pool. They 
never change during program execution. 
1.2.7 Function Pools 
Functions, either C0 functions defined in a source file or library functions, are kept in pools 
called the function pool and the native pool, respectively. Functions in the function pool are 
stored with their bytecode instructions, while functions in the native pool store an index into 
a table of C function pointers that the C0VM implementation can dereference and invoke. 
1.2.8 The Heap 
At runtime, the C0 heap contains C0 strings, C0 arrays (t[]) and C0 cells (t*). C0 arrays 
have to store size information so that dynamic array bounds checks can be performed. You 
will use the C heap to implement the C0 heap: that is, you will allocate strings, arrays, and 
cells directly using calls to xmalloc and xcalloc as defined earlier in this course. 
Since C0 is designed as a garbage-collected language, you will not be able to free space 
allocated on behalf of C0 unless you are willing to implement (or use) a garbage collector. 
We do not view this as a memory leak of the C0VM implementation. On the other hand, 
temporary data structures required for the C0VM's own operation should be properly freed. 
This means that when you are testing your own code using valgrind, you must be 
aware of which bc0 files should be free of valgrind memory leak reports and which should 
produce reports of memory leaks. The autograder will not penalize valgrind reports of 
memory leaks for tests that allocate space on the C0 heap. 
Note that the autograder will also not penalize for memory leaks for tests that are sup- 
posed to end in a runtime error. 
1.3 Runtime Errors 
In order to fully capture the behavior of C0 programs, you must correctly issue errors for 
things like dereferencing NULL, indexing an array outside of its bounds, and dividing by zero. 
Check the C0 Language Reference for details on what kinds of errors you must handle, and 
then use the following provided functions (defined in c0vm_abort.h and c0vm_abort.c) 
to issue appropriate error messages: 
void c0_user_error(char *err); // for calls to error() from C0 
void c0_assertion_failure(char *err); // for failed assertions from C0 
void c0_memory_error(char *err); // for memory-related errors 
void c0_arith_error(char *err); // for arithmetic-related errors 
c© Carnegie Mellon University 2020 
15-122 Programming Homework 11  12 Page 5 of 16 
For unexpected situations that arise while executing bytecode, situations which could indi- 
cate a bug in your VM, you may use the standard C library functions abort or assert to 
abort your program. See Section 3.3 for more details on this distinction. 
2 Instruction Set 
We group the instructions by type, in order of increasing complexity from the implementation 
point of view. We recommend implementing them in order and aborting with an appropriate 
message when an unimplemented instruction is encountered. Each task in this assignment 
corresponds to one or more sections, and tasks are summarized in Section 3.1. 
2.1 Stack Manipulation (Task 1) 
There are three instructions for direct stack manipulation without regard to types. 
0x59 dup S, v -> S, v, v 
0x57 pop S, v -> S 
0x5F swap S, v1, v2 -> S, v2, v1 
2.2 Arithmetic Instructions (Task 1) 
Arithmetic operations in C0 are defined using modular arithmetic based on a two's comple- 
ment signed representation. This does not match your implementation language (C) very 
well, where the result of signed arithmetic overflow is undefined. There are two solutions 
to this problem: first, because unsigned arithmetic overflow is defined to be modular arith- 
metic, you could cast int32_t values to uint32_t, perform unsigned arithmetic, then cast 
back. This should not be required in your implementations, however: we will compile your 
code with the -fwrapv command-line argument that forces gcc to treat integer arithmetic 
to be defined as signed two's complement arithmetic. (Remember, however, that this is not 
part of the C standard.) 
Casting signed integers to unsigned integers and/or using the -fwrapv command-line 
argument does not do everything that is needed to ensure C0 compliance, but it goes a long 
way. We recommend a careful reading of the arithmetic operations in the C0 Reference, as 
well as the notes on casting integers in C. 
For this implementation strategy to be correct, it is important to verify that our C 
environment does indeed use a two's complement representation and that the C type of int 
has 32 bits. The provided main function (see the file c0vm_main.c) performs these checks 
before starting the abstract machine and aborts the execution if necessary. 
In the instruction table below (and for subsequent tables), we use w32 for the type of 
primitive values and * for the type of reference values. Each line has an opcode in hex 
notation, followed by the operation mnemonic, followed by the effect of the operation, first 
on the stack, then any other effect. 
0x60 iadd S, x:w32, y:w32 -> S, x+y:w32 
0x7E iand S, x:w32, y:w32 -> S, xy:w32 
c© Carnegie Mellon University 2020 
15-122 Programming Homework 11  12 Page 6 of 16 
0x6C idiv S, x:w32, y:w32 -> S, x/y:w32 
0x68 imul S, x:w32, y:w32 -> S, x*y:w32 
0x80 ior S, x:w32, y:w32 -> S, x|y:w32 
0x70 irem S, x:w32, y:w32 -> S, x%y:w32 
0x78 ishl S, x:w32, y:w32 -> S, x<0x7A ishr S, x:w32, y:w32 -> S, x>>y:w32 
0x64 isub S, x:w32, y:w32 -> S, x-y:w32 
0x82 ixor S, x:w32, y:w32 -> S, x^y:w32 
Safety violations in idiv, irem, ishr, and ishl can cause run-time errors (use the pro- 
vided function c0_arith_error to generate a message). Please refer to the C0 language 
specification for important details. 
We have omitted negation -x, which the compiler can simulate with 0-x, and bitwise 
negation ~x, which the compiler can simulate with x^(-1). 
2.3 Constants (Tasks 1 and 2) 
We can push constants onto the operand stack. There are three different forms: (1) a 
constant null which is directly coded into the instruction, (2) a small signed (byte-sized) 
constant  which is an instruction operand and must be sign-extended to 32 bits, and (3) 
a constant stored in the constant pool. For the latter, we distinguish constants of primitive 
type from those of reference type because they are stored in different pools. 
The two constant loading instructions ildc and aldc take two unsigned bytes as operands, 
which must be combined into an unsigned integer index for the appropriate constant pool. 
The integer pool stores the constants directly, and the index given to ildc is an index into 
the integer pool. The string pool is one large array of character strings, each terminated by 
’\0’. The index given to aldc indicates the position of the first character; its address is 
therefore of type char* in C and understood by C as a string. 
0x01 aconst_null S -> S, null:* 
0x10 bipush  S -> S, x:w32 (x = (w32)b, sign extended) 
0x13 ildc