# CSAPP cachelab

This is the write up for CSAPP cachelab

## Part A

### General Solution

In part A, we will need to write a cache simulator to interpret the trace output file generated by valgrind. At run time, the parameters for the cache will be given through command line options.

Luckily, the professor provides us with a reference binary as a template to “guide us through”, or show the answer in front of us.

The first thing first is to define a useful structure which can be used to simulate the cache. From the course video and the recitation, we are able to know that cache is composed of three parts: cache set, cache line, and cache block. Since it’s just a simulation, so we don’t need the cache block actually to store the data. We just need to determine if the address that the program access is a hit/miss/eviction.

Generally, a cache line needs a valid bit, tag, and cache block. But here we can remove cache block with the reason above and add a lru variable to act as a time stamp to determine which line should be evicted.

Cache line -> cache set -> cache. So here in the code blow, Cache = ***cacheLine.

Then, another big thing is to understand how to use getopt by reading the man page and searching for demo code online. In optstring, an option followed by a colon : means it requires an argument, followed by two colons : means the argument is optional. If the optstring start with an add sign +, means it will stop when it meets the first unrecognizable option.

Then, we have to analyze how the trace file accesses the cache and what should we do to determine if it is a hit/miss/eviction.

To index the cache, we should first know the cache set index from the provided s, E, b, addr. Second, we will check the tag in all valid cache lines to see if it matches. If it does match, we will get a hit, otherwise fail to hit (still need to determine whether it is a miss or eviction).

If the cache is not full, then it will be a miss, and the cache will put the data into the empty cache line. If it is full, then we will have to change the content of the last used cache line to the new input one, which is an eviction.

So now, the whole program basically accomplishes all goals we would like to achieve, combine them up and add some helper function like print_help_msg and scan through the trace file.

Result:

### Reverse Engineering

This is a trickier version of solve. Since the professor provides the reference file, we can reverse engineer it to get the content and logic in the csim-ref binary file. Luckily, it compiles with all symbols preserved, it looks exactly the same as the source code in some professional reverse engineering tools. Here is just an example:

Through reverse the logic, we can also know what we need to do in our program XD.

## Part B

In part B, we are going to write a function to transpose matrix with different sizes 32 x 32, 64 x 64, 61 x 67, under the cache condition of s = 5, E = 1, b = 5, which has 32 set/line, each line have 32 bytes.

### 32 x 32

A 32 x 32 matrix is easy. Since each line of cache can store at most 8 int. Splitting the block into 8 x 8 submatrix, reading out each line in matrix A and copying it to each column in matrix B will solve the problem.

Result:

### 64 x 64

This is the hardest part of the assignment, I didn’t finish this myself, so I take reference from here.

So basically, the idea is to split the 8 x 8 matrix into a smaller 4 x 4 matrix. However, if we directly apply the idea to the 8 x 8 matrix, the result will be the same as splitting the original matrix into a 4 x 4 matrix, which is about 1644 misses, failing the test.

That’s because these two methods have the same idea, which is to read 4 rows in A and write 4 columns in B. So they have the same result, no matter what size the original matrix is.

The trick is that: When reading the first four lines of the 8 x 8 submatrix, it also reads in the content in block 2 and stores it somewhere in matrix B (the content in B doesn’t matter because not finished yet). We use the same cache to read out more data.

Then, we reuse the cache to recover the lost block 2 in matrix B and copy the rest content to B.

So in the code, it reverses the read sequence to “preheat” the cache and prepare for the following read/write in blocks 3 and 4 to decrease the miss further.

Result:

### 61 x 67

This is also very simple to deal with. Since it is not a square matrix, the improvement to split the matrix into smaller parts is little. We can just split it into a 16 x 16 matrix and it will be fine.

Result: