cache-simulator思路
先讀docs, valgrind 可以提供關(guān)于 cache 的 memory access,命令:
linux> valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes ls -l
memory trace 是這樣的狀態(tài):
I 0400d7d4,8
M 0421c7f0,4
L 04f6b868,8
S 7ff0005c8,8
[space]operation address,size
I -> instruction load, L -> data load, S -> data store, M -> data modify.(data load followed by a data store). 然后I前面一定沒(méi)有space,M, L, S 前面都會(huì)有一個(gè)space。addres 是 64-bit hex memory 地址, size 是operation access。
我們用的是trace file,存在 traces 文件夾下,看一下,比如 yi.trace:
L 10,1
M 20,1
L 22,1
S 18,1
L 110,1
L 210,1
M 12,1
看另一個(gè)文件:
I 00400531,2
I 00400581,4
L 7ff000384,4
I 00400585,2
I 00400533,7
S 7ff000388,4
的確有 instruction load 和 data load,所以需要注意,也有f的出現(xiàn),所以確定是 hex 。
然后來(lái)看給的例子:
linux> ./csim-ref -s 4 -E 1 -b 4 -t traces/yi.trace
hits:4 misses:5 evictions:3
verbose模式:
linux> ./csim-ref -v -s 4 -E 1 -b 4 -t traces/yi.trace
L 10,1 miss
M 20,1 miss hit
L 22,1 hit
S 18,1 hit
L 110,1 miss eviction
L 210,1 miss eviction
M 12,1 miss eviction hit
hits:4 misses:5 evictions:3
先來(lái)分析一下結(jié)果:s = 4 -> S = 16, E = 1, b = 4, B = 2^4 = 16.
所以看結(jié)構(gòu),
- L 10,1 miss 必然的,這個(gè)時(shí)候 0x10 - 0x1f 被加載進(jìn) set 1
- M 20,1 miss hit 因?yàn)镸是 load + store,首先 0x20 - 0x2f 被加載進(jìn) set 2,先 miss, 再hit
- L 22,1 hit 必然因?yàn)?0x22 此刻位于 set 2
- L 18,1 hit 此刻0x18 位于 set 1
- L 110,1 miss eviction 0x110 % (16 * 16) = 16 -> 所以這個(gè)也想在set 1, 所以出現(xiàn) miss eviction(驅(qū)逐)
- L 210,1 miss eviction 0x210 % (16 * 16) = 16 -> 同想在set 1, 再出現(xiàn) miss eviction
- M 12,1 miss eviction hit M = load + store 0x12 想在set 1, 所以先 miss eviction 再hit
- 所以數(shù)數(shù): hits:4 misses:5 evictions:3
再看一個(gè)文件例子: trans.trace:
S 00600aa0,1
I 004005b6,5
I 004005bb,5
I 004005c0,5
S 7ff000398,8
I 0040051e,1
S 7ff000390,8
I 0040051f,3
I 00400522,4
S 7ff000378,8
I 00400526,4
S 7ff000370,8
I 0040052a,7
S 7ff000384,4
I 00400531,2
I 00400581,4
L 7ff000384,4
I 00400585,2
I 00400533,7
S 7ff000388,4
....
如果運(yùn)行,結(jié)果是:
S 600aa0,1 miss
S 7ff000398,8 miss
S 7ff000390,8 hit
S 7ff000378,8 miss
S 7ff000370,8 hit
S 7ff000384,4 miss
L 7ff000384,4 hit
S 7ff000388,4 hit
...
所以可以看出來(lái),首先這個(gè)I *****, instruction load 并不會(huì)影響數(shù)據(jù),因?yàn)槭?load 指令,S則會(huì)。
再看例子:
./csim-ref -v -s 4 -E 2 -b 4 -t traces/yi.trace
L 10,1 miss
M 20,1 miss hit
L 22,1 hit
S 18,1 hit
L 110,1 miss
L 210,1 miss eviction
M 12,1 miss eviction hit
hits:4 misses:5 evictions:2
這一次 S = 16, E = 2, B = 16, c = 512
大體思路清楚,嘗試coding。
- L 0x10,1 加載入 0x10 / 16 = set 1
- M 0x20,1 加載入 0x20 / 16 = set 2
- L 0x22,1 跟0x20一起 hit 0x22 / 16 = set 2
- S 0x18,1 跟0x10一起 hit 0x18 / 16 = set 1
- L 0x110,1 放入 set 1 中空閑部分。0x110 / 16 = 17, 17 % 16 = 1
- L 0x210,1 需要驅(qū)逐set 1中某個(gè),所以 eviction 0x210 / 16 = 33, 33 % 16 = 1
- M 0x12, 1 同樣需要處于set 1中
這里我需要作出正確的計(jì)算公式
Coding
getopt
模仿slides 代碼先寫 getopt, 注意optString的格式:
char*optstring = “ab:c::”;
單個(gè)字符a 表示選項(xiàng)a沒(méi)有參數(shù) 格式:-a即可,不加參數(shù)
單字符加冒號(hào)b: 表示選項(xiàng)b有且必須加參數(shù) 格式:-b 100或-b100,但-b=100錯(cuò)
單字符加2冒號(hào)c:: 表示選項(xiàng)c可以有,也可以無(wú) 格式:-c200,其它格式錯(cuò)誤
optarg —— 指向當(dāng)前選項(xiàng)參數(shù)(如果有)的指針。
可以參見(jiàn)文章: 命令行選項(xiàng)解析函數(shù)(C語(yǔ)言):getopt()和getopt_long()
while(-1 != (opt = getopt(argc, argv, "hvs:E:b:t:"))){
...
}
先來(lái)最簡(jiǎn)單的 case 'h': usage(), 直接照抄string,然后可以值得注意的是:對(duì)于一個(gè)比較長(zhǎng)的string 可以用折行符 \
// 折行符'\'是代碼換行連接的標(biāo)記(一行不夠?qū)懀?"a looooooooooong \
string"
Cache
看hint:
A cache is just 2D array of cache lines:
- struct cache_line cache[S][E];
- S = 2^s, is the number of sets
- E is associa-vity
Each cache_line has:
- Valid bit
- Tag
- LRU counter( only if you are not using a queue)
先來(lái)看 cache_line,因?yàn)檫@里的跟實(shí)際的有所區(qū)別,所以: valid_bit, start_address, end_adress 感覺(jué)一定是有必要的,關(guān)于LRU, 只會(huì)在如下的狀態(tài)下會(huì)出現(xiàn),E > 1, 然后都被填滿了,我們需要 evict, 那么我們調(diào)一個(gè)最近都沒(méi)有用過(guò)的block來(lái)驅(qū)逐,最直觀的想法是用一個(gè)timestamp,然后每次檢查最久遠(yuǎn)的timestamp,驅(qū)逐這個(gè)。
看一下有關(guān)C中的時(shí)間:
Unix and POSIX-compliant systems implement time_t as an integer or real-floating type (typically a 32- or 64-bit integer) which represents the number of seconds since the start of the Unix epoch: midnight UTC of January 1, 1970 (not counting leap seconds). Some systems correctly handle negative time values, while others do not. Systems using a signed 32-bit time_t type are susceptible to the Year 2038 problem.
查看此處 C Programming/C Reference/time.h/time_t
#include <stdio.h>
#include <time.h>
int main(){
time_t t;
t = time(NULL);
printf("%ld\n",t);
return 0;
}
運(yùn)行:1543130656 , 1543130661
依舊可能出現(xiàn)問(wèn)題,因?yàn)檫@里給的是 seconds 秒數(shù),可能如果操作間隔時(shí)間很短可能我們無(wú)法分辨(?to be questioned here)
然后看到有很多這樣的問(wèn)題 time in milliseconds
long long current_timestamp() {
struct timeval te;
gettimeofday(&te, NULL); // get current time
long long milliseconds = te.tv_sec*1000LL + te.tv_usec/1000; // calculate milliseconds
return milliseconds;
}
如果出現(xiàn)不能分辨的問(wèn)題可能需要用milliseconds. 暫時(shí)先用 seconds.
所以暫定 cache_line:
typedef struct {
bool valid_bit;
long start_address;
long end_address;
long lru_counter;
} cache_line;
然后需要?jiǎng)討B(tài)分配二維數(shù)組 struct cache_line cache[S][E]。查看 C語(yǔ)言常見(jiàn)問(wèn)題集。
傳統(tǒng)的解決方案是分配一個(gè)指針數(shù)組, 然后把每個(gè)指針初始化為動(dòng)態(tài)分配的 “列”。以下為一個(gè)二維的例子:
#include <stdlib.h>
int **array1 = malloc(nrows * sizeof(int *));
for(i = 0; i < nrows; i++)
array1[i] = malloc(ncolumns * sizeof(int));
當(dāng)然, 在真實(shí)代碼中, 所有的 malloc 返回值都必須檢查。你也可以使用 sizeof(array1)和sizeof(*array1)代替sizeof(int *)和sizeof(int)
發(fā)現(xiàn)配套的 csapp.h 和 csapp.c 已經(jīng)具備 wrap 的 Malloc, Free, 無(wú)需我們?cè)馘e(cuò)誤檢查:
// csapp.c
void *Malloc(size_t size)
{
void *p;
if ((p = malloc(size)) == NULL)
unix_error("Malloc error");
return p;
}
根據(jù)網(wǎng)上的提示吧 csapp.h 和 csapp.c 拷貝到 /usr/include 中。然后 csapp.h 的 #endif之前添上#include <csapp.c>, 但是compile 發(fā)現(xiàn)錯(cuò)誤,應(yīng)該是所謂的 " csapp.c文件中有關(guān)于線程中部分,gcc編譯的時(shí)候必須帶 -lpthread,否則會(huì)出錯(cuò)的。"
修改 Makefile:
12 csim: csim.c cachelab.c cachelab.h
13 $(CC) $(CFLAGS) -o csim csim.c cachelab.c -lm -lpthread
然后可以make成功。
讀文件
make 成功的下一步我們來(lái)嘗試讀取文件。
// open the trace file to read
FILE *trace_file;
trace_file = fopen(file_name, "r");
char identifier;
unsigned address;
int size;
while(fscanf(trace_file, " %c %x,%d", &identifier, &address, &size) >0)
{
//
printf(" %c %x,%d\n",identifier, address, size);
}
printf("\n");
fclose(trace_file);
讀取成功,能成功顯示。正式開(kāi)始和 structurelized.
尋找cache_block
按照之前的寫法來(lái)尋找一個(gè)地址應(yīng)當(dāng)對(duì)應(yīng)的 cache_block. 然后發(fā)現(xiàn)最好/需要將一些東西變成全局變量,畢竟也沒(méi)有禁止使用 global variable.
糾錯(cuò)
發(fā)現(xiàn)如果使用 LRU counter 無(wú)論我的 time stamp 用 seconds 或者 milliseconds 都可能導(dǎo)致兩條指令太接近不能分辨,所以干脆用一個(gè) global variable.
終于 ./test-csim 拿到全部分?jǐn)?shù)||||
代碼如下,如果就 ./csim -h 會(huì)有以下結(jié)果:
輸完幫助參數(shù)之后出現(xiàn)|||
Segmentation fault (core dumped)
我猜代碼應(yīng)當(dāng)有很多可以優(yōu)化的地方|||
#include "cachelab.h"
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>
#include <string.h>
#include <math.h>
#include <csapp.h>
#include <time.h>
#define bool char
#define true 1
#define false 0
typedef struct {
bool valid;
long start;
long end;
long lru;
} cache_line;
void usage();
int find_s_index(unsigned long address);
int find_e_index(unsigned long address, int size);
int find_unsed_e_index(unsigned long address, int size);
void modify_cache_block(int s_index, int e_index, unsigned long address);
char* save_data(unsigned long address, int size);
char* load_data(unsigned long address, int size);
void cache_instruction(char identifier, unsigned long address, int size);
unsigned S,E,B; // S = 2^s, B = 2^b
bool verbose;
unsigned hits,misses,evictions;
char* file_name; // trace file name
cache_line** cache;
long lru_timer;
int main(int argc, char** argv)
{
int opt;
unsigned s,b;
// init to omit errors
lru_timer = 0;
verbose = false;
hits = misses = evictions = 0;
// get opt
while(-1 != (opt = getopt(argc, argv, "hvs:E:b:t:"))){
switch(opt){
case 'h':
usage();
break;
case 'v':
verbose = true;
break;
case 's':
s = atoi(optarg);
break;
case 'E':
E = atoi(optarg);
break;
case 'b':
b = atoi(optarg);
break;
case 't':
file_name = optarg;
break;
default:
printf("wrong argument\n");
break;
}
}
S = pow(2.0, (float)s);
B = pow(2.0, (float)b);
// allocate cache
cache = Malloc(S * sizeof(cache_line *));
for(int i = 0; i < S; i++)
cache[i] = Malloc( E * sizeof(cache_line));
for(int i = 0; i < S ; i++)
for(int j = 0; j < E; j++)
cache[i][j].valid = false;
// open the trace file to read
FILE *trace_file;
trace_file = fopen(file_name, "r");
char identifier;
unsigned long address;
int size;
while(fscanf(trace_file, " %c %lx,%d", &identifier, &address, &size) >0)
{
cache_instruction(identifier, address, size);
}
for (int i = 0; i < S; ++i)
{
Free(cache[i]);
}
fclose(trace_file);
printSummary(hits, misses, evictions);
return 0;
}
/*
* find the set of the address
*/
int find_s_index(unsigned long address)
{
int s_index = address / B;
if (s_index >= S)
s_index = s_index % S;
return s_index;
}
void cache_instruction(char identifier, unsigned long address, int size)
{
switch(identifier){
case 'S':
{
char* description = load_data(address, size);
if (verbose == true)
{
printf("%c %lx,%d ",identifier, address, size);
printf("%s \n", description);
}
break;
}
case 'M':
{
char* load = load_data(address, size);
char* save = save_data(address, size);
if (verbose == true)
{
printf("%c %lx,%d ",identifier, address, size);
printf("%s ", load);
printf("%s \n",save);
}
break;
}
case 'L':
{
char* description = load_data(address, size);
if (verbose == true)
{
printf("%c %lx,%d ",identifier, address, size);
printf("%s \n", description);
}
break;
}
}
lru_timer++;
}
char* save_data(unsigned long address, int size)
{
char* description;
int s_index = find_s_index(address);
int e_index = find_e_index(address, size);
if(e_index == -1)
{
misses += 1;
description = "miss";
} else {
hits += 1;
modify_cache_block(s_index, e_index, address);
description = "hit";
}
return description;
}
/*
* modify cache block
*/
void modify_cache_block(int s_index, int e_index, unsigned long address)
{
cache[s_index][e_index].valid = true;
cache[s_index][e_index].start = (address / B) * B;
cache[s_index][e_index].end = cache[s_index][e_index].start + (B - 1);
cache[s_index][e_index].lru = lru_timer;
}
// here I made the assumption that the size will be valid?
char* load_data(unsigned long address, int size)
{
char* description;
int s_index = find_s_index(address);
int e_index = find_e_index(address, size);
// we have a hit
if (e_index != -1){
description = "hit";
hits +=1;
modify_cache_block(s_index, e_index, address);
return description;
}
// we have not fulled block and a miss
e_index = find_unsed_e_index(address, size);
if (e_index != -1)
{
description = "miss";
misses += 1;
modify_cache_block(s_index,e_index,address);
return description;
}
cache_line* located_set = cache[s_index];
e_index = 0;
// every block is used and we have to pick out lru
for(int i = 0; i < E ; i++)
{
cache_line block = located_set[i];
if(located_set[e_index].lru > block.lru)
e_index = i;
}
description = "miss eviction";
misses += 1;
evictions += 1;
modify_cache_block(s_index,e_index,address);
return description;
}
/*
* find e_index of cache block, -1 stands for not found
*/
int find_e_index(unsigned long address, int size)
{
int s_index = find_s_index(address);
cache_line* located_set = cache[s_index];
for(int i = 0; i < E; i++)
{
cache_line block = located_set[i];
if(block.valid == true && address >= block.start && address <= block.end)
return i;
}
return -1;
}
int find_unsed_e_index(unsigned long address, int size)
{
int s_index = find_s_index(address);
cache_line* located_set = cache[s_index];
for (int i = 0; i < E; i++)
{
if (located_set[i].valid == false)
return i;
}
return -1;
}
void usage()
{
char *usage =
"Usage: ./csim [-hv] -s <num> -E <num> -b <num> -t <file>\n\
Options:\n\
-h Print this help message.\n\
-v Optional verbose flag.\n\
-s <num> Number of set index bits.\n\
-E <num> Number of lines per set.\n\
-b <num> Number of block offset bits.\n\
-t <file> Trace file.\n\
\n\
Examples:\n\
linux> ./csim -s 4 -E 1 -b 4 -t traces/yi.trace \n\
linux> ./csim -v -s 8 -E 2 -b 4 -t traces/yi.trace";
printf("%s\n",usage);
}