CacheLab- Cache Simulator - Part I

cache-simulator思路

先讀docs, valgrind 可以提供關(guān)于 cache 的 memory access,命令:

linux> valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes ls -l

memory trace 是這樣的狀態(tài):


I 0400d7d4,8
 M 0421c7f0,4
 L 04f6b868,8
 S 7ff0005c8,8

[space]operation address,size

I -> instruction load, L -> data load, S -> data store, M -> data modify.(data load followed by a data store). 然后I前面一定沒(méi)有space,M, L, S 前面都會(huì)有一個(gè)space。addres 是 64-bit hex memory 地址, size 是operation access。

我們用的是trace file,存在 traces 文件夾下,看一下,比如 yi.trace:

 L 10,1
 M 20,1
 L 22,1
 S 18,1
 L 110,1
 L 210,1
 M 12,1

看另一個(gè)文件:

I  00400531,2
I  00400581,4
 L 7ff000384,4
I  00400585,2
I  00400533,7
 S 7ff000388,4

的確有 instruction load 和 data load,所以需要注意,也有f的出現(xiàn),所以確定是 hex 。

然后來(lái)看給的例子:

linux> ./csim-ref -s 4 -E 1 -b 4 -t traces/yi.trace
hits:4 misses:5 evictions:3

verbose模式:

linux> ./csim-ref -v -s 4 -E 1 -b 4 -t traces/yi.trace 
L 10,1 miss
M 20,1 miss hit
L 22,1 hit
S 18,1 hit
L 110,1 miss eviction
L 210,1 miss eviction
M 12,1 miss eviction hit 
hits:4 misses:5 evictions:3

先來(lái)分析一下結(jié)果:s = 4 -> S = 16, E = 1, b = 4, B = 2^4 = 16.

所以看結(jié)構(gòu),

  • L 10,1 miss 必然的,這個(gè)時(shí)候 0x10 - 0x1f 被加載進(jìn) set 1
  • M 20,1 miss hit 因?yàn)镸是 load + store,首先 0x20 - 0x2f 被加載進(jìn) set 2,先 miss, 再hit
  • L 22,1 hit 必然因?yàn)?0x22 此刻位于 set 2
  • L 18,1 hit 此刻0x18 位于 set 1
  • L 110,1 miss eviction 0x110 % (16 * 16) = 16 -> 所以這個(gè)也想在set 1, 所以出現(xiàn) miss eviction(驅(qū)逐)
  • L 210,1 miss eviction 0x210 % (16 * 16) = 16 -> 同想在set 1, 再出現(xiàn) miss eviction
  • M 12,1 miss eviction hit M = load + store 0x12 想在set 1, 所以先 miss eviction 再hit
  • 所以數(shù)數(shù): hits:4 misses:5 evictions:3

再看一個(gè)文件例子: trans.trace:

 S 00600aa0,1
I  004005b6,5
I  004005bb,5
I  004005c0,5
 S 7ff000398,8
I  0040051e,1
 S 7ff000390,8
I  0040051f,3
I  00400522,4
 S 7ff000378,8
I  00400526,4
 S 7ff000370,8
I  0040052a,7
 S 7ff000384,4
I  00400531,2
I  00400581,4
 L 7ff000384,4
I  00400585,2
I  00400533,7
 S 7ff000388,4
 ....

如果運(yùn)行,結(jié)果是:

S 600aa0,1 miss 
S 7ff000398,8 miss 
S 7ff000390,8 hit 
S 7ff000378,8 miss 
S 7ff000370,8 hit 
S 7ff000384,4 miss 
L 7ff000384,4 hit 
S 7ff000388,4 hit 
...

所以可以看出來(lái),首先這個(gè)I *****, instruction load 并不會(huì)影響數(shù)據(jù),因?yàn)槭?load 指令,S則會(huì)。

再看例子:

./csim-ref -v -s 4 -E 2 -b 4 -t traces/yi.trace
L 10,1 miss 
M 20,1 miss hit 
L 22,1 hit 
S 18,1 hit 
L 110,1 miss 
L 210,1 miss eviction 
M 12,1 miss eviction hit 
hits:4 misses:5 evictions:2

這一次 S = 16, E = 2, B = 16, c = 512

大體思路清楚,嘗試coding。

  • L 0x10,1 加載入 0x10 / 16 = set 1
  • M 0x20,1 加載入 0x20 / 16 = set 2
  • L 0x22,1 跟0x20一起 hit 0x22 / 16 = set 2
  • S 0x18,1 跟0x10一起 hit 0x18 / 16 = set 1
  • L 0x110,1 放入 set 1 中空閑部分。0x110 / 16 = 17, 17 % 16 = 1
  • L 0x210,1 需要驅(qū)逐set 1中某個(gè),所以 eviction 0x210 / 16 = 33, 33 % 16 = 1
  • M 0x12, 1 同樣需要處于set 1中

這里我需要作出正確的計(jì)算公式

Coding

getopt

模仿slides 代碼先寫 getopt, 注意optString的格式:

char*optstring = “ab:c::”;
單個(gè)字符a         表示選項(xiàng)a沒(méi)有參數(shù)            格式:-a即可,不加參數(shù)
單字符加冒號(hào)b:     表示選項(xiàng)b有且必須加參數(shù)      格式:-b 100或-b100,但-b=100錯(cuò)
單字符加2冒號(hào)c::   表示選項(xiàng)c可以有,也可以無(wú)     格式:-c200,其它格式錯(cuò)誤

optarg —— 指向當(dāng)前選項(xiàng)參數(shù)(如果有)的指針。

可以參見(jiàn)文章: 命令行選項(xiàng)解析函數(shù)(C語(yǔ)言):getopt()和getopt_long()

while(-1 != (opt = getopt(argc, argv, "hvs:E:b:t:"))){
  ...
}

先來(lái)最簡(jiǎn)單的 case 'h': usage(), 直接照抄string,然后可以值得注意的是:對(duì)于一個(gè)比較長(zhǎng)的string 可以用折行符 \

// 折行符'\'是代碼換行連接的標(biāo)記(一行不夠?qū)懀?"a looooooooooong \
 string" 

Cache

看hint:

A cache is just 2D array of cache lines:

  • struct cache_line cache[S][E];
  • S = 2^s, is the number of sets
  • E is associa-vity

Each cache_line has:

  • Valid bit
  • Tag
  • LRU counter( only if you are not using a queue)

先來(lái)看 cache_line,因?yàn)檫@里的跟實(shí)際的有所區(qū)別,所以: valid_bit, start_address, end_adress 感覺(jué)一定是有必要的,關(guān)于LRU, 只會(huì)在如下的狀態(tài)下會(huì)出現(xiàn),E > 1, 然后都被填滿了,我們需要 evict, 那么我們調(diào)一個(gè)最近都沒(méi)有用過(guò)的block來(lái)驅(qū)逐,最直觀的想法是用一個(gè)timestamp,然后每次檢查最久遠(yuǎn)的timestamp,驅(qū)逐這個(gè)。

看一下有關(guān)C中的時(shí)間:

Unix and POSIX-compliant systems implement time_t as an integer or real-floating type (typically a 32- or 64-bit integer) which represents the number of seconds since the start of the Unix epoch: midnight UTC of January 1, 1970 (not counting leap seconds). Some systems correctly handle negative time values, while others do not. Systems using a signed 32-bit time_t type are susceptible to the Year 2038 problem.

查看此處 C Programming/C Reference/time.h/time_t

#include <stdio.h>
#include <time.h>

int main(){
  time_t t;
  t = time(NULL);
  printf("%ld\n",t);

  return 0;
}

運(yùn)行:1543130656 , 1543130661

依舊可能出現(xiàn)問(wèn)題,因?yàn)檫@里給的是 seconds 秒數(shù),可能如果操作間隔時(shí)間很短可能我們無(wú)法分辨(?to be questioned here)

然后看到有很多這樣的問(wèn)題 time in milliseconds

long long current_timestamp() {
  struct timeval te;
  gettimeofday(&te, NULL); // get current time
  long long milliseconds = te.tv_sec*1000LL + te.tv_usec/1000; // calculate milliseconds
  return milliseconds;
} 

如果出現(xiàn)不能分辨的問(wèn)題可能需要用milliseconds. 暫時(shí)先用 seconds.

所以暫定 cache_line:

typedef struct {
  bool valid_bit;
  long start_address;
  long end_address;
  long lru_counter;
} cache_line;

然后需要?jiǎng)討B(tài)分配二維數(shù)組 struct cache_line cache[S][E]。查看 C語(yǔ)言常見(jiàn)問(wèn)題集。

傳統(tǒng)的解決方案是分配一個(gè)指針數(shù)組, 然后把每個(gè)指針初始化為動(dòng)態(tài)分配的 “列”。以下為一個(gè)二維的例子:

#include <stdlib.h>
        int **array1 = malloc(nrows * sizeof(int *));
        for(i = 0; i < nrows; i++)
            array1[i] = malloc(ncolumns * sizeof(int));

當(dāng)然, 在真實(shí)代碼中, 所有的 malloc 返回值都必須檢查。你也可以使用 sizeof(array1)和sizeof(*array1)代替sizeof(int *)和sizeof(int)

發(fā)現(xiàn)配套的 csapp.h 和 csapp.c 已經(jīng)具備 wrap 的 Malloc, Free, 無(wú)需我們?cè)馘e(cuò)誤檢查:

// csapp.c
void *Malloc(size_t size) 
{
    void *p;

    if ((p  = malloc(size)) == NULL)
    unix_error("Malloc error");
    return p;
}

根據(jù)網(wǎng)上的提示吧 csapp.h 和 csapp.c 拷貝到 /usr/include 中。然后 csapp.h 的 #endif之前添上#include <csapp.c>, 但是compile 發(fā)現(xiàn)錯(cuò)誤,應(yīng)該是所謂的 " csapp.c文件中有關(guān)于線程中部分,gcc編譯的時(shí)候必須帶 -lpthread,否則會(huì)出錯(cuò)的。"

修改 Makefile:

 12 csim: csim.c cachelab.c cachelab.h
 13         $(CC) $(CFLAGS) -o csim csim.c cachelab.c -lm -lpthread 

然后可以make成功。

讀文件

make 成功的下一步我們來(lái)嘗試讀取文件。

// open the trace file to read
FILE *trace_file;
trace_file = fopen(file_name, "r");
  
char identifier;
unsigned address;
int size;
while(fscanf(trace_file, " %c %x,%d", &identifier, &address, &size) >0)
{
  // 
  printf(" %c %x,%d\n",identifier, address, size);
}
printf("\n");
fclose(trace_file);

讀取成功,能成功顯示。正式開(kāi)始和 structurelized.

尋找cache_block

按照之前的寫法來(lái)尋找一個(gè)地址應(yīng)當(dāng)對(duì)應(yīng)的 cache_block. 然后發(fā)現(xiàn)最好/需要將一些東西變成全局變量,畢竟也沒(méi)有禁止使用 global variable.

糾錯(cuò)

發(fā)現(xiàn)如果使用 LRU counter 無(wú)論我的 time stamp 用 seconds 或者 milliseconds 都可能導(dǎo)致兩條指令太接近不能分辨,所以干脆用一個(gè) global variable.

終于 ./test-csim 拿到全部分?jǐn)?shù)||||

代碼如下,如果就 ./csim -h 會(huì)有以下結(jié)果:

輸完幫助參數(shù)之后出現(xiàn)|||
Segmentation fault (core dumped)

我猜代碼應(yīng)當(dāng)有很多可以優(yōu)化的地方|||

#include "cachelab.h"
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>
#include <string.h>
#include <math.h>
#include <csapp.h>
#include <time.h>

#define bool char
#define true 1
#define false 0

typedef struct {
  bool valid;
  long start;
  long end;
  long lru;
} cache_line;

void usage();
int find_s_index(unsigned long address);
int find_e_index(unsigned long address, int size);
int find_unsed_e_index(unsigned long address, int size);
void modify_cache_block(int s_index, int e_index, unsigned long address);
char* save_data(unsigned long address, int size);
char* load_data(unsigned long address, int size);
void cache_instruction(char identifier, unsigned long address, int size);

unsigned S,E,B; // S = 2^s, B = 2^b
bool verbose;
unsigned hits,misses,evictions; 
char* file_name; // trace file name
cache_line** cache;
long lru_timer;

int main(int argc, char** argv)
{
  int opt;
  unsigned s,b;

  // init to omit errors
  lru_timer = 0;
  verbose = false;
  hits = misses = evictions = 0;

  // get opt 
  while(-1 != (opt = getopt(argc, argv, "hvs:E:b:t:"))){
    switch(opt){
      case 'h':
        usage();
        break;
      case 'v':
        verbose = true;
        break;
      case 's':
        s = atoi(optarg);
        break;
      case 'E':
        E = atoi(optarg);
        break;
      case 'b':
        b = atoi(optarg);
        break;
      case 't':
        file_name = optarg;
        break;
      default:
        printf("wrong argument\n");
        break;
    }
  }
  
  S = pow(2.0, (float)s);
  B = pow(2.0, (float)b);

  // allocate cache
  cache = Malloc(S * sizeof(cache_line *));
  for(int i = 0; i < S; i++)
    cache[i] = Malloc( E * sizeof(cache_line));

  for(int i = 0; i < S ; i++)
    for(int j = 0; j < E; j++)
      cache[i][j].valid = false;

  // open the trace file to read
  FILE *trace_file;
  trace_file = fopen(file_name, "r");
  
  char identifier;
  unsigned long address;
  int size;
  while(fscanf(trace_file, " %c %lx,%d", &identifier, &address, &size) >0)
  {
    cache_instruction(identifier, address, size);
  }

  for (int i = 0; i < S; ++i)
  {
    Free(cache[i]);
  }
  
  fclose(trace_file);
  printSummary(hits, misses, evictions);

  return 0;
}

/*
 * find the set of the address
 */
int find_s_index(unsigned long address)
{
  int s_index = address / B;
  if (s_index >= S)
    s_index = s_index % S;
 
  return s_index;
}

void cache_instruction(char identifier, unsigned long address, int size)
{
  switch(identifier){
    case 'S':
    {
      char* description = load_data(address, size);
      if (verbose == true)
      {
        printf("%c %lx,%d ",identifier, address, size);
        printf("%s \n", description);
      }     
      break;
    }
    case 'M':
    {
      char* load = load_data(address, size);
      char* save = save_data(address, size);
      if (verbose == true)
      {
        printf("%c %lx,%d ",identifier, address, size);
        printf("%s ", load);
        printf("%s \n",save);
      }
      break;
    }
    case 'L':
    {
      char* description = load_data(address, size);
      if (verbose == true)
      {
        printf("%c %lx,%d ",identifier, address, size);
        printf("%s \n", description);
      }
      break;
    }
  }
  lru_timer++;
}

char* save_data(unsigned long address, int size)
{
  char* description;
  int s_index = find_s_index(address);
  int e_index = find_e_index(address, size);

  if(e_index == -1)
  {
    misses += 1;
    description = "miss";
  } else {
    hits += 1;

    modify_cache_block(s_index, e_index, address);
    description = "hit";
  }

  return description;
}

/*
 * modify cache block
 */
void modify_cache_block(int s_index, int e_index, unsigned long address)
{
  cache[s_index][e_index].valid = true;
  cache[s_index][e_index].start = (address / B) * B;
  cache[s_index][e_index].end = cache[s_index][e_index].start + (B - 1);
  cache[s_index][e_index].lru = lru_timer;
}



// here I made the assumption that the size will be valid?
char* load_data(unsigned long address, int size)
{
  char* description;

  int s_index = find_s_index(address);
  int e_index = find_e_index(address, size);
  
  // we have a hit
  if (e_index != -1){
    description = "hit";
    hits +=1;
    

    modify_cache_block(s_index, e_index, address);
    return description;
  }
 
  // we have not fulled block and a miss
  e_index = find_unsed_e_index(address, size);
  if (e_index != -1)
  {
    description = "miss";
    misses += 1;


    modify_cache_block(s_index,e_index,address);
    return description;
  }

  cache_line* located_set = cache[s_index];
  e_index = 0;
  // every block is used and we have to pick out lru
  for(int i = 0; i < E ; i++)
  {
    cache_line block = located_set[i];
    if(located_set[e_index].lru >  block.lru)
       e_index = i;
  }

  description = "miss eviction";
  misses += 1;
  evictions += 1;

  modify_cache_block(s_index,e_index,address);
  return description;
}

/*
 * find e_index of cache block, -1 stands for not found
 */
int find_e_index(unsigned long address, int size)
{
  int s_index = find_s_index(address);
  cache_line* located_set = cache[s_index];

  for(int i = 0; i < E; i++)
  {
    cache_line block = located_set[i];

    if(block.valid == true && address >= block.start && address <= block.end)
       return i;
   }

  return -1; 
}

int find_unsed_e_index(unsigned long address, int size)
{
  int s_index = find_s_index(address);
  cache_line* located_set = cache[s_index];
  
  for (int i = 0; i < E; i++)
  {
    if (located_set[i].valid == false)
      return i;
  }
  return -1;
}


void usage()
{
  char *usage  =
  "Usage: ./csim [-hv] -s <num> -E <num> -b <num> -t <file>\n\
   Options:\n\
    -h         Print this help message.\n\
    -v         Optional verbose flag.\n\
    -s <num>   Number of set index bits.\n\
    -E <num>   Number of lines per set.\n\
    -b <num>   Number of block offset bits.\n\
    -t <file>  Trace file.\n\
    \n\
  Examples:\n\
    linux>  ./csim -s 4 -E 1 -b 4 -t traces/yi.trace \n\
    linux>  ./csim -v -s 8 -E 2 -b 4 -t traces/yi.trace";
    printf("%s\n",usage);
}

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容