日精品无码,亚洲精品一区二区综合

簡(jiǎn)介

lz4是目前綜合來(lái)看效率最高的壓縮算法，更加側(cè)重壓縮解壓速度，壓縮比并不是第一。在當(dāng)前的安卓和蘋(píng)果操作系統(tǒng)中，內(nèi)存壓縮技術(shù)就使用的是lz4算法，及時(shí)壓縮手機(jī)內(nèi)存以帶來(lái)更多的內(nèi)存空間。本質(zhì)上是時(shí)間換空間。

壓縮原理

lz4壓縮算法其實(shí)很簡(jiǎn)單，舉個(gè)壓縮的栗子

輸入：abcde_bcdefgh_abcdefghxxxxxxx
輸出：abcde_(5,4)fgh_(14,5)fghxxxxxxx

其中兩個(gè)括號(hào)內(nèi)的便代表的是壓縮時(shí)檢測(cè)到的重復(fù)項(xiàng)，(5,4) 代表向前5個(gè)byte，匹配到的內(nèi)容長(zhǎng)度有4，即"bcde"是一個(gè)重復(fù)。當(dāng)然也可以說(shuō)"cde"是個(gè)重復(fù)項(xiàng)，但是根據(jù)算法實(shí)現(xiàn)的輸入流掃描順序，我們?nèi)〉降氖堑谝粋€(gè)匹配到的，并且長(zhǎng)度最長(zhǎng)的作為匹配。

1.壓縮格式

壓縮后的數(shù)據(jù)是下面的格式

輸入：abcde_bcdefgh_abcdefghxxxxxxx
輸出：tokenabcde_(5,4)fgh_(14,5)fghxxxxxxx
格式：[token]literals(offset,match length)[token]literals(offset,match length)....

其他情況也可能有連續(xù)的匹配：

輸入：fghabcde_bcdefgh_abcdefghxxxxxxx
輸出：fghabcde_(5,4)(13,3)_(14,5)fghxxxxxxx
格式：[token]literals(offset,match length)[token](offset,match length)....
這里(13,3)長(zhǎng)度3其實(shí)并不對(duì)，match length匹配的長(zhǎng)度默認(rèn)是4

Literals指沒(méi)有重復(fù)、首次出現(xiàn)的字節(jié)流，即不可壓縮的部分
Match指重復(fù)項(xiàng)，可以壓縮的部分
Token記錄literal長(zhǎng)度，match長(zhǎng)度。作為解壓時(shí)候memcpy的參數(shù)

2.壓縮率

可以想到，如果重復(fù)項(xiàng)越多或者越長(zhǎng)，壓縮率就會(huì)越高。上述例子中"bcde"在壓縮后，用(5,4)表示，即從4個(gè)bytes壓縮成了3個(gè)bytes來(lái)表示，其中offset 2bytes, match length 1byte，能節(jié)省1個(gè)byte。

3.壓縮算法實(shí)現(xiàn)

大致流程，壓縮過(guò)程以至少4個(gè)bytes為掃描窗口查找匹配，每次移動(dòng)1byte進(jìn)行掃描，遇到重復(fù)的就進(jìn)行壓縮。
由于offset用2bytes表示，只能查找到到2^16(64kb)距離的匹配，對(duì)于壓縮4Kb的內(nèi)核頁(yè)，只需要用到12位。
掃描的步長(zhǎng)1byte是可以調(diào)整的，即對(duì)應(yīng)LZ4_compress_fast機(jī)制，步長(zhǎng)變長(zhǎng)可以提高壓縮解壓速度，減少壓縮率。

我們來(lái)看下apple的lz4實(shí)現(xiàn)

//src是輸入流，dst是輸出，還需要使用一個(gè)hash表記錄前面一段距離內(nèi)的字符串，用來(lái)查找之前是否有匹配
void lz4_encode_2gb(uint8_t ** dst_ptr,
                    size_t dst_size,
                    const uint8_t ** src_ptr,
                    const uint8_t * src_begin,
                    size_t src_size,
                    lz4_hash_entry_t hash_table[LZ4_COMPRESS_HASH_ENTRIES],
                    int skip_final_literals)
{
  uint8_t *dst = *dst_ptr;        // current output stream position
  uint8_t *end = dst + dst_size - LZ4_GOFAST_SAFETY_MARGIN;
  const uint8_t *src = *src_ptr;  // current input stream literal to encode
  const uint8_t *src_end = src + src_size - LZ4_GOFAST_SAFETY_MARGIN;
  const uint8_t *match_begin = 0; // first byte of matched sequence
  const uint8_t *match_end = 0;   // first byte after matched sequence
//蘋(píng)果這里使用了一個(gè)early abort機(jī)制，即輸入流掃描到lz4_do_abort_eval位置的時(shí)候，仍然沒(méi)有匹配，則認(rèn)為該輸入無(wú)法壓縮，提前結(jié)束不用全部掃描完
#if LZ4_EARLY_ABORT
  uint8_t * const dst_begin = dst;
  uint32_t lz4_do_abort_eval = lz4_do_early_abort;
#endif
  
  while (dst < end)
  {
    ptrdiff_t match_distance = 0;
    //for循環(huán)一次查找到一個(gè)match即跳出到EXPAND_FORWARD
    for (match_begin = src; match_begin < src_end; match_begin += 1) {
      const uint32_t pos = (uint32_t)(match_begin - src_begin);
      //蘋(píng)果這里實(shí)現(xiàn)比較奇怪，還在思考為何同時(shí)查找連續(xù)四個(gè)bytes的匹配
      const uint32_t w0 = load4(match_begin);//該位置4個(gè)bytes的內(nèi)容
      const uint32_t w1 = load4(match_begin + 1);
      const uint32_t w2 = load4(match_begin + 2);
      const uint32_t w3 = load4(match_begin + 3);
      const int i0 = lz4_hash(w0);
      const int i1 = lz4_hash(w1);
      const int i2 = lz4_hash(w2);
      const int i3 = lz4_hash(w3);
      const uint8_t *c0 = src_begin + hash_table[i0].offset;
      const uint8_t *c1 = src_begin + hash_table[i1].offset;
      const uint8_t *c2 = src_begin + hash_table[i2].offset;
      const uint8_t *c3 = src_begin + hash_table[i3].offset;
      const uint32_t m0 = hash_table[i0].word;//取出hash表中以前有沒(méi)有一樣的值
      const uint32_t m1 = hash_table[i1].word;
      const uint32_t m2 = hash_table[i2].word;
      const uint32_t m3 = hash_table[i3].word;
      hash_table[i0].offset = pos;
      hash_table[i0].word = w0;
      hash_table[i1].offset = pos + 1;
      hash_table[i1].word = w1;

      hash_table[i2].offset = pos + 2;
      hash_table[i2].word = w2;
      hash_table[i3].offset = pos + 3;
      hash_table[i3].word = w3;

      match_distance = (match_begin - c0);
      //比較hash表中的值和當(dāng)前指針位置的hash值
      if (w0 == m0 && match_distance < 0x10000 && match_distance > 0) {
        match_end = match_begin + 4;
        goto EXPAND_FORWARD;
      }

      match_begin++;
      match_distance = (match_begin - c1);
      if (w1 == m1 && match_distance < 0x10000 && match_distance > 0) {
        match_end = match_begin + 4;
        goto EXPAND_FORWARD;
      }

      match_begin++;
      match_distance = (match_begin - c2);
      if (w2 == m2 && match_distance < 0x10000 && match_distance > 0) {
        match_end = match_begin + 4;
        goto EXPAND_FORWARD;
      }

      match_begin++;
      match_distance = (match_begin - c3);
      if (w3 == m3 && match_distance < 0x10000 && match_distance > 0) {
        match_end = match_begin + 4;
        goto EXPAND_FORWARD;
      }

#if LZ4_EARLY_ABORT
      //DRKTODO: Evaluate unrolling further. 2xunrolling had some modest benefits
      if (lz4_do_abort_eval && ((pos) >= LZ4_EARLY_ABORT_EVAL)) {
          ptrdiff_t dstd = dst - dst_begin;
          //到這仍然沒(méi)有匹配，放棄
          if (dstd == 0) {
              lz4_early_aborts++;
              return;
          }

/*        if (dstd >= pos) { */
/*            return; */
/*        } */
/*        ptrdiff_t cbytes = pos - dstd; */
/*        if ((cbytes * LZ4_EARLY_ABORT_MIN_COMPRESSION_FACTOR) > pos)  { */
/*            return; */
/*        } */
          lz4_do_abort_eval = 0;
      }
#endif
    }
    //到這，整個(gè)for循環(huán)都沒(méi)有找到match，直接把整個(gè)src拷貝到dst即可
    if (skip_final_literals) { *src_ptr = src; *dst_ptr = dst; return; } // do not emit the final literal sequence
    
    //  Emit a trailing literal that covers the remainder of the source buffer,
    //  if we can do so without exceeding the bounds of the destination buffer.
    size_t src_remaining = src_end + LZ4_GOFAST_SAFETY_MARGIN - src;
    if (src_remaining < 15) {
      *dst++ = (uint8_t)(src_remaining << 4);
      memcpy(dst, src, 16); dst += src_remaining;
    } else {
      *dst++ = 0xf0;
      dst = lz4_store_length(dst, end, (uint32_t)(src_remaining - 15));
      if (dst == 0 || dst + src_remaining >= end) return;
      memcpy(dst, src, src_remaining); dst += src_remaining;
    }
    *dst_ptr = dst;
    *src_ptr = src + src_remaining;
    return;
    
  EXPAND_FORWARD:
    
    // Expand match forward 查看匹配是否能向前擴(kuò)展，擴(kuò)大匹配長(zhǎng)度
    {
      const uint8_t * ref_end = match_end - match_distance;
      while (match_end < src_end)
      {
        size_t n = lz4_nmatch(LZ4_MATCH_SEARCH_LOOP_SIZE, ref_end, match_end);
        if (n < LZ4_MATCH_SEARCH_LOOP_SIZE) { match_end += n; break; }
        match_end += LZ4_MATCH_SEARCH_LOOP_SIZE;
        ref_end += LZ4_MATCH_SEARCH_LOOP_SIZE;
      }
    }
    
    // Expand match backward 查看匹配是否能向后擴(kuò)展，擴(kuò)大匹配長(zhǎng)度
    {
      // match_begin_min = max(src_begin + match_distance,literal)
      const uint8_t * match_begin_min = src_begin + match_distance;
      match_begin_min = (match_begin_min < src)?src:match_begin_min;
      const uint8_t * ref_begin = match_begin - match_distance;
      
      while (match_begin > match_begin_min && ref_begin[-1] == match_begin[-1] ) { match_begin -= 1; ref_begin -= 1; }
    }
    
    // Emit match 確定好match的offset和length以后，編碼成壓縮后的格式
    dst = lz4_emit_match((uint32_t)(match_begin - src), (uint32_t)(match_end - match_begin), (uint32_t)match_distance, dst, end, src);
    if (!dst) return;
    
    // Update state
    src = match_end;
    
    // Update return values to include the last fully encoded match
    //刷新src和dst位置，回到while重新開(kāi)始for循環(huán)
    *dst_ptr = dst;
    *src_ptr = src;
  }
}

安卓?jī)?nèi)存中壓縮的實(shí)例

該例子是一個(gè)起址0xffffffc06185f000的4K頁(yè)，大部分是0和1，由于length或者offset超長(zhǎng)，多了一些特殊處理，這部分可以看安卓的lz4源碼

發(fā)現(xiàn)兩個(gè)匹配，壓縮后的數(shù)據(jù)為31bytes，壓縮后概覽如下
09-15 14:35:06.821 <3>[138, kswapd0][  638.194336]  src 0xffffffc06185f000 literallen 1
09-15 14:35:06.821 <3>[138, kswapd0][  638.194349]  src 0xffffffc06185f000 (1,219)   #(offset,match length)
09-15 14:35:06.821 <3>[138, kswapd0][  638.194359]  src 0xffffffc06185f000 literallen 1
09-15 14:35:06.821 <3>[138, kswapd0][  638.194386]  src 0xffffffc06185f000 (3044,7)
09-15 14:35:06.821 <3>[138, kswapd0][  638.194400]  src 0xffffffc06185f000 count 2 compressed 31
---------------------------對(duì)應(yīng)壓縮后的原始數(shù)據(jù)-----------------------------
第一個(gè)匹配：
09-15 14:35:06.821 <3>[138, kswapd0][  638.194411]   0xffffffc06185f000 31    #token:0001 1111 前四位是literal長(zhǎng)度1，低4位15表示matchlength長(zhǎng)度溢出，要看后面
09-15 14:35:06.821 <3>[138, kswapd0][  638.194422]   0xffffffc06185f000 0     #literal
09-15 14:35:06.821 <3>[138, kswapd0][  638.194433]   0xffffffc06185f000 1     #offset 小端序01
09-15 14:35:06.821 <3>[138, kswapd0][  638.194444]   0xffffffc06185f000 0     #offset
09-15 14:35:06.821 <3>[138, kswapd0][  638.194459]   0xffffffc06185f000 255   #matchLength begin
09-15 14:35:06.821 <3>[138, kswapd0][  638.194469]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194483]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194494]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194505]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194551]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194565]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194579]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194590]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194602]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194612]   0xffffffc06185f000 255   
09-15 14:35:06.822 <3>[138, kswapd0][  638.194624]   0xffffffc06185f000 219   #matchLength end: 219+255*11 3024
第二個(gè)匹配：
09-15 14:35:06.822 <3>[138, kswapd0][  638.194635]   0xffffffc06185f000 31    #Token:0001 1111 前四位是literal長(zhǎng)度1
09-15 14:35:06.822 <3>[138, kswapd0][  638.194646]   0xffffffc06185f000 1     #literal
09-15 14:35:06.822 <3>[138, kswapd0][  638.194657]   0xffffffc06185f000 228   #offset
09-15 14:35:06.822 <3>[138, kswapd0][  638.194667]   0xffffffc06185f000 11    #offset 228(1110 0100) 11(1011) 改為小端序(1011 1110 0100)即3044
09-15 14:35:06.822 <3>[138, kswapd0][  638.194678]   0xffffffc06185f000 255   #matchLength begin
09-15 14:35:06.822 <3>[138, kswapd0][  638.194689]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194701]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194712]   0xffffffc06185f000 255
09-15 14:35:06.822 <3>[138, kswapd0][  638.194747]   0xffffffc06185f000 7     #matchLength end:255*4+7 1027

解壓算法

壓縮理解了其實(shí)解壓也很簡(jiǎn)單

輸入：[token]abcde_(5,4)[token]fgh_(14,5)fghxxxxxxx
輸出：abcde_bcdefgh_abcdefghxxxxxxx

根據(jù)解壓前的數(shù)據(jù)流，取出token內(nèi)的length，literals直接復(fù)制到輸出，即memcpy(src,dst,length)
遇到match，在從前面已經(jīng)拷貝的literals復(fù)制到后面即可

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

深入淺出lz4壓縮算法

深入淺出lz4壓縮算法

簡(jiǎn)介

壓縮原理

1.壓縮格式

2.壓縮率

3.壓縮算法實(shí)現(xiàn)

安卓?jī)?nèi)存中壓縮的實(shí)例

解壓算法

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

深入淺出lz4壓縮算法

簡(jiǎn)介

壓縮原理

1.壓縮格式

2.壓縮率

3.壓縮算法實(shí)現(xiàn)

安卓?jī)?nèi)存中壓縮的實(shí)例

解壓算法

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av