從Redis連接池獲取連接失敗的原因說起

問題描述

其他業(yè)務(wù)線的同學(xué)在測試環(huán)境發(fā)現(xiàn)應(yīng)用程序一直不能獲取redis連接,我?guī)兔戳讼隆?br> 首先看應(yīng)用錯(cuò)誤日志

Caused by: org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
    at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:97)
    at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:143)
    at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:41)
    at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:85)
    at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:55)
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:169)
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:149)
    ... 76 more
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
    at redis.clients.util.Pool.getResource(Pool.java:22)
    at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:90)
    ... 83 more
Caused by: java.util.NoSuchElementException: Could not create a validated object, cause: ValidateObject failed
    at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:871)
    at redis.clients.util.Pool.getResource(Pool.java:20)
    ... 84 more

問題調(diào)查

確定環(huán)境

發(fā)現(xiàn)是使用spring-data-redis通過jedis連接的redis服務(wù)端。
這個(gè)系統(tǒng)的代碼很久沒動(dòng),已經(jīng)忘記了。先看看使用的jar版本吧。
查看應(yīng)用程序使用的相關(guān)jar:

lsof -p 19377 | grep -E "jedis|pool|redis"

發(fā)現(xiàn)輸出的jar包含:commons-pool-1.3.jar、spring-data-redis-1.1.1.RELEASE.jar、jedis-2.1.0.jar
翻了下commons pool相關(guān)代碼

try {
    _factory.activateObject(latch.getPair().value);
    if(_testOnBorrow &&
            !_factory.validateObject(latch.getPair().value)) {
        throw new Exception("ValidateObject failed");
    }
    synchronized(this) {
        _numInternalProcessing--;
        _numActive++;
    }
    return latch.getPair().value;
}
catch (Throwable e) {
    PoolUtils.checkRethrow(e);
    // object cannot be activated or is invalid
    try {
        _factory.destroyObject(latch.getPair().value);
    } catch (Throwable e2) {
        PoolUtils.checkRethrow(e2);
        // cannot destroy broken object
    }
    synchronized (this) {
        _numInternalProcessing--;
        if (!newlyCreated) {
            latch.reset();
            _allocationQueue.add(0, latch);
        }
        allocate();
    }
    if(newlyCreated) {
        throw new NoSuchElementException("Could not create a validated object, cause: " + e.getMessage());
    }
    else {
        continue; // keep looping
    }
}

可見客戶端應(yīng)該是配置了testOnBorrow,在校驗(yàn)連接時(shí)失敗了。

java操作redis有多種客戶端,項(xiàng)目使用spring-data-redis操作redis,在spring-data-redis中也有不同的客戶端實(shí)現(xiàn)如jedis,lettuce等。根據(jù)錯(cuò)誤日志推斷使用的redis客戶端實(shí)現(xiàn)為jedis。
查看JedisConnectionFactory源碼
JedisPool中定義了校驗(yàn)對(duì)象的代碼。

public boolean validateObject(final Object obj) {
    if (obj instanceof Jedis) {
        final Jedis jedis = (Jedis) obj;
        try {
            return jedis.isConnected() && jedis.ping().equals("PONG");
        } catch (final Exception e) {
            return false;
        }
    } else {
        return false;
    }
}

通過wireshark查看TCP包并確定問題原因

熟悉redis的同學(xué)都知道,redis客戶端發(fā)送“PING”后服務(wù)端會(huì)返回一個(gè)“PONG“作為回應(yīng),一般會(huì)作為連接的檢驗(yàn)方法。
既然校驗(yàn)報(bào)錯(cuò),那抓包看看請(qǐng)求和響應(yīng)吧!

首先查看網(wǎng)卡編號(hào)ip a
再使用tcpdump對(duì)eth1網(wǎng)卡的6379端口數(shù)據(jù)抓包。

tcpdump -i eth1 port 6379 -w target.cap

最后使用wireshark對(duì)target.cap進(jìn)行分析,可借助wireshark的redis插件進(jìn)行分析。
根據(jù)應(yīng)用錯(cuò)誤日志打印的時(shí)間,查詢到此時(shí)客戶端(應(yīng)用服務(wù)器)向服務(wù)端(redis服務(wù)器)發(fā)送了一個(gè)RST包。

ws_1.png

感覺是有問題的。就往上查了下。

ws_2.png

可以看到,箭頭位置上方客戶端發(fā)送了PING命令,箭頭位置應(yīng)該返回客戶端一個(gè)PONG作為響應(yīng)。而是返回了以下信息:

MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.

意思是,redis服務(wù)端配置了RDB快照持久化,但當(dāng)前不能進(jìn)行持久化。有可能修改數(shù)據(jù)集的命令都被禁用了。(但是通過看源碼發(fā)現(xiàn),除了涉及修改的命令,PING也在禁用之列,redis-3.2.9 server.c,而讀取涉及的命令應(yīng)該不會(huì)受到影響)
以下代碼是redis-3.2.9 server.c中in processCommand(client *c)發(fā)生持久化異常后的處理代碼

/* Don't accept write commands if there are problems persisting on disk
     * and if this is a master instance. */
    if (((server.stop_writes_on_bgsave_err &&
          server.saveparamslen > 0 &&
          server.lastbgsave_status == C_ERR) ||
          server.aof_last_write_status == C_ERR) &&
        server.masterhost == NULL &&
        (c->cmd->flags & CMD_WRITE ||
         c->cmd->proc == pingCommand))
    {
        flagTransaction(c);
        if (server.aof_last_write_status == C_OK)
            addReply(c, shared.bgsaveerr);
        else
            addReplySds(c,
                sdscatprintf(sdsempty(),
                "-MISCONF Errors writing to the AOF file: %s\r\n",
                strerror(server.aof_last_write_errno)));
        return C_OK;
    }

之后客戶端發(fā)送QUIT命令退出,服務(wù)器返回OK響應(yīng)退出成功。
那個(gè)返回的配置錯(cuò)誤信息是說在持久化RDB時(shí)出現(xiàn)了問題。于是到redis服務(wù)器上看了下磁盤信息和redis的日志,果然,磁盤空間不足了。

linux_df.png

到此,問題基本查明,是由于redis所在服務(wù)器磁盤不足導(dǎo)致,由于是測試服務(wù)器,也沒有配置磁盤的監(jiān)控。騰出空間后即可恢復(fù)。

對(duì)RST包的理解

但是我還有一個(gè)問題,那就是為什么會(huì)有一個(gè)RST包呢?如果沒有那個(gè)RST包,其實(shí)問題還不好發(fā)現(xiàn),雖然按照錯(cuò)誤日志的時(shí)間,挨個(gè)查找Redis數(shù)據(jù)包的信息,能夠查詢出來,但是RST無疑從一開始就吸引了我的注意,讓我能夠更加快速的定位問題。

初識(shí)RST

那現(xiàn)在問題來了,為什么會(huì)有RST包呢?
首先了解一下RST。(可參考TCP/IP詳解 卷118.7 復(fù)位報(bào)文段)
歸納起來,當(dāng)以下任一情況發(fā)生時(shí),會(huì)產(chǎn)生RST包:

  • 到不存在的端口的連接請(qǐng)求
  • 異常終止一個(gè)連接
  • 檢測半打開連接

jedis與redis的關(guān)閉機(jī)制

觀察RST之前的幾個(gè)包

ws_3.png

使用wireshark的專家信息查看多個(gè)RST包,發(fā)現(xiàn)RST之前都會(huì)有QUIT,OK的交互。那看來應(yīng)該是框架層面的問題。
再翻看上面GenericObjectPool的相關(guān)代碼,在borrowObject時(shí)如果發(fā)生異常,會(huì)調(diào)用destroyObject()方法,這個(gè)destroyObject是延遲到子類實(shí)現(xiàn)的,也就是上面說到的JedisPool。

public void destroyObject(final Object obj) throws Exception {
    if (obj instanceof Jedis) {
        final Jedis jedis = (Jedis) obj;
        if (jedis.isConnected()) {
            try {
                try {
                    jedis.quit();
                } catch (Exception e) {
                }
                jedis.disconnect();
            } catch (Exception e) {

            }
        }
    }
}

最終調(diào)用redis.clients.jedis.Connection的disconnect,關(guān)閉輸入輸出流。

public void disconnect() {
    if (isConnected()) {
        try {
            inputStream.close();
            outputStream.close();
            if (!socket.isClosed()) {
                socket.close();
            }
        } catch (IOException ex) {
            throw new JedisConnectionException(ex);
        }
    }
}

這也就解釋了為什么會(huì)出現(xiàn)RST包:
客戶端請(qǐng)求QUIT,服務(wù)端返回OK。(此時(shí)客戶端在接收完quit返回后,調(diào)用了disconnect方法,導(dǎo)致連接斷開)緊接著服務(wù)端發(fā)起TCP揮手,發(fā)送FIN包到之前交互的客戶端51311端口,但調(diào)用完disconnect的客戶端已經(jīng)斷開了和服務(wù)端的連接??蛻舳酥荒芡ㄟ^發(fā)送RST,通知服務(wù)端“你發(fā)送了一個(gè)到不存在的端口的關(guān)閉請(qǐng)求”。

翻看新版的jedis代碼,除了將之前JedisPool中實(shí)現(xiàn)的代碼挪到了JedisFactory中實(shí)現(xiàn),大致邏輯依然沒有改變()

// 2.10 JedisFactory
@Override
  public void destroyObject(PooledObject<Jedis> pooledJedis) throws Exception {
    final BinaryJedis jedis = pooledJedis.getObject();
    if (jedis.isConnected()) {
      try {
        try {
          jedis.quit();
        } catch (Exception e) {
        }
        jedis.disconnect();
      } catch (Exception e) {

      }
    }
  }

@Override
public boolean validateObject(PooledObject<Jedis> pooledJedis) {
  final BinaryJedis jedis = pooledJedis.getObject();
  try {
    HostAndPort hostAndPort = this.hostAndPort.get();

    String connectionHost = jedis.getClient().getHost();
    int connectionPort = jedis.getClient().getPort();

    return hostAndPort.getHost().equals(connectionHost)
        && hostAndPort.getPort() == connectionPort && jedis.isConnected()
        && jedis.ping().equals("PONG");
  } catch (final Exception e) {
    return false;
  }
}

而disconnect最終調(diào)用的Connection有變化。

public void disconnect() {
  if (isConnected()) {
    try {
      outputStream.flush();
      socket.close();
    } catch (IOException ex) {
      broken = true;
      throw new JedisConnectionException(ex);
    } finally {
      IOUtils.closeQuietly(socket);
    }
  }
}

由之前的inpusStream.close()和outputStream.close()改成了outputStream.flush()。原因是jedis自定義了帶緩沖的RedisOutputStream,在socket.close前要確保緩沖內(nèi)容寫到流中。
客戶端使用disconnect確實(shí)能夠快速釋放資源,在調(diào)用disconnect時(shí)關(guān)閉了客戶端端口,回收了文件句柄資源。
試想如果在quit后,服務(wù)端就已經(jīng)釋放了文件句柄,關(guān)閉了socket連接,而客戶端不調(diào)用disconnect釋放資源,就會(huì)一直占用資源,在進(jìn)程結(jié)束才會(huì)釋放。
下圖也進(jìn)行了驗(yàn)證。第一次注釋掉disconnect中關(guān)閉socket的代碼,程序sleep10秒后退出,可以看到直到進(jìn)程退出時(shí),客戶端的連接才被關(guān)閉。而第二次是恢復(fù)注釋掉的代碼,客戶端在quit后馬上就關(guān)閉了連接釋放了資源。

ws_4.png

redis連接開啟和關(guān)閉時(shí)的系統(tǒng)調(diào)用

這個(gè)問題困擾了我一天,到底怎么產(chǎn)生的RST包?不管是客戶端還是服務(wù)端,調(diào)用close后,都應(yīng)該進(jìn)行正常的四次握手吧?
我反復(fù)看了redis服務(wù)端關(guān)閉客戶端連接的源碼(redis 3.2.9 networking.c#unlinkClient)。也只是調(diào)用了系統(tǒng)調(diào)用close(fd),甚至為了避免干擾還新建了一個(gè)redis實(shí)例,使用strace -f -p $pid -tt -T跟蹤關(guān)閉附近的系統(tǒng)調(diào)用

[pid 25442] 10:29:42.299132 epoll_wait(3, {{EPOLLIN, {u32=4, u64=4}}}, 11024, 100) = 1 <0.004041>
[pid 25442] 10:29:42.303248 accept(4, {sa_family=AF_INET, sin_port=htons(52294), sin_addr=inet_addr("192.168.3.45")}, [16]) = 5 <0.000025>
[pid 25442] 10:29:42.303356 fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) <0.000014>
[pid 25442] 10:29:42.303417 fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000010>
[pid 25442] 10:29:42.303456 setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000012>
[pid 25442] 10:29:42.303499 epoll_ctl(3, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=5, u64=5}}) = 0 <0.000011>
[pid 25442] 10:29:42.303544 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 11024, 96) = 1 <0.073370>
[pid 25442] 10:29:42.376968 read(5, "*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\nbar\r\n", 16384) = 31 <0.000014>
[pid 25442] 10:29:42.377071 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLOUT, {u32=5, u64=5}}) = 0 <0.000013>
[pid 25442] 10:29:42.377144 epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 11024, 22) = 1 <0.000017>
[pid 25442] 10:29:42.377210 write(5, "+OK\r\n", 5) = 5 <0.000034>
[pid 25442] 10:29:42.377304 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN, {u32=5, u64=5}}) = 0 <0.000025>
[pid 25442] 10:29:42.377377 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 11024, 22) = 1 <0.007943>
[pid 25442] 10:29:42.385376 read(5, "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n", 16384) = 22 <0.000013>
[pid 25442] 10:29:42.385432 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLOUT, {u32=5, u64=5}}) = 0 <0.000011>
[pid 25442] 10:29:42.385477 epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 11024, 14) = 1 <0.000010>
[pid 25442] 10:29:42.385518 write(5, "$3\r\nbar\r\n", 9) = 9 <0.000019>
[pid 25442] 10:29:42.385567 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN, {u32=5, u64=5}}) = 0 <0.000011>
[pid 25442] 10:29:42.385617 epoll_wait(3, {}, 11024, 14) = 0 <0.014075>
[pid 25442] 10:29:42.399742 epoll_wait(3, {}, 11024, 100) = 0 <0.100126>
[pid 25442] 10:29:42.499930 epoll_wait(3, {}, 11024, 100) = 0 <0.100126>
[pid 25442] 10:29:42.600115 epoll_wait(3, {}, 11024, 100) = 0 <0.100071>
[pid 25442] 10:29:42.700276 epoll_wait(3, {}, 11024, 100) = 0 <0.100131>
[pid 25442] 10:29:42.800482 epoll_wait(3, {}, 11024, 100) = 0 <0.100129>
[pid 25442] 10:29:42.900687 epoll_wait(3, {}, 11024, 100) = 0 <0.100141>
[pid 25442] 10:29:43.000895 epoll_wait(3, {}, 11024, 100) = 0 <0.100132>
[pid 25442] 10:29:43.101095 epoll_wait(3, {}, 11024, 100) = 0 <0.100131>
[pid 25442] 10:29:43.201305 epoll_wait(3, {}, 11024, 100) = 0 <0.100134>
[pid 25442] 10:29:43.301521 epoll_wait(3, {}, 11024, 100) = 0 <0.100136>
[pid 25442] 10:29:43.401725 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 11024, 100) = 1 <0.003552>
[pid 25442] 10:29:43.405350 read(5, "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n", 16384) = 22 <0.000016>
[pid 25442] 10:29:43.405425 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLOUT, {u32=5, u64=5}}) = 0 <0.000011>
[pid 25442] 10:29:43.405477 epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 11024, 96) = 1 <0.000014>
[pid 25442] 10:29:43.405531 write(5, "$3\r\nbar\r\n", 9) = 9 <0.000022>
[pid 25442] 10:29:43.405601 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN, {u32=5, u64=5}}) = 0 <0.000011>
[pid 25442] 10:29:43.405660 epoll_wait(3, {}, 11024, 96) = 0 <0.096129>
[pid 25442] 10:29:43.501877 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 11024, 100) = 1 <0.003474>
[pid 25442] 10:29:43.505429 read(5, "*1\r\n$4\r\nQUIT\r\n", 16384) = 14 <0.000018>
[pid 25442] 10:29:43.505514 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLOUT, {u32=5, u64=5}}) = 0 <0.000015>
[pid 25442] 10:29:43.505578 epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 11024, 96) = 1 <0.000012>
[pid 25442] 10:29:43.505623 write(5, "+OK\r\n", 5) = 5 <0.000028>
[pid 25442] 10:29:43.505693 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN, {u32=5, u64=5}}) = 0 <0.000016>
[pid 25442] 10:29:43.505764 epoll_ctl(3, EPOLL_CTL_DEL, 5, {0, {u32=5, u64=5}}) = 0 <0.000016>
[pid 25442] 10:29:43.505830 close(5)    = 0 <0.000111>
[pid 25442] 10:29:43.505992 epoll_wait(3, {}, 11024, 96) = 0 <0.096134>

java客戶端junit測試代碼(根據(jù)jedis測試用例JedisPoolTest#checkConnections修改):

    JedisPool pool = new JedisPool(new JedisPoolConfig(), hnp.getHost(), hnp.getPort(), 2000);
    Jedis jedis = pool.getResource();
    jedis.set("foo", "bar");
    assertEquals("bar", jedis.get("foo"));
    pool.returnResource(jedis);

    try {
      Thread.sleep(1*1000);
    } catch (InterruptedException e) {
      e.printStackTrace();
    }
    System.out.println("hello");
    jedis.get("foo");
    pool.destroy();
    assertTrue(pool.isClosed());

觀察服務(wù)端系統(tǒng)調(diào)用,

setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
...
close(5) = 0

在socket連接時(shí)只設(shè)置了TCP_NODELAY,禁用了Nagle算法。

jedis客戶端的socket設(shè)置

正在無解之際,突然想到是不是redis客戶端設(shè)置了一些參數(shù)呢?
終于,在jedis控制連接的redis.clients.jedisConnection類中,找到了連接時(shí)對(duì)socket的設(shè)置:

public void connect() {
    if (!isConnected()) {
      try {
        socket = new Socket();
        // ->@wjw_add
        socket.setReuseAddress(true);
        socket.setKeepAlive(true); // Will monitor the TCP connection is
        // valid
        socket.setTcpNoDelay(true); // Socket buffer Whetherclosed, to
        // ensure timely delivery of data
        socket.setSoLinger(true, 0); // Control calls close () method,
        // the underlying socket is closed
        // immediately
        // <-@wjw_add

        socket.connect(new InetSocketAddress(host, port), connectionTimeout);
        socket.setSoTimeout(soTimeout);

        if (ssl) {
          if (null == sslSocketFactory) {
            sslSocketFactory = (SSLSocketFactory)SSLSocketFactory.getDefault();
          }
          socket = (SSLSocket) sslSocketFactory.createSocket(socket, host, port, true);
          if (null != sslParameters) {
            ((SSLSocket) socket).setSSLParameters(sslParameters);
          }
          if ((null != hostnameVerifier) &&
              (!hostnameVerifier.verify(host, ((SSLSocket) socket).getSession()))) {
            String message = String.format(
                "The connection to '%s' failed ssl/tls hostname verification.", host);
            throw new JedisConnectionException(message);
          }
        }

        outputStream = new RedisOutputStream(socket.getOutputStream());
        inputStream = new RedisInputStream(socket.getInputStream());
      } catch (IOException ex) {
        broken = true;
        throw new JedisConnectionException("Failed connecting to host " 
            + host + ":" + port, ex);
      }
    }
  }

這個(gè)socket.setSoLinger(true, 0);引起了我的注意。
根據(jù)SCTP rfc SO_LINGER的解釋

If the l_linger value is set to 0, calling close() is the same as the ABORT primitive.

繼續(xù)看SCTP_ABORT:

SCTP_ABORT: Setting this flag causes the specified association
to abort by sending an ABORT message to the peer. The ABORT
chunk will contain an error cause of 'User Initiated Abort'
with cause code 12. The cause-specific information of this
error cause is provided in msg_iov.

不太明白,看下TCP中對(duì)Abort的解釋吧
TCP rfc對(duì)Abort的解釋:

This command causes all pending SENDs and RECEIVES to be
aborted, the TCB to be removed, and a special RESET message to
be sent to the TCP on the other side of the connection.
Depending on the implementation, users may receive abort
indications for each outstanding SEND or RECEIVE, or may simply
receive an ABORT-acknowledgment.
注:TCB是一個(gè)抽象的控制塊(Transmission Control Block)

Socket選項(xiàng)SO_LINGER用于強(qiáng)制中斷

到此才算明白,由于jedis客戶端在連接時(shí),設(shè)置了socket.setSoLinger(true, 0);,這樣在關(guān)閉連接時(shí)就等同與TCP的Abort,也就是忽略所有正在發(fā)送和接收的數(shù)據(jù),直接向?qū)Ψ桨l(fā)送一個(gè)RESET消息。這也是為什么jedis要在socket.close()前flush緩沖,以確保在途數(shù)據(jù)不會(huì)丟失。
我去掉了客戶端對(duì)SO_LINGER的設(shè)置,終于又看到了正常的TCP揮手。

ws_5.png

還想深入的同學(xué),可以閱讀linux源碼net/ipv4/tcp.c。我大概看了下,代碼邏輯很明確(linux內(nèi)核版本有區(qū)別)如果設(shè)置了SO_LINGER,在close時(shí),會(huì)直接調(diào)用tcp_disconnect發(fā)送RST數(shù)據(jù)包,而不再做常規(guī)的四次揮手流程。雖然我覺得這樣做不太優(yōu)雅,更優(yōu)雅的做法可能是socket.setSoLinger(true, timeout)設(shè)置一個(gè)超時(shí)閥值。
在這個(gè)github jedis issue Improving socket performance中描述了加入以下四項(xiàng)設(shè)置用于提升性能。

socket.setReuseAddress(true);
socket.setKeepAlive(true);
socket.setTcpNoDelay(true);
socket.setSoLinger(true,0);

在issue下加了個(gè)comment詢問了下,有消息了再更新吧。

總結(jié)

此次應(yīng)用程序中Jedis連接池不能獲取redis連接的問題,原因是redis服務(wù)器磁盤空間滿,導(dǎo)致不能保存快照(rdb snapshot)。應(yīng)用程序中在testOnBorrow為true的情況下,使用redisPING PONG命令測試redis連接是否有效時(shí),收到了MISCONF Redis is configured to save RDB snapshots的響應(yīng),而非正常的PONG。這就導(dǎo)致jedis判斷連接無效,強(qiáng)制斷開了連接。
之后對(duì)TCP中RST flag做了淺嘗輒止的分析。當(dāng)設(shè)置了socket.setSoLinger(true, 0)后,關(guān)閉此socket將清空數(shù)據(jù)并向?qū)Ψ桨l(fā)送RST消息。
可以深入的地方還有不少,自己關(guān)于網(wǎng)絡(luò)編程的知識(shí)也有待加強(qiáng)。準(zhǔn)備補(bǔ)充下相關(guān)知識(shí),再結(jié)合一些優(yōu)秀的開源項(xiàng)目如redis、nginx深入了解下。


參考

  1. Jedis源碼 https://github.com/xetorthio/jedis
  2. Commons-pool源碼 https://github.com/apache/commons-pool
  3. Spring-data-redis源碼 https://github.com/spring-projects/spring-data-redis
  4. redis-wireshark源碼 https://github.com/jzwinck/redis-wireshark
  5. Redis源碼 https://github.com/antirez/redis
  6. TCP/IP詳解在線電子書 http://www.52im.net/topic-tcpipvol1.html
  7. SCTP rfc - https://tools.ietf.org/html/rfc6458
  8. TCP rfc - https://tools.ietf.org/html/rfc793
  9. 幾種TCP連接中出現(xiàn)RST的情況
  10. setsockopt()--Set Socket Options
  11. StackOverflow What is AF_INET, and why do I need it?
  12. Socket選項(xiàng)系列之SO_LINGER(《深入剖析Nginx》作者) - http://www.lenky.info/archives/2013/02/2220
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容