背景
某應(yīng)用1.0性能測試
服務(wù)強依賴于mysql, 許多接口都會請求mysql
-
對mysql的請求用GenericObjectPool的連接池來進(jìn)行管理, 設(shè)置如下:
79 connectionPool.setMaxActive(maxActive); 80 connectionPool.setTestOnBorrow(false); 81 connectionPool.setTestOnReturn(true); 82 connectionPool.setTestWhileIdle(true); 83 connectionPool 84 .setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis); 85 86 // Start the Evictor 87 connectionPool 88 .setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis);
問題
-
用100并發(fā)打壓服務(wù),發(fā)現(xiàn)拿到連接并在干活的線程數(shù)只有10+, 而其余80+的線程wait在borrowObject的邏輯, 相應(yīng)的stack如下:
有許多線程在blocked在makeObject相關(guān)的邏輯,均在等一把鎖 1462 "resin-tcp-connection-*:3231-321" daemon prio=10 tid=0x000000004dc43800 nid=0x65f5 waiting for monitor entry [0x00000000507ff000] 1463 java.lang.Thread.State: BLOCKED (on object monitor) 1464 at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290) 1465 - waiting to lock <0x00000000b26ee8a8> (a org.apache.commons.dbcp.PoolableConnectionFactory) 1466 at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:771) 1467 at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:95) 1468 at outfox.cps.dao.source.CpsDataSource.getConnection(CpsDataSource.java:61) 1469 at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) 1470 at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) 1471 at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:572) 1472 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:636) 1473 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:665) 1474 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:673) 1475 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:713) 1476 at outfox.cps.dao.impl.FriendshipDaoImpl.getFansWithTime(FriendshipDaoImpl.java:93) ...... 有一個線程拿到了makeObject相關(guān)的鎖,并在執(zhí)行相關(guān)的邏輯 1208 "resin-tcp-connection-*:3231-149" daemon prio=10 tid=0x000000004d67e800 nid=0x66d7 runnable [0x000000005180f000] 1209 java.lang.Thread.State: RUNNABLE 1210 at java.net.SocketInputStream.socketRead0(Native Method) 1211 at java.net.SocketInputStream.read(SocketInputStream.java:129) 1212 at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113) 1213 at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160) 1214 at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188) 1215 - locked <0x000000009f5a4cc8> (a com.mysql.jdbc.util.ReadAheadInputStream) 1216 at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2329) 1217 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2774) 1218 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2763) 1219 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3299) 1220 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1837) 1221 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1961) 1222 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2537) 1223 - locked <0x000000009f5a1600> (a java.lang.Object) 1224 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2466) 1225 at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1383) 1226 - locked <0x000000009f5a1600> (a java.lang.Object) 1227 at com.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3752) 1228 at com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3345) 1229 at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2046) 1230 at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:729) 1231 at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:46) 1232 at sun.reflect.GeneratedConstructorAccessor26.newInstance(Unknown Source) 1233 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 1234 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 1235 at com.mysql.jdbc.Util.handleNewInstance(Util.java:406) 1236 at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:302) 1237 at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:283) 1238 at java.sql.DriverManager.getConnection(DriverManager.java:582) 1239 at java.sql.DriverManager.getConnection(DriverManager.java:207) 1240 at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:46) 1241 at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290) 1242 - locked <0x00000000b26ee8a8> (a org.apache.commons.dbcp.PoolableConnectionFactory) 1243 at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:771) 1244 at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:95) 1245 at outfox.cps.dao.source.CpsDataSource.getConnection(CpsDataSource.java:61) 1246 at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) 1247 at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) 1248 at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:572) 1249 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:636) 1250 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:665) 1251 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:673) 1252 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:713) 1253 at outfox.cps.dao.impl.FriendshipDaoImpl.getAttentionsWithTime(FriendshipDaoImpl.java:69) 1254 at outfox.cps.api.impl.FriendshipApi$RelationListOperation.copyRealtionListFromDbToRedis(FriendshipApi.java:118) 1255 at outfox.cps.api.impl.FriendshipApi$RelationListOperation.getRelationList(FriendshipApi.java:144) ......
分析
為什么maxActive設(shè)置為100, 還會有如此多的線程在makeObject?這是我們的setMaxActive沒有生效, 還是GenericObjectPool有其他destroy對象的機制被我們忽略了?
查看GenericObjectPool的java doc, 了解對象池關(guān)于對象創(chuàng)建和銷毀相關(guān)的方式以及配置方式
-
打壓開始后, 動態(tài)debug, 發(fā)現(xiàn)pool對象有許多我們未進(jìn)行配置的成員變量,對照doc逐一check各變量
發(fā)現(xiàn)一個可懷疑點: numIdle, 其值一直在2-8之間波動, 且doc中如是描述:maxActive controls the maximum number of objects that can be borrowed from the pool at one time. When non-positive, there is no limit to the number of objects that may be active at one time. When maxActive is exceeded, the pool is said to be exhausted. The default setting for this parameter is 8. maxIdle controls the maximum number of objects that can sit idle in the pool at any time. When negative, there is no limit to the number of objects that may be idle at one time. The default setting for this parameter is 8. 可知默認(rèn)的maxIdle值為8,若未進(jìn)行配置,當(dāng)pool中idle的object超過默認(rèn)值8時,多余的對象就會被destroy
至此,問題基本定位到
驗證
-
增加對maxIdle的配置,即:
79 connectionPool.setMaxActive(maxActive); 80 connectionPool.setMaxIdle(maxActive); //實際中可以設(shè)小一些, 如30 81 connectionPool.setTestOnBorrow(false); 82 connectionPool.setTestOnReturn(true); 83 connectionPool.setTestWhileIdle(true); 84 connectionPool 85 .setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis); 86 87 // Start the Evictor 88 connectionPool 89 .setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis); 重新打壓, 發(fā)現(xiàn)吞吐增長近一倍,響應(yīng)速度也顯著加快,stack中無線程blocked在borrowObject或makeObject相關(guān)的邏輯
另一個重要的問題
-
未設(shè)置maxWait, doc中對maxWait說明如下:
setMaxWait public void setMaxWait(long maxWait) Sets the maximum amount of time (in milliseconds) the borrowObject() method should block before throwing an exception when the pool is exhausted and the "when exhausted" action is WHEN_EXHAUSTED_BLOCK. When less than or equal to 0, the borrowObject() method may block indefinitely. Parameters: maxWait - maximum number of milliseconds to block when borrowing an object. See Also: getMaxWait(), setWhenExhaustedAction(byte), WHEN_EXHAUSTED_BLOCK 可知若不進(jìn)行設(shè)置,當(dāng)被依賴的服務(wù)由于某些故障(如機器資源被某進(jìn)程大量占用)而響應(yīng)極慢時,會有大量線程blocked在borrowObject的邏輯,最終導(dǎo)致resin線程耗盡,服務(wù)卡死,用戶請求被拒絕
可參考之前的一次線上故障:某服務(wù)變慢導(dǎo)致依賴于它的另一web卡死,其原因為改web請求改服務(wù)時的連接池未對borrowObject設(shè)置timeout,導(dǎo)致大量線程blocked在borrowObject的邏輯
-
因此需要對maxWait進(jìn)行設(shè)置,且設(shè)置的時間不宜過大(高吞吐時仍有可能導(dǎo)致web卡死),也不宜過?。赡軐?dǎo)致較多的請求失?。?,具體的配置可參考如下:
基本原則: qt<N, 其中q為服務(wù)最大吞吐(請求/s), t為設(shè)置的maxWait時間(s), N為resin的最大線程數(shù) resin的線程數(shù)當(dāng)前為512, 預(yù)估最大吞吐為100, 因此有t<N/q=512/100=5.12 我們將其配置為3s(即3000ms), 從而當(dāng)該對象池中的對象出現(xiàn)異常時,仍可扛住512/3約為170qps的壓力 -
具體的配置如下:
79 connectionPool.setMaxActive(maxActive); 80 connectionPool.setMaxIdle(maxActive); 81 connectionPool.setMaxWait(maxWait); // maxWait=3000 82 connectionPool.setTestOnBorrow(false); 83 connectionPool.setTestOnReturn(true); 84 connectionPool.setTestWhileIdle(true); 85 connectionPool 86 .setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis); 87 88 // Start the Evictor 89 connectionPool 90 .setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis);