一、好言
熬得住就出眾,熬不住就出局,相信糟糕得日志熬過(guò)去了,剩下得就是好運(yùn)氣 。
二、背景
上周一晚上十點(diǎn)多,就開(kāi)始MQ消息累積然后報(bào)警,運(yùn)維早上找我看,那些消息累積了,然后運(yùn)維重啟服務(wù)后消息就消費(fèi)了,然后過(guò)一會(huì)又累積了,到上午十一點(diǎn)多的時(shí)候,開(kāi)始報(bào)內(nèi)存過(guò)高,然后發(fā)現(xiàn)已經(jīng)又兩臺(tái)服務(wù)已經(jīng)掛了,看了日志,有錯(cuò)誤,但是沒(méi)有很多可用的錯(cuò)誤。所以重啟,吃完飯服務(wù)又掛了,jconsole看到結(jié)果如下圖:

然后下午的時(shí)候內(nèi)存的爆發(fā)簡(jiǎn)直更加頻繁了

三:解決
3.1、內(nèi)容
出現(xiàn)內(nèi)存溢出問(wèn)題,所以肯定首要需要dump內(nèi)存溢出文件,所以先讓運(yùn)維導(dǎo)出*.hprof 文件,然后我也查看日志文件,我們使用kibana進(jìn)行日志統(tǒng)計(jì)了,所以查看其中錯(cuò)誤日志,的確又很多錯(cuò)誤的信息,redis錯(cuò)誤,MQ錯(cuò)誤,還有mysql錯(cuò)誤。
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is com.lambdaworks.redis.RedisCommandTimeoutException: Command timed out
at org.springframework.data.redis.connection.lettuce.LettuceExceptionC
onverter.convert(LettuceExceptionConverter.java:66)
at org.springframework.data.redis.connection.lettuce.LettuceExceptionC
onverter.convert(LettuceExceptionConverter.java:41)
at org.springframework.data.redis.PassThroughExceptionTranslationStr
ategy.translate(PassThroughExceptionTranslationStrategy.java:37)
[ActiveMQ Transport: tcp://mqtest.com/10.10.10.10:6161@123] org.apache.activemq.transport.failover.FailoverTransport - Transport (tcp://mqtest.com/10.10.10.10:6161) failed, attempting to automatically reconnect
java.io.IOException: Unexpected error occurred: java.lang.OutOfMemoryError: Java heap space
at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:222)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
如上兩個(gè)認(rèn)為是內(nèi)存溢出了,造成沒(méi)有內(nèi)存可使用沒(méi)法開(kāi)線程處理。
看看下面mysql錯(cuò)誤
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackExceptio
n: Lock wait timeout exceeded; try restarting transaction
; SQL []; Lock wait timeout exceeded; try restarting transaction;
nested exception is
com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackExceptio
n: Lock wait timeout exceeded; try restarting transaction
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTransl
ator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:259)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTran
slator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfP
ossible(MyBatisExceptionTranslator.java:74)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke
(SqlSessionTemplate.java:421)
at com.sun.proxy.$Proxy31.update(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.update(SqlSessionTemplate.j
ava:270)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:55)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:53)
at com.sun.proxy.$Proxy55.updateByPrimaryKey(Unknown
Source)
at com.mouse.moon.app.service.Userervice.update(
UserService.java:26)
at com.mouse.moon.app.service.Userervice$$FastClassBySpringCGLIB$$3eb1bfc8.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.
proceedWithInvocation(TransactionInterceptor.java:99)
at org.springframework.transaction.interceptor.TransactionAspectSuppor
t.invokeWithinTransaction(TransactionAspectSupport.java:281)
at org.springframework.transaction.interceptor.TransactionInterceptor.inv
oke(TransactionInterceptor.java:96)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proc
eed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvised
Interceptor.intercept(CglibAopProxy.java:655)
at com.mouse.moon.app.service.UserService$$EnhancerBySpringCGLIB$$31a8197.update(<generated>)
at com.mouse.moon.app.service.userService.dealUser(MobileAppRegisterService.java:154)
at com.mouse.moon.app.service.UserService.(userService.java:104)
at com.mouse.moon.app.service.UserService$$FastClassBySpringCGLIB$$a99b8cf4.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:281)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:115)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:404)
at com.mysql.jdbc.Util.getInstance(Util.java:387)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:946)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3878)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3814)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2478)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2551)
at com.mysql.jdbc.MultiHostMySQLConnection.execSQL(MultiHostMySQLConnection.java:157)
at sun.reflect.GeneratedMethodAccessor306.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.mysql.jdbc.LoadBalancedConnectionProxy.invokeMore(LoadBalancedConnectionProxy.java:484)
at com.mysql.jdbc.MultiHostConnectionProxy.invoke(MultiHostConnectionProxy.java:452)
at com.sun.proxy.$Proxy63.execSQL(Unknown Source)
at com.mysql.jdbc.MultiHostMySQLConnection.execSQL(MultiHostMySQLConnection.java:157)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1192)
at sun.reflect.GeneratedMethodAccessor314.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.mysql.jdbc.MultiHostConnectionProxy$JdbcInterfaceProxy.invoke(MultiHostConnectionProxy.java:103)
at com.sun.proxy.$Proxy68.execute(Unknown Source)
這個(gè)地方的錯(cuò)誤我在網(wǎng)上也搜過(guò),其實(shí)基本是鎖等待超時(shí)問(wèn)題,這個(gè)一個(gè)同事之前遇到過(guò),跟他交流了下,這個(gè)錯(cuò)誤說(shuō)明我的事物太大了,造成事物等待超時(shí),并且有查處和更新相同數(shù)據(jù)造成的。這個(gè)現(xiàn)在想想這里代碼的業(yè)務(wù)邏輯的確是存在這種問(wèn)題,由于數(shù)據(jù)量大,如果按照現(xiàn)在的查詢條件,地區(qū)會(huì)查詢出相同數(shù)據(jù),并且由于事物放在servicec層,在for循環(huán)中update數(shù)據(jù)的時(shí)候,會(huì)造成事物最后一起提交,所以造成鎖等待超時(shí)問(wèn)題。所以最后更新代碼業(yè)務(wù)如下
1:對(duì)查詢代碼順序做優(yōu)化
List list = ...
if(符合條件){
list = ...(select ...where)
}
由于數(shù)據(jù)庫(kù)數(shù)據(jù)量有4500w左右,上述代碼先查詢,然后再判斷條件,所以有可能會(huì)查詢兩次數(shù)據(jù)庫(kù),所以代碼優(yōu)化,處理代碼順序,先判斷條件。
2:對(duì)加在service的事物移動(dòng)到dao層,每次更新一次提交一次,把事物縮小。
3:對(duì)于一些處理可以做異步處理
4:做限流操作(使用hystrix做限流)
5:<a href ="http://www.itdecent.cn/p/82c27c58e0b6">對(duì)消費(fèi)做消費(fèi)速度限制</a>
3.2:dump文件分析
由于之前系統(tǒng)是假死狀態(tài),每次沖上去之后,過(guò)十幾分鐘或者半小時(shí)后,可能又會(huì)慢慢回收。所以每次爆的時(shí)候并沒(méi)有dump文件,<a href="http://www.itdecent.cn/p/05e9f46e3f09">腳本</a>
每次沒(méi)dump,最后我們?cè)趦?nèi)存上升期間,只用使用
#查看內(nèi)存
jmap -heap pid
jmap -dump:format=b,file=$dumpfile pid

導(dǎo)出文件,查看如下

上圖是在沒(méi)有處理完sql錯(cuò)誤時(shí)倒出的hprof文件。
下圖是處理完sql異常導(dǎo)出的文件

圖二我們可以看到一個(gè)很大的對(duì)象,就是我畫(huà)紅色的部分,點(diǎn)進(jìn)去看,有三百多萬(wàn)個(gè)對(duì)象,占用內(nèi)存三百多兆,并且我們看到老年代使用的量特別大,基本沒(méi)回收樣。怎么會(huì)有這么大的對(duì)象了,根據(jù)對(duì)象我們找到代碼位置,確定是查詢問(wèn)題,造成大量對(duì)象產(chǎn)生,并且也是之前大量更新,并且數(shù)據(jù)有重復(fù)的概率,所以造成占用大量?jī)?nèi)存。其實(shí)這種發(fā)生也是要有數(shù)據(jù)量的情況下才會(huì)出現(xiàn),發(fā)生溢出.
下面幾張是檢測(cè)到的日志信息數(shù)據(jù):



最后簡(jiǎn)單總結(jié)下處理的過(guò)程,首先需要排除所有可能造成該問(wèn)題的問(wèn)題,也就是先解決掉看到的錯(cuò)誤,然后查看導(dǎo)出dump文件中的大對(duì)象,根據(jù)大對(duì)象查看代碼,進(jìn)一步分析問(wèn)題。其實(shí)在這個(gè)中間,做了很多各種處理看效果,比如MQ限流消費(fèi)處理等。但是現(xiàn)在看來(lái),最終原因應(yīng)該還是出在數(shù)據(jù)庫(kù)層面。
四:感觸
其實(shí)之前沒(méi)怎么解決過(guò)內(nèi)存溢出問(wèn)題,這算是自己第一次全程經(jīng)歷此次內(nèi)存溢出并解決,不過(guò)首先說(shuō),這代碼并不是我寫(xiě)的代碼,是我被按排接手的項(xiàng)目,所以還的感謝這之前寫(xiě)這代碼的人,才使得我有這么一次經(jīng)歷,通過(guò)這次經(jīng)歷,也會(huì)讓自己對(duì)于一個(gè)內(nèi)存溢出問(wèn)題的解決思路更佳清晰,思考問(wèn)題更佳全面,處理問(wèn)題更佳穩(wěn)重,并且對(duì)于一些工具的使用,比如MAT,Jprofile有了接觸。所以有些經(jīng)歷真的是不可求的。因此才有了經(jīng)驗(yàn)之說(shuō)。努力吧,騷年。