erlang應(yīng)用腳本stop分析
其實(shí)這篇文章的名字應(yīng)該是如何安全關(guān)閉erlang應(yīng)用更加科學(xué)。
erlang應(yīng)用腳本生成
使用rebar工具,創(chuàng)建一個(gè)erlang節(jié)點(diǎn)后,
<pre>
./rebar create-node nodeid=hook_heroes
</pre>
然后在rel目錄里面,執(zhí)行打包命令
<pre>
./rebar generate
</pre>
會(huì)生成完整的應(yīng)用包,目錄如下:
<pre>
bin erts-6.0 lib log releases
</pre>
bin里面,有一個(gè)啟動(dòng)腳本名字和節(jié)點(diǎn)名字一樣的,這里是hook_heroes
停止服務(wù)的時(shí)候,目前使用
<pre>
./hook_heroes stop
</pre>
對于hook_heroes stop分析
hook_heroes stop調(diào)用如下
<pre>
%%Tell nodetool to initiate a stop
$NODETOOL stop
ES=$?
if [ "$ES" -ne 0 ]; then
exit $ES
fi
</pre>
這里的nodetool來自
<pre>
NODETOOL="$ERTS_PATH/escript $ERTS_PATH/nodetool
</pre>
即erts包下面的nodetool腳本,傳入的參數(shù)stop
nodetool是一個(gè)escript腳本,作用就是“Helper Script for interacting with live nodes”
<pre>
case RestArgs of
["getpid"] ->
io:format("~p\n",
[list_to_integer(rpc:call(TargetNode, os, getpid, []))]);
["ping"] ->
io:format("pong\n");
["stop"] ->
io:format("~p\n", [rpc:call(TargetNode, init, stop, [], 60000)]);
.......
</pre>
可以看到,直接使用的是rpc:call()方法:調(diào)用TargetNode的init模塊的stop方法,傳入的參數(shù)為[],下面來看看init模塊的stop方法。
init模塊的stop()方法調(diào)用
init 模塊的文檔給的解釋是:“Coordination of System Startup”,
stop方法的注釋是:
<pre>
All applications are taken down smoothly, all code is unloaded, and all ports are closed before the system terminates
</pre>
顯然就是用來系統(tǒng)關(guān)閉的,關(guān)鍵是需要看看他是怎么關(guān)閉系統(tǒng)的。
函數(shù)入口:
<pre>
stop() -> init ! {stop,stop}, ok.
</pre>
給init模塊發(fā)送自己發(fā)送一個(gè){stop,stop}消息,
init自己循環(huán)接收消息
<pre>
loop(State) ->
receive
{'EXIT',Pid,Reason} ->
Kernel = State#state.kernel,
terminate(Pid,Kernel,Reason), %% If Pid is a Kernel pid, halt()!
loop(State);
{stop,Reason} ->
stop(Reason,State);
{From,fetch_loaded} -> %% The Loaded info is cleared in
Loaded = State#state.loaded, %% boot_loop but is handled here
From ! {init,Loaded}, %% anyway.
loop(State);
{From, {ensure_loaded, _}} ->
From ! {init, not_allowed},
loop(State);
Msg ->
loop(handle_msg(Msg,State))
end.
</pre>
匹配到{stop,Reason},進(jìn)入stop(Reason,State)這里調(diào)用,Reason為stop,
來打這里
<pre>
stop(Reason,State) ->
BootPid = State#state.bootpid,
{_,Progress} = State#state.status,
State1 = State#state{status = {stopping, Progress}},
clear_system(BootPid,State1),
do_stop(Reason,State1).
</pre>
重點(diǎn)看下clear_system函數(shù)和do_stop函數(shù)
clear_system()函數(shù)
clear_system()這里的作用就是關(guān)閉虛擬機(jī)中的進(jìn)程,只用三個(gè)函數(shù)調(diào)用
<pre>
clear_system(BootPid,State) ->
Heart = get_heart(State#state.kernel), %A
shutdown_pids(Heart,BootPid,State), %B
unload(Heart). %C
</pre>
A和C都是在處理erlang啟動(dòng)參數(shù)heart,其意義在vm.args有說明
<pre>
Heartbeat management; auto-restarts VM if it dies or becomes unresponsive
(Disabled by default..use with caution!)
-heart
</pre>
一般情況下,不使用-heart
我們這里只看shutdown_pids()怎么做的。
shutdown_pids()函數(shù)
<pre>
shutdown_pids(Heart,BootPid,State) ->
Timer = shutdown_timer(State#state.flags),
catch shutdown(State#state.kernel,BootPid,Timer,State),
kill_all_pids(Heart), % Even the shutdown timer.
kill_all_ports(Heart),
flush_timout(Timer).
</pre>
這里首先關(guān)閉定時(shí)器,然后關(guān)閉kernel進(jìn)程,然后再kill其余的進(jìn)程。
關(guān)閉kernel進(jìn)程
<pre>
%%
%% A kernel pid must handle the special case message
%% {'EXIT',Parent,Reason} and terminate upon it!
%%
shutdown_kernel_pid(Pid, BootPid, Timer, State) ->
Pid ! {'EXIT',BootPid,shutdown},
shutdown_loop(Pid, Timer, State, []).
</pre>
什么是erlang的kernel進(jìn)程?
這句話是重點(diǎn): A kernel pid must handle the special case message and terminate upon it!
那么什么是kernel進(jìn)程呢?
看下bin/start.script
<pre>
...
{kernelProcess,heart,{heart,start,[]}},
{kernelProcess,error_logger,{error_logger,start_link,[]}},
{kernelProcess,application_controller,
{application_controller,start,
[{application,kernel,
...
</pre>
這些帶kernelProcess標(biāo)簽的進(jìn)程都是, 特別是application!
來自http://blog.yufeng.info/archives/1411
故supervisor_tree收到的是{'EXIT',BootPid,shutdown}
kill其余的進(jìn)程:
<pre>
kill_all_pids(Heart) ->
case get_pids(Heart) of
[] ->
ok;
Pids ->
kill_em(Pids),
kill_all_pids(Heart) % Continue until all are really killed.
end.
</pre>
最終跟下去,使用的是
<pre>
exit(Pid,kill)
</pre>
向各個(gè)進(jìn)程發(fā)送kill消息。
supervisor terminate方法
supervisor中的terminate()方法如下:
<pre>
-spec terminate(term(), state()) -> 'ok'.
terminate(_Reason, #state{children=[Child]} = State) when ?is_simple(State) ->
terminate_dynamic_children(Child, dynamics_db(Child#child.restart_type,
State#state.dynamics),
State#state.name);
terminate(_Reason, State) ->
terminate_children(State#state.children, State#state.name).
</pre>
分為simple_one_for_one和非simple_one_for_one兩種情況。
terminate_dynamic_children()方法:
<pre>
...
EStack = case Child#child.shutdown of
brutal_kill ->
?SETS:fold(fun(P, _) -> exit(P, kill) end, ok, Pids),
wait_dynamic_children(Child, Pids, Sz, undefined, EStack0);
infinity ->
?SETS:fold(fun(P, _) -> exit(P, shutdown) end, ok, Pids),
wait_dynamic_children(Child, Pids, Sz, undefined, EStack0);
Time ->
?SETS:fold(fun(P, _) -> exit(P, shutdown) end, ok, Pids),
TRef = erlang:start_timer(Time, self(), kill),
wait_dynamic_children(Child, Pids, Sz, TRef, EStack0)
end,
...
</pre>
可以看出ChildSpec中的ShowDown字段的設(shè)置對于關(guān)閉子進(jìn)程的影響:
brutal_kill:發(fā)送kill消息,這個(gè)消息是不能捕捉的。即使如果worker設(shè)置了process_flag(trap_exit, true),仍然不會(huì)收到{'EXIT',_FROM,REASON}這個(gè)消息;
infinity和Time都會(huì)向監(jiān)督的worker進(jìn)程發(fā)送shutdown信號,這里worker做了 process_flag(trap_exit, true),自然會(huì)收到{'EXIT',_FROM,REASON}。唯一的區(qū)別是infinity會(huì)一直等待,Time會(huì)設(shè)置一個(gè)超時(shí):如果超時(shí)過了,那么supervisor會(huì)發(fā)送kill信號,直接殺死。
根據(jù)上面的分析,不難和erlang文檔中對于gen_server terminate()方法
<pre>
If the gen_server is part of a supervision tree and is ordered by its supervisor to terminate, this function will be called with Reason=shutdown if the following conditions apply:
the gen_server has been set to trap exit signals, and
the shutdown strategy as defined in the supervisor's child specification is an integer timeout value, not brutal_kill.
</pre>
supervisor何時(shí)調(diào)用terminate()方法
最后一個(gè)問題來了,supervisor何時(shí)調(diào)用terminate()方法?之前分析到,關(guān)閉kernel進(jìn)程的時(shí)候,supervisor監(jiān)控樹進(jìn)程會(huì)收到來自BootPid的{'EXIT',BootPid,shutdown}消息。我們知道supervisor實(shí)際上一個(gè)gen_server,那么去看看他的handle_info()方法好了。
<pre>
-spec handle_info(term(), state()) ->
{'noreply', state()} | {'stop', 'shutdown', state()}.
handle_info({'EXIT', Pid, Reason}, State) ->
case restart_child(Pid, Reason, State) of %重啟child
{ok, State1} -> %A
{noreply, State1};
{shutdown, State1} -> %B
{stop, shutdown, State1}
end;
handle_info(Msg, State) ->
error_logger:error_msg("Supervisor received unexpected message: pn",
[Msg]),
{noreply, State}.
</pre>
這里代碼顯然都是handle_info child發(fā)送過來的信號,調(diào)用restart_child()。在跟蹤restart_child()進(jìn)去,也沒有看出原因:因?yàn)閭魅隤id并不是Child,而是BootPid,總是會(huì)走到A分支,也就是說不會(huì)調(diào)用terminate方法。這里陷入困境。
后來翻閱了supervisor文檔,發(fā)現(xiàn)居然沒有terminate()方法的說明,再次陷入困境。
最后,想起supervisor實(shí)際上一個(gè)gen_server,應(yīng)該去看看gen_server()文檔對于terminate()方法地說明。
<pre>
...
Even if the gen_server is not part of a supervision tree, this function will be called if it receives an 'EXIT' message from its parent. Reason will be the same as in the 'EXIT' message.
...
</pre>
這里說明,只要gen_server收到了來自parent的'EXIT' message,terminate()方法就會(huì)調(diào)用。符合之前分析地:
<pre>
{'EXIT',BootPid,shutdown}
</pre>
至于BootPid和SuperVisor是否是parent關(guān)系,這里暫時(shí)沒時(shí)間探究:不過一定會(huì)是,否則,頂層的sup一定要有人通知關(guān)閉啊,而且BootPid從命名來看,相當(dāng)有可能。這里留一個(gè)坑后面填上,主要是init:start()的啟動(dòng)。
其它
- 之前代碼中的player進(jìn)程的child_spec的show_down寫的是brutal_kill,這里顯然寫錯(cuò)了;那么應(yīng)用關(guān)閉的時(shí)候,自然不會(huì)調(diào)用terminate方法
-
Erlang OTP之terminate 深入分析這篇文章是基于erlang 14A版本的,他建議使用one_for_one。原因很簡單,erlang 14A中,supervisor的terminate()函數(shù)如下
<pre>
terminate(_Reason, State) ->
terminate_children(State#state.children, State#state.name),
ok.
</pre>
對于17版本,可以看出,這里沒有處理單獨(dú)simple_one_for_one的情況。因?yàn)閟imple_one_for_one和one_for_one的child信息在supervisor里面存儲(chǔ)的是不一樣的:前者child存儲(chǔ)在dynamics屬性,
后者存儲(chǔ)在children屬性。erlang 14A的版本只處理了children里面的child,對于simple_one_for_one的child直接沒有處理。
對于這篇文章的實(shí)驗(yàn),我在自己電腦上也做了實(shí)驗(yàn),確實(shí)和他的結(jié)果不一致。
參考資料
- Erlang OTP之terminate 深入分析
- erlang init stop淺析
- erlang doc
- ”Erlang supervisor 極其白癡的 Bug“的澄清——這篇文章提了下什么是erlang kernelProcess進(jìn)程