如果你還不了解什么是runloop,可以看這里的詳解深入理解RunLoop。
蘋果官方文檔中,聲明了CFRunloop是線程安全的:
Thread safety varies depending on which API you are using to manipulate your run loop. The functions in Core Foundation are generally thread-safe and can be called from any thread. If you are performing operations that alter the configuration of the run loop, however, it is still good practice to do so from the thread that owns the run loop whenever possible.
但是需要注意的是,狡猾的蘋果使用了generally這個模糊的詞。
從實踐中來看,CFRunloop在停止runloop的階段的某些操作是存在多線程隱患的。
不安全的CFRunloopSource
CFRunloop是線程安全的,但是加上CFRunloopSource就不一定了。比如CFSocket。
示例代碼
看這樣一段自定義線程的代碼:
@interface MyThread()
@property (nonatomic, strong) NSThread *currentThread;
@property (nonatomic, assign) CFRunLoopSourceRef socketSource;
@property (nonatomic, assign) CFSocketRef socket;
@property (nonatomic, assign) CFRunLoopRef currentRunloop;
@end
@implementation MyThread
//初始化線程
- (instancetype)init {
if (self = [super init]) {
_currentThread = [[NSThread alloc] initWithTarget:self selector:@selector(runThread) object:nil];
}
return self;
}
//開啟線程;此方法在使用時沒有多線程調(diào)用
- (void)startThread {
[self.currentThread start];
}
//線程入口
- (void)runThread {
@autoreleasepool {
//返回runloop,可以讓其他線程停止此線程
self.currentRunloop = CFRunLoopGetCurrent();
[self addSocketSource];
CFRunLoopRun();
}
NSLog(@"線程退出");
}
//此方法在使用時沒有多線程調(diào)用
- (void)stopThread {
[self removeSocketSource];
@synchronized (_currentRunloop) {
if (_currentRunloop) {
CFRunLoopStop(_currentRunloop);
self.currentRunloop = NULL;
}
}
}
//此方法在使用時沒有多線程調(diào)用
- (void)addSocketSource {
int sock;
sock = socket(AF_INET6, SOCK_STREAM, 0);
CFSocketContext context = {0, (__bridge void *)(self), NULL, NULL, NULL};
self.socket = CFSocketCreateWithNative(NULL, sock, kCFSocketReadCallBack, socketCallBack, &context);
self.socketSource = CFSocketCreateRunLoopSource(NULL, self.socket, 0);
CFRunLoopAddSource(_currentRunloop, _socketSource, kCFRunLoopDefaultMode);
}
- (void)removeSocketSource {
@synchronized (_socket) {
if (_socket) {
//CFSocketInvalidate可能被拋到另一個線程去執(zhí)行,因此 CFSocketInvalidate 和 CFRunLoopStop可能有多線程同時調(diào)用的情況
CFSocketInvalidate(_socket);
CFRelease(_socket);
self.socket = NULL;
}
}
}
在實踐中,CFSocket是被另一個socket類管理的,所以addSocketSource和removeSocketSource都是在另一個類中的,也就有可能出現(xiàn)CFSocketInvalidate和 CFRunLoopStop多線程同時調(diào)用的情況。
crash實例分析
看上去并沒有什么問題,該加鎖的地方都加鎖了,而且CF開頭的那幾個方法都是線程安全的。但是這時候,如果出現(xiàn)CFSocketInvalidate和 CFRunLoopStop多線程同時調(diào)用的情況,就有crash的可能。例如我們項目里收到的某個crash:
Thread 0 name: Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0 CoreFoundation 0x000000018e6a9144 CFRunLoopWakeUp + 92
1 CoreFoundation 0x000000018e6a9140 CFRunLoopWakeUp + 88
2 CoreFoundation 0x000000018e6d71e8 CFSocketInvalidate + 712
3 MyApp 0x00000001000fe424 (-[MySocket stop] + 136)
4 MyApp 0x00000001000fcd50 (-[MySocket dealloc] + 56)
5 libsystem_blocks.dylib 0x000000018d6afa28 _Block_release + 144
6 libdispatch.dylib 0x000000018d65a1bc _dispatch_client_callout + 16
7 libdispatch.dylib 0x000000018d65ed68 _dispatch_main_queue_callback_4CF + 1000
8 CoreFoundation 0x000000018e77e810 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 12
9 CoreFoundation 0x000000018e77c3fc __CFRunLoopRun + 1660
10 CoreFoundation 0x000000018e6aa2b8 CFRunLoopRunSpecific + 444
11 GraphicsServices 0x000000019015e198 GSEventRunModal + 180
12 UIKit 0x00000001946f17fc -[UIApplication _run] + 684
13 UIKit 0x00000001946ec534 UIApplicationMain + 208
14 DuoYiIM 0x000000010003ca58 0x100024000 + 100952 (main + 132)
15 libdyld.dylib 0x000000018d68d5b8 start + 4
Thread 0 crashed with ARM-64 Thread State:
cpsr: 0x0000000020000000 fp: 0x000000016fddab30 lr: 0x000000018e6a9140 pc: 0x000000018e6a9144
sp: 0x000000016fddaa00 x0: 0x0000000000000000 x1: 0x0000000000000000 x10: 0x0000000000000000
x11: 0x0000000000000000 x12: 0x0000000000000000 x13: 0x0000000000000000 x14: 0x0000000000000000
x15: 0x0000000000001203 x16: 0x000000000000012d x17: 0x000000018f1eef74 x18: 0x0000000000000000
x19: 0x000000017056cb50 x2: 0x0000000000001000 x20: 0x000000017056cb40 x21: 0x96e73914144e0055
x22: 0x0000000174452990 x23: 0x000000017048bae0 x24: 0x0000000000000000 x25: 0x00000000ffffffff
x26: 0xffffffffffffffff x27: 0x000000017426f1c0 x28: 0x0000000002ffffff x29: 0x000000016fddab30
x3: 0x000000000017e4a6 x4: 0x0000000000012068 x5: 0x0000000000000000 x6: 0x0000000000000036
x7: 0xffffffffffffffec x8: 0x8c8c8c8c8c8c8c8c x9: 0x000000000000000c
CFSocketInvalidate在主線程被調(diào)用了??炊褩#?code>CFSocketInvalidate內(nèi)部調(diào)用CFRunLoopWakeUp時,出現(xiàn)了crash。
看不出具體是什么原因crash,所以需要看看是在CFRunLoopWakeUp的哪里掛的。查看對應(yīng)版本的CoreFoundation的匯編代碼:
_CFRunLoopWakeUp:
0x0000000181521b9c FF0305D1 sub sp, sp, #0x140 ; CODE XREF=_CFRunLoopAddTimer+696, _CFRunLoopTimerSetNextFireDate+592, _CFSocketInvalidate+708, __wakeUpRunLoop+276, __CFXRegistrationPost+344, -[CFPrefsSearchListSource asynchronouslyNotifyOfChangesFromDictionary:toDictionary:]+172, ___CFSocketPerformV0+1408, ___CFSocketManager+2004, ___CFSocketManager+4248, _boundPairRead+604, _boundPairReadClose+124, …
0x0000000181521ba0 FC6F11A9 stp x28, x27, [sp, #0x110]
0x0000000181521ba4 F44F12A9 stp x20, x19, [sp, #0x120]
0x0000000181521ba8 FD7B13A9 stp x29, x30, [sp, #0x130]
0x0000000181521bac FDC30491 add x29, sp, #0x130
0x0000000181521bb0 F40300AA mov x20, x0
0x0000000181521bb4 C80C10F0 adrp x8, #0x1a16bc000
0x0000000181521bb8 084140F9 ldr x8, [x8, #0x80] ; -[_CFXPreferences init]_1a16bc080
0x0000000181521bbc 080140F9 ldr x8, [x8]
0x0000000181521bc0 292013F0 adrp x9, #0x1a7928000
0x0000000181521bc4 29E90791 add x9, x9, #0x1fa ; ___CF120290
0x0000000181521bc8 A8831DF8 stur x8, [x29, #-0x28]
0x0000000181521bcc E8030032 orr w8, wzr, #0x1
0x0000000181521bd0 28010039 strb w8, [x9] ; ___CF120290
0x0000000181521bd4 E8731290 adrp x8, #0x1a639d000
0x0000000181521bd8 08F13F91 add x8, x8, #0xffc ; ___CF120293
0x0000000181521bdc 08014039 ldrb w8, [x8] ; ___CF120293
0x0000000181521be0 48000034 cbz w8, loc_181521be8
0x0000000181521be4 E3560394 bl ___THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__
loc_181521be8:
0x0000000181521be8 93420091 add x19, x20, #0x10 ; CODE XREF=_CFRunLoopWakeUp+68
0x0000000181521bec E00313AA mov x0, x19
0x0000000181521bf0 70300694 bl imp___stubs_-[NSOrderedSet sortedArrayFromRange:options:usingComparator:]//真機的系統(tǒng)庫做了混淆,這里其實是__CFRunLoopLock
0x0000000181521bf4 882E40F9 ldr x8, [x20, #0x58]
0x0000000181521bf8 080D40B9 ldr w8, [x8, #0xc]
0x0000000181521bfc A8010034 cbz w8, loc_181521c30
crash日志中,崩潰在CFRunLoopWakeUp + 92,對應(yīng)匯編地址為0x0000000181521b9c + 92=0x0000000181521bf8,在ldr w8, [x8, #0xc]的時候掛了。查看crash時寄存器的值,x8: 0x8c8c8c8c8c8c8c8c,很明顯x8指向的內(nèi)存已經(jīng)被釋放了。x8是從ldr x8, [x20, #0x58]得來的(也就是x20的地址偏移0x58后的值),而x20則是從mov x20, x0得來的,x0就是CFRunloopWakeUp的第一個參數(shù),CFRunLoopRef結(jié)構(gòu)體,所以x8就是CFRunLoopRef偏移0x58后的值。
CoreFoundation的代碼是開源的,可以在這里下載:CF-1153.18。
對應(yīng)CFRunloopWakeUp源碼:
void CFRunLoopWakeUp(CFRunLoopRef rl) {
CHECK_FOR_FORK();
__CFRunLoopLock(rl);
if (__CFRunLoopIsIgnoringWakeUps(rl)) {
__CFRunLoopUnlock(rl);
return;
}
kern_return_t ret;
ret = __CFSendTrivialMachMessage(rl->_wakeUpPort, 0, MACH_SEND_TIMEOUT, 0);
if (ret != MACH_MSG_SUCCESS && ret != MACH_SEND_TIMED_OUT) CRASH("*** Unable to send message to wake up port. (%d) ***", ret);
__CFRunLoopUnlock(rl);
}
CF_INLINE Boolean __CFRunLoopIsIgnoringWakeUps(CFRunLoopRef rl) {
return (rl->_perRunData->ignoreWakeUps) ? true : false;
}
CFRunloop結(jié)構(gòu)體:
struct __CFRunLoop {
CFRuntimeBase _base; //16 byte
pthread_mutex_t _lock; //64 byte
__CFPort _wakeUpPort; //mach_port_t (unsign int), 4 byte
Boolean _unused; //bool變量占用1 byte,但是需要和4字節(jié)對齊,所以也是4 byte
volatile _per_run_data *_perRunData;
pthread_t _pthread;
uint32_t _winthread;
CFMutableSetRef _commonModes;
CFMutableSetRef _commonModeItems;
CFRunLoopModeRef _currentMode;
CFMutableSetRef _modes;
struct _block_item *_blocks_head;
struct _block_item *_blocks_tail;
CFAbsoluteTime _runTime;
CFAbsoluteTime _sleepTime;
CFTypeRef _counterpart;
};
typedef struct __CFRuntimeBase {
uintptr_t _cfisa; //unsigned long 8 byte
uint8_t _cfinfo[4]; //unsigned char 4 byte
#if __LP64__
uint32_t _rc; //unsigned int 4 byte
#endif
} CFRuntimeBase;
struct pthread_mutex_t {
long __sig; //8 byte
char __opaque[56]; //56 byte
};
計算結(jié)構(gòu)體size后,得出ldr x8, [x20, #0x58]就是runloop-> _perRunData。也就是在調(diào)用__CFRunLoopIsIgnoringWakeUps的時候,CFRunLoopRef已經(jīng)被釋放了。
分析CFSocket源碼
查看CFSocketInvalidate源碼:
void CFSocketInvalidate(CFSocketRef s) {
CHECK_FOR_FORK();
CFRetain(s);
__CFLock(&__CFAllSocketsLock);
__CFSocketLock(s);
if (__CFSocketIsValid(s)) {
//省略部分代碼...
//取出socket中的runloop數(shù)組
CFArrayRef runLoops = (CFArrayRef)CFRetain(s->_runLoops);
//CFRunloop釋放操作1
CFRelease(s->_runLoops);
s->_runLoops = NULL;
//省略部分代碼...
__CFSocketUnlock(s);
// Do this after the socket unlock to avoid deadlock (10462525)
for (idx = CFArrayGetCount(runLoops); idx--;) {
CFRunLoopWakeUp((CFRunLoopRef)CFArrayGetValueAtIndex(runLoops, idx));
}
//CFRunloop釋放操作3
CFRelease(runLoops);
//省略部分代碼...
} else {
__CFSocketUnlock(s);
}
__CFUnlock(&__CFAllSocketsLock);
CFRelease(s);
}
CFSocketInvalidate中唯一使用到CFRunLoopWakeUp的地方,就是最后遍歷runloops的操作。
但是此時CFRunLoopRef還在數(shù)組里,正在被數(shù)組強引用,到了CFRunLoopWakeUp里怎么就被釋放了呢?
注意,CFSocketInvalidate里遍歷runloops的操作是在鎖外面進行的,說明CFSocket很有可能沒有管理好它的runloops數(shù)組,導致數(shù)組在遍歷時被釋放了。從Do this after the socket unlock to avoid deadlock (10462525)這一行注釋猜測,這部分遍歷操作之前應(yīng)該也是在鎖內(nèi)的,但是會出現(xiàn)死鎖,所以放到了鎖外。蘋果的bug report是不對外公開的,只在這里找到了可能相關(guān)的討論:bug #10462525。
最大的可能是出現(xiàn)在__CFSocketCancel里。在runloop停止的時候,也會執(zhí)行remove source操作,在CFRunLoopRemoveSource里,會執(zhí)行source0的cancel函數(shù),也就是__CFSocketCancel:
void CFRunLoopRemoveSource(CFRunLoopRef rl, CFRunLoopSourceRef rls, CFStringRef modeName) \
CHECK_FOR_FORK();
Boolean doVer0Callout = false, doRLSRelease = false;
__CFRunLoopLock(rl);
if (modeName == kCFRunLoopCommonModes) {
//省略代碼...
} else {
CFRunLoopModeRef rlm = __CFRunLoopFindMode(rl, modeName, false);
if (NULL != rlm && ((NULL != rlm->_sources0 && CFSetContainsValue(rlm->_sources0, rls)) || (NULL != rlm->_sources1 && CFSetContainsValue(rlm->_sources1, rls)))) {
CFRetain(rls);
//省略代碼...
if (0 == rls->_context.version0.version) {
if (NULL != rls->_context.version0.cancel) {
doVer0Callout = true;
}
}
doRLSRelease = true;
}
//省略代碼...
}
}
__CFRunLoopUnlock(rl);
if (doVer0Callout) {
// although it looses some protection for the source, we have no choice but
// to do this after unlocking the run loop and mode locks, to avoid deadlocks
// where the source wants to take a lock which is already held in another
// thread which is itself waiting for a run loop/mode lock
rls->_context.version0.cancel(rls->_context.version0.info, rl, modeName); /* CALLOUT */
}
if (doRLSRelease) CFRelease(rls);
}
__CFSocketCancel源碼:
static void __CFSocketCancel(void *info, CFRunLoopRef rl, CFStringRef mode) {
CFSocketRef s = (CFSocketRef)info;
__CFSocketLock(s);
if (0 == s->_socketSetCount) {
//省略代碼...
if (NULL != s->_runLoops) {
//從runloops數(shù)組中移除此runloop;對原數(shù)組執(zhí)行拷貝后,釋放原數(shù)組
CFMutableArrayRef runLoopsOrig = s->_runLoops;
CFMutableArrayRef runLoopsCopy = CFArrayCreateMutableCopy(kCFAllocatorSystemDefault, 0, s->_runLoops);
idx = CFArrayGetFirstIndexOfValue(runLoopsCopy, CFRangeMake(0, CFArrayGetCount(runLoopsCopy)), rl);
if (0 <= idx) CFArrayRemoveValueAtIndex(runLoopsCopy, idx);
s->_runLoops = runLoopsCopy;
//CFRunloop釋放操作2
CFRelease(runLoopsOrig);
}
__CFSocketUnlock(s);
}
__CFSocketCancel也有一次對CFRunloopRef的釋放操作,加上CFSocketInvalidate里的2個,總共有3個釋放操作。
所以,如果__CFSocketCancel和CFSocketInvalidate在多線程同時執(zhí)行,就有可能出現(xiàn)對CFSocket中的runloops數(shù)組過度釋放,因此在遍歷runloops的時候就會出現(xiàn)CFRunLoopRef被釋放的情況。雖然這個crash出現(xiàn)的概率比較低,但是在項目里隔一段時間就會穩(wěn)定出現(xiàn)。
所以,不是加了鎖就萬事大吉了,CFSocketInvalidate里在遍歷數(shù)組前應(yīng)該再加一個retain才能保證安全。
解決方法
- 既然是CFSocket里的bug,那就只能避免不要出現(xiàn)
CFSocketInvalidate和CFRunloopStop多線程執(zhí)行的代碼。 - 如果你的socket只在這個線程里運行,那直接調(diào)用
CFRunloopStop即可,runloop會自動清理所有source。 - 如果這個線程需要重用,那就不需要stop,而是停止socket后,在同一個線程里新建socket。
自動停止的Runloop
那么,如果把stop代碼改成這樣,應(yīng)該就沒問題了吧?
- (void)runThread {
@autoreleasepool {
self.currentRunloop = CFRunLoopGetCurrent();
[self addRunloopSource];
[self addSocketSource];
CFRunLoopRun();
}
NSLog(@"線程退出");
}
- (void)stopThread {
if (_currentRunloop) {
//保證removeSocketSource的操作只會在這里執(zhí)行,沒有多線程的情況
[self removeSocketSource];
CFRunLoopStop(_currentRunloop);
self.currentRunloop = NULL;
}
}
很遺憾,這樣寫還是不安全的。
原因在于removeSocketSource之后,runloop里source就全部為空了,runloop如果檢測到了source為空,就會自動停止runloop循環(huán),銷毀線程。
因此如果你在另一個線程調(diào)用stopThread,在removeSocketSource之后線程就會隨時停止,runloop在調(diào)用CFRunLoopStop時可能已經(jīng)被釋放了。
上面的寫法出現(xiàn)crash的概率太低,但是稍微改一下就能必現(xiàn):
- (void)stopThread {
if (_currentRunloop) {
[self removeSocketSource];
//插入一個耗時操作
sleep(2);
//必定crash
CFRunLoopStop(_currentRunloop);
self.currentRunloop = NULL;
}
}
這種情況下crash的原因其實是沒做好內(nèi)存管理,只要對runloop增加一次retain操作就沒問題了:
- (void)runThread {
@autoreleasepool {
//做一次retain操作
self.currentRunloop = CFRetain(CFRunLoopGetCurrent());
[self addRunloopSource];
[self addSocketSource];
CFRunLoopRun();
}
NSLog(@"線程退出");
}
- (void)stopThread {
if (_currentRunloop) {
[self removeSocketSource];
CFRunLoopStop(_currentRunloop);
CFRelease(_currentRunloop);
self.currentRunloop = NULL;
}
}
結(jié)論
在使用runloop source的時候要謹慎,尤其在處理stop的階段。其他source可能也存在類似的問題。
一個變量有多線程操作的時候,在鎖外的操作即使是只讀也是不安全的,在讀取之前最好再做一次retain操作,防止在讀取的過程中被釋放。