ios 11 新出了Vision 框架,提供了人臉識(shí)別、物體檢測(cè)、物體跟蹤等技術(shù)。本文將通過一個(gè)Demo簡(jiǎn)單介紹如何使用Vision框架進(jìn)行物體檢測(cè)和物體跟蹤。本文Demo可以在Github上下載。
1. 關(guān)于Vision框架
Vision 是伴隨ios 11 推出的基于CoreML的圖形處理框架。運(yùn)用高性能圖形處理和視覺技術(shù),可以對(duì)圖像和視頻進(jìn)行人臉檢測(cè)、特征點(diǎn)檢測(cè)和場(chǎng)景識(shí)別等。

2. 使用vision 進(jìn)行物體識(shí)別
環(huán)境
Xcode 9 + ios 11
獲取圖像數(shù)據(jù)
該步驟假設(shè)你已經(jīng)調(diào)起系統(tǒng)相機(jī),并獲得 CMSampleBufferRef 數(shù)據(jù)。注意返回的simpleBuffer 方向和UIView 顯示方向不一致,所以先對(duì)simpleBuffer 旋轉(zhuǎn)到正確的方向。
當(dāng)然也可以不進(jìn)行旋轉(zhuǎn),但是要保證后續(xù)坐標(biāo)轉(zhuǎn)換的一致性。
/*
* 注意旋轉(zhuǎn)SampleBuffer 為argb或者bgra格式,其他格式可能不支持
* rotationConstant:
* 0 -- rotate 0 degrees (simply copy the data from src to dest)
* 1 -- rotate 90 degrees counterclockwise
* 2 -- rotate 180 degress
* 3 -- rotate 270 degrees counterclockwise
*/
+ (CVPixelBufferRef)rotateBuffer:(CMSampleBufferRef)sampleBuffer withConstant:(uint8_t)rotationConstant
{
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CVPixelBufferLockBaseAddress(imageBuffer, 0);
OSType pixelFormatType = CVPixelBufferGetPixelFormatType(imageBuffer);
// NSAssert(pixelFormatType == kCVPixelFormatType_32ARGB, @"Code works only with 32ARGB format. Test/adapt for other formats!");
const size_t kAlignment_32ARGB = 32;
const size_t kBytesPerPixel_32ARGB = 4;
size_t bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer);
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
BOOL rotatePerpendicular = (rotationConstant == 1) || (rotationConstant == 3); // Use enumeration values here
const size_t outWidth = rotatePerpendicular ? height : width;
const size_t outHeight = rotatePerpendicular ? width : height;
size_t bytesPerRowOut = kBytesPerPixel_32ARGB * ceil(outWidth * 1.0 / kAlignment_32ARGB) * kAlignment_32ARGB;
const size_t dstSize = bytesPerRowOut * outHeight * sizeof(unsigned char);
void *srcBuff = CVPixelBufferGetBaseAddress(imageBuffer);
unsigned char *dstBuff = (unsigned char *)malloc(dstSize);
vImage_Buffer inbuff = {srcBuff, height, width, bytesPerRow};
vImage_Buffer outbuff = {dstBuff, outHeight, outWidth, bytesPerRowOut};
uint8_t bgColor[4] = {0, 0, 0, 0};
vImage_Error err = vImageRotate90_ARGB8888(&inbuff, &outbuff, rotationConstant, bgColor, 0);
if (err != kvImageNoError)
{
NSLog(@"%ld", err);
}
CVPixelBufferUnlockBaseAddress(imageBuffer, 0);
CVPixelBufferRef rotatedBuffer = NULL;
CVPixelBufferCreateWithBytes(NULL,
outWidth,
outHeight,
pixelFormatType,
outbuff.data,
bytesPerRowOut,
freePixelBufferDataAfterRelease,
NULL,
NULL,
&rotatedBuffer);
return rotatedBuffer;
}
void freePixelBufferDataAfterRelease(void *releaseRefCon, const void *baseAddress)
{
// Free the memory we malloced for the vImage rotation
free((void *)baseAddress);
}
物體檢測(cè)
拿到圖像數(shù)據(jù)后就可以進(jìn)行物體檢測(cè),物體檢測(cè)流程很簡(jiǎn)單:
- 創(chuàng)建一個(gè)物體檢測(cè)請(qǐng)求 VNDetectRectanglesRequest
- 根據(jù)數(shù)據(jù)源(pixelBuffer 或者 UIImage)創(chuàng)建一個(gè) VNImageRequestHandler
- 調(diào)用[VNImageRequestHandler performRequests] 執(zhí)行檢測(cè)
- (void)detectObjectWithPixelBuffer:(CVPixelBufferRef)pixelBuffer
{
CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
void (^ VNRequestCompletionHandler)(VNRequest *request, NSError * _Nullable error) = ^(VNRequest *request, NSError * _Nullable error)
{
CFAbsoluteTime end = CFAbsoluteTimeGetCurrent();
NSLog(@"檢測(cè)耗時(shí): %f", end - start);
if (!error && request.results.count > 0) {
// TODO 這里處理檢測(cè)結(jié)果
return ;
}
};
VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithCVPixelBuffer:pixelBuffer options:@{}];
VNDetectRectanglesRequest *request = [[VNDetectRectanglesRequest alloc] initWithCompletionHandler:VNRequestCompletionHandler];
request.minimumAspectRatio = 0.1; // 最小長(zhǎng)寬比設(shè)為0.1
request.maximumObservations = 0; // 不限制檢測(cè)結(jié)果
[handler performRequests:@[request] error:nil];
}
顯示檢測(cè)結(jié)果
物體檢測(cè)返回結(jié)果是一個(gè) VNDetectedObjectObservation 的結(jié)果集,包含confidence, uuid 和 boundingBox三種屬性。 因?yàn)関ision坐標(biāo)系類似opengl的紋理坐標(biāo)系,以屏幕左下角為坐標(biāo)原點(diǎn),并做了歸一化。所以將顯示結(jié)果投影到屏幕時(shí),還需要進(jìn)行坐標(biāo)系的轉(zhuǎn)換。
三種坐標(biāo)系的區(qū)別:
| 坐標(biāo)系 | 原點(diǎn) | 長(zhǎng)寬 |
|---|---|---|
| UIKit坐標(biāo)系 | 左上角 | 屏幕大小 |
| AVFoundation坐標(biāo)系 | 左上角 | 0 - 1 |
| Vision坐標(biāo)系 | 左下角 | 0 - 1 |
顯示代碼如下,使用CGAffineTransform進(jìn)行坐標(biāo)轉(zhuǎn)換,并根據(jù)轉(zhuǎn)換后矩形繪制紅色邊框。同時(shí)打印confidence信息到屏幕上。
- (void)overlayImageWithSize:(CGSize)size
{
NSDictionary *lastObsercationDicCopy = [NSDictionary dictionaryWithDictionary:self.lastObsercationsDic];
NSArray *keyArr = [lastObsercationDicCopy allKeys];
UIGraphicsImageRenderer *renderer = [[UIGraphicsImageRenderer alloc] initWithSize:CGSizeMake(size.width, size.height)];
void (^UIGraphicsImageDrawingActions)(UIGraphicsImageRendererContext *rendererContext) = ^(UIGraphicsImageRendererContext *rendererContext)
{
// 將vision坐標(biāo)轉(zhuǎn)換為屏幕坐標(biāo)
CGAffineTransform transform = CGAffineTransformIdentity;
transform = CGAffineTransformScale(transform, size.width, -size.height);
transform = CGAffineTransformTranslate(transform, 0, -1);
for (NSString *uuid in keyArr) {
VNDetectedObjectObservation *rectangleObservation = lastObsercationDicCopy[uuid];
// 繪制紅框
[[UIColor redColor] setStroke];
UIBezierPath *path = [UIBezierPath bezierPathWithRect:CGRectApplyAffineTransform(rectangleObservation.boundingBox, transform)];
path.lineWidth = 4.0f;
[path stroke];
}
};
UIImage *overlayImage = [renderer imageWithActions:UIGraphicsImageDrawingActions];
NSMutableString *trackInfoStr = [NSMutableString string];
for (NSString *uuid in keyArr) {
VNDetectedObjectObservation *rectangleObservation = lastObsercationDicCopy[uuid];
[trackInfoStr appendFormat:@"置信度 : %.2f \n", rectangleObservation.confidence];
}
dispatch_async(dispatch_get_main_queue(), ^{
self.highlightView.image = overlayImage;
self.infoLabel.text = trackInfoStr;
});
}
3. 物體跟蹤
物體跟蹤需要處理連續(xù)的視頻幀,所以需要?jiǎng)?chuàng)建VNSequenceRequestHandler處理多幀圖像。同時(shí)還需要一個(gè)VNDetectedObjectObservation對(duì)象 做為參考源。你可以使用物體檢測(cè)的結(jié)果,或者指定一個(gè)矩形作為物體跟蹤的參考源。注意因?yàn)樽鴺?biāo)系不同,如果直接指定矩形作為參考源時(shí),需要事先進(jìn)行正確的坐標(biāo)轉(zhuǎn)換。
跟蹤多物體時(shí),可以使用VNDetectedObjectObservation.uuid區(qū)分跟蹤對(duì)象,并做相應(yīng)處理。
- (void)objectTrackWithPixelBuffer:(CVPixelBufferRef)pixelBuffer
{
if (!self.sequenceHandler) {
self.sequenceHandler = [[VNSequenceRequestHandler alloc] init];
}
NSArray<NSString *> *obsercationKeys = self.lastObsercationsDic.allKeys;
NSMutableArray<VNTrackObjectRequest *> *obsercationRequest = [NSMutableArray array];
CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
for (NSString *key in obsercationKeys) {
VNDetectedObjectObservation *obsercation = self.lastObsercationsDic[key];
VNTrackObjectRequest *trackObjectRequest = [[VNTrackObjectRequest alloc] initWithDetectedObjectObservation:obsercation completionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {
CFAbsoluteTime end = CFAbsoluteTimeGetCurrent();
NSLog(@"跟蹤耗時(shí): %f", end - start);
if (nil == error && request.results.count > 0) {
// TODO 處理跟蹤結(jié)果
} else {
// 跟蹤失敗處理
}
}];
trackObjectRequest.trackingLevel = VNRequestTrackingLevelAccurate;
[obsercationRequest addObject:trackObjectRequest];
}
NSError *error = nil;
[self.sequenceHandler performRequests:obsercationRequest onCVPixelBuffer:pixelBuffer error:&error];
}
效果圖

4. 性能
測(cè)試機(jī)型
iphone6p ios 11.0(15A5318g)
1/10 取幀率
物體檢測(cè)
內(nèi)存
穩(wěn)定在40M左右

耗時(shí)
平均在50ms左右

物體跟蹤
內(nèi)存
和物體檢測(cè)一樣在40M左右

耗時(shí)
相對(duì)低些,20-40ms不等

5. 總結(jié)
Vision是一個(gè)比較好用的框架,性能也不錯(cuò)。除了物體跟蹤,Vision還提供圖像分類、人臉識(shí)別、人臉特征提取、人臉追蹤、文字識(shí)別等功能,使用方法和物體檢測(cè)類似,本文就不再進(jìn)行過多描述。