ios 11 使用vision開始物體跟蹤

ios 11 新出了Vision 框架,提供了人臉識(shí)別、物體檢測(cè)、物體跟蹤等技術(shù)。本文將通過一個(gè)Demo簡(jiǎn)單介紹如何使用Vision框架進(jìn)行物體檢測(cè)和物體跟蹤。本文Demo可以在Github上下載。

1. 關(guān)于Vision框架

Vision 是伴隨ios 11 推出的基于CoreML的圖形處理框架。運(yùn)用高性能圖形處理和視覺技術(shù),可以對(duì)圖像和視頻進(jìn)行人臉檢測(cè)、特征點(diǎn)檢測(cè)和場(chǎng)景識(shí)別等。

image

2. 使用vision 進(jìn)行物體識(shí)別

環(huán)境

Xcode 9 + ios 11

獲取圖像數(shù)據(jù)

該步驟假設(shè)你已經(jīng)調(diào)起系統(tǒng)相機(jī),并獲得 CMSampleBufferRef 數(shù)據(jù)。注意返回的simpleBuffer 方向和UIView 顯示方向不一致,所以先對(duì)simpleBuffer 旋轉(zhuǎn)到正確的方向。

當(dāng)然也可以不進(jìn)行旋轉(zhuǎn),但是要保證后續(xù)坐標(biāo)轉(zhuǎn)換的一致性。

/*
 * 注意旋轉(zhuǎn)SampleBuffer 為argb或者bgra格式,其他格式可能不支持
 * rotationConstant:
 *  0 -- rotate 0 degrees (simply copy the data from src to dest)
 *  1 -- rotate 90 degrees counterclockwise
 *  2 -- rotate 180 degress
 *  3 -- rotate 270 degrees counterclockwise
 */
+ (CVPixelBufferRef)rotateBuffer:(CMSampleBufferRef)sampleBuffer withConstant:(uint8_t)rotationConstant
{
    CVImageBufferRef imageBuffer        = CMSampleBufferGetImageBuffer(sampleBuffer);
    CVPixelBufferLockBaseAddress(imageBuffer, 0);
    
    OSType pixelFormatType              = CVPixelBufferGetPixelFormatType(imageBuffer);
    
//    NSAssert(pixelFormatType == kCVPixelFormatType_32ARGB, @"Code works only with 32ARGB format. Test/adapt for other formats!");
    
    const size_t kAlignment_32ARGB      = 32;
    const size_t kBytesPerPixel_32ARGB  = 4;
    
    size_t bytesPerRow                  = CVPixelBufferGetBytesPerRow(imageBuffer);
    size_t width                        = CVPixelBufferGetWidth(imageBuffer);
    size_t height                       = CVPixelBufferGetHeight(imageBuffer);
    
    BOOL rotatePerpendicular            = (rotationConstant == 1) || (rotationConstant == 3); // Use enumeration values here
    const size_t outWidth               = rotatePerpendicular ? height : width;
    const size_t outHeight              = rotatePerpendicular ? width  : height;
    
    size_t bytesPerRowOut               = kBytesPerPixel_32ARGB * ceil(outWidth * 1.0 / kAlignment_32ARGB) * kAlignment_32ARGB;
    
    const size_t dstSize                = bytesPerRowOut * outHeight * sizeof(unsigned char);
    
    void *srcBuff                       = CVPixelBufferGetBaseAddress(imageBuffer);
    
    unsigned char *dstBuff              = (unsigned char *)malloc(dstSize);
    
    vImage_Buffer inbuff                = {srcBuff, height, width, bytesPerRow};
    vImage_Buffer outbuff               = {dstBuff, outHeight, outWidth, bytesPerRowOut};
    
    uint8_t bgColor[4]                  = {0, 0, 0, 0};
    
    vImage_Error err                    = vImageRotate90_ARGB8888(&inbuff, &outbuff, rotationConstant, bgColor, 0);
    if (err != kvImageNoError)
    {
        NSLog(@"%ld", err);
    }
    
    CVPixelBufferUnlockBaseAddress(imageBuffer, 0);
    
    CVPixelBufferRef rotatedBuffer      = NULL;
    CVPixelBufferCreateWithBytes(NULL,
                                 outWidth,
                                 outHeight,
                                 pixelFormatType,
                                 outbuff.data,
                                 bytesPerRowOut,
                                 freePixelBufferDataAfterRelease,
                                 NULL,
                                 NULL,
                                 &rotatedBuffer);
    
    return rotatedBuffer;
}

void freePixelBufferDataAfterRelease(void *releaseRefCon, const void *baseAddress)
{
    // Free the memory we malloced for the vImage rotation
    free((void *)baseAddress);
}


物體檢測(cè)

拿到圖像數(shù)據(jù)后就可以進(jìn)行物體檢測(cè),物體檢測(cè)流程很簡(jiǎn)單:

  1. 創(chuàng)建一個(gè)物體檢測(cè)請(qǐng)求 VNDetectRectanglesRequest
  2. 根據(jù)數(shù)據(jù)源(pixelBuffer 或者 UIImage)創(chuàng)建一個(gè) VNImageRequestHandler
  3. 調(diào)用[VNImageRequestHandler performRequests] 執(zhí)行檢測(cè)

- (void)detectObjectWithPixelBuffer:(CVPixelBufferRef)pixelBuffer
{
    CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
    
    void (^ VNRequestCompletionHandler)(VNRequest *request, NSError * _Nullable error) = ^(VNRequest *request, NSError * _Nullable error)
    {
        CFAbsoluteTime end = CFAbsoluteTimeGetCurrent();
        
        NSLog(@"檢測(cè)耗時(shí): %f", end - start);
        if (!error && request.results.count > 0) {
            // TODO 這里處理檢測(cè)結(jié)果
            return ;
        }
    };
    
    VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithCVPixelBuffer:pixelBuffer options:@{}];
    VNDetectRectanglesRequest *request = [[VNDetectRectanglesRequest alloc] initWithCompletionHandler:VNRequestCompletionHandler];
    request.minimumAspectRatio = 0.1;   // 最小長(zhǎng)寬比設(shè)為0.1
    request.maximumObservations = 0;        // 不限制檢測(cè)結(jié)果
    [handler performRequests:@[request] error:nil];
}

顯示檢測(cè)結(jié)果

物體檢測(cè)返回結(jié)果是一個(gè) VNDetectedObjectObservation 的結(jié)果集,包含confidence, uuidboundingBox三種屬性。 因?yàn)関ision坐標(biāo)系類似opengl的紋理坐標(biāo)系,以屏幕左下角為坐標(biāo)原點(diǎn),并做了歸一化。所以將顯示結(jié)果投影到屏幕時(shí),還需要進(jìn)行坐標(biāo)系的轉(zhuǎn)換。

三種坐標(biāo)系的區(qū)別:

坐標(biāo)系 原點(diǎn) 長(zhǎng)寬
UIKit坐標(biāo)系 左上角 屏幕大小
AVFoundation坐標(biāo)系 左上角 0 - 1
Vision坐標(biāo)系 左下角 0 - 1

顯示代碼如下,使用CGAffineTransform進(jìn)行坐標(biāo)轉(zhuǎn)換,并根據(jù)轉(zhuǎn)換后矩形繪制紅色邊框。同時(shí)打印confidence信息到屏幕上。


- (void)overlayImageWithSize:(CGSize)size
{
    
    NSDictionary *lastObsercationDicCopy = [NSDictionary dictionaryWithDictionary:self.lastObsercationsDic];
    NSArray *keyArr = [lastObsercationDicCopy allKeys];
    
    UIGraphicsImageRenderer *renderer = [[UIGraphicsImageRenderer alloc] initWithSize:CGSizeMake(size.width, size.height)];
    
    void (^UIGraphicsImageDrawingActions)(UIGraphicsImageRendererContext *rendererContext) = ^(UIGraphicsImageRendererContext *rendererContext)
    {
         // 將vision坐標(biāo)轉(zhuǎn)換為屏幕坐標(biāo)
        CGAffineTransform  transform = CGAffineTransformIdentity;
        transform = CGAffineTransformScale(transform, size.width, -size.height);
        transform = CGAffineTransformTranslate(transform, 0, -1);
        
        for (NSString *uuid in keyArr) {
            VNDetectedObjectObservation *rectangleObservation = lastObsercationDicCopy[uuid];
            
            // 繪制紅框
            [[UIColor redColor] setStroke];
            UIBezierPath *path = [UIBezierPath bezierPathWithRect:CGRectApplyAffineTransform(rectangleObservation.boundingBox, transform)];
            path.lineWidth = 4.0f;
            [path stroke];
            
        }
    };
    
    UIImage *overlayImage = [renderer imageWithActions:UIGraphicsImageDrawingActions];
    
    NSMutableString *trackInfoStr = [NSMutableString string];
    
    for (NSString *uuid in keyArr) {
        VNDetectedObjectObservation *rectangleObservation = lastObsercationDicCopy[uuid];
        
        [trackInfoStr appendFormat:@"置信度 : %.2f \n", rectangleObservation.confidence];
    }
    
    dispatch_async(dispatch_get_main_queue(), ^{
        
        self.highlightView.image = overlayImage;
        
        self.infoLabel.text = trackInfoStr;
    });
}   
    

3. 物體跟蹤

物體跟蹤需要處理連續(xù)的視頻幀,所以需要?jiǎng)?chuàng)建VNSequenceRequestHandler處理多幀圖像。同時(shí)還需要一個(gè)VNDetectedObjectObservation對(duì)象 做為參考源。你可以使用物體檢測(cè)的結(jié)果,或者指定一個(gè)矩形作為物體跟蹤的參考源。注意因?yàn)樽鴺?biāo)系不同,如果直接指定矩形作為參考源時(shí),需要事先進(jìn)行正確的坐標(biāo)轉(zhuǎn)換。

跟蹤多物體時(shí),可以使用VNDetectedObjectObservation.uuid區(qū)分跟蹤對(duì)象,并做相應(yīng)處理。


- (void)objectTrackWithPixelBuffer:(CVPixelBufferRef)pixelBuffer
{

    if (!self.sequenceHandler) {
        self.sequenceHandler = [[VNSequenceRequestHandler alloc] init];
    }
    
    NSArray<NSString *> *obsercationKeys = self.lastObsercationsDic.allKeys;
    
    NSMutableArray<VNTrackObjectRequest *> *obsercationRequest = [NSMutableArray array];
    
    CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
    for (NSString *key in obsercationKeys) {
        
        VNDetectedObjectObservation *obsercation = self.lastObsercationsDic[key];
        
        VNTrackObjectRequest *trackObjectRequest = [[VNTrackObjectRequest alloc] initWithDetectedObjectObservation:obsercation completionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {
            
            CFAbsoluteTime end = CFAbsoluteTimeGetCurrent();
            NSLog(@"跟蹤耗時(shí): %f", end - start);
            
            if (nil == error && request.results.count > 0) {
                
                // TODO 處理跟蹤結(jié)果
                
                
            } else {
                // 跟蹤失敗處理
                
            }
        }];
        trackObjectRequest.trackingLevel = VNRequestTrackingLevelAccurate;
        
        [obsercationRequest addObject:trackObjectRequest];
    }
    
    
    NSError *error = nil;
    [self.sequenceHandler performRequests:obsercationRequest onCVPixelBuffer:pixelBuffer error:&error];
    
}

效果圖

image

4. 性能

測(cè)試機(jī)型

iphone6p ios 11.0(15A5318g)

1/10 取幀率

物體檢測(cè)

內(nèi)存

穩(wěn)定在40M左右

image

耗時(shí)

平均在50ms左右

image

物體跟蹤

內(nèi)存

和物體檢測(cè)一樣在40M左右

image

耗時(shí)

相對(duì)低些,20-40ms不等

image

5. 總結(jié)

Vision是一個(gè)比較好用的框架,性能也不錯(cuò)。除了物體跟蹤,Vision還提供圖像分類、人臉識(shí)別人臉特征提取、人臉追蹤文字識(shí)別等功能,使用方法和物體檢測(cè)類似,本文就不再進(jìn)行過多描述。

參考文檔

Getting Started with Vision

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容