Tensorflow 2.0 使用keras_cv和YoloV8模型Object Detection處理視頻

本例是上例的一個擴(kuò)展,但是需要先下載一個YouTube的視頻。
其他配置環(huán)境請參考上節(jié),http://www.itdecent.cn/p/3ac3f54636f8
下載yt-dlp,用于下載Youtube的視頻文件,
yt-dlp下載地址
https://gitlab.com/zhuge20100104/cpp_practice/-/blob/master/simple_learn/deep_learning/15_tensorflow_object_detection_api_in_video/yt-dlp?ref_type=heads

寫一個download.sh腳本用于簡化下載步驟,download.sh腳本的代碼如下,

#!/bin/bash

if [[ "$#" -lt 1 ]]; then 
    echo "Usage: ./download.sh {You tube file link}"
    exit 1
fi

./yt-dlp "${1}"  --proxy http://10.224.0.110:3128 --yes-playlist -f best

使用download.sh下載detect所需要的貓的視頻

 chmod a+x ./download.sh
./download.sh https://www.youtube.com/watch?v=IzluNxh-8_o

完事兒以后將下載的視頻Rename成cat.mp4。

注意以上步驟均需要特殊網(wǎng)絡(luò)才能下載,不要問為什么。

還有download.sh需要在linux環(huán)境運行,我覺得直接在tensorflow的docker里面執(zhí)行就很好。

Detect object in video的代碼如下,
完整的notebook地址如下,
https://gitlab.com/zhuge20100104/cpp_practice/-/blob/master/simple_learn/deep_learning/15_tensorflow_object_detection_api_in_video/15.%20Tensorflow%20Objection%20API%20in%20Video.ipynb?ref_type=heads

# imports
import os 
os.environ['KERAS_BACKEND'] = 'jax'

import tensorflow as tf

from tensorflow import data as tf_data
import tensorflow_datasets as tfds
import tensorflow.keras
import keras_cv
import keras
import numpy as np
from keras_cv import bounding_box
import os
from keras_cv import visualization
import tqdm

# 3. env setup
%matplotlib inline

# 詳細(xì)細(xì)節(jié)可參考: https://keras.io/guides/keras_cv/object_detection_keras_cv/

# Let's get started by constructing a YOLOV8Detector pretrained on the pascalvoc dataset.
pretrained_model = keras_cv.models.YOLOV8Detector.from_preset(
    "yolo_v8_m_pascalvoc", bounding_box_format="xywh"
)

# Resize the image to the model compat input size
inference_resizing = keras_cv.layers.Resizing(
    640, 640, pad_to_aspect_ratio=True, bounding_box_format='xywh'
)

# keras_cv.visualization.plot_bounding_box_gallery() supports a class_mapping parameter to
# highlight what class each box was assigned to. Let's assemble a class mapping now.

class_ids = [
    "Aeroplane",
    "Bicycle",
    "Bird",
    "Boat",
    "Bottle",
    "Bus",
    "Car",
    "Cat",
    "Chair",
    "Cow",
    "Dining Table",
    "Dog",
    "Horse",
    "Motorbike",
    "Person",
    "Potted Plant",
    "Sheep",
    "Sofa",
    "Train",
    "Tvmonitor",
    "Total",
]

class_mapping = dict(zip(range(len(class_ids)), class_ids))

import imageio
from datetime import datetime

input_video = 'cats'

video_reader = imageio.get_reader('{}.mp4'.format(input_video))
video_writer = imageio.get_writer('{}_annotated.mp4'.format(input_video), fps=10)

t0 = datetime.now()
n_frames = 0
for frame in video_reader:
    if n_frames > 10000:
        break
    n_frames += 1
    # print(frame.shape)
    # This can be used as our inference preprocessing pipeline:
    image_batch = inference_resizing([frame])
    y_pred = pretrained_model.predict(image_batch)
    # 下面這個圖就可以save了
    image_with_boxes = visualization.draw_bounding_boxes(image_batch,
        bounding_boxes=y_pred,
        color = (0, 255, 0),
        bounding_box_format="xywh",
        class_mapping=class_mapping, )
    
    image_with_boxes = image_with_boxes.reshape(640, 640, 3)
    video_writer.append_data(image_with_boxes)
fps = n_frames/(datetime.now() - t0).total_seconds()
print('Frames processed: {}, speed: {} fps'.format(n_frames, fps))
# Clean up 
video_writer.close()
video_reader.close()

程序輸出的Video效果如下,


image.png
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容