基于mmaction2的TimeSformer訓(xùn)練somethingv2數(shù)據(jù)集和自定義數(shù)據(jù)

mmaction2 部署

這里先在windows上部署測試
conda create -n mmaction2 --clone openmmlab
pip install -r requirements/build.txt
pip install -v -e .
注意mmcv-full 版本 小于1.4.2
測試

import torch
from mmaction.apis import init_recognizer, inference_recognizer

config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
device = 'cuda:0' # or 'cpu'
device = torch.device(device)

model = init_recognizer(config_file, device=device)
# inference the demo video
inference_recognizer(model, 'demo/demo.mp4')

數(shù)據(jù)集準(zhǔn)備

Something數(shù)據(jù)集是一個大型的帶有標(biāo)簽的記錄了人類與日常生活中的一些物體之間的動作數(shù)據(jù)集,動作的類別共174類something-V1和something-V2的主要區(qū)別就是V2的視頻數(shù)量更多了,從V1的108,499增加到了220,847。v2鏈接:https://pan.baidu.com/s/1NCqL7JVoFZO6D131zGls-A
提取碼:07ka
對數(shù)據(jù)及的劃分的代碼的話推薦TSM的作者放出來的劃分代碼,可以輕松根據(jù)原始的csv文件把數(shù)據(jù)集劃分成訓(xùn)練、驗證、以及測試數(shù)據(jù)集
https://github.com/mit-han-lab/temporal-shift-module/tree/master/tools
解壓拼接數(shù)據(jù)集
cat 20bn-something-something-v2-?? | tar zx
安裝ffmpeg
下載到本地
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz
解壓
tar -xvf ffmpeg-git-amd64-static.tar.xz
cd ffmpeg-git-20220302-amd64-static/

image.png

用代碼來批量將視頻轉(zhuǎn)為數(shù)據(jù)幀
里面調(diào)用了ffmpeg,可以在cmd那里修改參數(shù)

from __future__ import print_function, division
import os
import sys
import subprocess

def class_process(dir_path, dst_dir_path):
  class_path = dir_path
  if not os.path.isdir(class_path):
    return

  dst_class_path = dst_dir_path
  if not os.path.exists(dst_class_path):
    os.mkdir(dst_class_path)

  for file_name in os.listdir(class_path):
    if '.webm' not in file_name:
      continue
    name, ext = os.path.splitext(file_name)
    dst_directory_path = os.path.join(dst_class_path, name)

    video_file_path = os.path.join(class_path, file_name)
    try:
      if os.path.exists(dst_directory_path):
        if not os.path.exists(os.path.join(dst_directory_path, '000001.jpg')):
          subprocess.call('rm -r \"{}\"'.format(dst_directory_path), shell=True)
          print('remove {}'.format(dst_directory_path))
          os.mkdir(dst_directory_path)
        else:
          continue
      else:
        os.mkdir(dst_directory_path)
    except:
      print(dst_directory_path)
      continue
#調(diào)用ffmpeg工具進行分視頻幀
    cmd = 'ffmpeg -i \"{}\" -vf scale=-1:240 \"{}/%06d.jpg\"'.format(video_file_path, dst_directory_path)
    print(cmd)
#運行腳本
    subprocess.call(cmd, shell=True)
    print('\n')

if __name__=="__main__":
  print ("HELLO")
  dir_path = sys.argv[1]
  dst_dir_path = sys.argv[2]

  count=0
  for class_name in os.listdir(dir_path):
    print (count)
    count=count+1
    class_process(dir_path, dst_dir_path)
 

python video_jpg_ucf101_hmdb51.py /mnt/e/BaiduNetdiskDownload/somethingV2/20bn-something-something-v2/ /mnt/e/workspace/mmaction2/data/somethingv2/
這個跑了有4、5天大概

sthv2 數(shù)據(jù)訓(xùn)練

_base_ = ['../../_base_/default_runtime.py']

# model settings
model = dict(
    type='Recognizer3D',
    backbone=dict(
        type='TimeSformer',
        pretrained=  # noqa: E251
        'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth',  # noqa: E501
        num_frames=8,
        img_size=224,
        patch_size=16,
        embed_dims=768,
        in_channels=3,
        dropout_ratio=0.,
        transformer_layers=None,
        attention_type='divided_space_time',
        norm_cfg=dict(type='LN', eps=1e-6)),
    cls_head=dict(type='TimeSformerHead', num_classes=174, in_channels=768),
    # model training and testing settings
    train_cfg=None,
    test_cfg=dict(average_clips='prob'))

# dataset settings
#直接使用視頻格式
dataset_type = 'VideoDataset'
data_root = 'data/sthv2/videos'
data_root_val = 'data/sthv2/videos'
ann_file_train = 'data/sthv2/sthv2_train_list_videos.txt'
ann_file_val = 'data/sthv2/sthv2_val_list_videos.txt'
ann_file_test = 'data/sthv2/sthv2_val_list_videos.txt'
#轉(zhuǎn)為數(shù)據(jù)幀格式
#dataset_type = 'RawframeDataset'
#data_root = 'data/sthv2/rawframes'
#data_root_val = 'data/sthv2/rawframes'
#ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt'
#ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt'
#ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt'



img_norm_cfg = dict(
    mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_bgr=False)

train_pipeline = [
    dict(type='DecordInit'),   #由于是視頻,需要先加編解碼
    dict(type='SampleFrames', clip_len=8, frame_interval=30, num_clips=1), #數(shù)據(jù)幀采樣,表示沿時序維度方向,以間隔為8幀的方式,采集30幀圖像。
?#num_clips = N 相當(dāng)于對一個視頻進行 N 次 sample、測試,將結(jié)果 ensemble,如1表示采集一個clip(可以簡單理解為batch_size=1,后續(xù)其會被覆蓋)
    dict(type='DecordDecode'), #視頻數(shù)據(jù)需要解碼
    dict(type='RandomRescale', scale_range=(256, 320)),
    dict(type='RandomCrop', size=224),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'), # 調(diào)整輸出形狀
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), # 統(tǒng)一數(shù)據(jù)格式
    dict(type='ToTensor', keys=['imgs', 'label'])# 轉(zhuǎn)換為pytorch需要的Tensor數(shù)組
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=30,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=30,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 224)),
    dict(type='ThreeCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
data = dict(
    videos_per_gpu=2,#每個GPU加載2個視頻數(shù)據(jù),可以理解為batch_size
    workers_per_gpu=2, #每個GPU分配2個線程
    test_dataloader=dict(videos_per_gpu=1),
# 指定訓(xùn)練,驗證,測試數(shù)據(jù)集路徑文件夾配置
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
#評估指標(biāo)
evaluation = dict(
    interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])

# optimizer,模型訓(xùn)練的優(yōu)化器
optimizer = dict(
    type='SGD',
    lr=0.005/8/4,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys={ #凍結(jié)骨架的偏執(zhí),即不是訓(xùn)練backbone
            '.backbone.cls_token': dict(decay_mult=0.0),
            '.backbone.pos_embed': dict(decay_mult=0.0),
            '.backbone.time_embed': dict(decay_mult=0.0)
        }),
    weight_decay=1e-4,
    nesterov=True)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))

 
# learning policy,學(xué)習(xí)率的衰減策略
# lr_config = dict(policy='CosineAnealing', min_lr=0)
lr_config = dict(policy='step', step=[5,10])
total_epochs = 15

# runtime settings
checkpoint_config = dict(interval=1)  #多少代間隔保存一次模型
work_dir = './work_dirs/timesformer_divST_8x32x1_ssv2'

學(xué)習(xí)率根據(jù)GPU個數(shù)和batch大小改變,
原來是8個GPU * batchsize = 8 現(xiàn)在是1個GPU * batchsize=2,
lr=0.005/8/4

這里先使用的是video格式數(shù)據(jù),直接訓(xùn)練
從頭訓(xùn)練驗證
python tools/train.py configs/recognition/timesformer/timesformer_divST_8x32x1_ssv2.py --work-dir work_dirs/timesformer_divST_8x32x1_ssv2 --gpus 0
如果在windows 無gpu情況要設(shè)置gpu為0

也可加隨機數(shù)
python tools/train.py configs/recognition/timesformer/timesformer_divST_8x32x1_ssv2.py --work-dir work_dirs/timesformer_divST_8x32x1_ssv2
--validate --seed 0 --deterministic

斷點續(xù)練
python tools/train.py work_dirs/timesformer_divST_8x32x1_ssv2/timesformer_divST_8x32x1_ssv2.py --work-dir work_dirs/timesformer_divST_8x32x1_ssv2 --gpus 0 --resume-from work_dirs/timesformer_divST_8x32x1_ssv2/epoch_9.pth

驗證測試
python tools/test.py configs/recognition/timesformer/timesformer_divST_8x32x1_ssv2.py work_dirs/timesformer_divST_8x32x1_ssv2/epoch_6.pth --eval top_k_accuracy mean_class_accuracy --out result6.json
保存結(jié)果


image.png

調(diào)用攝像頭實時推理
python .\demo\webcam_demo.py .\work_dirs\timesformer_divST_8x32x1_ssv2\timesformer_divST_8x32x1_ssv2.py .\work_dirs\timesformer_divST_8x32x1_ssv2\epoch_15.pth .\tools\data\sthv2\label_map.txt --average-size 5 --threshold 0.2

自定義數(shù)據(jù)集

以tiny數(shù)據(jù)集為例,這里就兩個類,訓(xùn)練30個視頻,開發(fā)測試10個視頻,也是先使用視頻直接訓(xùn)練
tsn 測試

import os.path as osp

from mmaction.datasets import build_dataset
from mmaction.models import build_model
from mmaction.apis import train_model

import mmcv

from mmcv import Config
cfg = Config.fromfile('./configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py')

from mmcv.runner import set_random_seed

# Modify dataset type and path
cfg.dataset_type = 'VideoDataset'
cfg.data_root = 'data/kinetics400_tiny/train/'
cfg.data_root_val = 'data/kinetics400_tiny/val/'
cfg.ann_file_train = 'data/kinetics400_tiny/kinetics_tiny_train_video.txt'
cfg.ann_file_val = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'
cfg.ann_file_test = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'

#cfg.data.videos_per_gpu=1
#cfg.data.workers_per_gpu=1
cfg.data.test.type = 'VideoDataset'
cfg.data.test.ann_file = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'
cfg.data.test.data_prefix = 'data/kinetics400_tiny/val/'

cfg.data.train.type = 'VideoDataset'
cfg.data.train.ann_file = 'data/kinetics400_tiny/kinetics_tiny_train_video.txt'
cfg.data.train.data_prefix = 'data/kinetics400_tiny/train/'

cfg.data.val.type = 'VideoDataset'
cfg.data.val.ann_file = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'
cfg.data.val.data_prefix = 'data/kinetics400_tiny/val/'

# The flag is used to determine whether it is omnisource training
cfg.setdefault('omnisource', False)
# Modify num classes of the model in cls_head
cfg.model.cls_head.num_classes = 2
# We can use the pre-trained TSN model
cfg.load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './test'

# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.data.videos_per_gpu = cfg.data.videos_per_gpu // 16
cfg.optimizer.lr = cfg.optimizer.lr / 8 / 16
cfg.total_epochs = 10

# We can set the checkpoint saving interval to reduce the storage cost
cfg.checkpoint_config.interval = 5
# We can set the log print interval to reduce the the times of printing log
cfg.log_config.interval = 5

# Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)

# Save the best
cfg.evaluation.save_best='auto'

# Build the dataset
datasets = [build_dataset(cfg.data.train)]

# Build the recognizer
model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_model(model, datasets, cfg, distributed=False, validate=True)


from mmaction.apis import single_gpu_test
from mmaction.datasets import build_dataloader
from mmcv.parallel import MMDataParallel

# Build a test dataloader
dataset = build_dataset(cfg.data.test, dict(test_mode=True))
data_loader = build_dataloader(
        dataset,
        videos_per_gpu=1,
        workers_per_gpu=cfg.data.workers_per_gpu,
        dist=False,
        shuffle=False)
model = MMDataParallel(model, device_ids=[0])
outputs = single_gpu_test(model, data_loader)

eval_config = cfg.evaluation
eval_config.pop('interval')
eval_res = dataset.evaluate(outputs, **eval_config)
for name, val in eval_res.items():
    print(f'{name}: {val:.04f}')

同樣根據(jù) GPU個數(shù)和videos_per_gpu 數(shù)修改lr
主要跑通測試tsn

接著才是重點,要用TimeSformer訓(xùn)練

timesformer_divST_8x32x1_15e_kinetics_tiny.py

_base_ = ['../../_base_/runtimetiny.py']

# model settings
model = dict(
    type='Recognizer3D',
    backbone=dict(
        type='TimeSformer',
        pretrained=  # noqa: E251
        'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth',  # noqa: E501
        num_frames=8,
        img_size=224,
        patch_size=16,
        embed_dims=768,
        in_channels=3,
        dropout_ratio=0.,
        transformer_layers=None,
        attention_type='divided_space_time',
        norm_cfg=dict(type='LN', eps=1e-6)),
    cls_head=dict(type='TimeSformerHead', num_classes=2, in_channels=768),
    # model training and testing settings
    train_cfg=None,
    test_cfg=dict(average_clips='prob'))

# dataset settings
dataset_type = 'VideoDataset'
data_root = 'data/kinetics400_tiny/train'
data_root_val = 'data/kinetics400_tiny/val'
ann_file_train = 'data/kinetics400_tiny/kinetics_tiny_train_video.txt'
ann_file_val = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'
ann_file_test = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'

img_norm_cfg = dict(
    mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_bgr=False)

train_pipeline = [
    dict(type='DecordInit'),
    dict(type='SampleFrames', clip_len=8, frame_interval=32, num_clips=1),
    dict(type='DecordDecode'),
    dict(type='RandomRescale', scale_range=(256, 320)),
    dict(type='RandomCrop', size=224),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=32,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=32,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 224)),
    dict(type='ThreeCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
data = dict(
    videos_per_gpu=2,
    workers_per_gpu=2,
    test_dataloader=dict(videos_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))

evaluation = dict(
    interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])

# optimizer
optimizer = dict(
    type='SGD',
    lr=0.005/8,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys={
            '.backbone.cls_token': dict(decay_mult=0.0),
            '.backbone.pos_embed': dict(decay_mult=0.0),
            '.backbone.time_embed': dict(decay_mult=0.0)
        }),
    weight_decay=1e-4,
    nesterov=True)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))

# learning policy
lr_config = dict(policy='step', step=[5, 8])
total_epochs = 10

# runtime settings
checkpoint_config = dict(interval=1)
work_dir = './work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny'
 

python tools/train.py configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics_tiny.py --gpus 0

推理測試
cat tinyinfer.py
from mmaction.apis import inference_recognizer, init_recognizer
import os

# Choose to use a config and initialize the recognizer
config = 'configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics_tiny.py'
# Setup a checkpoint file to load
checkpoint = 'work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny/epoch_10.pth'
# Initialize the recognizer
model = init_recognizer(config, checkpoint, device='cuda:0')
# Use the recognizer to do inference


label = 'tools/data/kinetics/label_map_k2.txt'
labels = open(label).readlines()
labels = [x.strip() for x in labels]

path = 'data/kinetics400_tiny/val'  #  
for root, dirs, names in os.walk(path):
    for name in names:
        ext = os.path.splitext(name)[1]  #  
        if ext == '.mp4':
            video = os.path.join(root, name)
            results = inference_recognizer(model, video)

            #labels = open(label).readlines()
            #labels = [x.strip() for x in labels]
            results = [(labels[k[0]], k[1]) for k in results]
            print(name)
            for result in results:
                print(f'{result[0]}: ', result[1])


自己定義的標(biāo)簽文件,0為爬繩,1 為吹玻璃
cat tools/data/kinetics/label_map_k2.txt
climbing a rope
blowing glass

這里是輸出每個視頻對兩個類別的預(yù)測概率


image.png

日志
python tools/analysis/analyze_logs.py plot_curve work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny/20220403_010309.log.json --keys top1_acc --out acc1.pdf

image.png

日志分析

root@83c3d6970b59:/workspace# python tools/analysis/analyze_logs.py cal_train_time work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny/20220403_010309.log.json
-----Analyze train time of work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny/20220403_010309.log.json-----
slowest epoch 5, average time is 0.8540
fastest epoch 4, average time is 0.8354
time std over epochs is 0.0063
average iter time: 0.8425 s/iter

模型復(fù)雜度分析

/tools/analysis/get_flops.py 是根據(jù) flops-counter.pytorch 庫改編的腳本,用于計算輸入變量指定模型的 FLOPs 和參數(shù)量。

python tools/analysis/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]

其他模型部署相關(guān)

模型轉(zhuǎn)換

  1. 導(dǎo)出 MMAction2 模型為 ONNX 格式(實驗特性)
    /tools/deployment/pytorch2onnx.py 腳本用于將模型轉(zhuǎn)換為 ONNX 格式。 同時,該腳本支持比較 PyTorch 模型和 ONNX 模型的輸出結(jié)果,驗證輸出結(jié)果是否相同。 本功能依賴于 onnx 以及 onnxruntime,使用前請先通過 pip install onnx onnxruntime 安裝依賴包。 請注意,可通過 --softmax 選項在行為識別器末尾添加 Softmax 層,從而獲取 [0, 1] 范圍內(nèi)的預(yù)測結(jié)果。

對于行為識別模型,請運行:

python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify

對于時序動作檢測模型,請運行:

python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify

  1. 發(fā)布模型
    tools/deployment/publish_model.py 腳本用于進行模型發(fā)布前的準(zhǔn)備工作,主要包括:
    (1) 將模型的權(quán)重張量轉(zhuǎn)化為 CPU 張量。 (2) 刪除優(yōu)化器狀態(tài)信息。 (3) 計算模型權(quán)重文件的哈希值,并將哈希值添加到文件名后。
python tools/deployment/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}

例如,

python tools/deployment/publish_model.py work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth tsn_r50_1x1x3_100e_kinetics400_rgb.pth

最終,輸出文件名為 tsn_r50_1x1x3_100e_kinetics400_rgb-{hash id}.pth。

5- 指標(biāo)評價

tools/analysis/eval_metric.py 腳本通過輸入變量指定配置文件,以及對應(yīng)的結(jié)果存儲文件,計算某一評價指標(biāo)。

結(jié)果存儲文件通過 tools/test.py 腳本(通過參數(shù) --out ${RESULT_FILE} 指定)生成,保存了指定模型在指定數(shù)據(jù)集中的預(yù)測結(jié)果。

python tools/analysis/eval_metric.py ${CONFIG_FILE} ${RESULT_FILE} [--eval ${EVAL_METRICS}] [--cfg-options ${CFG_OPTIONS}] [--eval-options ${EVAL_OPTIONS}]

6- 打印完整配置

tools/analysis/print_config.py 腳本會解析所有輸入變量,并打印完整配置信息。

python tools/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]

檢查視頻
tools/analysis/check_videos.py 腳本利用指定視頻編碼器,遍歷指定配置文件視頻數(shù)據(jù)集中所有樣本,尋找無效視頻文件(文件破損或者文件不存在),并將無效文件路徑保存到輸出文件中。請注意,刪除無效視頻文件后,需要重新生成視頻文件列表。

python tools/analysis/check_videos.py ${CONFIG} [-h] [--options OPTIONS [OPTIONS ...]] [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]] [--output-file OUTPUT_FILE] [--split SPLIT] [--decoder ]
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容