【TVM系列七】TVMC介紹

一、前言

TVMC是TVM python包提供的一個工具,可以通過命令行的方式執(zhí)行auto-tuning,編譯,性能profiling以及模型運行。本文將根據(jù)TVM官網(wǎng)的指導文檔跟大家一起熟悉TVMC的使用。

二、TVMC使用示例

1、安裝

從github更新TVM到最新代碼后:

cd tvm/python
python gen_requirements.py
python setup.py build
python setup.py install

遇到問題:

Processing dependencies for tvm==0.9.dev915+g5c29e55be
Searching for synr==0.6.0
Reading https://pypi.python.org/simple/synr/
Couldn't find index page for 'synr' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.python.org/simple/
No local packages or working download links found for synr==0.6.0
error: Could not find suitable distribution for Requirement.parse('synr==0.6.0')

解決:

pip install synr
python setup.py build
python setup.py install

安裝完成后輸入tvmc --help會打印:

...
usage: tvmc [-v] [--version] [-h] {run,tune,compile} ...
TVM compiler driver
optional arguments:
-v, --verbose       increase verbosity
--version           print the version and exit
-h, --help          show this help message and exit.
commands:
{run,tune,compile}
run               run a compiled module
tune              auto-tune a model
compile           compile a model.
TVMC - TVM driver command-line interface
2、使用示例

TVMC是TVM python包的一個應用工具,安裝完TVM之后可以通過tvmc命令來執(zhí)行。接下來會使用tvmc跑一個圖像分類的模型:
(1)模型下載

使用onnx格式的resnet50模型:

pip install onnx onnxoptimizer
wget https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx

(2)模型編譯

tvmc compile --target "llvm" \
    --output resnet50-v2-7-tvm.tar \
    resnet50-v2-7.onnx

執(zhí)行完成后:

target={1: llvm -keys=cpu -link-params=0}
target={1: llvm -keys=cpu -link-params=0}, target_host=None
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.

會在當前目錄下生成 resnet50-v2-7-tvm.tar 壓縮包,解壓后可以看到里面包含了三個文件:

-rw-rw-r-- 1 user user  88K Mar 29 16:42 mod.json
-rw-rw-r-- 1 user user  98M Mar 29 16:42 mod.params
-rwxrwxr-x 1 user user 582K Mar 29 16:42 mod.so
  • mod.so:tvm模型,編譯成C++動態(tài)庫的形式,可由 TVM runtime加載;

  • mod.json:TVM Relay的計算圖,主要描述了模型的節(jié)點以及各各節(jié)點輸入輸出的類型與參數(shù)等;

  • mod.params:模型訓練的參數(shù)。
    (3)調(diào)優(yōu)

TVMC默認使用xgboost調(diào)優(yōu)器進行調(diào)優(yōu),需要指定調(diào)優(yōu)記錄的輸出路徑文件,整個調(diào)優(yōu)過程本質(zhì)上是一個參數(shù)選擇的過程,對不同的算子使用不同的參數(shù)配置,然后選擇模型運行最快的那一組參數(shù),屬于一種參數(shù)空間搜索的策略,一般情況下都會比較耗時。本例中通過指定--number和--repeat來限制調(diào)優(yōu)運行的總次數(shù):

tvmc tune --target "llvm" \
    --output resnet50-v2-7-autotuner_records.json \
    --number 10 \
    --repeat 10 \
    resnet50-v2-7.onnx

整個過程完成之后,得到一個調(diào)優(yōu)記錄文件resnet50-v2-7-autotuner_records.json:

{"input": ["llvm -keys=cpu -link-params=0", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 224, 224], "float32"], ["TENSOR", [64, 3, 7, 7], "float32"], [2, 2], [3, 3, 3, 3], [1, 1], "NCHW", "NCHW", "float32     "], {}], "config": {"index": 42, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 64]], ["unroll_kw", "ot", true]]}, "result": [[0.0098876     83900000001], 0, 6.844898462295532, 1648601601.8722398], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["llvm -keys=cpu -link-params=0", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 224, 224], "float32"], ["TENSOR", [64, 3, 7, 7], "float32"], [2, 2], [3, 3, 3, 3], [1, 1], "NCHW", "NCHW", "float32     "], {}], "config": {"index": 172, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 4]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}, "result": [[0.008670     090600000001], 0, 1.1055541038513184, 1648601602.1772938], "version": 0.2, "tvm_version": "0.9.dev0"}
...

可以看到每一組數(shù)據(jù)包括輸入"input",配置"config",以及運行結(jié)果"result"。
(4)編譯調(diào)優(yōu)模型

有了調(diào)優(yōu)的記錄文件,就可以重新編譯調(diào)優(yōu)模型:

tvmc compile --target "llvm" \
    --output resnet50-v2-7-tvm_autotuned.tar \
    --tuning-records resnet50-v2-7-autotuner_records.json \
    resnet50-v2-7.onnx

(5)結(jié)果對比

  • 輸入數(shù)據(jù)預處理

新建下面的代碼文件保存為 tvmc_pre_process.py

from tvm.contrib.download import download_testdata
from PIL import Image  # 需要安裝依賴庫:pip install pillow
import numpy as np

img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")

# resnet50 要求輸入圖像大小為224x224
resized_image = Image.open(img_path).resize((224, 224))  
img_data = np.asarray(resized_image).astype("float32")

# ONNX使用 NCHW 格式的輸入,將NHWC轉(zhuǎn)為NCHW
img_data = np.transpose(img_data, (2, 0, 1))

# 根據(jù)ImageNet數(shù)據(jù)庫給的參數(shù)歸一化輸入
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
      norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]

# 加上batch維
img_data = np.expand_dims(norm_img_data, axis=0)

# 保存為.npz格式,TVMC已經(jīng)提供了對這種數(shù)據(jù)格式的支持
np.savez("imagenet_cat", data=img_data)
輸出數(shù)據(jù)后處理

新建下面的代碼文件保存為 tvmc_post_process.py

import os.path
import numpy as np
from scipy.special import softmax
from tvm.contrib.download import download_testdata

# 下載標簽
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

output_file = "predictions.npz"

# 讀取輸出結(jié)果
if os.path.exists(output_file):
    with np.load(output_file) as data:
        scores = softmax(data["output_0"]) # 對輸出數(shù)據(jù)求softmax
        scores = np.squeeze(scores)  # 將scores的shape中為1的維度去掉
        ranks = np.argsort(scores)[::-1]  # 獲取scores從小到大的索引值

        for rank in ranks[0:5]:  # 打印前top 5的分值
            print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
  • 未調(diào)優(yōu)模型結(jié)果
python tvmc_pre_process.py
tvmc run --inputs imagenet_cat.npz \
    --output predictions.npz \
    --print-time \
    --repeat 100 \
    resnet50-v2-7-tvm.tar
python tvmc_post_process.py

輸出:

Execution time summary:
mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
148.2297     142.3968     217.4981     137.0922     15.2046
class='n02123045 tabby, tabby cat' with probability=0.610552
class='n02123159 tiger cat' with probability=0.367180
class='n02124075 Egyptian cat' with probability=0.019365
class='n02129604 tiger, Panthera tigris' with probability=0.001273
class='n04040759 radiator' with probability=0.000261
  • 調(diào)優(yōu)模型結(jié)果
python tvmc_pre_process.py
tvmc run --inputs imagenet_cat.npz \
    --output predictions.npz \
    --print-time \
    --repeat 100 \
    resnet50-v2-7-tvm_autotuned.tar
python tvmc_post_process.py

輸出:

Execution time summary:
mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
108.6551     108.0915     119.1327     106.1360      2.0859
class='n02123045 tabby, tabby cat' with probability=0.610552
class='n02123159 tiger cat' with probability=0.367179
class='n02124075 Egyptian cat' with probability=0.019365
class='n02129604 tiger, Panthera tigris' with probability=0.001273
class='n04040759 radiator' with probability=0.000261

可以看到調(diào)優(yōu)后的速度還是有比較大的提升,而精度幾乎完全不受影響。

三、基本實現(xiàn)流程

TVMC的代碼在 python/tvm/driver/tvmc目錄下,由于TVMC提供了很多的命令選項,這里我們主要看編譯、調(diào)優(yōu)與運行這三個子命令的實現(xiàn)流程。

1、main

tvmc的命令解析器是通過注冊的形式添加的,首先在main.py中定義一個全局的列表REGISTERD_PARSE和一個注冊函數(shù)register_parser():

REGISTERED_PARSER = []
def register_parser(make_subparser):
    REGISTERED_PARSER.append(make_subparser)
    return make_subparser

在新增命令解析時,比如增加子命令compile的命令解析器,在compiler.py中的實現(xiàn)如下:

@register_parser
def add_compile_parser(subparsers, _):
    parser = subparsers.add_parser("compile", help="compile a model.")
    parser.set_defaults(func=drive_compile) # 設置func屬性
    ...

這就相當于將函數(shù) add_compile_parser 對象添加到 REGISTERD_PARSER列表中。然后在main.py中的_main()函數(shù)遍歷這個列表,并執(zhí)行相應的函數(shù):

def _main(argv):
    ...
    for make_subparser in REGISTERED_PARSER:
        make_subparser(subparser, parser)
    ...
    args = parser.parse_args(argv)
    ...
    try:
        return args.func(args)  # 執(zhí)行func屬性所指向的函數(shù)
    except TVMCImportError as err:
        ...

此時args的內(nèi)容為:

Namespace(FILE='resnet50-v2-7.onnx', cross_compiler='', cross_compiler_options='', desired_layout=None, disabled_pass=[''], dump_code='', executor='graph',
......,
func=, input_shapes=None, model_format=None, opt_level=3, output='resnet50-v2-7-tvm_autotuned.tar', output_format='so', pass_config=None, runtime='cpp',target='llvm',
......
tuning_records='resnet50-v2-7-autotuner_records.json', verbose=0, version=False)

其中func=<function drive_compile at 0x7f12ca2f78b0>就是子命令要執(zhí)行的函數(shù)。

2、compiler

drive_compile()就兩個主要的操作:

(1)通過frontends.load_model()加載模型

在frontends.py中,封裝的前端接口有:

ALL_FRONTENDS = [
    KerasFrontend,
    OnnxFrontend,
    TensorflowFrontend,
    TFLiteFrontend,
    PyTorchFrontend,
    PaddleFrontend,
]

frontends.py首先定義一個抽象基類,各個子類需要實現(xiàn)這三個函數(shù),其中在load函數(shù)中不同前端會根據(jù)各自的接口從模型文件路徑讀取model,然后調(diào)用 relay.frontend.from_xxx(model, ...) 函數(shù)進行相應的tvm模型加載與轉(zhuǎn)換:

class Frontend(ABC):
    @staticmethod
    @abstractmethod
    def name():
        # 前端的名稱
    @staticmethod
    @abstractmethod
    def suffixes():
        # 模型文件的后綴

    @abstractmethod
    def load(self, path, shape_dict=None, **kwargs):
        # 模型加載函數(shù)
        
class OnnxFrontend(Frontend):
    @staticmethod
    def name():
        return "onnx"
    @staticmethod
    def suffixes():
        return ["onnx"]
    def load(self, path, shape_dict=None, **kwargs):
        onnx = lazy_import("onnx")
        model = onnx.load(path)
        return relay.frontend.from_onnx(model, shape=shape_dict, **kwargs)

(2)通過compile_model()編譯模型

這個函數(shù)的主要做兩個事情,一個是調(diào)用relay.build()執(zhí)行編譯,一個是導出編譯結(jié)果:

graph_module = relay.build(mod, target=tvm_target, executor=executor, runtime=runtime, params=params)
...
package_path = tvmc_model.export_package(graph_module, package_path, cross, cross_options,output_format)
# 最后會調(diào)用export_classic_format(),它會將graph_module內(nèi)部的模型信息寫入到相關的文件中并以tar包的形式保存下來
...
3、autotuner

drive_tune()函數(shù)所做的工作是根據(jù)配置決定硬件參數(shù),并判斷是否進行rpc遠端調(diào)優(yōu),然后調(diào)用tune_model()進行相應的處理。目前TVMC支持兩種自動調(diào)優(yōu)方式,分別為auto-scheduling和autotvm,默認使用的是autotvm,它們對應的最終調(diào)優(yōu)任務啟動接口是schedule_tasks()和tune_tasks()。這里主要看默認方式的實現(xiàn):

def tune_tasks(
    tasks: List[autotvm.task.Task],
    log_file: str,
    measure_option: autotvm.measure_option,
    tuner: str,
    trials: int,
    early_stopping: Optional[int] = None,
    tuning_records: Optional[str] = None,
):
    if not tasks:
        logger.warning("there were no tasks found to be tuned")
        return

    if not early_stopping:
        early_stopping = trials
    
    # 多任務處理
    for i, tsk in enumerate(tasks):
        prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))

        # 創(chuàng)建調(diào)優(yōu)器
        if tuner in ("xgb", "xgb-rank"):
            tuner_obj = XGBTuner(tsk, loss_type="rank")
        elif tuner == "xgb_knob":
            tuner_obj = XGBTuner(tsk, loss_type="rank", feature_type="knob")
        elif tuner == "ga":
            tuner_obj = GATuner(tsk, pop_size=50)
        elif tuner == "random":
            tuner_obj = RandomTuner(tsk)
        elif tuner == "gridsearch":
            tuner_obj = GridSearchTuner(tsk)
        else:
            raise TVMCException("invalid tuner: %s " % tuner)

        # 如果有調(diào)優(yōu)的歷史記錄,可以從歷史記錄開始調(diào)優(yōu),相當于"斷點續(xù)調(diào)"
        if tuning_records and os.path.exists(tuning_records):  
            logger.info("loading tuning records from %s", tuning_records)
            start_time = time.time()
            tuner_obj.load_history(autotvm.record.load_from_file(tuning_records))
            logging.info("loaded history in %.2f sec(s)", time.time() - start_time)

        tuner_obj.tune(
            n_trial=min(trials, len(tsk.config_space)),
            early_stopping=early_stopping,
            measure_option=measure_option,
            callbacks=[
                autotvm.callback.progress_bar(trials, prefix=prefix),
                autotvm.callback.log_to_file(log_file),
            ],
        )
4、runner

drive_run()主要是判斷是否使用rpc遠端執(zhí)行,調(diào)用run_module()運行模型推理,最后輸出和保存推理結(jié)果。這里摘取run_module()的主要步驟進行分析,把micro-tvm和rpc相關的部分略掉:

def run_module(tvmc_package: TVMCPackage, device: str,...):
    ...
    # 加載編譯好的模型.so庫
    session.upload(tvmc_package.lib_path)
    lib = session.load_module(tvmc_package.lib_name)
    ...
    # 設置運行的目標設備
    if device == "cuda":
        dev = session.cuda()
    elif device == "cl":
        dev = session.cl()
    ...
    else:
        dev = session.cpu()
    ...
    # 創(chuàng)建 module 對象
    module = executor.create(tvmc_package.graph, lib, dev)
    ...
    # 加載模型訓練參數(shù)
    module.load_params(tvmc_package.params)
    ...
    # 設置模型輸入數(shù)據(jù)
    shape_dict, dtype_dict = module.get_input_info()
    inputs_dict = make_inputs_dict(shape_dict, dtype_dict, inputs, fill_mode)
    module.set_input(**inputs_dict)
    ...
    # 模型推理
    times = module.benchmark(dev, number=number, repeat=repeat, end_to_end=end_to_end)
    ...
    # 獲取推理輸出結(jié)果
    num_outputs = module.get_num_outputs()
    outputs = {}
    for i in range(num_outputs):
      output_name = "output_{}".format(i)
      outputs[output_name] = module.get_output(i).numpy()
     
    return TVMCResult(outputs, times)

四、總結(jié)

本文介紹了TVMC工具的使用以及基本的實現(xiàn)流程,TVMC提供了很豐富的命令選項,可以說TVMC是一個很好的TVM Python API使用范例,建議感興趣的同學可以通過深入了解。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容