背景
tensorrt是nvidia開發(fā)的模型推理框架, 對于各個框架的模型推理都有比較高的加速比.不過tensorrt只支持加載caffe, onnx模型(官方建議tensorflow模型轉(zhuǎn)UFF, 不過一般還是轉(zhuǎn)為onnx). 所以首先需要將各個框架的模型保存為合適的格式, 這里我們將mxnet模型轉(zhuǎn)為onnx.
mxnet雖然有API支持直接轉(zhuǎn)onnx, 但是經(jīng)常會出現(xiàn)op不存在或者推理結(jié)果不一致的情況. 這里主要是梳理模型轉(zhuǎn)換過程中遇到的一些問題及解決方案(mxnet已經(jīng)沒幾個人在維護(hù)了....)
轉(zhuǎn)換
轉(zhuǎn)換很簡單, 通過內(nèi)置API導(dǎo)出即可, 只是需要首先修復(fù)一些轉(zhuǎn)化過程中可能存在的問題, 最后需要把batch維度設(shè)置成動態(tài)的(如果輸入的尺寸也是動態(tài)的, 設(shè)置方法一致)
轉(zhuǎn)化腳本, 見代碼:
import onnx
import numpy as np
import mxnet as mx
from mxnet.contrib import onnx as onnx_mx
def mxnet_model_fix(input_symbol_path, input_params_path, rewrite = True):
# if some bug need fixed.
pass
def export_onnx(input_symbol_path, input_params_path, input_shape, precision, export_onnx_path):
# mxnet_model_fix(input_symbol_path, input_params_path, rewrite=True)
onnx_mx.export_model(input_symbol_path, input_params_path, [input_shape], precision, export_onnx_path, verbose=True)
onnx.checker.check_model(export_onnx_path)
# set batch dimention to be dynamic
model = onnx.load(export_onnx_path)
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = "?"
onnx.save(model, export_onnx_path)
if __name__ == "__main__":
sym_path = "./model-symbol.json"
params_path = "./model-0000.params"
precision = np.float32
input_shape = (1, 3, 224, 224)
export_onnx_path = "./model.onnx"
export_onnx(sym_path, params_path, input_shape, precision, export_onnx_path)
推理
見代碼:
import cv2
import numpy as np
import mxnet as mx
import onnxruntime as ort
mxnet_model_path = "./model"
onnx_model_path = "./model.onnx"
image_path = "./image.jpg"
# format input
img = cv2.imread(image_path)
# img = cv2.resize(img, (224, 224))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.transpose(img, (2, 0, 1))
input_blob = np.expand_dims(img, axis=0).astype(np.float32) # NCHW
# mxnet runtime
sym, params, aux_params = mx.model.load_checkpoint(mxnet_model_path, 0)
model = mx.mod.Module(symbol=sym, context=mx.cpu(), label_names=None)
model.bind(data_shapes=[('data', input_blob.shape)])
model.set_params(params, aux_params)
# mxnet input
mx_data = mx.nd.array(input_blob)
mx_db = mx.io.DataBatch(data=(mx_data,))
# mxnet predict
model.forward(mx_db, is_train=False)
mxnet_result = model.get_outputs()[0].asnumpy()
# onnx runtime
ort_session = ort.InferenceSession(onnx_model_path)
onnx_input_name = ort_session.get_inputs()[0].name
onnx_outputs = ort_session.get_outputs()[0].name
# set input and predict
onnx_result = ort_session.run([onnx_outputs], input_feed={onnx_input_name: input_blob})
onnx_result = onnx_result[0]
print("######mxnet result#########")
print(mxnet_result)
print("######onnx result##########")
print(onnx_result)
可能遇到的問題
op名稱不同
- SoftmaxActivation
mxnet官方文檔顯示這個op已經(jīng)廢除, 但實際上softmax的實現(xiàn)和這個op一致, 我們可以直接替換
解決方案: 將symbol.json中的SoftmaxActivation修改為softmax, 并修改attrs, 見下圖:
SoftmaxActivation
op不存在
- Upsampling
mxnet沒有實現(xiàn)這個op, 如果使用deconv的話,可能存在一定的diff, 這里使用resize實現(xiàn).
解決方案: 在mxnet包的mxnet/contrib/onnx/mx2onnx/_op_translations.py中添加實現(xiàn), 見代碼:
def create_helper_tensor_node(input_vals, output_name, kwargs):
"""create extra tensor node from numpy values"""
data_type = onnx.mapping.NP_TYPE_TO_TENSOR_TYPE[input_vals.dtype]
tensor_node = onnx.helper.make_tensor_value_info(
name=output_name,
elem_type=data_type,
shape=input_vals.shape
)
kwargs["initializer"].append(
onnx.helper.make_tensor(
name=output_name,
data_type=data_type,
dims=input_vals.shape,
vals=input_vals.flatten().tolist(),
raw=False,
)
)
return tensor_node
@mx_op.register("UpSampling")
def convert_upsample(node, **kwargs):
"""Map MXNet's UpSampling operator attributes to onnx's Upsample operator
and return the created node.
"""
name, input_nodes, attrs = get_inputs(node, kwargs)
sample_type = attrs.get('sample_type', 'nearest')
sample_type = 'linear' if sample_type == 'bilinear' else sample_type
scale = convert_string_to_list(attrs.get('scale'))
scaleh = scalew = float(scale[0])
if len(scale) > 1:
scaleh = float(scale[0])
scalew = float(scale[1])
scale = np.array([1.0, 1.0, scaleh, scalew], dtype=np.float32)
roi = np.array([], dtype=np.float32)
node_roi=create_helper_tensor_node(roi, name+'roi', kwargs)
node_sca=create_helper_tensor_node(scale, name+'scale', kwargs)
node = onnx.helper.make_node(
'Resize',
inputs=[input_nodes[0], name+'roi', name+'scale'],
outputs=[name],
coordinate_transformation_mode='asymmetric',
mode=sample_type,
nearest_mode='floor',
name=name
)
return [node_roi, node_sca, node]
- Crop
mxnet官方文檔中提示使用slice代替(那為啥不直接在模型轉(zhuǎn)化中注冊這個op呢?)
解決方案: 在mxnet包的mxnet/contrib/onnx/mx2onnx/_op_translations.py中添加實現(xiàn), 見代碼:
def create_helper_shape_node(input_node, node_name):
"""create extra transpose node for dot operator"""
trans_node = onnx.helper.make_node(
'Shape',
inputs=[input_node],
outputs=[node_name],
name=node_name
)
return trans_node
@mx_op.register("Crop")
def convert_crop(node, **kwargs):
"""Map MXNet's crop operator attributes to onnx's Crop operator
and return the created node.
"""
name, inputs, attrs = get_inputs(node, kwargs)
start=np.array([0, 0, 0, 0], dtype=np.int) #index是int類型
start_node=create_helper_tensor_node(start, name+'__starts', kwargs)
shape_node = create_helper_shape_node(inputs[1], inputs[1]+'__shape')
crop_node = onnx.helper.make_node(
"Slice",
inputs=[inputs[0], name+'__starts', inputs[1]+'__shape'], #data、start、end
outputs=[name],
name=name
)
logging.warning(
"Using an experimental ONNX operator: Crop. " \
"Its definition can change.")
return [start_node, shape_node, crop_node]
op結(jié)果不一致
- softmax
onnx實現(xiàn)的softmax在處理多維輸入(NCHW)時存在問題
解決方案: 在mxnet包的mxnet/contrib/onnx/mx2onnx/_op_translations.py中添加實現(xiàn), 并注釋原op, 見代碼:
@mx_op.register("softmax")
def convert_softmax(node, **kwargs):
"""Map MXNet's softmax operator attributes to onnx's Softmax operator
and return the created node.
"""
name, input_nodes, attrs = get_inputs(node, kwargs)
axis = int(attrs.get("axis", -1))
c_softmax_node = []
axis=-1
transpose_node1 = onnx.helper.make_node(
"Transpose",
inputs=input_nodes,
perm=(0,2,3,1), #NCHW--NHWC--(NHW,C)
name=name+'_tr1',
outputs=[name+'_tr1']
)
softmax_node = onnx.helper.make_node(
"Softmax",
inputs=[name+'_tr1'],
axis=axis,
name=name+'',
outputs=[name+'']
)
transpose_node2 = onnx.helper.make_node(
"Transpose",
inputs=[name+''],
perm=(0,3,1,2), #NHWC--NCHW
name=name+'_tr2',
outputs=[name+'_tr2']
)
c_softmax_node.append(transpose_node1)
c_softmax_node.append(softmax_node)
c_softmax_node.append(transpose_node2)
return c_softmax_node
- BatchNorm
mxnet的BatchNorm中, gamma和beta都是可學(xué)習(xí)參數(shù), 當(dāng)fix_gamma為True時,gamma被設(shè)置為1, 梯度設(shè)置為0.
如果一個BatchNorm op的fix_gamma設(shè)置為True, 但是保存的gamma值不為1, mxnet推理的時候會將gamma置為1, 沒有問題, 但是onnx在推理的時候使用的卻是實際保存的gamma值, 因此會導(dǎo)致結(jié)果不一致.
解決方案: 將mxnet的.params中, fix_gamma=True的batchnorm的gamma值修改為1.
