[yolo] - 如何理解yolo(Darknet)的cfg文件中的變量?

yolo的cfg文件內(nèi)容比較豐富,可以用于配置很多網(wǎng)絡(luò)參數(shù),暫時我還未發(fā)現(xiàn)有特別詳細的介紹,根據(jù)網(wǎng)絡(luò)上零星的描述,現(xiàn)整理如下:

來自darknet原著作者的解釋

  1. saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV
    The larger the value, the more invariance would neural network to change of lighting and color of the objects.

  2. steps and scales values - steps is a checkpoints (number of itarations) at which scales will be applied, scales is a coefficients at which learning_rate will be multipled at this checkpoints.
    Determines how the learning_rate will be changed during increasing number of iterations during training.

  3. anchors, bias_match
    anchors are frequent initial <width,height> of objects in terms of output network resolution.
    bias_match used only for training, if bias_match=1 then detected object will have <width,height> the same as in one of anchor, else if bias_match=0 then <width,height> of anchor will be refined by a neural network:

    darknet/src/region_layer.c

    Lines 275 to 283 in c190406

    | | box pred = get_region_box(l.output, l.biases, n, index, i, j, l.w, l.h); |
    | | if(l.bias_match){ |
    | | pred.w = l.biases[2n]; |
    | | pred.h = l.biases[2
    n+1]; |
    | | if(DOABS){ |
    | | pred.w = l.biases[2n]/l.w; |
    | | pred.h = l.biases[2
    n+1]/l.h; |
    | | } |
    | | } |

    If you train with height=416,width=416,random=0, then max values of anchors will be 13,13.
    But if you train with random=1, then max input resolution can be 608x608, and max values of anchors can be 19,19.

  4. jitter, rescore, thresh
    jitter can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects:

    darknet/src/data.c

    Lines 513 to 528 in c190406

    | | int dw = (owjitter); |
    | | int dh = (oh
    jitter); |
    | | |
    | | int pleft = rand_uniform(-dw, dw); |
    | | int pright = rand_uniform(-dw, dw); |
    | | int ptop = rand_uniform(-dh, dh); |
    | | int pbot = rand_uniform(-dh, dh); |
    | | |
    | | int swidth = ow - pleft - pright; |
    | | int sheight = oh - ptop - pbot; |
    | | |
    | | float sx = (float)swidth / ow; |
    | | float sy = (float)sheight / oh; |
    | | |
    | | int flip = random_gen()%2; |
    | | image cropped = crop_image(orig, pleft, ptop, swidth, sheight); |

rescore determines what the loss (delta, cost, ...) function will be used - more about this: #185 (comment)

darknet/src/region_layer.c

Lines 302 to 305 in c190406

| | l.delta[best_index + 4] = l.object_scale * (1 - l.output[best_index + 4]) * logistic_gradient(l.output[best_index + 4]); |
| | if (l.rescore) { |
| | l.delta[best_index + 4] = l.object_scale * (iou - l.output[best_index + 4]) * logistic_gradient(l.output[best_index + 4]); |
| | } |

thresh is a minimum IoU when should be used delta_region_class() during training:

darknet/src/region_layer.c

Line 235 in c190406

| | if (best_iou > l.thresh) { |


  1. object_scale, noobject_scale, class_scale, coord_scale values - all used for training
  • object_scale used for loss (delta, cost, ...) function for objects: #185 (comment)

  • noobject_scale - used for loss (delta, cost, ...) function for objects and backgrounds:

    darknet/src/region_layer.c

    Lines 232 to 233 in c190406

    | | l.delta[index + 4] = l.noobject_scale * ((0 - l.output[index + 4]) * logistic_gradient(l.output[index + 4])); |
    | | if(l.classfix == -1) l.delta[index + 4] = l.noobject_scale * ((best_iou - l.output[index + 4]) * logistic_gradient(l.output[index + 4])); |

  • class_scale - used as scale in the delta_region_class():

    darknet/src/region_layer.c

    Line 108 in c190406

void delta_region_class(float *output, float *delta, int index, int class, int classes, tree *hier, float scale, float *avg_cat) |

float delta_region_box(box truth, float *x, float *biases, int n, int index, int i, int j, int w, int h, float *delta, float scale) |

  1. absolute - isn't used

來自Stack Overflow上的解釋

Here is my current understanding of some of the variables. Not necessarily correct though:

[net]

  • batch: That many images+labels are used in the forward pass to compute a gradient and update the weights via backpropagation.
  • subdivisions: The batch is subdivided in this many "blocks". The images of a block are ran in parallel on the gpu.
  • decay: Maybe a term to diminish the weights to avoid having large values. For stability reasons I guess.
  • channels: Better explained in this image :

On the left we have a single channel with 4x4 pixels, The reorganization layer reduces the size to half then creates 4 channels with adjacent pixels in different channels.

  • momentum: I guess the new gradient is computed by momentum * previous_gradient + (1-momentum) * gradient_of_current_batch. Makes the gradient more stable.
  • adam: Uses the adam optimizer? Doesn't work for me though
  • burn_in: For the first x batches, slowly increase the learning rate until its final value (your learning_rate parameter value). Use this to decide on a learning rate by monitoring until what value the loss decreases (before it starts to diverge).
  • policy=steps: Use the steps and scales parameters below to adjust the learning rate during training
  • steps=500,1000: Adjust the learning rate after 500 and 1000 batches
  • scales=0.1,0.2: After 500, multiply the LR by 0.1, then after 1000 multiply again by 0.2
  • angle: augment image by rotation up to this angle (in degree)

layers

  • filters: How many convolutional kernels there are in a layer.
  • activation: Activation function, relu, leaky relu, etc. See src/activations.h
  • stopbackward: Do backpropagation until this layer only. Put it in the panultimate convolution layer before the first yolo layer to train only the layers behind that, e.g. when using pretrained weights.
  • random: Put in the yolo layers. If set to 1 do data augmentation by resizing the images to different sizes every few batches. Use to generalize over object sizes.

Many things are more or less self-explanatory (size, stride, batch_normalize, max_batches, width, height). If you have more questions, feel free to comment.

Again, please keep in mind that I am not 100% certain about many of those.

以上內(nèi)容摘抄自:
https://github.com/AlexeyAB/darknet/issues/279
https://stackoverflow.com/questions/50390836/understanding-darknets-yolo-cfg-config-files

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容