YOLO,是You Only Look Once的縮寫,一種基于深度卷積神經(jīng)網(wǎng)絡(luò)的物體檢測(cè)算法,YOLO v3是YOLO的第3個(gè)版本,檢測(cè)算法更快更準(zhǔn),2018年4月8日。
本文源碼:https://github.com/SpikeKing/keras-yolo3-detection
歡迎Follow我的GitHub:https://github.com/SpikeKing

YOLO
數(shù)據(jù)集
YOLO v3已經(jīng)提供COCO(Common Objects in Context)數(shù)據(jù)集的模型參數(shù),支持直接用于物體檢測(cè),模型248M,下載:
wget https://pjreddie.com/media/files/yolov3.weights
將模型參數(shù)轉(zhuǎn)換為Keras的模型參數(shù),模型248.6M,轉(zhuǎn)換:
python convert.py -w yolov3.cfg model_data/yolov3.weights model_data/yolo_weights.h5
畫出網(wǎng)絡(luò)結(jié)構(gòu):
plot_model(model, to_file='./model_data/model.png', show_shapes=True, show_layer_names=True) # 網(wǎng)絡(luò)圖
COCO含有80個(gè)類別:
person(人)
bicycle(自行車) car(汽車) motorbike(摩托車) aeroplane(飛機(jī)) bus(公共汽車) train(火車) truck(卡車) boat(船)
traffic light(信號(hào)燈) fire hydrant(消防栓) stop sign(停車標(biāo)志) parking meter(停車計(jì)費(fèi)器) bench(長(zhǎng)凳)
bird(鳥) cat(貓) dog(狗) horse(馬) sheep(羊) cow(牛) elephant(大象) bear(熊) zebra(斑馬) giraffe(長(zhǎng)頸鹿)
backpack(背包) umbrella(雨傘) handbag(手提包) tie(領(lǐng)帶) suitcase(手提箱)
frisbee(飛盤) skis(滑雪板雙腳) snowboard(滑雪板) sports ball(運(yùn)動(dòng)球) kite(風(fēng)箏) baseball bat(棒球棒) baseball glove(棒球手套) skateboard(滑板) surfboard(沖浪板) tennis racket(網(wǎng)球拍)
bottle(瓶子) wine glass(高腳杯) cup(茶杯) fork(叉子) knife(刀)
spoon(勺子) bowl(碗)
banana(香蕉) apple(蘋果) sandwich(三明治) orange(橘子) broccoli(西蘭花) carrot(胡蘿卜) hot dog(熱狗) pizza(披薩) donut(甜甜圈) cake(蛋糕)
chair(椅子) sofa(沙發(fā)) pottedplant(盆栽植物) bed(床) diningtable(餐桌) toilet(廁所) tvmonitor(電視機(jī))
laptop(筆記本) mouse(鼠標(biāo)) remote(遙控器) keyboard(鍵盤) cell phone(電話)
microwave(微波爐) oven(烤箱) toaster(烤面包器) sink(水槽) refrigerator(冰箱)
book(書) clock(鬧鐘) vase(花瓶) scissors(剪刀) teddy bear(泰迪熊) hair drier(吹風(fēng)機(jī)) toothbrush(牙刷)
YOLO的默認(rèn)anchors是9個(gè),對(duì)應(yīng)三個(gè)尺度,每個(gè)尺度含有3個(gè)anchors,如下:
10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
檢測(cè)器
YOLO檢測(cè)類的構(gòu)造器:
- anchors、model、classes是參數(shù)文件,其中,anchors可以使用默認(rèn),但是model與classes必須相互匹配;
- score和iou是檢測(cè)參數(shù),即置信度閾值和交叉區(qū)域閾值,置信度閾值避免誤檢,交叉區(qū)域閾值避免物體重疊;
-
self.class_names、self.anchors,讀取類別和anchors; -
self.sess是TensorFlow的上下文環(huán)境; -
self.model_image_size,檢測(cè)圖片尺寸,將原圖片同比例resize檢測(cè)尺寸,空白填充; -
self.generate()是參數(shù)流程,輸出框(boxes)、置信度(scores)和類別(classes);
源碼:
class YOLO(object):
def __init__(self):
self.anchors_path = 'configs/yolo_anchors.txt' # anchors
self.model_path = 'model_data/yolo_weights.h5' # 模型文件
self.classes_path = 'configs/coco_classes.txt' # 類別文件
self.score = 0.3 # 置信度閾值
# self.iou = 0.45
self.iou = 0.20 # 交叉區(qū)域閾值
self.class_names = self._get_class() # 獲取類別
self.anchors = self._get_anchors() # 獲取anchor
self.sess = K.get_session()
self.model_image_size = (416, 416) # fixed size or (None, None), hw
self.boxes, self.scores, self.classes = self.generate()
def _get_class(self):
classes_path = os.path.expanduser(self.classes_path)
with open(classes_path) as f:
class_names = f.readlines()
class_names = [c.strip() for c in class_names]
return class_names
def _get_anchors(self):
anchors_path = os.path.expanduser(self.anchors_path)
with open(anchors_path) as f:
anchors = f.readline()
anchors = [float(x) for x in anchors.split(',')]
return np.array(anchors).reshape(-1, 2)
參數(shù)流程:輸出框(boxes)、置信度(scores)和類別(classes)
- 在
yolo_body網(wǎng)絡(luò)中,加載yolo_model參數(shù); - 為不同的框,生成不同的顏色,隨機(jī);
- 將模型的輸出,經(jīng)過(guò)置信度和交叉區(qū)域,過(guò)濾框;
源碼:
def generate(self):
model_path = os.path.expanduser(self.model_path) # 轉(zhuǎn)換~
assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'
num_anchors = len(self.anchors) # anchors的數(shù)量
num_classes = len(self.class_names) # 類別數(shù)
# 加載模型參數(shù)
self.yolo_model = yolo_body(Input(shape=(None, None, 3)), 3, num_classes)
self.yolo_model.load_weights(model_path)
print('{} model, {} anchors, and {} classes loaded.'.format(model_path, num_anchors, num_classes))
# 不同的框,不同的顏色
hsv_tuples = [(float(x) / len(self.class_names), 1., 1.)
for x in range(len(self.class_names))] # 不同顏色
self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors)) # RGB
np.random.seed(10101)
np.random.shuffle(self.colors)
np.random.seed(None)
# 根據(jù)檢測(cè)參數(shù),過(guò)濾框
self.input_image_shape = K.placeholder(shape=(2,))
boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors, len(self.class_names),
self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou)
return boxes, scores, classes
檢測(cè)方法detect_image
第1步,圖像處理:
- 將圖像等比例轉(zhuǎn)換為檢測(cè)尺寸,檢測(cè)尺寸需要是32的倍數(shù),周圍進(jìn)行填充;
- 將圖片增加1維,符合輸入?yún)?shù)格式;
if self.model_image_size != (None, None): # 416x416, 416=32*13,必須為32的倍數(shù),最小尺度是除以32
assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required'
assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required'
boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) # 填充圖像
else:
new_image_size = (image.width - (image.width % 32), image.height - (image.height % 32))
boxed_image = letterbox_image(image, new_image_size)
image_data = np.array(boxed_image, dtype='float32')
print('detector size {}'.format(image_data.shape))
image_data /= 255. # 轉(zhuǎn)換0~1
image_data = np.expand_dims(image_data, 0) # 添加批次維度,將圖片增加1維
第2步,feed數(shù)據(jù),圖像,圖像尺寸;
out_boxes, out_scores, out_classes = self.sess.run(
[self.boxes, self.scores, self.classes],
feed_dict={
self.yolo_model.input: image_data,
self.input_image_shape: [image.size[1], image.size[0]],
K.learning_phase(): 0
})
第3步,繪制邊框,自動(dòng)設(shè)置邊框?qū)挾?,繪制邊框和類別文字,使用Pillow。
font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) # 字體
thickness = (image.size[0] + image.size[1]) // 512 # 厚度
for i, c in reversed(list(enumerate(out_classes))):
predicted_class = self.class_names[c] # 類別
box = out_boxes[i] # 框
score = out_scores[i] # 執(zhí)行度
label = '{} {:.2f}'.format(predicted_class, score) # 標(biāo)簽
draw = ImageDraw.Draw(image) # 畫圖
label_size = draw.textsize(label, font) # 標(biāo)簽文字
top, left, bottom, right = box
top = max(0, np.floor(top + 0.5).astype('int32'))
left = max(0, np.floor(left + 0.5).astype('int32'))
bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
print(label, (left, top), (right, bottom)) # 邊框
if top - label_size[1] >= 0: # 標(biāo)簽文字
text_origin = np.array([left, top - label_size[1]])
else:
text_origin = np.array([left, top + 1])
# My kingdom for a good redistributable image drawing library.
for i in range(thickness): # 畫框
draw.rectangle(
[left + i, top + i, right - i, bottom - i],
outline=self.colors[c])
draw.rectangle( # 文字背景
[tuple(text_origin), tuple(text_origin + label_size)],
fill=self.colors[c])
draw.text(text_origin, label, fill=(0, 0, 0), font=font) # 文案
del draw
目標(biāo)檢測(cè)
使用YOLO檢測(cè)器,檢測(cè)圖像:
def detect_img_for_test(yolo):
img_path = './dataset/a4386X6Te9ajq866zgOtWKLx18XGW.jpg'
image = Image.open(img_path)
r_image = yolo.detect_image(image)
r_image.show()
yolo.close_session()
if __name__ == '__main__':
detect_img_for_test(YOLO())
效果:

output
OK, that‘s all! Enjoy it!