驗證碼識別是3年前的一個小愿望了(當(dāng)時是做一個自動回帖器,抽獎iphone),但自己這兩年主要在做分布式架構(gòu),今年終于抽出了空,又戰(zhàn)勝了對數(shù)學(xué)的恐懼,在coursera上學(xué)習(xí)了吳恩達的機器學(xué)習(xí)和深度學(xué)習(xí),驗證碼識別也算是對部分課程的實踐,下面就來整理一下這次識別的過程。
1. 驗證碼識別主流程
- 目標檢測,檢測出字符邊距,主要是獲得weight,并輸出坐標
- 圖片黑白處理
- cnn識別算法
2. 識別細節(jié)
2.1 目標檢測,檢測出驗證碼圖片中的字符邊距
這個部分使用了yolov2算法,下面介紹一下yolov2算法:
A. 理論:yolov2算法整體來說其實是把圖片分成一個一個小格子,然后每個格子會有一個輸出<pi, x, y, dx, dy> 分別代表這個格子是類別i的概率,x,y代表這個格子作為這個類別的中心位置,dx,dy代表以x,y為中心位置的目標長寬。
B. 實踐:yolo的練習(xí)我僅僅在cousera上面進行了原生練習(xí),但是在做驗證碼識別的時候我使用了一個開源版本darknet,最后租用了百度的5塊錢/1小時的gpu進行訓(xùn)練,標記用labelImg進行了邊框標記,下面主要介紹一下這兩個工具的使用過程:
2.1.1 labelImg使用
http://blog.csdn.net/dcrmg/article/details/78496002
首先,采用上述博客的方法對驗證碼進行手工的打標簽,把驗證碼的目標邊框轉(zhuǎn)換為darknet使用的格式, 接下來,如果需要使用gpu,那么按照
2.1.2 darknet使用教程
darknet的使用主要有2個部分要注意,
第一塊是:gpu訓(xùn)練darknet
如果使用gpu來訓(xùn)練darknet(比如我就是去百度租了5塊1個小時的gpu進行訓(xùn)練,效率真的提高了很多),那么需要注意一些安裝和配置:
- 百度的gpu機器需要安裝cudnn
a. 首先下載 https://developer.nvidia.com/rdp/cudnn-download , 請注意一定要下載cudnn-8.0-linux-x64-v5.1.tgz,別的版本可能會有問題
b. 安裝cudnn
$ cd ~
$ sudo tar xvf cudnn-8.0-linux-x64-v5.1.tgz
$ cd cuda/include
$ sudo cp *.h /usr/local/include/
$ cd ../lib64
$ sudo cp lib* /usr/local/lib/
$ cd /usr/local/lib# sudo chmod +r libcudnn.so.5.1.5
$ sudo ln -sf libcudnn.so.5.1.5 libcudnn.so.5
$ sudo ln -sf libcudnn.so.5 libcudnn.so
$ sudo ldconfig
- 編譯darknet:
由于要使用gpu的方式,所以我們需要修改一些配置文件后進行編譯:
a. 修改makefile
GPU=1
CUDNN=1
b. 修改cuda的路徑
ifeq ($(GPU), 1)
COMMON+= -DGPU -I/usr/local/cuda-8.0/include/
CFLAGS+= -DGPU
LDFLAGS+= -L/usr/local/cuda-8.0/lib64 -lcuda -lcudart -lcublas -lcurand
#########################
NVCC=/usr/local/cuda-8.0/bin/nvcc
c. make即可, 這就會產(chǎn)生gpu訓(xùn)練版本的darknet源碼了,現(xiàn)在只需要參考:http://blog.csdn.net/dcrmg/article/details/78496002 進行運行即可。
d. 接下來就訓(xùn)練吧,訓(xùn)練產(chǎn)生的參數(shù)都會保存到一個文件backup下面的yolo-voc.backup, 這個文件可要好好的保存哦,可以說weight在手,一切都飛不走,我在百度的機器上跑一天,基本測試集最后能達到100%的iou,和90%的覆蓋比,基本夠用啦。
./darknet detector train cfg/voc.data cfg/yolo-voc.2.0.cfg cfg/yolo-voc.weight
f. 上面的訓(xùn)練結(jié)束之后(以測試集能達到100%的iou,90%的覆蓋比為準),我們又需要改造一下代碼,讓darknet能夠輸出中心位置的坐標,而不僅僅是在圖片上顯示出來。打開darknet的代碼: src/image.c, 在如下函數(shù)新增一句話:
void draw_detections(image im, int num, float thresh, box *boxes, float **probs, float **masks, char **names, image **alphabet, int classes)
{
int i,j;
for(i = 0; i < num; ++i){
char labelstr[4096] = {0};
int class = -1;
for(j = 0; j < classes; ++j){
if (probs[i][j] > thresh){
if (class < 0) {
strcat(labelstr, names[j]);
class = j;
} else {
strcat(labelstr, ", ");
strcat(labelstr, names[j]);
}
}
}
if(class >= 0){
int width = im.h * .006;
/*
if(0){
width = pow(prob, 1./2.)*10+1;
alphabet = 0;
}
*/
//printf("%d %s: %.0f%%\n", i, names[class], prob*100);
int offset = class*123457 % classes;
float red = get_color(2,offset,classes);
float green = get_color(1,offset,classes);
float blue = get_color(0,offset,classes);
float rgb[3];
//width = prob*20+2;
rgb[0] = red;
rgb[1] = green;
rgb[2] = blue;
box b = boxes[i];
int left = (b.x-b.w/2.)*im.w;
int right = (b.x+b.w/2.)*im.w;
int top = (b.y-b.h/2.)*im.h;
int bot = (b.y+b.h/2.)*im.h;
if(left < 0) left = 0;
if(right > im.w-1) right = im.w-1;
if(top < 0) top = 0;
if(bot > im.h-1) bot = im.h-1;
draw_box_width(im, left, top, right, bot, width, red, green, blue);
//------------------新增這句話--------------------------------
printf("rect:%d, %d, %d, %d\n", left, top, right, bot);
//----------------------------------------------------------
if (alphabet) {
image label = get_label(alphabet, labelstr, (im.h*.03)/10);
draw_label(im, top + width, left, label, rgb);
free_image(label);
}
if (masks){
image mask = float_to_image(14, 14, 1, masks[i]);
image resized_mask = resize_image(mask, b.w*im.w, b.h*im.h);
image tmask = threshold_image(resized_mask, .5);
embed_image(tmask, im, left, top);
free_image(mask);
free_image(resized_mask);
free_image(tmask);
}
}
}
}
g. 重新make,得到darknet
h. 進行預(yù)測:
./darknet detector test cfg/voc.data cfg/yolo-voc.2.0.cfg backup/yolo-voc.backup a.png|grep "rect"
# 注意rect打印出來的結(jié)果不一定是從左到右,所以我們需要進行排序,讓識別的部分能從左到右識別
echo "${result//rect:/}"|sort -n -t "," -k 1 >"${destPath}"/tmp



好的,已經(jīng)打印出我們的字符的坐標了。第一步順利完成。
2.2 圖片黑白處理
當(dāng)圖片的邊框已經(jīng)識別出來了,我們就需要根據(jù)給出的坐標將其切割,并二值化為黑白圖片。
可以使用如下的python代碼進行切割:
# -*-coding:utf-8-*-
from PIL import Image
import sys
# x1(左上角坐標x), y1(左上角坐標y), x2(右下角坐標x), y2(右下角坐標y), picName(文件名), picPath(文件路徑), codeName(單字符名字), destPath(目標路徑)
x1 = sys.argv[1]
y1 = sys.argv[2]
x2 = sys.argv[3]
y2 = sys.argv[4]
picName = sys.argv[5]
picPath = sys.argv[6]
codeName = sys.argv[7]
destPath = sys.argv[8]
im = Image.open(picPath)
region = im.crop((float(x1), float(y1), float(x2), float(y2)))
cropPath= destPath + "/" + codeName + "_" + picName +"_ori.png"
bwPath= destPath+ "/" + codeName + "_" + picName +".png"
region.save(destPath + "/" + codeName + "_" + picName +"_ori.png")
可以用如下代碼進行二值化為黑白圖片:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Binarize (make it black and white) an image with Python."""
from PIL import Image
from scipy.misc import imsave
import numpy
def binarize_image(img_path, target_path, threshold):
"""Binarize an image."""
image_file = Image.open(img_path)
image = image_file.convert('L') # convert image to monochrome
image = numpy.array(image)
image = binarize_array(image, threshold)
imsave(target_path, image)
def binarize_array(numpy_array, threshold=254):
"""Binarize a numpy array."""
for i in range(len(numpy_array)):
for j in range(len(numpy_array[0])):
# print(numpy_array[i][j])
if numpy_array[i][j] > threshold:
numpy_array[i][j] = 255
else:
numpy_array[i][j] = 0
return numpy_array
def get_parser():
"""Get parser object for script xy.py."""
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
parser = ArgumentParser(description=__doc__,
formatter_class=ArgumentDefaultsHelpFormatter)
parser.add_argument("-i", "--input",
dest="input",
help="read this file",
metavar="FILE",
required=True)
parser.add_argument("-o", "--output",
dest="output",
help="write binarized file hre",
metavar="FILE",
required=True)
parser.add_argument("--threshold",
dest="threshold",
default=200,
type=int,
help="Threshold when to show white")
return parser
if __name__ == "__main__":
args = get_parser().parse_args()
binarize_image(args.input, args.output, args.threshold)
調(diào)用方式:
# cropPath:源文件
# bwPath: 目標文件
python ./convertblack.py -i "$cropPath" -o "$bwPath" --threshold 254
至此,我們的切割與黑白化就完成了。
2.3 cnn識別算法
這個部分我主要參考了tensorflow識別mnist的代碼,對其進行了改造,我的識別圖片resize到24*24,一共有62個類別(0-9, a-z, A-Z), 主要有如下2個文件:一個是huobi.py,主要是識別的主體部分:
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Functions for downloading and reading MNIST data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import gzip
import os
import tempfile
import numpy
from six.moves import urllib
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
#from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
import tensorflow as tf
import read_huobi
reSizePic=24
#0-9,a-z,A-Z, 62
classNum=62
originalPicSize=reSizePic*reSizePic
# read to mnist
#mnist = read_data_sets('/Users/baidu/PycharmProjects/neural/mnist/date/', one_hot=True)
mnist = read_huobi.load_huobi("/Users/baidu/PycharmProjects/neural/VerifyCodeDetection/darknet/darknet/results/")
x = tf.placeholder("float", shape=[None, originalPicSize], name='input_x')
y_ = tf.placeholder("float", shape=[None, classNum], name='input_y')
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
# 1 layer: 5*5*1(input 1), 32 filters
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,reSizePic,reSizePic,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
# pool1 output 12
#2 layer: 5*5*32(layer 1 output 32), 64 filters
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
# pool2 output 6,
# conection 1, the 6 is related to input size
W_fc1 = weight_variable([(reSizePic/4) * (reSizePic/4) * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, (reSizePic/4)*(reSizePic/4)*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# drop out
keep_prob = tf.placeholder("float", name='keep_prob')
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# output layer
W_fc2 = weight_variable([1024, classNum])
# 0-9, a-z, a-Z=10+26+26=62
b_fc2 = bias_variable([classNum])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
# loss
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
# backproporation
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
graph_location = tempfile.mkdtemp() # temp file
print('Saving graph to: %s' % graph_location)
train_writer = tf.summary.FileWriter(graph_location)
train_writer.add_graph(tf.get_default_graph())
saver=tf.train.Saver(tf.global_variables())
tf.add_to_collection('pred_network', y_conv)
tf.add_to_collection('accuracy_network', accuracy)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
for i in range(1000):
batch = mnist.train.next_batch(1448)
if i%100 == 0:
save_path ='./tf_model/model_'+'%d'%i
print('%s' % save_path)
saver.save(sess, save_path)
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
另一個是read_huobi.py,主要用于圖片的處理和讀?。?/p>