使用scikit-learn KMeans实现验证码的字符切分

字符切分是实现机器识别验证码的一个必要步骤。

验证码样本如下图所示:

验证码原始图

使用PIL读入图像,进行二值化处理(Binarize),然后利用sklearn.cluster中的kmeans进行字符切分,最后用matplotlib.pyplot输出结果。

拆分效果如下图所示:

参考:http://dsp.stackexchange.com/questions/23662/k-means-for-2d-point-clustering-in-python

Python代码:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from PIL import Image

##############################################################################
# Binarize image data

im = np.array(Image.open('zftb.gif'))
h, w = im.shape
X = [(h - x, y) for x in range(h) for y in range(w) if im[x][y]]
X = np.array(X)
n_clusters = 4

##############################################################################
# Compute clustering with KMeans

k_means = KMeans(init='k-means++', n_clusters=n_clusters)
k_means.fit(X)
k_means_labels = k_means.labels_
k_means_cluster_centers = k_means.cluster_centers_
k_means_labels_unique = np.unique(k_means_labels)

##############################################################################
# Plot result

colors = ['#4EACC5', '#FF9C34', '#4E9A06', '#FF3300']
plt.figure()
plt.hold(True)
for k, col in zip(range(n_clusters), colors):
    my_members = k_means_labels == k
    cluster_center = k_means_cluster_centers[k]
    plt.plot(X[my_members, 1], X[my_members, 0], 'w',
            markerfacecolor=col, marker='.')
    plt.plot(cluster_center[1], cluster_center[0], 'o', markerfacecolor=col,
            markeredgecolor='k', markersize=6)
plt.title('KMeans')    
plt.grid(True)
plt.show()

 

本文链接:http://bookshadow.com/weblog/2015/11/21/sklearn-kmeans-captcha-character-cut/
请尊重作者的劳动成果,转载请注明出处!书影博客保留对文章的所有权利。

如果您喜欢这篇博文,欢迎您捐赠书影博客: ,查看支付宝二维码