使用scikit-learn KMeans实现验证码的字符切分

字符切分是实现机器识别验证码的一个必要步骤。

使用PIL读入图像，进行二值化处理（Binarize），然后利用sklearn.cluster中的kmeans进行字符切分，最后用matplotlib.pyplot输出结果。

参考：http://dsp.stackexchange.com/questions/23662/k-means-for-2d-point-clustering-in-python

Python代码：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from PIL import Image

##############################################################################
# Binarize image data

im = np.array(Image.open('zftb.gif'))
h, w = im.shape
X = [(h - x, y) for x in range(h) for y in range(w) if im[x][y]]
X = np.array(X)
n_clusters = 4

##############################################################################
# Compute clustering with KMeans

k_means = KMeans(init='k-means++', n_clusters=n_clusters)
k_means.fit(X)
k_means_labels = k_means.labels_
k_means_cluster_centers = k_means.cluster_centers_
k_means_labels_unique = np.unique(k_means_labels)

##############################################################################
# Plot result

colors = ['#4EACC5', '#FF9C34', '#4E9A06', '#FF3300']
plt.figure()
plt.hold(True)
for k, col in zip(range(n_clusters), colors):
    my_members = k_means_labels == k
    cluster_center = k_means_cluster_centers[k]
    plt.plot(X[my_members, 1], X[my_members, 0], 'w',
            markerfacecolor=col, marker='.')
    plt.plot(cluster_center[1], cluster_center[0], 'o', markerfacecolor=col,
            markeredgecolor='k', markersize=6)
plt.title('KMeans')    
plt.grid(True)
plt.show()

本文链接：http://bookshadow.com/weblog/2015/11/21/sklearn-kmeans-captcha-character-cut/
请尊重作者的劳动成果，转载请注明出处！书影博客保留对文章的所有权利。

周一	周二	周三	周四	周五	周六	周日
2015年10月				2015年12月
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30