字符切分是实现机器识别验证码的一个必要步骤。
使用PIL读入图像,进行二值化处理(Binarize),然后利用sklearn.cluster中的kmeans进行字符切分,最后用matplotlib.pyplot输出结果。
参考:http://dsp.stackexchange.com/questions/23662/k-means-for-2d-point-clustering-in-python
Python代码:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from PIL import Image
##############################################################################
# Binarize image data
im = np.array(Image.open('zftb.gif'))
h, w = im.shape
X = [(h - x, y) for x in range(h) for y in range(w) if im[x][y]]
X = np.array(X)
n_clusters = 4
##############################################################################
# Compute clustering with KMeans
k_means = KMeans(init='k-means++', n_clusters=n_clusters)
k_means.fit(X)
k_means_labels = k_means.labels_
k_means_cluster_centers = k_means.cluster_centers_
k_means_labels_unique = np.unique(k_means_labels)
##############################################################################
# Plot result
colors = ['#4EACC5', '#FF9C34', '#4E9A06', '#FF3300']
plt.figure()
plt.hold(True)
for k, col in zip(range(n_clusters), colors):
my_members = k_means_labels == k
cluster_center = k_means_cluster_centers[k]
plt.plot(X[my_members, 1], X[my_members, 0], 'w',
markerfacecolor=col, marker='.')
plt.plot(cluster_center[1], cluster_center[0], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=6)
plt.title('KMeans')
plt.grid(True)
plt.show()
本文链接:http://bookshadow.com/weblog/2015/11/21/sklearn-kmeans-captcha-character-cut/
请尊重作者的劳动成果,转载请注明出处!书影博客保留对文章的所有权利。