CSE 251A Project 1: K-Means Clustering based Prototype Selection for Nearest Neighbor Conferenceβ17, July 2017, Washington, DC, USA
label-based partitioning, we showed that our approach out-
performs the random one in every experiment.
References
[1]
Khaled Alsabti, Sanjay Ranka, and Vineet Singh. 1997. An eξcient
k-means clustering algorithm. (1997).
[2]
Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classiξ-
cation. IEEE transactions on information theory 13, 1 (1967), 21β27.
[3]
Charles R. Harris, K. Jarrod Millman, StΓ©fan J. van der Walt, Ralf
Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian
Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus,
Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Hal-
dane, Jaime FernΓ‘ndez del RΓo, Mark Wiebe, Pearu Peterson, Pierre
GΓ©rard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser,
Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array
programming with NumPy. Nature 585, 7825 (Sept. 2020), 357β362.
hξps://doi.org/10.1038/s41586-020-2649-2
[4]
Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST
handwritten digit database. ATT Labs [Online]. Available:
http://yann.lecun.com/exdb/mnist 2 (2010).
A Appendix
A.1 Experiments RAW Data
RAW results of all the experiments (6 runs for each combi-
nation):
Method M Average Standard Deviation
Random 500 0.150517 0.003311
Random 1000 0.117667 0.002391
Random 2000 0.088217 0.000711
Random 5000 0.065617 0.001389
Random 10000 0.053300 0.001646
K-Means 500 0.082783 0.001878
K-Means 1000 0.073517 0.002154
K-Means 2000 0.061267 0.002176
K-Means 5000 0.049750 0.000954
K-Means 10000 0.044500 0.001752
A.2 K-Means Clustering Selection Implementation
def kmeans_pts (X, Y, M):
P = []
Cls = len(np.unique(Y))
Nc = int(M/Cls)
x_idx = [ [] for _ in range(Cls) ]
for i in range(len(Y)):
x_idx[Y[i]].append(i)
for x in x_idx:
cc = KMeans(n_clusters =
Nc).fit(X[x]).cluster_centers_β©β
px, _ = pairwise_distances_argmin_min(cc,X)
P.append(px)
P = np.concatenate(P).ravel()
return P # prototype result
A.3 Full Source Code for Experiments
import numpy as np
from keras.datasets import mnist
from sklearn.neighbors import
KNeighborsClassifierβ©β
from sklearn.cluster import KMeans
from sklearn.metrics import
pairwise_distances_argmin_minβ©β
import random
import time
def kmeans_pts(X, Y, M):
P = []
Cls = len(np.unique(Y))
Nc = int(M / Cls)
x_idx = [[] for _ in range(Cls)]
for i in range(len(Y)):
x_idx[Y[i]].append(i)
for x in x_idx:
cc = KMeans(n_clusters=Nc).fit(X[x]).cl
β
uster_centers_β©β
px, _ =
pairwise_distances_argmin_min(cc, X)β©β
P.append(px)
P = np.concatenate(P).ravel()
return P # prototype result
def rand_pts(X, _Y, M):
P = random.sample(range(0, X.shape[0]), M)
return P
# Dataset and Pre-Processing
(x_train, y_train), (x_test, y_test) =
mnist.load_data()β©β
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255 # normalize [0, 1]
x_test /= 255
# Run 6 rounds of experiments with different M
for rand and kmeansβ©β
runs = 6
Ms = np.array([500, 1000, 2000, 5000, 10000],
dtype=int)β©β
types = np.array(["rand", "kmeans"],
dtype=object)β©β
for run in range(runs):
print("run", run + 1)
3