Im trying to predict MNIST
data using my own k-means algorithm.
What I want to do:
1. MNIST data is loaded and pixel values are normalized to the range [0, 1].
2. Parameters for the K-means algorithm are set: the number of clusters (k), initial centroids (Z0), and the maximum number of iterations (NITERMAX).
3. The K-means algorithm is executed on the MNIST data to classify images into k clusters.4. A figure with subplots is created, where each subplot represents an image from the dataset.
5. For each image, the image itself is displayed, and the subplot title is set with the label predicted by K-means for that image.
6. The figure with all images and their predicted labels is shown.
Problems:
1. Error: ValueError: 'tab10' is not a valid color value.
2. The algorithm takes approximately 2 hours to finish running on my Jupyter.
3. Since I have an error, I don't know if the algorithm contains the logic described above
def k_means_fit(X: np.array, Z0: np.array, NITERMAX: int) -> tuple: Z = np.array(Z0) centroids_history = [Z.copy()] inertia_history = [] for _ in range(NITERMAX): distances = np.linalg.norm(X.values - Z[:, np.newaxis, :], axis=2) labels = np.argmin(distances, axis=0) Z_new = np.array([X.values[labels == k].mean(axis=0) for k in range(len(Z))]) inertia = np.sum(np.min(distances, axis=0)) inertia_history.append(inertia) if np.allclose(Z, Z_new): print("Converged at iteration", _) break Z = Z_new centroids_history.append(Z.copy()) inertia_diff = [inertia_history[i] - inertia_history[i-1] for i in range(1, len(inertia_history))] return Z, labels, centroids_history, inertia_history, inertia_difffrom sklearn.datasets import fetch_openmlimport pandas as pdimport matplotlib.pyplot as pltfrom matplotlib.colors import ListedColormap# Load MNIST datamnist = fetch_openml('mnist_784', version=1)X = mnist.data / 255.0 # Normalize pixel values to [0, 1]X_df = pd.DataFrame(X)# K-means parametersk = 20 # Number of clustersZ0 = X_df.sample(k).values # Initial centroids randomly sampled from dataNITERMAX = 100 # Maximum iterations# Run K-meansfinal_centroids, labels, _, _, _ = k_means_fit(X_df, Z0, NITERMAX)# Visualize classification for all imagesfig, axs = plt.subplots(nrows=X.shape[0], ncols=1, figsize=(15, 10)) # One subplot per imagecmap = ListedColormap('tab10') # Colormap for visualizationfor i in range(X.shape[0]): # Show image axs[i].imshow(X.iloc[i].values.reshape(28, 28), cmap=cmap) # Set title with predicted label axs[i].set_title(f'Prediction: {labels[i]}') # Turn off axes axs[i].axis('off')plt.suptitle('K-means Classification for All Images')plt.tight_layout() # Adjust spacing between subplotsplt.show()
Plot example: