In this post, we’ll take the Semeion Handwritten Digits data set and cluster the handwritten digits data using the EM algorithm with a principle components step within each maximization.
First, we’ll read in the data, load the additional libraries, and create our initial data table.
library("mvtnorm")
library("data.table")
# Reading data and convert to data table
setwd("C:/Users/Josh/Documents/GitHub/joshuahancock.github.io/data_sets/")
data <- fread("C:/Users/Josh/Documents/GitHub/joshuahancock.github.io/data_sets/semeion.csv", header = FALSE)
Each row of the data represents one handwritten digit, which were digitally scanned and stretched into a 16x16 pixel box.