Josh Hancock's Personal Site
/
Recent content on Josh Hancock's Personal SiteHugo -- gohugo.ioen-usSun, 03 Feb 2019 00:00:00 +0000Classifying Handwritten Digits Using EM and PCA
/2019/02/classifying-handwritten-digits-using-em-and-pca/
Sun, 03 Feb 2019 00:00:00 +0000/2019/02/classifying-handwritten-digits-using-em-and-pca/In this post, we’ll take the Semeion Handwritten Digits data set and cluster the handwritten digits data using the EM algorithm with a principle components step within each maximization.
First, we’ll read in the data, load the additional libraries, and create our initial data table.
library("mvtnorm")
library("data.table")
# Reading data and convert to data table
setwd("C:/Users/Josh/Documents/GitHub/joshuahancock.github.io/data_sets/")
data <- fread("C:/Users/Josh/Documents/GitHub/joshuahancock.github.io/data_sets/semeion.csv", header = FALSE)
Each row of the data represents one handwritten digit, which were digitally scanned and stretched into a 16x16 pixel box.