In this post, we’ll take the Semeion Handwritten Digits data set and cluster the handwritten digits data using the EM algorithm with a principle components step within each maximization. First, we’ll read in the data, load the additional libraries, and create our initial data table. library("mvtnorm") library("data.table") # Reading data and convert to data table setwd("C:/Users/Josh/Documents/GitHub/joshuahancock.github.io/data_sets/") data <- fread("C:/Users/Josh/Documents/GitHub/joshuahancock.github.io/data_sets/semeion.csv", header = FALSE) Each row of the data represents one handwritten digit, which were digitally scanned and stretched into a 16x16 pixel box.

Continue reading

Author's picture

Josh Hancock

info@joshuahancock.org

Senior Data Scientist @ Nike

Portland, OR