Gaussian Naive Bayes

Submitted by Xilodyne on Sun, 09/18/2016 - 11:02
Prediction with naive bayes

Confronted with implementing a Gaussian Naïve Bayes I first needed to understand (and implement) the classification and prediction of a Naïve Bayes.  I found that most of the machine learning frameworks, while implementing some form of the algorithm, never explained why they made some decisions in the coding, nor obvious ways of testing that classification / prediction is consistent with the formula.  I ended up writing code to implement the Male/Female Drew examples as explained by professor Eamonn Keogh at UC Riverside, Bayesian Classification withInsect_examples.pdf .  By breaking down the P(c|d) into P(d|c) & P(c) I was able to produce the same results as the example.

The next challenge was understanding how a Gaussian distribution would change the Naïve Bayes.  I used the formulas listed in the Wikipedia's Naïve Bayes explanation and used their gender sample data to verify my results.  Where as my NB was text based, my GNB is float based.  To test the validity of my algorithms I loaded in the Pima Indian Diabetes Data Set (from the UCI Machine Learning Repository).  My accuracy  of 67% matched others.

The next step was to refactor my GaussianNB.java to handle the N-Dimensional array structure used by the python scikit  implementation (sklearn.naive_bayes.GaussianNB) so that I could use it in my Udacity course (yes, six months later I'm still working on that project).   I'm using the nicely done vectorz NDArray.  One change was implemeting the  "fit" method in which the classes (i.e. like the gender class in the Wikipedia example) is automatically determined by parsing the "labels" float array.

The final coding result is here.