How to choose a machine learning algorithm

Submitted by Xilodyne on Tue, 04/04/2017 - 14:09

In my udacity machine learning class I was confronted with this list of Supervised Learning Models (lots of information here) from scikit-learn:

  • Gaussian Naive Bayes (GaussianNB)
  • Decision Trees
  • Ensemble Methods (Bagging, AdaBoost, Random Forest, Gradient Boosting)
  • K-Nearest Neighbors (KNeighbors)
  • Stochastic Gradient Descent Classifier (SGDC)
  • Support Vector Machines (SVM)
  • Logistic Regression

 

Features vs attributes, classes vs labels

Submitted by Xilodyne on Sun, 01/15/2017 - 10:35

Recently reviewing my Naïve Bayes java routine that I wrote last summer I realized that I had mix/matched/confused a number of data and method definitions involving attributes, features, labels, classes, training and prediction.  Basing my routine on the description given in Wikipedia, which describes features associated to classes, while at the same time trying to translate the python sklearn into Java, which uses features and labels, led to the mess.  Si

PKL to ARFF

Submitted by Xilodyne on Sat, 12/10/2016 - 12:16

Java source code for converting PKL files to ARFF are at the bottom of this blog post.  The process is:  convert PKL to text file format to match the Weka TextDirectoryLoader structure using the Jython pickle API, run the Weka TextDirectoryLoader routine, then write out to ARFF.