Recently reviewing my Naïve Bayes java routine that I wrote last summer I realized that I had mix/matched/confused a number of data and method definitions involving attributes, features, labels, classes, training and prediction. Basing my routine on the description given in Wikipedia, which describes features associated to classes, while at the same time trying to translate the python sklearn into Java, which uses features and labels, led to the mess. Since I've been also been using Weka, it also has it's own terminology: attributes and classes (more clearly defined in the ARFF description).
Category | sklearn | Wikipedia | Weka |
---|---|---|---|
Data title | features | features | attributes |
Class association | label | class | class |
Train method | fit | training | train |
Test method | predict | testing | test |
It is surprising that there can be so many ways of describing the same thing. The best answer I've found so far comes from a response by Zeeshan Zai, on Quora:
Zeeshan Zia, Research Scientist at NEC Laboratories America
Written Apr 9, 2015... What is called a "feature" in machine learning or pattern recognition is traditionally called an "attribute" in data mining!
As Udacity uses python for machine learning, I'll stick with feature, label, fit, predict for my java routines.