Weka Random Forest with (chained) Meta/Ensembles

This came up when I was participating in the Kaggle Forest competition. Mostly I have used Weka for these. RandomForest is generally the classifier I have had the best results with. Usually, in combination with the Bagging meta or ensemble classifier. At some point I had the idea to try 'chaining' the AdaBoostM1 and Bagging meta classifiers together.

It appeared to result in some improvement in accuracy. I submitted the results to the competition and did get a substantial jump in my leader board position. I had written some java to do the ensemble chaining from the command line because it was the most convenient way to give it more memory. You could probably accomplish the same thing with the Weka command line alone but my knowledge of how to use that is still lacking at this time. Learning to code against Weka a little bit I also consider a useful exercise in and of itself.

The Kaggle dataset is too large to run the Experimenter on. I have more memory on my machine now than I did then and tried it, without thinking about it, and it erred out on memory. This was a while ago and I have a number of variations with different attribute sets for the data, but I think my results were something like...

forest RF - 87.7778 Ada - 87.7712 Bagging - 87.0238 Ada+Bagging 87.8505 Bagging+Ada 86.4418

This appears to follow patterns discussed in the conclusions section.

A similar idea to multiple ensemble classifiers seemed to be involved in this Weka mail-list thread. I decided to use Experimenter with more datasets to see if my original idea of chaining ensemble classifiers really seemed to have any validity beyond the one Kaggle competition.

For the experiment I used RandomForest by itself as the initial classifier. Then I also included classifiers with AdaBoostM1 with RandomForest, Bagging with RandomForest, and then the 'chained' ensembles run with AdaBoostM1 with Bagging with RandomForest and finally Bagging with AdaBoostM1 with RandomForest. All of these were run with default Weka parameters.

I came up with a 'rank' system for the results. Giving the five different Classifiers points based on their standings for accuracy and least variance. There are zero points for being the least accurate or worst and four points for being the best. Ties get a 1/2 point.

Dataset	RandomForest	AdaBoostM1	Bagging	AdaBoostM1/Bagging	Bagging/AdaBoostM1
Iris	1	2	3.5	0	3.5
Contact-Lenses	1	2	3.5	0	3.5
Breast-Cancer	2	1	4	0	3
German-credit	4	2	3	0	1
pima_diabetes	1	2	4	0	3
Glass	3	2	0	4	1
ionosphere	1	0	3.5	2	3.5
Labor-neg-data	0	1	2.5	4	2.5
segment	2	4	1	3	0
soybean	4	1	3	0	2
unbalanced	1	2	3.5	0	3.5
vote	3	1	4	0	2
Average	1.9	1.67	2.96	1.08	2.37

The actual Experimenter results if of interest.

Variance

I had thought I'd do something for variance as well but don't think I'm going to take the time right now.

Conclusions

You can usually improve the results of RandomForest by using an ensemble. Only two times in the Experimenter ## runs did RandomForest alone get the most accurate results.
The most likely improvement in accuracy comes from using Bagging as the meta classifier. The score for this is in the table has the highest average in 'standing' at 2.8. Chaining it to AdaBoostM1 pretty much never results in any improvement. Chaining gets good results as well, having the second highest average at 2.3. This sort of seems like what Game Theory calls dominance. For every experimenter result the Bagging result alone is higher or equal to the chained Bagging/AdaBoostM1 result. The only exception was for the Glass dataset. Also the only one where Bagging was dead last in accuracy. Bagging/AdaBoostM1 was next to last. I would say it is dominated by Bagging alone and not worth trying.
AdaBoostM1 generally does not do that well in improving RandomForest. However, there are more notable exceptions here, particularly in chaining it with Bagging. AdaBoostM1/Bagging started out the experimenter runs with five straight worst accuracy results. Then it was best twice and second best once in the next four runs. It seems to do very badly running chained except when it does very well. Sort of interesting is that when it was most accurate as chained, AdaBoostM1 itself as the only ensemble didn't do well. Yes, I am going to say it, it seems the whole is greater than the sum of the parts. So, for some minority, of datasets chaining AdaBoostM1 and Bagging does give the most accurate results.

These are not really large improvements. None of them are indicated as significant by Experimenter. I do believe they are real improvements though and not spurious, simply better looking results due to over-fitting the training data. This is based on the improvement having held up for my Kaggle Forest Cover competition entry. Run against the test set there it definitely improved.

If the numbers to the right of the decimal point matter to you, where you want the most accuracy possible then you might find this of interest. Remember that this does involve more overhead, it uses more memory and runs longer.