Weka on Graal

Performance

In running jconsole against my currently downloaded OS X Weka application and an application using GraalVM to run Weka that I made using the jpackage command, I noticed what seemed to be slight increases in speed and possibly significant improvements in memory usage. Since improved memory management would mean Weka could be run against larger datasets, or for longer periods of time without running out of memory, I decided to do additional profiling to verify a possible improvement.

My initial benchmarking began by writing a simple command line tool to run a named Weka classifier with default settings against a given dataset for a given number of iterations. The dataset I used was from an old Kaggle competition I participated in. That dataset concerned classifying forest cover. The actual dataset, if of interest, being forest-train-sparse.arff. As you can see converted to a Weka sparse arff format. The classifier I used was RandomForest. Command line invocations were, for the jdk...

java -cp .:/Applications/weka-3-9-3/weka.jar us.hall.weka.BenchMarkClassifier weka.classifiers.trees.RandomForest /Users/mjh/forest-train-sparse.arff 5

run against the currently latest installed jdk, which right now is a early access jdk 14. This should eliminate, at least for this testing, the possibility that any differences are because an older jdk version is being used against the jdk 11 GraalVM one. I started out trying to extract information on the individual pools. One thought was determining the maximum usage. It seemed if Graal reduced this considerably it would in turn reduce the chances for memory errors. The pool peak values appeared to maybe provide this data but I didn't really follow up on the idea. Easier is to ignore the pools and simply look at the "Heap space and non-Heap space" totals. The conclusions I reach can be derived from that.

GraalVM is the current JDK 11 version...
/usr/libexec/java_home -v 11 --exec java -cp .:/Applications/weka-3-9-3/weka.jar us.hall.weka.BenchMarkClassifier weka.classifiers.trees.RandomForest /Users/mjh/forest-train-sparse.arff 5

The source BenchMarkClassifier.java

The JDK 14 results...
prof_jdk.txt

for GraalVM 11...
prof_graal.txt

This ended up actually being three different loops for the provided iteration count. Trying to get numbers that better matched what I expected from the initial profiling I had done.

Before getting into detailed discussion of memory please notice the elapsed times. GraalVM is consistently faster. Not dramatically, like twice as fast or anything. But every run it is always quicker.

In looking at memory the first iteration set should be providing the best numbers for gauging actual memory usage. It includes code from The 6 Memory Metrics You Should Track in Your Java Benchmarks. This code tries to ensure that a garbage collection has been done in a settled situation to get the best static information on the amount of memory currently being used. The results here were a little confusing and disappointing. JDK 14 memory usage was consistently better than the Graal 11. This didn't seem to agree with what I had been seeing with jconsole and VisualVM from the actual running applications. There Graal had seemed to be doing better.

When doing a number of classifications from the Weka Explorer application one thing you can do if you run out of memory is to delete result sets from prior runs. I'm not sure what all is included along with these but I tried for the second set of iterations to sort of simulate this by saving the returned classification evaluations. The JDK version was still doing better.

It occurred to me that usage wasn't actually the measurement of interest. During normal processing a garbage collection being done is not ensured. Not to mention that a settled memory allocation situation isn't guaranteed. The running application is dynamic not static. So for the final set of iterations I saved allocations and also did no forced garbage collection.

Here is where Graal started showing better results. Just looking at the final allocations from the last iteration, JDK 14 was showing 637,141,664 bytes, while Graal showed only 319,768,240. So it appears, from my limited testing, that the JDK is better at making garbage, or freeing objects for collection, while Graal is better at garbage collecting, more actively freeing the unused so the total used plus unused is less on a ongoing basis. My initial impression that Graal handles memory better seems to be wrong in one sense but correct in another.

That is command line testing, maybe for some reason it isn't the same for the applications? I did one test where I did 20 RandomForest classifications for each app, same dataset. Using VisualVM after 10 runs the downloaded Weka app showed 1,558,661,856 B, then after hitting "Perform GC" it showed 884,841,904 B. For Graal it was 1,193,506,968 B before and 877,121,184 B after. I did 10 more runs on each, the JDK then showed before GC - 2,551,598,728 B, after GC - 1,705,074,696 B. For Graal it was 1,981,686,032 B before, 1,707,950,984 B after. So more or less the same pattern seems to hold, with Graal showing less allocation before a GC and the Weka JDK version showing better or about the same after a GC. Not quite so much better after GC it seemed, possibly this being the Weka currently used Amazon Corretto 1.8 JDK and not JDK 14?

I still hoped to have a test where a non-Graal JDK had an actual memory fault where Graal didn't. My first try at this was BoomBenchMarkClassifier.java. This would run the Weka Bagging meta classifier on the RandomForest classifier to chew up more memory. It would do so in an infinite loop and I could see at what point a given version hit a memory fault. Run until it blew up and went boom. Watching a run from VisualVm it appeared to churn a lot of memory on it's first run, then it appeared to do a garbage collection and was proceeding in a very stable manner. I decided it would take forever, cancelled the run, and gave up on this approach.

Also, yes I'm aware that RandomForest itself does bagging, so using it with the Bagging meta classifier may seem redundant. However, in a number of Kaggle competitions doing this actually seemed to provide my best Weka results.

So, instead, still using RandomForest I started increasing the iterations parameter from the application itself. This is better in that it is done in the application, where we would actually like it tested with the JDK we would actually want to compare to. With the downloaded application this had a memory fault at 1600 iterations. From weka.log...

INFO: Command: weka.classifiers.trees.RandomForest -P 100 -I 1600 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1 Exception in thread "Thread-6" java.lang.OutOfMemoryError: Java heap space

Note the -I 1600. Graal can successfully run at 1800 iterations.

INFO: Command: weka.classifiers.trees.RandomForest -P 100 -I 1800 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1 2020-04-24 17:38:14 weka.gui.explorer.ClassifierPanel$18 run INFO: Finished weka.classifiers.trees.RandomForest

This still seemed somewhat artificial. It does show a contrived situation where a Graal based version of the application handles memory better than the downloaded. By increasing a parameter to possibly unrealistic levels. But the main situation of interest to me has not been shown. That Graal could possibly handle larger datasets. My first attempt to demonstrate this didn't work. I duplicated records from the forest cover dataset until I had a version about 39M bytes in size. This ran slowly but fine with the downloaded weka.

Having reached this point without anything to show this I started searching the internet for large Weka arff files. I found this Auto-WEKA : Sample Datasets with the 199M CIFAR-10 dataset. Downloaded the train.arff dataset is actually 549.3M. Running default RandomForest against this with the downloaded Weka application memory faulted, showing this in weka.log...

2020-04-25 17:44:28 weka.gui.explorer.ClassifierPanel$18 run
INFO: Started weka.classifiers.trees.RandomForest
2020-04-25 17:44:28 weka.gui.explorer.ClassifierPanel$18 run
INFO: Command: weka.classifiers.trees.RandomForest -P 100 -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1

displayed message:
Not enough memory (less than 50MB left on heap). Please load a smaller dataset or use a larger heap size.
- initial heap size: 256MB
- current memory (heap) used: 3959.6MB
- max. memory (heap) available: 4008.5MB

Note:
The Java heap size can be specified with the -Xmx option.
E.g., to use 128MB as heap size, the command line looks like this:
java -Xmx128m -classpath ...
This does NOT work in the SimpleCLI, the above java command refers
to the one with which Weka is started. See the Weka FAQ on the web
for further info.

exiting...

Running it with a GraalVM application succeeded with this in weka.log.

2020-04-25 16:49:36 weka.gui.explorer.ClassifierPanel$18 run
INFO: Started weka.classifiers.trees.RandomForest
2020-04-25 16:49:36 weka.gui.explorer.ClassifierPanel$18 run
INFO: Command: weka.classifiers.trees.RandomForest -P 100 -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1
2020-04-25 17:39:20 weka.gui.explorer.ClassifierPanel$18 run
INFO: Finished weka.classifiers.trees.RandomForest
Warning : data contains more attributes than can be displayed as attribute bars.
2020-04-25 17:39:21 weka.gui.visualize.VisualizePanel$PlotPanel addPlot
INFO: Warning : data contains more attributes than can be displayed as attribute bars.

So, yes, the GraalVM can in some cases handle larger datasets that the currently used jdk can't. Default RandomForest didn't do really well with less than 50% accuracy but it ran.

From my limited experience, to go with my limited testing, the trend these days seems to be larger and larger datasets. An improved way to work with those I would think might be of interest.

What GraalVM has for R is still of interest to me and I may also be looking at whether what that does can be incorporated into what Weka and the RPlugin does.

If you have reached this point you might actually be interested in trying the GraalVM based application. Be aware that it is a sizey download at about 667M.
OS X Download: WekaGraal-1.0.dmg

I have not done this for Windows or Unix, if there is interest in that let me know here

Or use the jpackage command to come up with your own. I used this modified from a build for another application of mine. weka.sh. I'm not sure it was modified entirely correctly. I think I had to make some manual tweaks to the app/*.cfg file. The main changes were to eliminate existing jdk memory settings having only what Weka includes. -Xss20M. To make sure we are comparing apples to apples. I did just notice though that I do include gc startup settings I got from somewhere, that may also differ from default Weka. And may have given Graal an edge. -XX:+UseG1GC -XX:MaxGCPauseMillis=50. Maybe additional testing for next week. The weka-in input file contains only weka.jar.

Weka is copyrighted. Experimenting with these I assume doesn't constitute infringement. However, I am not a lawyer. Please do not involve me in an international incident with New Zealand.

Links

jpackage Packaging Tool although now this just indicates that it has merged into jdk14 early access. Still incubator(?) status though.
JDK 14.0.1 General-Availability Release and actually I notice that now shows general availability not early access.
High-performance polyglot VM GraalVM
GraalVM Native Image and just noticed this. More improved memory performance?
WEKA The workbench for machine learning
Weka (machine learning)

Might still be busy testing next week.