A collection of benchmark results and findings regarding code optimization.
The performance of MATSim depends on a lot of different factors:
To get a better understanding, under which circumstances MATSim performs best, we created a simple benchmark (performance test) that runs 20 iterations of a sample scenario with different settings. If you run the benchmark on your machine, we would be happy if you could send us your results.
Download the following zip-file: benchmark.zip [35MB]
Unzip the downloaded file.
java -Xmx500m -jar Benchmark.jar
This will generate a directory output with some files in it from the run. The test will usually run between 25 and 40 minutes. The benchmark requires Java 1.5 or newer and 150MB free disk space.
If you want to re-run the benchmark, rename or delete the output directory and run the test again.
Please send an email to benchmark AT matsim DOT org containing:
output/stopwatch.txtoutput/logfile.logWe collected some results and did a short analysis on them. Have a look at the results. If you want your results included as well or have some interesting findings yourself, please submit us our benchmark results.
The benchmark contains parts of the code running in parallel (the replanning part, using 4 threads) and other parts running single-threaded.
The benchmark was run on some of our production servers with different versions of Java Virtual Machines. The servers have 2 Single-Core "AMD Opteron Processor 248" running at 2.2 GHz (date of purchase: fall 2004, so they have to be considered as "old").
The Java Virtual Machines tested were:
J2RE 1.5.0 IBM J9 2.3 Linux amd64-64 j9vmxa6423ifx-20080811"1.6.0_05; Sun Microsystems Inc.; mixed mode; 64-bit"-XX:+UseCompressedOops". Compressed Object Pointers should "improve performance of the 64-bit JRE when the Java object heap is less than 32 gigabytes in size" (see Java SE 6 Update 14 Release Notes). As an additional advantage, memory consumption should also be a bit lower when using only 32bits for object pointers.-XX:+UseCompressedOops -XX:AggressiveOpts". In addition to using Compressed Object Pointers, also try out the new "experimental implementation of java.util.TreeMap that can improve the performance" as MATSim makes heavy use of TreeMaps (although not necessarily that often iterating over them)-XX:AggressiveOpts". Just for comparison, also start the 32-bit version of the JVM with the aggressive optimization option.
The shown number in above graph are the average of two runs of the benchmark for each configuration (Yeah, two isn't that big a sample, we know... but it should still be valid to demonstrate some findings).
What can be observed is the huge difference of execution time in general between 32-bit and 64-bit versions of the virtual machines. The "compressed object pointers" (COP) feature of Suns JVM 6u14 seems to compensate for this nicely, making the 64-bit version about the same speed than the 32-bit version. The aggressive optimization options (AO) on the other hand doesn't seem to influence the performance of MATSim drastically.
While the difference between Suns Java 5 and Java 6 versions in the total execution time seem more or less random, there seems to be a performance improvement for the multithreaded replanning part by changing from Java 5 to Java 6. Interestingly, despite IBM's worse multithreaded performance, it is able to catch up in the single-threaded parts to come in with a similar total execution time than Sun's 64-bit JVMs.
The benchmark was also run on other server machines:

Comparing the AMD servers running at 2.2 GHz and 3.0 GHz, the most obvious difference is the time for the replanning, explainable by the different number of cores the machines have. Interestingly, the remaining execution time didn't really improve by the change in CPU speed, leading to the guess that the performance of the memory controller or the memory bus is limiting the speed of MATSim (both AMD servers seem to have a front side bus of 1000 MHz).
The Intel servers were massively faster than the AMD machines. We do not yet know if it's the different memory controller, faster memory bus, or if Sun's JDK is just more optimized for Intel processors. Anyway, the difference is striking. And each newer generation of Intel processors seems to deliver a real performance upgrade, even when running at a lower clock-speed.
At last, the benchmark was also run on some of our laptop machines: An Apple MacBook Pro with a Intel Core 2 Duo processor clocked at 2.33 GHz (model from fall 2006), running Mac OS X 10.5.7, and an IBM/Lenovo Laptop with the same Intel Core 2 Duo processor, 2.33 GHz, running a Gentoo 64-bit Linux (Kernel 2.6.28). Both laptops have a front side bus of 667 MHz, so from a technical view point they are very similar.
On the Apple MacBook Pro, the following Java Virtual Machines were used:
On the Lenovo Laptop, an OpenJDK 6u0 64-bit JVM was used.

Surprisingly, Apple pulled the trick to make their 64-bit Java 6 a lot faster than the older 32-bit Java 5—well, it could also mean that their Java 5 offering is just very slow... The Mac-port of OpenJDK 6 ("Soylatte") is even a bit faster, but that may be likely due to the difference between 64-bit and 32-bit.
Are you able to run the MATSim benchmark even faster? Please tell us so! We're very interested in your benchmark results.
MATSim writes out information about memory usage from time to time into the logfile. Plotting this information gives a jagged line running from left to right. Heights and lows in the plot can be explained with the Java Garbage Collector, only freeing up the memory from time to time. Still, one can guess the absolute minimum of memory required by MATSim by looking at the lower parts of the curve (that's then when a Garbage Collection just ran, showing all the memory that could not be collected).
Comparing the memory consumption in Sun's (currently) latest Java VM version holds no real surprises.

It can be clearly seen that the 64-bit JVM uses the largest amount of memory, due to the fact that each object pointer takes up 8 bytes. The 32-bit JVM uses the least memory. The 64-bit JVM with compressed object pointers seems to lie somewhere in between—although I would have expected it to be comparable to the 32-bit JVM, it seems that it still uses a bit more memory for unknown reasons. Anyway, it comes in handy to know that one can load now larger scenarios on a 32GB (or less) machine. The memory savings, compared to a 64-bit JVM without compressed object pointers, should be even bigger the larger the scenario is or the more details the simulated network has, so this feature really looks promising.
The following shows two different benchmarks using jdeqsim, parallel events handling, or both.
The first benchmark uses the ivtch-osm network (~60k links), while the second one uses a navteq network with much more links.
Zrh 10%, ivtch-osm network; computing times per iteration (computer = cluster4 = 2x "Dual-Core AMD Opteron Processor 2222", 3.0 GHz, 1000 MHz FSB). Runs 669, 676, 678, 679

Computing times per Iterations. Scenario = navteq network of Switzerland; computer = cluster4 = servers with 2x "Dual-Core AMD Opteron Processor 2222", 3.0 GHz, 1000 MHz FSB.

The following list gives some hints on how to speed up MATSim simulation runs.
<module name="controler"> <param name="writeEventsInterval" value="10" /> </module>
This will instruct the MATSim Controler to write events only every 10th iteration, which is usually enough for analyses.
<module name="controler"> <param name="routingAlgorithmType" value="AStarLandmarks" /> </module>