jLuger.de - The LMAX Architecture

Recently I've stumbled over an a question on how to significantly improve java performance on stackoverflow.com. The question was inspired by an article of Martin Fowler about the The LMAX Architecture where he describes the architecture of a trading software made by the company LMAX. The have made a Business Logic Processor that can handle 6 million orders per second. Whats so great about it? Well, it is single threaded. Yes, there is only one thread that executes the whole business logic.

The main trick is to keep all the data needed by the business logic thread in memory. This brought them a performance of 10K TPS. Then they've tweaked memory handling (allocation/garbage collection) and unwanted executions (mainly synchronization issues).

It's amazing what performance you get, when you respect the memory hierachy of computers and avoid unnecessary operations. The drawback of this solution is that it doesn't focus on the processing time that a user notices. When their business logic needs some external information they stop processing of the order and let another thread get the data that provides them as input of the next calculation. Imagine that 90% of the time a user is waiting is spent with bytes being sent over the wire. Their solution won't help you much as it could only impact on the remaining 10%. You have to cut back the time/amount of network transfer.