Mark Price
Bit twiddler
Ever since I read some initial blogs posts about the upcoming eBPF tracing functionality in the 4.x Linux kernel, I have been looking for an excuse to get to grips with this technology. With a planned kernel upgrade in progress at LMAX Exchange, I now have access to an interesting environment and workload in order to play around with BCC. BPF Compiler Collection BCC is a collection of tools that allows the curious to express programs in C or Lua, and then load those programs as optimised kernel modules, hooked in to the runtime via a number of different mechanisms.
9 min read
In my last couple of posts, I’ve been looking at how UDP network packets are received by the Linux kernel. While diving through the source code, it has been shown that there are a number of statistics available for monitoring receive errors, buffer overruns, and queue depths. In the course of investigating network throughput issues in our systems at LMAX¬†Exchange, we have written some tooling for monitoring the available statistics. The result of that work is a small utility that provides an interface for monitoring system-wide or socket-specific statistics from a Java program.
3 min read
Background At work we practice continuous integration in terms of performance testing alongside different stages of functional testing. In order to do this, we have a performance environment that fully replicates the hardware and software used in our production environments. This is necessary in order to be able to find the limits of our system in terms of throughput and latency, and means that we make sure that the environments are identical, right down to the network cables.
12 min read
Continuing on from my last post, here we’ll be looking at flags used to control the C2 or server compiler of the Hotspot JVM. In writing this article, I discovered that the C2 compiler flags did not operate as I expected, and I’ve drawn some possibly incorrect conclusions about how to achieve the required effects. Any enlightenment from those in the know would be welcomed… Configuration In order to reduce the noise created in the compilation logs, we’ll be disabling tiered compilation so that only the server compiler will be used.
7 min read
In this post, we will explore some of the various flags that can affect the operation of the JVM’s JIT compiler. Anything demonstrated in this post should come with a public health warning - these options are explored for reference only, and modifying them without being able to observe and reason about their effects should be avoided. You have been warned. The two compilers The JVM that ships with OpenJDK contains two compiler back-ends: C1, also known as ‘client’ C2, also known as ‘server’ The C1 compiler has a number of different modes, and will alter its response to a compilation request given a number of system factors, including, but not limited to, the current workload of the C1 & C2 compiler thread pool.
9 min read
LMAX Exchange developers are giving two talks at QCon London this year. Sam Adams, our Head of Software, will be discussing the awesome LMAX Continuous Delivery process in his talk “CD at LMAX: Testing into production and back again”. I will be talking about JVM warm-up strategies and how to inspect the machinations of the Hotspot compiler in “Hot code is faster code - addressing JVM warm-up”. If you’re at the conference, please come and say hello.
1 min read
Monitoring of various metrics is a large part of ensuring that our systems are behaving in the way that we expect. For low-latency systems in particular, we need to be able to develop an understanding of where in the system any latency spikes are occurring. Ideally, we want to be able to detect and diagnose a problem before it’s noticed by any of our customers. In order to do this, at LMAX Exchange we have developed extensive tracing capabilities that allow us to inspect request latency at many different points in our infrastructure.
12 min read
A few months ago, I wrote about how we had improved our journalling write latency at LMAX by upgrading our kernel and file-system. As a follow up to some discussion on write techniques, I then explored the difference between a seek/write and positional write strategy. The journey did not end at that point, and we carried on testing to see if we could improve things even further. Our initial upgrade work involved changing the file-system from ext3 to ext4 (reflecting the default choice of the kernel version that we upgraded to).
7 min read
For the next instalment of this series on low-latency tuning at LMAX Exchange, I’m going to talk about reducing jitter introduced by the operating system.Our applications typically execute many threads, running within a JVM, which in turns runs atop the Linux operating system. Linux is a general-purpose multi-tasking OS, which can target phones, tablets, laptops, desktops and server-class machines. Due to this broad reach, it can sometimes be necessary to supply some guidance in order to achieve the lowest latency.
10 min read
I have been working on a small tool to measure the effects of system jitter within a JVM; it is a very simple app that measures inter-thread latencies. The tool’s primary purpose is to demonstrate the use of linux performance tools such as perf_events and ftrace in finding causes of latency.Before using this tool for a demonstration, I wanted to make sure that it was going to actually behave in the way I intended. During testing, I seemed to always end up with a max inter-thread latency of around 100us.
4 min read
For the last few months at LMAX Exchange, we’ve been working on building out our next generation platform. Every few years we refresh our hardware and upgrade the machines that run our systems, and this time we decided to have a look at upgrading the operating system at the same time. When our first generation exchange was built, we were happy with low-millisecond-level mean latencies. After a couple of years of operation, we upgraded to newer hardware, made some significant software changes and ended up with mean end-to-end latencies of around 250 microseconds.
8 min read
My colleague Sam & I will be talking at JAX Finance next week (28th/29th April). I’ll be doing a talk with Vijay from Azul on our experiences at LMAX Exchange with deploying Zing to production. In the talk, we’ll discuss how to go about making such a change in a safe manner, some of the internals of Zing, and lessons learned along the way. Sam’s talk describes how we achieve high-throughput and low-latency at LMAX Exchange, and the architecture that we’ve developed to become the UK’s fastest growing tech firm.
1 min read
In my last post, I focussed on how to go about analysing your application’s inbound traffic in order to create a simulation for performance testing. One part of the analysis was determining how long to leave between each simulated message, and we saw that the majority of messages received by LMAX have less than one millisecond between them. Using this piece of data to create a simulation doesn’t really make sense - if our simulator should send messages with zero milliseconds delay between each, it will just sit in a loop sending messages as fast as possible.
5 min read