What is the best way to describe latency?

The latency of an individual transaction or process is best described by a single number, namely the duration of the process in an appropriate unit. When summarising many latency measurements, a good choice is to capture the distribution of the measurements; that is, the minimum, maximum, and the percentiles. The most useful percentiles are typically the median or 50th percentile, and the first few nines - 90th, 99th, and 99.9th.

The distribution of trading latencies generally varies significantly over the day, with the greatest spread between minimum and maximum being seen around market-open and -close. As a result, it is often useful to break up the trading day into buckets of a few minutes each, and capture the evolving latency profile with the distribution of latency in each bucket. This makes it easy to produce a time-series of, for example, the 99th percentile over the day.

It is tempting to try to summarise a distribution by traditional statistics such as the mean, standard deviation, skewness, and curtosis. However these statistics are most meaningful when the underlying data follows a normal (Gaussian) distribution, or close to normal. In contrast, even simple latency distributions are often heavily skewed with a clear minimum and fluctuations and outliers, and realistic data are often multimodal. As a result, the traditional statistics offer very little value in capturing or describing latency, and raw percentiles are generally much more effective.