How is latency analyzed and eliminated in high-frequency trading?

Latency is eliminated by making changes to the trading system software or infrastructure, and there is a wide variety of such changes that can be implemented. Network changes include upgrading network switches, increasing bandwidths, and eliminating propagation delay by colocating with counterparties. Software changes are intimately tied to the code in use.

In all cases though, it is essential to properly analyze the profile of latency and to accurately assess which components of the system are driving the overall latency of the business processes. There's no point in spending money upgrading the network if the latency is primarily being incurred in software, and there's no point in wasting time trying to optimize code if the latency is primarily being caused by the exchange you're trading on, or the broker you're connecting through.

Latency analysis consists of breaking down the overall process into logical subprocesses and measuring their latency, and iterating this process as necessary until the total latency is fully accounted for and specific actions can be identified to control or eliminate as much latency as possible.

Returning to our earlier example of the process of a trader reacting to an update to the price of a traded instrument by placing an order to trade at the new price. This process is sometimes referred to as tick-to-trade -- or more precisely as tick-to-order, since the placement of the order does not guarantee that a trade will actually ensue. The tick-to-order process might be broken down into three stages:

  • the price update, once formed in the matching engine must be published and received by the trader;
  • the trader must make the decision to react to the new price;
  • the instruction to trade must be transmitted back to the exchange.

Any of these stages can in turn be further broken down; for example, the first stage of delivery of the price update

  • starts with the update to the central limit order book in the matching engine,
  • ends with the consumption of the price update in the trader's strategy,
  • and along the way,
    • the price tick is published by the exchange market-data feed assembler,
    • the multicast packet containing that tick is routed across the network,
    • and the tick is received and decoded by a feed-handler,

to name just a few of the lower-level events that happen as part of market-data publication and delivery.

In this scenario, latency analysis might be triggered by the observation that fill-rates achieved by the trading strategy have fallen. Measurements of the tick-to-order latency might reveal a corresponding increase in latency, prompting a closer look at the constituent latencies. Order-response times and trade-decision latencies might not have changed, but market-data delivery latency has. In that case, the action to be taken depends on whether:

  • there has been an increase in latency across the feed-handler, or
  • the market-data delivery latency has increased, as evidenced by an increase in the difference between the publication timestamps embedded in the ticks and the network receipt timestamps.

In the latter case, the increase in latency needs to be raised with the market-data provider; in the former case, the feed-handler warrants further inspection -- maybe the market-data rates have increased beyond its design capacity.