"Financial Service Technology America, today's latest financial news now..."
New Account

The Magazine

Issue 9

This is a short description of the magazine.

E-magazine
  • Previous Issues

Blog

Where our team of guest writers discuss what they think about the current FST US Issues.

Paul Styles
Product Manager, ACI Worldwide

Europe’s SEPA initiative: The challenges ahead

Paul Styles, Product Marketing Manager for Wholesale Payments at ACI Worldwide discusses the challenges that lie ahead.
29 Jul 2010

Performance Management for Latency-Intolerant Financial Trading Networks

No Comments

Real-time performance monitoring and analysis to ensure application availability to end-users 

Overview

The call comes in to the help desk of your financial trading organization: “Pulling up customer files seems slow today.”

Current management tools in your network indicate your application servers, switches and routers, and network circuits are all up and running – “All systems are green!” they say. Individual groups within IT advise: “It’s not the database servers!” “The WAN links are up.” “No issues with the application servers.”

But there is still a problem – is this just one symptom? Could other applications be slow too? What about the multicast traffic distributing market data – is that degrading? Worse yet, could FIX-based electronic trading applications be affected too?

Introduction

The highly competitive financial trading industry is defined by volatile, unpredictable activity. Natural disasters in one corner of the world, political upheaval in a developing country, an announcement from the U.S. Federal Reserve Chairman can each dramatically impact the global financial markets. And yet, the need to maintain optimal, predictable, consistent high-quality network performance is one essential necessity. In fact, according to leading information technology journal Information Week, “a millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm.”

Essential to the day-to-day operation of trading houses, hedge funds, investment banks, foreign exchanges, and other global financial institutions and market exchanges with trading desks is the ability to identify and monitor microbursts, latency and packet loss for critical market data and trading applications and their associated protocols. Such applications as FIX, Multicast, TIBCO, OPRA and others, serve as the underlying business services to execute electronic trades, deliver market-related pricing and research data, and provide customer account information – all necessary for day-to-day operations.

Additionally, the reliance on algorithmic trading is driving firms to re-architect their trading systems and networks for low latency and optimal performance. Firms are automating trade processes and adding throughput and processing power to their market data and trading systems. Data centers are being relocated closer to trading centers to reduce the distance trading applications must travel with the objective that performance will improve, even if only by milliseconds. There may be cases when market data delayed by tens of milliseconds may no longer be usable for trading purposes. This may mean that redesigning the network to minimize latency becomes a new corporate initiative.

As importantly, customers, brokers and traders depend on these networks 24 x 7 to check account balances, current stock prices, validate that orders have been executed, view investment and trading history, research investment alternatives and talk via IP phones. This increased dependency on the network has brought many IT departments additional responsibilities and formidable challenges in managing, maintaining, and optimizing the performance of their valuable, global networks .

A New Imperative

What network engineers in financial trading organizations need is a best practices approach to detect, diagnose and resolve network and application problems in order to reduce overall network troubleshooting time. Metrics that reveal details of the health of market trading services can be best quantified by leveraging deep packet inspection technology. Packet flow-based analysis delivers complete visibility into real-time operational intelligence spanning high-level, services-aware key performance indicators (KPIs), flow-based analysis of all the applications throughout the network, with actual packets for deep forensics troubleshooting. The combination of this approach enables IT organizations to reduce the mean time to repair or restore network services (MTTR).

Services-Aware Early Detection

Those responsible for the health and well-being of trading networks need to begin looking to KPI metrics to provide the necessary network-wide analysis and visibility essential for early detection of emerging network or application problems. KPI metrics in the investment trading network environment analyze the health of the electronic trade lifecycle by answering the most challenging questions, including:

  • “How fast is this trading application running?”
  • “Is that response time acceptable?”
  • “Are there any errors for that application or network area?”
  • “Where are there service performance degradations?”

Some important KPIs to incorporate in an overall solution focus on tracking user affecting issues, such as errors, packet loss, and response times for key applications. These are crucial metrics to investment organizations because they provide early detection of performance issues, potentially enabling them to avert a problem that degrades performance of trading applications, or worse, brings the system down.

An important component in deriving KPIs is using statistical behavior modeling to detect abnormal changes in network and application behavior in order to deliver early warning of performance issues. Performance analytics systems automatically learn the network’s behavior patterns and identify performance anomalies without the manual configuration and guesswork of setting thresholds. By using various time frames, the analytics system is able to detect different classes of problems, including short-term spikes, sustained shifts, and subtle, long-term performance drifts that are virtually impossible to catch manually.

Some specific KPIs essential to performance management of financial trading network environments include:

  • Alerting and KPIs derived from threshold alarms on traffic utilization, application volume, or deviations from acceptable response times for FIX or HTTP traffic.
  • Real-time microburst alarming
  • Identification of inter-packet delay or latency on a particular application stream and alarming if the gap exceeds a pre-defined limit
  • Robust error detection, such as monitoring out-of-sequence packets / retransmissions in real time, and the ability to generate an alarm when the gap exceeds a pre-defined value

Problem Diagnosis with Flow-Based Analysis

Once a potential problem is identified using established KPIs, effective use of the application and packet flow-based analysis collected by monitoring major data centers, trading floors, and Internet access segments will help answer questions such as:

  • “Which network and application resources are being utilized?”
  • “How much bandwidth is each application consuming?”
  • “Are the applications operating in the proper QoS class?”
  • “Who is using the application?”
  • “How are multiple applications traversing the network affecting one another?”
  • “Where is the worst performing CRM server or location?”

In short, flow-based analysis addresses how resources are being used. The needs of market trading companies demand real-time and historical packet flow information to keep the network running at peak performance with the lowest possible latency. Part of the challenge of fully realizing this level of flow analysis is being able to identify all applications in the network including well-known applications such as e-mail, HTTP and VoIP; web-based applications (by URL); complex applications like Citrix and SAP; and specialized financial trading applications.

Any solution selected should incorporate measuring, analyzing, and alarming in key areas of the investment trading enterprise network to troubleshoot and evaluate performance metrics with the following functionality:

  • Real-time analysis and historical reporting, including utilization, response time, hosts and conversations, of market trading applications, such as FIX, OPRA, MDP, PGM, etc.
  • Evaluate FIX protocol application utilization and when possible, break down and analyze activity by specific transactions type, e.g. administrative messages and FIX Order Single and execution messages.
  • Examine traffic activity and application response time metrics and notify of degradations and identified failures in FIX trades, e. g. when a specific FIX Order Single does not have a corresponding execution message.
  • Ability to identify IP Multicast groups as unique business applications as well as view the interaction between publishers and groups
  • Ability to view TIBCO statistics, including error traffic and retransmissions
  • Alerting and KPIs derived from threshold alarms on traffic utilization, application volume, or deviations from acceptable response times for FIX or HTTP traffic.

Focused Troubleshooting with Deep Packet Analysis

For many situations, traditional monitoring approaches provide excellent analysis of networks, applications, conversations, response times and trending. However, at times, packet-level details are necessary to troubleshoot and identify the most challenging problems with complex, latency–sensitive trading services transported across global networks.

Continuous recording of the actual packets in the network provides an insurance policy of sorts to leverage in cases when post-event forensics and retrospective analysis is necessary. The addition of 24x7 continuous packet capture helps answer questions like:

  • “What exactly was the conversation exchange for a specific application?”
  • “How can we reconstruct a session from the recent past to see what happened?”
  • “Is there a poorly designed application causing the degradation?”

Your solution needs to retain continuously recorded packetflows from target interfaces in the core, distribution, and access layers of the network. This provides a complete set of trace files with both header and payload information for in-depth, post-event forensic data mining with microsecond granularity.

In diagnosing the most challenging packet-level problems, IT staff in financial services networks should look to engage sophisticated, integrated filters and automated analysis of monitored applications, such as OPRA or multicast TIBCO, and to launch bounce charts with built-in, contextual drill-downs to the trace file packet for quick viewing of the session details.

Analysis at the packet level should include essential functionality for troubleshooting FIX-based application include ability to:

  • Track Order ACK times
  • Session summaries per trading station
  • Drill down by ECN (Electronic Communications Networks), Market, Currency, or Security Type
  • Search by Order ID, Symbol, or Trading Station
  • Latency distribution analysis by such elements as Trading Station or Time Of Day

Multicast Metrics derived from forensics analysis of packet flow data needs to identify:

  • Message loss identification
  • Trend message arrival rates per S, G
  • IP delay variation – identify & analyze between packets
  • Retransmissions – identify retransmission requests
  • Out-of-sequence packet analysis
  • Microburst analysis – sub-millisecond bit rate capture & analysis

Performance Management in Use

The best testimony of the value of real-time performance management is demonstrated by stories from IT organizations who have already implemented such solutions. In one case, an East Coast financial institution was experiencing intermittent delays with one of its market data applications and users were starting to complain. Using a packet-flow-based network and application performance management solution, they investigated the application response time and discovered that the delay was attributable to one specific, overwhelmed server, not the network. Once network operations identified the problem, the support team was able to redirect the clients to alternate servers for production data.

In yet another situation, the traffic to all the remote offices of a global investment services firm were monitored by a performance management solution that identified and notified the IT organization of traffic volume increases. The IT staff was able to get ahead of a potential congestion problem by researching the reason for the increased traffic. They used the packets themselves to identify the specific middleware applications and discover where multiple sessions were pulling identical data across the wire to multiple endpoints, adding an unnecessary and potentially harmful volume of traffic to some of the circuits.

Having the detailed evidence from their own network in screens and reports, the Network Team was able to help application owners and middleware engineers architect an approach to combine some of the data streams across the WAN for local distribution, thereby sending unique traffic in only one session. This reconfiguration opened up bandwidth availability and in some cases avoided having to add bandwidth. Most importantly, it averted a potential degradation in trading services due to bandwidth congestion.

Decisive Returns

In what other industry but investment services could the saying “time is money” be more apropos? Real-time performance and transaction monitoring, analysis, and trouble-shooting capabilities have greatly evolved. In fact, solutions today provide demonstrable reductions in mean time to restore services to their necessary business levels in many financial services organizations. A March 2007 Ashton, Metzler and Associates sponsored survey of 138 enterprise network engineering professionals revealed that engaging a real-time network monitoring and management solution had reduced their time to diagnose performance problems by approximately 69%. Users reported that problem diagnosis took an average of 9.1 hours with other tools and approaches, and an average of 2.8 hours once they deployed more holistic solutions incorporating KPI-to-Flow-to-Packet technologies.

For global financial services organizations, such dramatic IT productivity improvements provide compelling incentive for deploying these real-time performance management solutions. In the case of that help desk call at the beginning of this article, putting in place such performance management capabilities will first reduce the need and frequency for these calls with ongoing proactive analysis. Further, when users do experience poor performance, the same tools can be engaged to troubleshoot the problems as fast as possible. When money flows through the network, the ultimate payoff for rapid problem resolution is going to be recognized in customer service, retention, and financial rewards for your business.

“ Wall Street's Quest to Process Data at the Speed of Light,”Information Week, April 23, 2007.


Disclaimer: All comments posted in a personal capacity
POST A COMMENT
In order to post a comment you need to be regsitered and signed in.
Register | Sign in
No Comments Have Been Submitted
Disclaimer: All comments posted in a personal capacity