"Financial Service Technology America, today's latest financial news now..."
New Account

The Magazine

Issue 8

This is a short description of the magazine.

E-magazine
  • Previous Issues

Blog

Spencer Green
Chairman, GDS International

Sales and the 'Talent Magnet'

A lot is written about being a ‘Talent Magnet’, either as a company, or as President. It’s all good practice – listen, mentor, reward, provide clear goals and career maps. Good practice for the employer, but what about the employee?
25 May 2011

Business intelligence needs automation for information to flow

Tidal Software | www.tidalsoftware.com

No Comments

Business Intelligence (BI) systems have increasingly moved into the role of providing the operational database for Enterprises. For many businesses, this has been driven by the need for a whole-enterprise view to properly understand the daily state of the business to drive more effective, informed decisions.By Wayne Greene

Competitive businesses have, in general, mastered the basics of effective operations of each silo or component of their business, because competing in today’s markets simply demands this level of core competence. Businesses must understand the full scope of their operations and the impact of each functional area if they are to improve operations, better serve customers, and speed delivery of new products and services to market. The only way to achieve a whole-enterprise view is for businesses to make better use of the vast amounts of data collected by all the enterprise systems and applications, and BI solutions are leading the way.

An example of where this form of data integration can be particularly important is in compiling whole customer profiles. These consolidate customer information and activity from a number of systems. This view is central to directing real-time up sell campaigns in customer service calls. Another application of the customer profile is as input to fraud detection systems. Many environments utilize multiple portfolio accounting systems, and require consolidation of data across these systems for risk mitigation and portfolio analysis applications. Additional applications include financial modeling for cost allocation of services; trades and transfers reconciliation checking; and compliance and audit reporting. As the operational warehouse matures, new applications for the data typically emerge. These are based on the business’s experience with the information that is available.

With an Enterprise Job Scheduler, three critical steps in your data flow can be automated and orchestrated: preprocessing, ETL, and analysis.

A BI approach is just one of the solutions to enterprise-wide data integration. Its benefits are its ready accessibility and scalability, but it does have its challenges. The information retrieved from a BI system is only as trustworthy as the data put into it. Generally bad information is not the result of bad source data. More often bad information is the result of pulling out-of-sync data into the system due to problems in the data processing flow. This process flow, referred to as Extract-Transform-Load (ETL), can not only result in bad or inaccurate information, but also can suffer from other problems in the areas of information availability, process auditability, and agility in responding to changing business needs. While bad information can lead to bad decisions, which can negatively affect the course of the business, the full scope of problems leads to high costs for IT, potential issues in governance and compliance, and inability to provide a whole-enterprise view that is kept current with the business needs.

Scripts, Custom Code and Islands of Automation
In typical environments, the ETL processing flows are generally set up and maintained by the enterprise’s IT group. The group works in conjunction with other departments to identify the needed information and its role in the overall data warehouse. There can be hundreds if not thousands of data sources including legacy databases, application databases, departmental data marts, inbound data sources from partners, and many other sources of information scattered throughout the IT landscape. Each source often has its own unique issues of method of access, data content and quality, update and arrival schedules, and requirements for transformation. To consolidate and integrate this information, each portion of the ETL process is often performed by different types of technology most of which has little if any management or administrative control.

Within the ETL tools themselves there is often a basic process scheduler to initiate, coordinate, and manage the process. However, these process schedulers are limited in their reach and typically can only manage operations inside the ETL tool. For process flows outside the tool, scripting and custom code is required. These individual custom and scripted solutions manage everything from preprocessing steps, to moving data between systems, to notifying users of data availability. An afterthought in many of these systems is alerting operators and reporting errors. The overall result is a solution that provides little visibility into the progress of the ETL process, little notification when a portion of the process fails, no help in recovering from errors, and is very brittle making it difficult to understand, fix, or evolve.

End-to-end Automation is the Key
To get the ETL process flows under control it’s necessary to target the heart of the problem by standardizing on a platform for end-to-end automation and visibility of the data flow. A tool that is particularly fit to providing this kind of automation is a distributed job scheduler. Enterprise distributed job schedulers provide both the reach and the control to manage both the ETL process as well as all the related input, output, and notification processes associated with the complete data flow.

Key to proper ETL processing is managing the sequence of events in orchestrating the steps in the process. This involves coordinating steps that may operate relatively autonomously. Enterprise job schedulers manage this orchestration as a series of dependencies between steps in the overall process. These dependencies can be defined in very sophisticated ways involving a broad range of events and triggers that indicate completion and outcome of a step, making in easier to ensure consistent execution. Through this dependency management mechanism, each subsequent step in a process is ensured of having complete and accurate data. This also ensures that the next step in the process proceeds as soon as possible, maximizing throughput for the BI system. This standard feature of the job scheduler dramatically reduces the complexity involved in creating the end-to-end ETL process with some companies building their complete ROI on this result alone. This level of automation is the only way to consistently ensure that the data needed will be in the right place at the right time. Typically, there are many different data pathways, all of which have multiple dependencies with each other and with the hardware and applications that are also part of the process.

Importance of Visibility into the End-to-end Process
In most systems, there is also a lack of visibility into the end-to-end data process that can further hamper operations of the data stream. For example, with scripting, custom code, and embedded scheduling, it is generally impossible to know for certain exactly where specific ETL process is at any point in time, when a particular process will be completed, or even what the entire process consists of. The result is no visibility into the process status for business users and downstream systems, no proactive notification of process issues at the source of the problem, and brittleness in the overall process implementation.
The enterprise job scheduler provides a single console and view into the end-to-end process execution. Not only does this console provide ready visibility of the progress of a specific process, but also provides almost immediate visibility into any errors in processing steps as well as the location of these errors.
The single console can also provide a consolidated graphical view, assisting in documenting the overall process. This greatly enhances agility over the scripts, custom code, and embedded scheduler approach, which makes the planning, design, and implementation of new ETL processes much quicker and the addition of new functionality more resilient.

Recovery and Cascading Errors
No matter how well the system is managed, errors and failures inevitably occur: periodic problems with networks, servers, storage devices and applications cause processing steps to fail; data or application errors may produce corrupted data or truncated files; or unplanned system maintenance may take a required system offline. When everything is online and working as it should, the data flows as expected, but if a single resource is unavailable, the data flow stops working. This throws IT into problem-solving mode. For a system based on scripting, custom code, and embedded schedulers, these numerous problem points make the BI system fragile enough that IT spends most of their time identifying and solving problems.

A major problem is cascading errors. These occur when the custom solutions lack sufficient error detection or dependency management to properly halt execution of the data flow when a step fails. The result is a series of errors in the downstream steps, some of which fail completely or, even worse, go unnoticed and proceed with processing using erroneous or missing data.

With the automation of an enterprise job scheduler, the situation is quite different. Cascading errors are prevented right from the start,and dependency management is used to ensure that each step in the process completes successfully before launching the next activity. Additionally, isolation of errors is immediate because the scheduling console immediately highlights any failed processing steps thus localizing the problem to the specific system component.
Reduced costs in the time it takes to isolate errors, recover from there side effects, and eliminate delays in delivery of the data to the business have also provided excellent ROI for justifying the investment in an enterprise scheduler for these systems.

Processing Windows and Tuning the System
The amount of data that Enterprises are utilizing to run their businesses only continues to increase. By creating a whole-enterprise view, this vast amount of information is being leveraged in ways that create tremendous value out of what would otherwise be mostly incomprehensible. However, processing this growing information base puts tremendous pressures on the IT team and on the IT infrastructure to keep up with the demands.

Much of the ETL processing occurs after business hours. However with the increasing amount of data, this processing window becomes an ever smaller and presents tighter time frames to complete all the processing. Fortunately automation with an enterprise job scheduler can help.

During this time many of the resources utilized during business hours for online transaction processing can be redirected towards the ETL processing, and many IT teams are beginning to utilize virtualization to facilitate this repurposing of resources. The enterprise job scheduler can manage this repurposing process, weaving in the management of the virtualization infrastructure as another dependency in the ETL process flow.

Other ETL processing is moving into business hours as globalization is further reducing the meaning of after hours and as the business demands more near-real-time access to the whole-enterprise view. Enterprise schedulers help here as well by driving more and more processing off real-time events and ensuring optimal utilization of resources by tightly managing dependencies. New technologies for Workload Automation are also helping to dynamically manage the application of processing resources and to mitigate contention between processes. This allows utilization of the IT infrastructure to be optimized to satisfy the business information needs while avoiding unnecessary hardware proliferation.

Conclusion
By improving the way in which data flows to the BI system are managed, businesses can gain greater value from their BI investment, enable their IT staff to focus on increasing the intelligence put out by the system, and improve the overall functioning of the business itself. This can be accomplished with an enterprise scheduling solution that is easy to use and covers heterogeneous environments. Many problems result when the ETL process is handled by a patchwork of scripting, custom coding, and various built-in schedulers that are part of existing ETL solutions because these systems do not provide end-to-end execution, monitoring, and control of the ETL process. An enterprise scheduler with its broader coverage and important features like the ability to handle dependencies can dramatically raise the reliability and quality of BI and improve how BI systems teams use their time and resources to deliver rapid ROI. Ultimately, this approach frees resources to focus on exploring the type of information that can be extracted from the system, how that information can be used, and ultimately on creative solutions to answering business questions based on reliable and meaningful data.

The above is an abbreviated version of the article ' Automating Data Flows to Your Business Intelligence Application'. For the full version, please visit www.tidalsoftware.com/BI_FST

See more articles by Tidal Software:

Increase Operational Efficiency by Modernizing Batch Processing


More like this...

Disclaimer: All comments posted in a personal capacity
POST A COMMENT
In order to post a comment you need to be regsitered and signed in.
Register | Sign in
No Comments Have Been Submitted
Disclaimer: All comments posted in a personal capacity