For those organisations that have moved into live running of business applications based on SOA, one of the (many) current headaches is monitoring and managing end to end transactions. Although application, network and infrastructure monitoring tools have been around for many years, the loosely coupled nature of SOA presents some challenges in providing the transaction visibility, integrity and recovery capability that mainframe users have enjoyed since the 1970s.
Of course, this state of affairs is nothing new for an emerging technology standard such as SOA. The move from host-based computing to distributed client-server environments produced similar problems for transaction monitoring. The introduction of multi-phase commit functionality helped to provide better distributed transaction management.
Nowadays, application architectures typically have several layers that are not tightly integrated with each other. Most modern applications are accessed via a web browser across the internet/intranet, hosted on a portal server which in turn calls web services, possibly choreographed via an ESB, orchestrated by a process engine, run on one or more application servers, using business rules from a rules engine, calling legacy applications and databases on mainframes or servers in one or more data centres. And don’t get me started on where Software as a Service or Cloud Computing fits in…
So, when a web user of the business service experiences a problem (not responding, misbehaving, returning errors), how do we identify and rectify where the problem lies?
It would be a fair assumption that you already have an enterprise monitoring framework providing monitoring data on networks, servers, security, database, and some applications. The main additional requirements that an SOA environment might bring are: portal, web server, processes, enterprise service bus, services and an application server. For many of these components discrete monitoring tools are either built in or available. In fact, if any layer of your current stack is not instrumented you should consider replacing it with a product that provides the relevant performance statistics.
The two parts of the stack that won’t be initially ready for monitoring, not surprisingly, are the process layer and the services themselves. The runtime of the business process is typically either done as execution of Business Process Execution Language (BPEL) or within the black box of a specialist BPM tool. It is also possible that the process is being choreographed within the portal layer, or as an old fashioned program-like service running in the application server. Both of these architecturally unsound (but sometimes more practical) approaches still require the same monitoring instrumentation as for processes and services, as in all these cases there is limited built-in monitoring. Services, be they web services or other standard encapsulated code, are by their nature programs transforming the input into an output as specified. To find out what happens within a service, we need to ensure that the service tells us.
Therefore we are back to the old programming approach to providing insight to what is happening within code: alerts and flags. My experience is that currently you will still need to architect some code-based alerting into the processes and services. This is complicated by the need to understand the context in which the process or service will be invoked and consumed. One way this is done is by returning a status message containing the transaction ID to a monitoring console or database on completion of the task. If there has been a problem, an error code is typically returned that can be actioned by your service monitoring infrastructure. However, in the loosely coupled world of SOA, the specific service may be being used by a number of different business processes, so the response to the error condition will need to meet the particular requirements of this process.
To understand this context requires an overall End to End Service Management Strategy, comprising the following:
- Business Service Monitoring Strategy. This defines the business metrics and events that need to be captured and measured during execution of the high level business process or service.
- Business Transaction Management. In a traditional CICS-like Transaction Processing (TP) environment each transaction is managed to provide trusted commit, rollback and recovery. In an SOA world you will need to emulate this across the whole business process. Some of the more sophisticated process engines and ESBs provide tooling to reasonably easily enable the tracking of each transaction, and provide the audit trail to prove completion, or enable rollback of the transaction. However you will still need to define and develop the recovery procedures yourself to cover the round trip.
- Business Activity Monitoring. Even if the transaction completes there could still be performance issues or delays in returning the results of each transaction. This requires more detailed activity monitoring of each component of your SOA stack to identify potential and actual bottlenecks. As you can imagine, in a large stack there could be a considerable number of components to be monitored in the complete journey. Expecting your monitoring team to keep track of all of this manually is unreasonable. An automated script-based intelligent tracking system is required to meet the service levels your support teams or outsourcer will be held to.
Having been working on a large SOA transformation programme for the past few months, I have been working through the challenges of delivering this. If you deliver all the above will you achieve end to end SOA monitoring? My experience is that this provides the groundwork. In the next article I will cover making this work within an ITIL Service Management framework, and with multiple service providers, as the resulting Service Level and Operational Level Agreements (SLA/OLA) and Under-Pining Contracts (UPC) significantly complicate the story.