Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

6.1 Introduction

The previous three chapters have described the basic middleware building blocks that can be used to implement distributed systems architectures for large-scale enterprise systems. Sometimes, however, these building blocks are not sufficient to enable developers to easily design and build complex architectures. In such cases, more advanced tools and designs are needed, which make it possible to address architectural issues with more powerful middleware technologies. This chapter describes two of these, namely message brokers and workflow engines, and analyses the strengths and weaknesses of these approaches.

6.2 Message Brokers

Basic messaging using MOM and publish–subscribe technologies suffices for many applications. It’s a simple, effective, and proven approach that can deliver high levels of performance and reliability.

MOM deployments start to get a little more complex though when message formats are not totally agreed among the various applications that communicate using the MOM. This problem occurs commonly in the domain of enterprise integration, where the basic problem is building business applications from large, complex legacy business systems that were never designed to work together and exchange information.

Enterprise integration is a whole field of study in itself (see Further Reading). From the perspective of this book however, enterprise integration has spawned an interesting and widely used class of middleware technologies, known as message brokers.

Let’s introduce message brokers by way of a motivating example. Assume an organization has four different legacy business systems that each hold information about customers.Footnote 1 Each of these four stores some common data about customers, as well as some unique data fields that others do not maintain. In addition, each of the applications has a different format for a customer record, and the individual field names are different across each (e.g., one uses ADDRESS, another LOCATION, as a field name for customer address data). To update customer data, a proprietary API is available for each legacy system.

While this is conceptually pretty simple, it’s a problem that many organizations have. So, let’s assume keeping the data consistent in each of these four applications is a problem for our hypothetical organization. Hence, they decide to implement a web site that allows customers to update their own details online. When this occurs, the data entered into the web page is passed to a web component in the web server (e.g., a servlet or ASP.NET page). The role of this component is to pass the updated data to each of the four legacy applications, so they can update their own customer data correctly.

The organization uses MOM to communicate between applications. Consequently, the web component formats a message with the new customer data and uses the MOM to send the message to each legacy system.Footnote 2 The message format, labeled In-format in Fig. 6.1, is an agreed format that the web component and all the legacy applications understand.

Fig. 6.1
figure 1_6

Using MOM to communicate a customer data update to four legacy systems

Each legacy system has a queue interface component that can read messages from the queue, and using the data in the message, create a call to the customer data update API that the legacy system supports. In this example, the interface component would read the message from the queue, extract the specific data fields from the message that it needs to call its legacy system’s API, and finally issue the API call. As shown in Fig. 6.2, the interface component is basically performing a transformation from the In-format to a format suitable for its associated legacy system.

Fig. 6.2
figure 2_6

Message transformation from common to a legacy-specific format

So, for each legacy application, there is a dedicated component that executes the logic to transform the incoming message into a correctly formatted legacy system API call. The transformation is implemented in the program code of the component.

This solution has some interesting implications:

  • If the common In-format message format changes, then the web component and every legacy system component that executes the transformation must be modified and tested.

  • If any legacy system API changes, then only the transformation for that system must be modified and tested.

  • Modifying any of the transformations most likely requires coordinating with the development team who are responsible for the upkeep of the legacy system(s). These development teams are the ones who know the intimate details of how to access the legacy system API.

Hence, there is a tight coupling between all the components in this architecture. This is caused by the need for them to agree on the message format that is communicated. In addition, in large organizations (or even harder, across organizational boundaries), communicating and coordinating changes to the common message format across multiple legacy system development teams can be slow and painful. It’s the sort of thing you’d like to avoid if possible.

The obvious alternative solution is to move the responsibility for the message format transformation to the web component. This would guarantee that messages are sent to each legacy system interface component in the format they need to simply call the legacy API. The transformation complexity is now all in one place, the web component, and the legacy system interface component becomes simple. It basically reads a message from the queue and calls the associated API using the data in the message. Changes to the In-format message do not cause changes in legacy interface components, as only the web component needs modifying and testing. Changes to any legacy API though require the specific legacy system development team to request a new message format from the web component development team.

This is a much better solution as it reduces the number of changes needed to the various software systems involved (and remember, “change” means “test”). The major downside of this solution is the complexity of the web component. The transformation for each legacy system is embedded in its program code, making it prone to modification as it is effectively coupled to the message formats of every legacy system it communicates with.

This is where message brokers offer a potentially attractive alternative solution. Architecturally, a broker is a known architecture pattern Footnote 3 incorporating a component that decouples clients and servers by mediating the communications between them. Similarly, message broker middleware augments the capabilities of a MOM platform so that business logic related to integration can be executed within the broker. In our example, using a broker we could embed the message transformation rules for each legacy system within the broker, giving a solution as in Fig. 6.3.

Fig. 6.3
figure 3_6

Decoupling clients and servers with a message broker

A message broker solution is attractive because it completely decouples the web component and the legacy interface components. The web component simply assembles and emits a message, and the broker transforms the message into the necessary format for each legacy system. It then sends an output message to the legacy system interface components in the precise format they desire.

A further attraction is the simplification of all the components in the system, as they now do not have to be concerned with message format transformation. The message transformation logic is localized within the message broker and becomes the responsibility of the integration group to maintain. Consequently, if changes are needed in the web or legacy system message formats, the development team responsible only need liaise with the integration group, whose job it is to correctly update the transformations.

It’s not a massive job to implement the broker pattern in conjunction with a standard MOM platform.Footnote 4 Such a solution would still have the disadvantage of defining the transformation logic in the program code. For simple transformations, this is no big deal, but many such applications involve complex transformations with fiddly string formatting and concatenations, formulas to calculate composite values, and so on. Nothing too difficult to write, but if there were a better solution that made creating complex transformations simple, I doubt many people would complain.

Message broker technologies begin to excel at this stage, because they provide specialized tools for:

  • Graphically describing complex message transformations between input formats and output formats. Transformations can be simple in terms of moving an input field value to an output field, or they can be defined using scripting languages (typically product specific) that can perform various formatting, data conversions, and mathematical transforms.

  • High-performance multithreaded message transformation engines that can handle multiple simultaneous transformation requests.

  • Describing and executing message flows, in which an incoming message can be routed to different transformations and outputs depending on the values in the incoming message.

An example of a message mapping tool is shown in Fig. 6.4. This is Microsoft’s BizTalk Mapper and is typical of the class of mapping technologies. In BizTalk, the mapper can generate the transformations necessary to move data between two XML schemas, with the lines depicting the mapping between source and destination schemas. Scripts (not shown in the figure) can be associated with any mapping to define more complex mappings.

Fig. 6.4
figure 4_6

A message broker mapping tool example

An example of a typical message routing definition tool is shown in Fig. 6.5. This is IBM’s WebSphere MQSI technology. It shows how an incoming message, delivered on a queue, can be processed according to some data value in the message. In the example, a Filter component inspects the incoming message field values, and based on specified conditions, executes one of two computations, or sends the message to one of two output queues. The message flow also defines exception handling logic, which is invoked when, for example, invalidly formatted messages are received.

Fig. 6.5
figure 5_6

Message routing and processing

Hence, message brokers are essentially highly specialized message transformation and routing engines. With their associated customized development tools, they make it simpler to define message transformations that can be:

  • Easily understood and modified without changing the participating applications.

  • Managed centrally, allowing a team responsible for application integration to coordinate and test changes.

  • Executed by a high-performance, multithreaded transformation engine.

Of course, as integration logic gets more and more complex, using a message broker to implement this logic is tantamount to essentially moving the complexity from the integration end points to the broker. It’s an architectural design decision, based on the specifics of an enterprise and its technical and social environment, whether this is a good decision or not. There’s no simple answers, remember.

Importantly, message brokers operate on a per message level. They receive an input message, transform it according to the message routing rules and logic, and output the resulting message or messages to their destinations. Brokers work best when these transformations are short lived and execute quickly in, for example, a few milliseconds. This is because they are typically optimized for performance and hence try to avoid overheads that would slow down transformations. Consequently, if a broker or its host machine crashes, it relies on the fact that failed transformation can simply be executed again from the beginning, meaning expensive state and transaction management is not needed. Note, however, that many message brokers do optionally support transactional messaging and even allow the broker to modify databases transactionally during transformation execution. These transactions are coordinated by an ACID transaction manager, such as the one supplied with the underlying MOM technology.

For a large class of application integration scenarios, high-speed transformation is all that’s required. However, many business integration problems require the definition of a series of requests flowing between different applications. Each request may involve several message transformations, reads and updates to external database systems, and complex logic to control the flow of messages between applications and potentially even humans for offline decision making. For such problems, message brokers are insufficient, and well, you guessed it, even more technology is required. This is described in the next section.

Before moving on though, it should be emphasized that message brokers, like everything in software architecture and technologies, do have their downsides. First, many are proprietary technologies, and this leads to vendor lock-in. It’s the price you pay for all those sophisticated development and deployment tools. Second, in high-volume messaging applications, the broker can become a bottleneck. Most message broker products support broker clustering to increase performance, scalability, and reliability, but this comes at the costs of complexity and dollars. Recently open-source brokers have emerged, with MuleFootnote 5 being a high-quality example. These technologies are high-quality implementations and well worth considering in many integration scenarios.

6.3 Business Process Orchestration

Business processes in modern enterprises can be complex in terms of the number of enterprise applications that must be accessed and updated to complete the business service. As an example, Fig. 6.6 is a simple depiction of a sales order business process, in which the following sequence of events occurs.

Fig. 6.6
figure 6_6

A typical business process

A customer places an order through a call center. Customer data is stored in a customer relationship management package (e.g., Oracle Siebel). Once the order is placed, the customer’s credit is validated using an external credit service, and the accounts payable database is updated to record the order and send an invoice to the customer.

Placing an order causes a message to be sent to Shipping, who update their inventory system and ship the order to the customer. When the customer receives the order, they pay for the goods and the payment is recorded in the accounts received system. All financial data are periodically extracted from the accounts systems and stored in an Oracle data warehouse for management reporting and archiving.

Implementing such business processes has two major challenges. First, from the time an order is placed to when the payment is received might take several days or weeks, or even longer if items are out of stock. Somewhere then, the current state of the business process for a given order, representing exactly what stage it is up to, must be stored, potentially for a long time. Losing this state, and hence the status of the order, is not a desirable option.

Second, exceptions in the order process can cause the state of the order to fail and rollback. For example, an order is taken for some stock item. Let’s assume that this stock is not available in the warehouse, and when it is reordered, the supplier tells the warehouse that the old stock is now obsolete, and that a newer, more expensive model will replace it. The customer is informed of this, and they decide to cancel the order. Canceling requires the order data to be removed from the warehouse, accounts payable, and Siebel systems. This is potentially a complex task to reliably and correctly perform.

This style of rollback behavior can be defined by the process designer using a facility known as a compensating transaction. Compensating transactions allow the process designer to explicitly define the logic required to undo a failed transaction that partially completed.

In long-running business processes such as sales order processing, standard ACID transactions, which lock all resources until the transaction completes, are not feasible. This is because they lock data in the business systems for potentially minutes, hours, or even weeks in order to achieve transaction isolation. Locked data cannot be accessed by concurrent transactions, and hence lock contention will cause these to wait (or more likely fail through timing out) until the locks are released. Such a situation is unlikely to produce high-performance and scalable business process implementations for long-running business processes.

Transactional behavior for long-running processes is therefore usually handled by grouping a number of process activities into a long-running transaction scope. Long-running transactions comprise multiple process activities that do not place locks on the data items they modify in the various business systems. Updates are made and committed locally at each business system. However, if any activity in the transaction scope fails, the designer must specify a compensating function. The role of the compensator is to undo the effects of the transaction that have already committed. Essentially this means undoing any changes the transaction had made, leaving the data in the same state as it was before the transaction commenced.

Long-running transactions are notoriously difficult to implement correctly. And sometimes they are somewhat impossible to implement sensibly – how do you compensate for a business process that has sent an e-mail confirming an order has been shipped or has mailed an invoice? So, technology solutions for compensating transactions don’t eradicate any of these fundamental problems. However, they do provide the designer with a tool to make the existence of a long-running transaction explicit, and an execution framework that automatically calls the compensator when failures occur. For many problems, this is sufficient for building a workable solution.

As Fig. 6.7 illustrates, business process orchestration (BPO) platforms are designed to make implementing these long-running, highly integrated business processes relatively straightforward. BPO platforms are typically built as a layer that leverages some form of messaging infrastructure such as an SOA or a message broker. They augment the messaging layer with:

Fig. 6.7
figure 7_6

Anatomy of a business process orchestration platform

  • State management: the state of an executing business process is stored persistently in a database. This makes it resilient to BPO server failure. Also, once the process state is stored in the database, it does not consume any computational resources in the BPO engine until that particular workflow instance is resumed.

  • Development tools: visual process definition tools are provided for defining business processes.

  • Deployment tools: these enable developers to easily link logical business process steps to the underlying business systems using various types of connectivity, including message queues, web protocols, SOAP, and file systems.

An example from Microsoft’s BizTalk technology is shown in Fig. 6.8. This shows the design of a simple business process for the ordering example in Fig. 6.6. Messages are sent and received by activities in the process using ports. Ports basically connect to the business systems using a port-defined transport mechanism, for example, HTTP, a message queue or a file. All messages handled inside a BizTalk orchestration must be defined by XML schemas. Activities can be carried out in sequence or in parallel as shown in the example.

Fig. 6.8
figure 8_6

BizTalk business process definition

BPO engines are the most recent addition to the IT middleware stack. The need for their functionality has been driven by the desire to automate more and more business processes that must access numerous independent business applications. There seems little doubt that this trend will continue as enterprises drive down costs by better integrating and coordinating their internal applications, and seamlessly connecting to external business partners.

6.4 Integration Architecture Issues

The difficulty of integrating heterogeneous applications in large enterprises is a serious one. While there are many issues to deal with in enterprise integration, at the core is an architectural problem concerning modifiability. The story goes like this.

Assume your enterprise has five different business applications that need integrating to support some new business processes. Like any sensible architect, you decide to implement these business processes one at a time (as you know a “big bang” approach is doomed to fail!).

The first process requires one of the business systems to send messages to each of the other four, using their published messaging interfaces. To do this, the sender must create a message payload in the format required by each business application. Assuming one-way messages only, this means our first business process must be able to transform its source data into four different message formats. Of course, if the other business systems decide to change their formats, then these transformations must be updated. What we’ve created with this design is a tight coupling, namely the message formats, between the source and destination business systems. This scenario is depicted in the left side of Fig. 6.9.

Fig. 6.9
figure 9_6

Integrating applications in a point-to-point architecture

With the first business process working, and with many happy business users, you go on to incrementally build the remainder. When you’ve finished, you find you’ve created an architecture like that in the right side of Fig. 6.9. Each application sends messages to each of the other four, creating 20 interfaces, or dependencies, that need to be maintained. When one business application is modified, it’s possible that each of the others will need to update their message transformations to send messages in a newly required format.

This is a small-scale illustration of a problem that exists in thousands of organizations. I’ve seen enterprise software architectures that have 300 point-to-point interfaces between 40 or so standalone business applications. Changing an application’s message interface becomes a scary exercise in such enterprises, as so many other systems are dependent on it. Sometimes making changes is so scary, development teams just won’t do it. It’s simply too risky.

In the general case, the number of interfaces between N applications is (N 2 − N). So as N grows, the number of possible interfaces grows exponentially, making such point-to-point architectures nonscalable in terms of modifiability.

Now it’s true that very few enterprises have a fully connected point-to-point architecture such as that on the right side of Fig. 6.9. But it’s also true that many interfaces between two applications are two way, requiring two transformations. And most applications have more than one interface, so in reality the number of interfaces between two tightly coupled applications can be considerably greater than one.

Another name for a point-to-point architecture is a “spaghetti architecture”, hopefully for obvious reasons. When using this term, very few people are referring to spaghetti with the positive connotations usually associated with tasty Italian food. In fact, as the discipline of enterprise integration blossomed in the late 1990s, the emerging dogma was that spaghetti architectures should be avoided at all costs. The solution promoted, for many good reasons, was to use a message broker, as explained earlier in this chapter.

Let’s analyze exactly what happens when a spaghetti architecture is transformed using a message broker, as illustrated in Fig. 6.10. Complexity in the integration end points, namely the business applications, is greatly reduced as they just send messages using their native formats to the broker, and these are transformed inside the broker to the required destination format. If you need to change an end point, then you just need to modify the message transformations within the broker that are dependent on that end point. No other business applications know or care.

Fig. 6.10
figure 10_6

Eliminating a point-to-point architecture with a message broker

Despite all these advantages to introducing a message broker, the no free lunch Footnote 6 principle, as always, applies. The downsides are:

  • The spaghetti architecture really still exists. It’s now resident inside the message broker, where complex dependencies between message formats are captured in broker-defined message transformations.

  • Brokers are potentially a performance bottleneck, as all the messages between applications must pass through the broker. Good brokers support replication and clustered deployments to scale their performance. But of course, this increases deployment and management complexity, and more than likely the license costs associated with a solution. Message broker vendors, perhaps not surprisingly, rarely see this last point as a disadvantage.

So message brokers are very useful, but not a panacea by any means for integration architectures. There is however a design approach that can be utilized that possesses the scalability of a point-to-point architecture with the modifiability characteristics of broker-based solution.

The solution is to define an enterprise data model (also known as a canonical data model) that becomes the target format for all message transformations between applications. For example, a common issue is that all your business systems have different data formats to define customer information. When one application integrates with another, it (or a message broker) must transform its customer message format to the target message format.

Now let’s assume we define a canonical message format for customer information. This can be used as the target format for any business application that needs to exchange customer-related data. Using this canonical message format, a message exchange is now reduced to the following steps:

  • Source application transforms local customer data into canonical customer information format.

  • Source sends message to target with canonical message format as payload.

  • Target receives message and transforms the canonical format into its own local customer data representation.

This means that each end point (business application) must know:

  • How to transform all messages it receives from the canonical format to its local format

  • How to transform all messages it sends from its local format to the canonical format

As Fig. 6.11 illustrates, by using the enterprise data model to exchange messages, we get the best of both worlds. The number of transformations is reduced to 2 * N (assuming a single interface between each end point). This gives us much better modifiability characteristics. Also, as there are now considerably fewer and less complex transformations to build, the transformations can be executed in the end points themselves. We have no need for a centralized, broker-style architecture. This scales well, as there’s inherently no bottleneck in the design. And there’s no need for additional hardware for the broker, and additional license costs for our chosen broker solution.

Fig. 6.11
figure 11_6

Integration using an enterprise data model

I suspect some of you might be thinking that this is too good to be true. Perhaps there is at least a low cost lunch option here?

I’m sorry to disappoint you, but there are real reasons why this architecture is not ubiquitous in enterprise integration. The main one is the sheer difficulty of designing, and then getting agreement on, an enterprise data model in a large organization. In a green field site, the enterprise data model is something that can be designed upfront and all end points are mandated to adhere to. But green field sites are rare, and most organization’s enterprise systems have grown organically over many years, and rarely in a planned and coordinated manner. This is why broker-based solutions are successful. They recognize the reality of enterprise systems and the need for building many ad hoc transformations between systems in a maintainable way.

There are other impediments to establishing canonical data formats. If your systems integrate with a business partner’s applications over which you have no control, then it’s likely impossible to establish a single, agreed set of message formats. This problem has to be addressed on a much wider scale, where whole industry groups get together to define common message formats. A good example is RosettaNet Footnote 7 that has defined protocols for automating supply chains in the semiconductor industry. As I’m sure you can imagine, none of this happens quickly.Footnote 8

For many organizations, the advantages of using an enterprise data model can only be incrementally exploited. For example, a new business systems installation might present opportunities to start defining elements of an enterprise data model and to build point-to-point architectures that exploit end point transformations to canonical formats. Or your broker might be about to be deprecated and require you to upgrade your transformation logic? I’d recommend taking any chance you get.

6.5 What Is an Enterprise Service Bus

You’ll see the term “ESB” used widely in the Service-Oriented Architecture literature. When I first heard this, I wondered what “Extra Special Bitter” had to do with software integration architectures, and when I found out it stood for Enterprise Service Bus, I was sorely disappointed. Anyway, here’s my admittedly somewhat cynical interpretation of where the acronym ESB came from.

Somewhere in the middle of the last decade (~2003–2005), SOA was becoming the “next big thing” in enterprise integration. Software vendors needed something new to help them sell their integration technology to support an SOA, so one of them (I’m not sure who was first) coined the term ESB. Suddenly, every vendor had an ESB, which was basically their message broker and business process orchestration technologies rebadged with of course the ability to integrate web service end points. If you look under the covers of an ESB, you find all the technical elements and software integration approaches described in this and the last two chapters.

There’s a lot of definitions out there for ESBs. All more or less agree that an ESB provides fundamental mechanisms for complex integration architectures via an event-driven and standards-based messaging engine. There’s some debate about whether an ESB is a technology or a software integration design pattern, but some debates really aren’t worth getting involved in. You can buy or download products called ESBs, and these typically provide a messaging-based middleware infrastructure that has the ability to connect to external system endpoints over a variety of protocols – TCP/IP, SOAP, JMS, FTP, and many more. If what you’ve read so far in this book has sunk in to some degree, I don’t think you really need to know more.

6.6 Further Reading

There’s an enormous volume of potential reading on the subject matter covered in this chapter. The references that follow should give you a good starting point to delve more deeply.

  • D. S. Linthicum. Next Generation Application Integration: From Simple Information to Web Services. Addison-Wesley, 2003.

  • David Chappell, Enterprise Service Bus: Theory in Practice, O’Reilly Media, 2004

  • Gero Mühl, Ludger Fiege, Peter Pietzuch, Distributed Event-Based Systems, Springer-Verlag 2006.

The following three books have broad and informative coverage of design patterns for enterprise integration and messaging.

  • M. Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley, 2002.

  • G. Hohpe, B. Woolf. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley, 2003.

  • C. Bussler, B2B Integration Concepts and Architecture, Springer-Verlag 2003.

In terms of technologies, here are some quality message brokers and business process orchestration systems to look at:

  • David Dossot, John D’Emic, Mule in Action, Manning Press, 2009.

  • Tijs Rademakers, Jos Dirksen, Open-Source ESBs in Action: Example Implementations in Mule and ServiceMix, Manning Press, 2008.