Thursday, December 29, 2016

Following up on Spraint

This posting isn't just a blog entry it is a magnum opus. TL;dr version is "don't send the complete state of the system object downstream if the downstream systems are trying to deliver events". It fits into EA (admittedly on the technical side of things) because it digs into what being event driven means to the enterprise.

In a previous posting I introduced the notion of having to dig through old copies of data to figure out what happened. That post didn't dwell on ways to avoid doing that, so this one will.
The question on the table is a simple one. How can a system inform other systems that an event has occurred. I work mostly in the domain of airlines, reservation and operational systems and tying systems together. So this post will draw from airline reservations for its examples.
In the major reservation systems, a "transaction" can be quite a big thing. A passenger can establish a booking, with a suitable itinerary, have it priced and booked all in a single business transaction. A "transaction" can also be quite a small thing. A passenger can add a passport number to the booking. These are clearly at wildly different levels of granularity. So what's a poor architect to do?

The brute force approach is to change the state of the reservation object (or PNR in travel industry parlance) and then ship the whole PNR off downstream to be consumed by other applications. Oh and by the way, a PNR in XML might have a signal to noise ratio of about 10% and it might be as large as 1MB. If a receiving application were to need to know what happened to cause the second message to be sent, then it could look at the first one and deduce the difference. Lots of compute energy consumed to figure out what the upstream system already knew. We will refer to this as BF

Another approach might be for the upstream system to ship the whole PNR with some data somewhere in it telling downstream what changed. Still pretty heavyweight, but at least the decoder ring is just the decoder ring for the header and doesn't require decoding of the whole PNR. We will refer to this approach as BFD

A third approach might be for an upstream system to send the whole PNR only for the first transaction, and then send only deltas (actions perhaps) for subsequent transactions. We will refer to this approach as DO

There is also an information reliability aspect to contend with. Because the systems that need to communicate can have a variety of undesirable traits (they might receive data out of sequence, data might be lost somewhere in the network, a downstream system forgot to upgrade the schema,...) we also need an approach that provides sufficient reliability.

So looking at the needs of systems from a variety of potential consumers.

A Data Warehouse that needs all of the "state" information for each transaction

If the whole architectural approach to the enterprise is based on a collection of data stores (domain oriented operational and data warehouses), then this predominant pattern is for you. But it doesn't necessarily deliver the greatest business agility.

BF Approach

Taking the BF approach, the data warehouse has pretty much what it needs. There are the sequence issues to contend with, but by and large this is the easiest approach for the warehouse. You have complete information at every stage, so it is easy enough just to store the data as it comes in.

Except of course it's a lot of data. And this kind of data storage is often the most expensive storage in the enterprise. So maybe a Change Data Capture (CDC) approach makes sense. So what has actually happened is that a producing system has sent a stateful thing to the data warehouse. The data warehouse breaks it down to see what changed and stores the changed bits. Hmm, sounds like the upstream system is carefully packaging something only for the data warehouse to unpackage it to deduce what happened. Essentially (continuing with the scatalogical metaphors) looking for the pony in there.

BFD approach

The BFD approach has the advantages of the BF approach in that the data warehouse is in some senses complete. So no real impact there.

DO approach

The DO approach is the hardest approach for the data warehouse. Since the whole transactional history is transmitted in the header, the warehouse will need to apply changes forward (i.e. it knows what's changed so CDC). A kind of reverse CDC. Potentially no worse than the forwards CDC.

Downstream systems need to deliver lightweight "something happened" events

This architectural pattern for the enterprise assumes that information can be acted upon as soon as it becomes available. It doesn't mean it has to be, but it could be. Transactional systems typically execute the business transactions (statement of the obvious, I know), but rarely have the scope to deal with the implications. The implications are left to be dealt with by other systems.

BF approach

This is the least convenient approach for systems with an event generation requirement. To figure out what happened (and thus which events to emit), the application must determine the difference between the current message and its predecessor. This an be an expensive operation. It is also inherently unreliable because:
  • The eventing system has to fully process the messages in order so that it can determine state change 
  • The messages may arrive out of sequence
  • It may not be possible to determine that there is a gap in message sequence a priori.
This has a limiting factor that the processing of messages is a sequence preserving activity. Such sequence preserving activities are, by nature, governors on throughput.

BFD approach

In the BFD approach, the downstream, event producing system can identify what changed from the delta information it was given,  At least it says what changed from the previous message. Coupled with data that identifies what the previous values were, and it becomes possible to generate events properly. Except, again, for missing messages. Quite complex logic has to be put in place to deal with gaps in sequence when they are detected.

DO approach

In the DO approach, the downstream, event producing system can determine what happened from the transaction history. It doesn't have to wade through full state to figure it out. But there is some need to make sure that full transaction history be sent with each event because you can't recreate the history if there are gaps. So this is a bit of a hybrid approach. It is a bit like a banking system, whereby you have a periodic statement, and you can see ithe individual transactions between statements.
This approach gives a degree of flexibility - allowing for a kind of duality between state and event. But it still feels unsatisfactory.


There really isn't a one size fits all approach to information management when you have such diverse temporal use cases. The immediate action systems need it fast and light. The historical reporting systems need it less fast, but in full, fine grained detail. So a poor architect has to think varefully about the relevant patterns, and decide which trade offs to make.