Thursday, December 29, 2016

Following up on Spraint

This posting isn't just a blog entry it is a magnum opus. TL;dr version is "don't send the complete state of the system object downstream if the downstream systems are trying to deliver events". It fits into EA (admittedly on the technical side of things) because it digs into what being event driven means to the enterprise.

In a previous posting I introduced the notion of having to dig through old copies of data to figure out what happened. That post didn't dwell on ways to avoid doing that, so this one will.
The question on the table is a simple one. How can a system inform other systems that an event has occurred. I work mostly in the domain of airlines, reservation and operational systems and tying systems together. So this post will draw from airline reservations for its examples.
In the major reservation systems, a "transaction" can be quite a big thing. A passenger can establish a booking, with a suitable itinerary, have it priced and booked all in a single business transaction. A "transaction" can also be quite a small thing. A passenger can add a passport number to the booking. These are clearly at wildly different levels of granularity. So what's a poor architect to do?

The brute force approach is to change the state of the reservation object (or PNR in travel industry parlance) and then ship the whole PNR off downstream to be consumed by other applications. Oh and by the way, a PNR in XML might have a signal to noise ratio of about 10% and it might be as large as 1MB. If a receiving application were to need to know what happened to cause the second message to be sent, then it could look at the first one and deduce the difference. Lots of compute energy consumed to figure out what the upstream system already knew. We will refer to this as BF

Another approach might be for the upstream system to ship the whole PNR with some data somewhere in it telling downstream what changed. Still pretty heavyweight, but at least the decoder ring is just the decoder ring for the header and doesn't require decoding of the whole PNR. We will refer to this approach as BFD

A third approach might be for an upstream system to send the whole PNR only for the first transaction, and then send only deltas (actions perhaps) for subsequent transactions. We will refer to this approach as DO

There is also an information reliability aspect to contend with. Because the systems that need to communicate can have a variety of undesirable traits (they might receive data out of sequence, data might be lost somewhere in the network, a downstream system forgot to upgrade the schema,...) we also need an approach that provides sufficient reliability.

So looking at the needs of systems from a variety of potential consumers.

A Data Warehouse that needs all of the "state" information for each transaction

If the whole architectural approach to the enterprise is based on a collection of data stores (domain oriented operational and data warehouses), then this predominant pattern is for you. But it doesn't necessarily deliver the greatest business agility.

BF Approach

Taking the BF approach, the data warehouse has pretty much what it needs. There are the sequence issues to contend with, but by and large this is the easiest approach for the warehouse. You have complete information at every stage, so it is easy enough just to store the data as it comes in.

Except of course it's a lot of data. And this kind of data storage is often the most expensive storage in the enterprise. So maybe a Change Data Capture (CDC) approach makes sense. So what has actually happened is that a producing system has sent a stateful thing to the data warehouse. The data warehouse breaks it down to see what changed and stores the changed bits. Hmm, sounds like the upstream system is carefully packaging something only for the data warehouse to unpackage it to deduce what happened. Essentially (continuing with the scatalogical metaphors) looking for the pony in there.

BFD approach

The BFD approach has the advantages of the BF approach in that the data warehouse is in some senses complete. So no real impact there.

DO approach

The DO approach is the hardest approach for the data warehouse. Since the whole transactional history is transmitted in the header, the warehouse will need to apply changes forward (i.e. it knows what's changed so CDC). A kind of reverse CDC. Potentially no worse than the forwards CDC.

Downstream systems need to deliver lightweight "something happened" events

This architectural pattern for the enterprise assumes that information can be acted upon as soon as it becomes available. It doesn't mean it has to be, but it could be. Transactional systems typically execute the business transactions (statement of the obvious, I know), but rarely have the scope to deal with the implications. The implications are left to be dealt with by other systems.

BF approach

This is the least convenient approach for systems with an event generation requirement. To figure out what happened (and thus which events to emit), the application must determine the difference between the current message and its predecessor. This an be an expensive operation. It is also inherently unreliable because:
  • The eventing system has to fully process the messages in order so that it can determine state change 
  • The messages may arrive out of sequence
  • It may not be possible to determine that there is a gap in message sequence a priori.
This has a limiting factor that the processing of messages is a sequence preserving activity. Such sequence preserving activities are, by nature, governors on throughput.

BFD approach

In the BFD approach, the downstream, event producing system can identify what changed from the delta information it was given,  At least it says what changed from the previous message. Coupled with data that identifies what the previous values were, and it becomes possible to generate events properly. Except, again, for missing messages. Quite complex logic has to be put in place to deal with gaps in sequence when they are detected.

DO approach

In the DO approach, the downstream, event producing system can determine what happened from the transaction history. It doesn't have to wade through full state to figure it out. But there is some need to make sure that full transaction history be sent with each event because you can't recreate the history if there are gaps. So this is a bit of a hybrid approach. It is a bit like a banking system, whereby you have a periodic statement, and you can see ithe individual transactions between statements.
This approach gives a degree of flexibility - allowing for a kind of duality between state and event. But it still feels unsatisfactory.

Conclusion

There really isn't a one size fits all approach to information management when you have such diverse temporal use cases. The immediate action systems need it fast and light. The historical reporting systems need it less fast, but in full, fine grained detail. So a poor architect has to think varefully about the relevant patterns, and decide which trade offs to make.




Wednesday, November 9, 2016

Events and Context, and cauliflower.

I was chatting with a friend yesterday about composability, events, and other distributed computing concepts.
He is quite a fan of doing composition at the point of consumption. Using the idea of eating family style - a group of people, a common table and piles of food. Each diner chooses what to put on their own plate and how to arrange it.

But there is a flaw which we discussed (maybe many flaws), but the one we talked about was what memories exist as well. So for example, when the food arrives, one of the vegetables is cauliflower. I generally like cauliflower (Hmm, how do I know that - where is that information stored?). However, I have had the cauliflower at this restaurant before (a couple of years ago) and it was awful. (Where is that information stored?).  The key point here is that when events (food delivered to table, say), there is both information that is directly pertinent to the event (what kind of food, when did it arrive, who brought it, what was the temperature) and information that lives in history (I don't like cauliflower here).  We need both sets of data in order for me to have a satisfactory meal.
And that is why in our systems we do have to manage and make available historical state - even when our systems are driven by events.

Wednesday, November 2, 2016

System Spraint

Spraint is a quaint English word for otter droppings. Analysis of otter populations and their dietary habits can be performed by analysis of their spraint.

I see the same kind of analysis being required in systems that send their "spraint" - often in the form of messages downstream for subsequent systems to figure out what happened.

We want events (a passenger checked in), but instead we get a giant message with all of the reservation data and somehow we have to deduce what happened.

Knowing the current state of something doesn't tell us how it got to be in that state. If we are to try to figure out what happened from the state models we have to compare a previous piece of system spraint with the current one and look for differences. That is only sensible if the agent making the change can't or won't tell you what the change was.

In this day and age when we are in a mobile, somewhat reliable, but still constrained network world, it is extremely expensive to send giant messages around the system of systems when all that one of the systems might need is a little piece of knowledge that "something happened".

Being told that something happened, vs having to deduce it makes the job of downstream systems way easier.

This line of thinking is fundamental to the architecture of the enterprise - organizing the enterprise around the business events that can happen, and then having meaningful interpretation and use of those events available instantly should be a goal.

Ask the executive out of whose budget the analysis os coming, "Would you like to be told what happened? Or would you like us to figure it out?" A business taxonomy of events becomes vital.

Would you prefer to know, "Passenger Chris Bird just boarded flight 1234" or here's are a pair of booking records that show that Chris wasn't boarded and then he was?