Sunday, September 27, 2009

Watching Events

It seems that when thinking about events, we have a tendency to put some of the responsibilities in the wrong place. Of course every time we don't have a proper separation of responsibilities, we get extra complexity. So in this post I will look at some of the issues around the responsibilities and see where they should be allocated.

Short political rant that can safely be ignored now. Why is health care insurance (in the USA) handled essentially through employers and employment contracts? They simply don't belong to each other. The time base is wrong, the administration is wrong, the result is wrong.

End of rant!

I think a little naming context would be helpful here. This will be greatly simplified - just so that the essence is laid bare.

E is an event publisher
E has an output queue onto which itpublishes. This is called EQ
E publishes events of type V1, V2, V3 and V4. Individual events are named v1-1, (the first event of type V1), v1-2 (the second event of type V1), etc.
C1 is a consumer of events - it can consume events of type V1, V2
C2 is a Consumer of events - it can consume events of type V2, V3, V4
C3 is a consumer of events - it can consume all kinds of events.
Each consumer has an input queue - C1 has an input queue C1Q, C2 has C2Q, C3 has C3Q

The event network behaves as follows:

E publishes v1-1. The event network must transfer v1-1 to C1Q and C3Q. C1Q and C3Q. C1 and C3 are now able to process the events by reading their respective queues.

Thinking about the outcome possibilities (assuming E was successfuly able to publish) v1-1.

Both C1 and C3 receive the event v1-1
C1 receives the event v1-1, but C3 does not.
C1 does not receive the event v1-1, but C3 does
neither C1 nor C3 receive the event.

Question for the reader. Where should the responsibility lie in taking action for recognizing that one of these conditions obtains. The possibilities are:
(a) E
(b) C1
(c) C3
(d) none of the above

My answer is (d) - none of the above. It isn't any of these players' responsibility to do this. E's job was to publish the events. C1 and C3's jobs are to handle the events on behalf of their scope. But C1 and C3 are autonomous, so they can't be dealing with each others' failures.

So if it is (d), then there must be some other partcipant - yet to be identified that takes the responsibility. That something else is a proxy for the overall business policy. It needs to exist independently of everything else - and therefore needs to have the proper information fed to it.

Now imagine that C1 in some sense "fails" - it completes, but delivers an exception condition. If that condition is serious enough then something has to know. So it would be sensible C1 were to notify something of its own outcome.

Likewise C2 and C3.

So we have a kind of triplet of notifications (which expand into greater complexity for real sized problems).

E says, "I sent e1-1"
C1 says, "I got e1-1"
C1 says "I handled e1-1 with a normal result"
C3 says, "I got e1-1"
C3 says, "I couldn't do e1-1 - I failed because of a business problem."

So we could then implement a policy rule about what to do under these circumstances. That rule could of course inject new events into the "system." That however is a subject for another time.

Bottom line is that we have a three part system now for handling events. There is the standard pub/sub behavior (and don't get too hung up on the specific technologies). There is the abilty to send out content references using a mechanism that is not part of the event channel (quite RESTful really), and then there is the ability to act on non-receipt or improper handling of the event by one of the event subscribers.

A nice compact model that seperates concerns, so that the individual components can be focussed on their own responsibilities and not prying into each others' business.

A new twist on the Taj Chaat process...

For those who have seen the previous post on process improvement at my local Indian chaat
restaurant will probably be intrigued on yesterday's twist.

We were buying some appetizers to go. In this case 2 samosas and an order of golgappe. Normally the process (at least for dining) is to write the items on a form, be assigned a number and given the vibrating pager to let us know when ready. Collect food, eat, check out by handing over the pager so they could locate the items to be villed from an accordian file.

For take-out, the process is very different. Again I fill out the little 2-part form. But before handing it over to central oredering, I take both parts of th3e form to the cash register. I pay for the order. Both parts of the form are stamped paid. I take the form back to central ordering where I am given the a number and a vibrator. The order is prepared, vibrator goes off and now what? I go to the station to pick up the order, but since the collection point of the vibrators is the cash register how do I return the vibrator?

This is made harder because the worker at the cash register is a different person from the one I paid. So there is no session state anywhere. I have a vibrator that has vibrated and a cashier who expects to take cash. She can't find the form parts in the accordion file.....

So why is this important?
Again seeing how non-IT organizations think about the processes that run their own businesses makes us aware that we shouldn't be overengineering. There are exceptions - things that don't follow the normal path that we just have to work around. Of course each workaround decreases efficiency, but if they are sufficiently rare, then the overall efficiency is not decreased by having work arounds. However, if the workarounds do become cumbersome, expect them to be worked on without negatively affecting the frequent path activities.

Oh, and by the way yes the golgappe and samosas were worth it! The whole bill was $6.56!

Processes and events

Another day of talking to Nigel Green - thank you Skype! And some thinking around processes and their relationship with events. Again sounds innocent - but it seems as if both of us strongy event-oriented thinkers come to common ground when thinking about processes and orchestration - namely that while you might use low level messaging semantics for implementing processes, event modeling doesn't really help when trying to model processes. However and here the lightbulb began to flicker dimly, the result of executing of a process or process step can become a source of events.

We chose an example from the airline industry - and from our experiences of being travelers. Not from any great insights from the internals of the business. The focus was the check in process at the airport itself.

Clearly we see interesting policies at different places and for different carriers. For example at Mumbai (at least time I went through there), they seal your bag with some kind of security strap. So it can be seen whether the bag has been tampered with. That is less common in the USA. However, at Miami International Airport when I went through a month or so ago, I saw the ability to wrap the bag in a kind of cling wrap. I presume that can be done elsewhere too. That is all by way of background.

Airlines nowadays can and do charge fees for checking baggage. All of the rules require that checked bags undergo a security check. Bags are subject to weight limits. Passengers are subject to bag limits (no more than n per passenger). Ignoring further complexity like whether actually to collect the fees (elite passengers are exempt, for example) there are some quite interesting process decisions to be made.

If the airline chooses to impose the bag fee immediately that the passenger offers the bag for checkin, then there may be some undo logic if the passenger decides not to check the bag after all. It is worse though. Imagine that the bag is too heavy. When is that discovered? For example if the passenger has checked in (and been charged) at a kiosk, then on presentation of the bag it is discovered to be too heavy so an extra collection is made - that could make the passenger decide not to check it after all, so a refund is in order. Or perhaps the passenger may decide to open it and remove some of the heavier items, and get it to the correct weight. That's fine - but what if it has already been secured with tape or wrapped in a cling wrap. Kinda tricky to undo...

Then there is security. Another opportunity for the passenger to open the bag and remove things if they should not be in checked baggage. And so it goes.

Different airlines and different jurisdictions will implement the Policies - "maximize revenue for bags", "make sure the passengers' possessions are safe", and "transport passengers safely" with different process paths. Those paths need to be orchestrated. It isn't clear how an event network will really help that orchestration. In fact I would go so far as to say it complicates it. However at each step of the various process steps (or sub-processes), it would be very useful to spit out an event that provided useful (possibly actionable) information to trigger some other behaviors.

For example, if during the security screen a weapon were found, we would expect an event to be raised to trigger a whole raft of other processes. We would be jumping outside a process domain into another domain. That of airport behaviors to criminal behaviors. So looks like a terrific event.

Even the mundane events may be interesting to somebody. That a passenger decided not to check the bag after a fee was assessed can be helpful when looking at the behavior as a whole for market and planning purposes. Opportunities for process improvement abound.

The weight of the checked luggage is also useful for "weight and balance" on the aircraft. Necessary so proper takeoff parameters can be computed, proper fuel calculations can be performed, etc. So the event raised as a result of successful baggage check-in is quite a handy event to have.

So bottom line, it seems from this (and a whole raft of other possible examples) that we will typically see events being generated as a result of a process step happening - at least when in a process.

Of course there are lots of other ways of generating events, we don't need to formalize processes so that events can be generated. Relatively random behaviors give rise to events too.

Sunday, September 13, 2009

Event and Content

This conversation with Nigel started innocently enough. Two people who have very similar views talking about architecture in the large. We described our current problem spaces - they looked very similar, but as is often the case we used the same words to mean different things. Of course once we realized this we got back on track quickly. I'll write this from my perspective, but in reality it would be better written by a dispassionate observer.

I set out to describe the separtion that I am seeing in VPEC-T especially around P-E-C. In my definition, I was thinking of Content as being everything about the Event. I was merrily proceeding down this path, thinking that the view was shared until Nigel spoke. We realized simultaneously that the word covered several concepts. There is the "stuff" about the event itself - the event properties like when it happened, what kind of event it is, what channel it was communicated on - essentially a kind of tag cloud for the event. I suppose in the current vernacular this might be the event meta-data. There is also the "state" that exists as a result of something doing a piece of work and generating the event. That may well need to be made available somehow. So for example, in my sales system the "event" of a sale being made has something useful like when it happened, however there is a whole lot of other information like who the parties to the sale were, what the value of the sale is, who the sales team is, what was sold... - in other word the "business document" representating the state as perceived by the event emitter.

In the Chris world, the tags (event meta data) were part of content. In the Nigel world they aren't part of content per se. Actually that is unfair, they aren't ONLY content. In other words, the event meta data can easily travel with the event. But the state information typically won't.

Reversing that point of view for a second, we get the notion that there is definitely a content "store" somewhere that the processor of the event might go to get the contents, and an event store - somewhere the event data would be stored. The content store doesn't have to be explicitly created by the event creating component, it could equally be some external source (e.g. a weather forecast). The event store's job is to store the events (duh) so that they can be replayed for a variety of reasons:
  • An event processor has been out of the loop for a hour and wants to catch up "get me all the sales events for the last hour"
  • A simulation wants to be done using some "live" state. What would the impact be if we had decided to close this airport at 2pm yesterday? So inject a new event into the event store and then replay forward with the existing events after that point. Of course there will quickly be diversion from the reality as processed, but that is OK. We have been able to run the simulation with a known starting point

Considering the second bullet above, it is quite convenient to use the same semantics for processing "missed events" as for processing the events in real time. Kind of rule - never pass the content with the event, always pass a reference to content, but make sure that the event tag (meta-) data do come with the event. Make the event store readable so that the event sender isn't responsible for hanging on to the event, and having to remember who has received it and who hasn't

At the end of our time together on this call we were, as expected, in agreement that for a proper separation of concerns, we should treat content as being the state data and the event store as being the place where the events and associated metadat are stored to allow for retries and simulations (among other things).

The Policy part of the call will come next.....