Tuesday, January 18, 2011

More on what event models can learn from database models

One of my earliest posts was entitled, "What can SOA learn from RDBMS?" In that posting I was arguing for "Event Triggers" that fire when "interesting" business objects change. Of course, "interesting" needs further definition and so for tha matter does "change". However that isn't where I am going in this post.

Historically in the database world, most systems implemented "pessimistic" locking mechanisms. In a pessimistic approach, access (with possible intent to update) to a data resource could not be obtained unless the requestor were able to get exclusive control of the resource. More recent systems employ an "optimistic" approach whereby the data resource is only locked while it is actually being updated. The semantics are quite different, but the intent is the same - makle sure that we don't have conflicting updates.

In general pessimistic locking delivers absolute guarantees, but may cause throughput reductions or excessive resource consumption. Optimistic locking assumes that there will be little contention and therefore doen't consume resources not hold locks "until it has to". In a system with optimistic locks the update requestor may have to deal with the contention behavior (retrying operations or whatever). In the pessimistic scenario, the requesting application can't proceed unless it is safe to do so.

In eventing models, we have similar trade offs that we need to make. No, they aren't lock based, but they do have implications on throughput. These are all focussed around guranteed sequence and guaranteed delivery. Both of these impose throughput limitations on an event based system. Guaranteed sequence isn't really any kind of guarantee however. It is a guarantee that over a certain time frame, messages will arrive in sequence. However this unpacks into not applying events in consuming systems until the guarantee period has expired in some cases.

Why you might ask? Imagine that you have timestamped events coming in, you may not know that an event that comes in does not have an earlier predecessor that has somehow been held up. So you might choose to wait for the appropriate time interval "just to make sure nothing is coming in later" and then apply all events in proper sequence. Set the event sequence window to 5 minutes and in the worst case you are processing 5 minute batches. That doesn't sound very friendly.

So maybe you think about applying sequence numbers to the events. That's fine (well kind of) unless the sequence number calculator is a central shared resource, in which case there may be contention for that. At least however, you can detect that events are out of sequence because if a receiving system perceives a gap in sequence numbers, then it can wait for the "time out period" or the appearance of the missing sequence number(s) whichever happens first and then apply the updates. This is a bit intrusive to the event creators - managing sequence numbers was probably not in their original plan.

In neither of the above scenarios do we get maximal throughput. They are both intrusive and will deliver quite inconsistent throughput. They are rather akin to the pessimistic locking approach taken in database systems.

Looking at it a different way, we can perhaps detect (externally to the applications) that things have happened out of sequence. maybe using an out of band control infrastructure. Thus if a sender identifies when something was sent (using a messaging infrastructure), the receiver's act of receiving will identify when it was received. If the receipts are out of sequence the infrastructure can alert and trigger actions. The assumptions are that:
  • Sequence errors are few and far between
  • It is OK to recognize that the sequence has been violated
  • The infrastructure is capable of re-delivering the events in an appropriate sequence
  • There have been no irrevocable side effects as a result of the out of sequence receipt.
Attempting to detect post-hoc is likely (and I have not done the work to prove this yet) to deliver mmore consistent (and higher) throughput than attempting to guarantee, with a slight increase in risk.

This looks to be a fruitful area of research (or if the research has already been done, a fruitful area for some best practice patterns) as we treat 2011 as "The Year of the Event" as Todd Biske proposes.

Thursday, January 13, 2011

ROI Based Business Cases and Long Range Value

In a series of interesting twitter posts @taotwit(Nigel Green), @erikproper, @oscarberg, @jdevoo have been discussing modelling Value - other than monetary value. In VPEC-T (modeling based on Values, Policy, Events, Contents and Trust) the abstract concept of Value recognizes things other than monetary value.

In other posts, I have seen reference (and I am sorry that I cannot cite the references) to  a primary role of Enterprise Architecture as the team/approach for encouraging project funding across siloes. It was expressed more elegantly in the other posts, I admit!

These various posts and thoughts about Value led me to thinking about entrapreneurial organizations and risk averse organizations. Some of these arguments are kind of old hat, but in companies where innovation is a core driver (eg apple) the introduction of products isn't based (entirely) on ROI. Compare and contrast with very "mature" organizations where business as usual (in the sense that the company delivers the same products, with the same methods, with the same focus on return) because of shareholder or other leadership demands.

Architects are also sometimes thought to be ivory tower/"build it and they will come" kinds of people. This is an accusation that does have merit - we have been guilty of such sins, but it is also used as a weapon for politcal/organizational purposes. The Trust relationships are key here - because architects are likely to be destabilizers (of the status quo and current power bases) means that there will be a failure to Trust the EAs by those who will be affected/disempowered by innovation.

We see this happening frequently in enterprises where there is a process or organization for innovation. "Innovation will be handled only in the xxx organization". Anyone else onnovating outside that organization will be ignored. That's an intersection along the Value/Trust axes.

Wednesday, January 12, 2011

2011 The year of the event

Todd Biske, on his excellent Blog has offereed the imperative, "Let's make 2011 the year of the event" That is a truly wonderful cause. I am shamelessly taking the idea and adding some of my own thinking here. I urge you to read Todd's original post, though.

I think we have had misgivings about eventing models since the early EAI days. With the advent of EAI we worked with events as integration mechanisms. Without attempting to do anything about the underlying systems, and without a sensible taxonomy.

Now perhaps as we think seriously about events and event models there are 2 big ideas that we need to consider.

The first is that when some business object changes state, it should let "the world" know. Think Business Object trigger - here's a link back to 2008 on that.

The second is situational awareness, Here we really need to have a proper mechanism for understanding the effects of an event across a rabge of "observers" might be. It becomes a question of responsibility. When multiple "observers" see the same event they may all choose to handle it. However they are likely to be inconsistent (not necessarily any harm in that), some might claim ignorance (you never told me that), some might have taken different action knowing how another observer behaved...

Using human communications and human triggered interactions as a model, we need to look at the "situational awareness" aspect and decide what to do about conflicts in understanding, conflicts in behavior

Saturday, January 1, 2011

Time Part 2 - Januray 1, 2011

This post follows on from the introduction to thinking about time posted here.

In the previous post, I introduced a few of the nuanaces of time. It isn't about the obvious things like format, timezone, precision, calendar, etc. Those are important and can be the source of errors in coding, but they aren't the worst issues by any means. Of more importance is the meaning of time in a specific context. So to start with. let's look at the kinds of time we might be concerned with. I like to think of time as either a"point in time" or duration. Then associated with times there are various "operations" that are to be applied - dealing with relative time.

So thinking about the common "kinds" of time we see, we might classify this way:
  • Points in time
    • Real time - coincident (to a fine tolerance) with something else
    • Near real time - within the same "transaction" scope as something else
    • Actual time - When something happened
    • Notification time - When an event was notified. This may or not be the time the event happened
    • Observation time - when an observer receives notification that something happened
    • Effective time - a point in time when something becomes effective
    • Expiry time - a point in time when something is no longer valid
    • Cut off time - a point in time when changes are no longer accepted (may behve like an expiry time)
    • ?
  • Durations
    • Validity period - a time duration where something is valid
    • Length of time - a time duration where something is happening
    • Relative duration - a duration that is defined without there being an actual start time. For example, a half in soccer. A type level concept and not an instance level concept.
Clearly in this intial writing there are some quite nebulous words like "something" or "transaction" or "Event". Perhaps happening would be a more useful word.

Anyhow much of the difficulty in systems comes about because of ambiguity related to time. Something may "happen" causing the place at which it happens to think that the happening has occurred, but it has yet to be observed by other entities. Until it does the other entities cannot know that the state has changed.

This is not just a systems phenomenon, but a physics phenomen too. Light moves at a finite speed, so when we see an object (eg a star) that has emitted light, we are seeing the object as it was at the time the light was emitted, and not as it is "now" - or in the frame of reference of us, the observers. We don't (and cannot) know whether the object even still exists. We make assumptions, of course, but in reality we don't know. In other words there is considerable ambiguity.

The big question then is how to handle such ambiguities/inconsistencies of existence. We could take the "GOD" view. There is only one view of the truth, all questions must be asked of that view and there shall be  no other views.  That's fine, except it does take finite time to ask questions, so even if there is a "GOD" view there is and always will be inconsistency.

We can take a view that allows each observer to have its own "correct" view. But that doens't feel any better. We can never be sure when asking that observer if the same set of data has been included as in another observer's set. So for example, when I am presenting some conclusions, it might be quite reasonable for a questioner to ask if I had included the research that was published in December 2010. The point is that unless we are pretty explicit about what is in and what isn't, it is hard to determine what something actually means.

Ultimately we cop out and attempt to have a GOD view of our own data and then allow for other pockets of data which we "update" by taking extracts from the "GOD" view. That has sort of worked - we batch things together and know that the information was correct at midnight yesterday or other known time.

All changes in systems that work in individual events, where data are not batched, where a system is as up to the moment as possible because we don't know and cannot in today's systems know' what has actually been included and what hasn't.

The taxonomy of time that I presented earlier is really only useful as an analysis set. It helps us as analysts answer questions about the data as the data are purposed and repurposed in the (extended) enterprise. It gives a vocabulary to use when asking questions around a particular happening or duration.