Monday, November 30, 2009

Updates Harmful

I have written about this before, I suspect. So forgive me if this is another representation of that resource.

Hanging out with Nigel Green and John Schlesinger is dangerous, so be warned. There be sacred cows being slaughtered in this post.

It all started innocently enough a year or so when Nigel and I were discussing the role of databases (record keeping) vs the role of databases (transaction scheduling, message management, etc.). Faulty normalization came into the picture too. Then the harm done by data modeling (where we thought we could model the rules in data). Large scale data modeling efforts requiring significant investment and it becoming very hard to see where the return comes from. Then an aha. If we go back a bit in time, the worst office job used to be "filing clerk." An imaginary discussion when someone comes home from work, "Well dear, what did you do at work today?" "We opened a new file for Acme enterprises. That meant that our index cards were all messed up because we had run out of space for companies starting with A, so we had to rewrite some of those cards, but while we were there we saw that some cards pointed to files that no longer existed so we removed those - but only after we had added more space for the letter A companies (which we didn't need right now anyway.)" "That's nice dear, how about a nice cold beer?"

The point is that filing clerks used to be the lowliest members of the office - and yet in their electronic reincarnation they have acquired really expensive care and feeding. Of course the new clerks are the databases, the expensive care and feeding is manifested by having a group of thugs (DBAs) who hold everyone to ransom with their talk of normalization, SGA, proper keys,.. All things which we did pretty easily with clerks. So what's going wrong? What is normalization for?

Taking normalization first - it is simply for ensuring that we don't get update anomalies. That something that is to have the same value regardless of usage actually does have that value. You don't have to have a normalized database to ensure that the update anomalies aren't present. Although it is a bit easier.

What is going wrong, is in many ways a harder question. One fundamental thing going wrong is that we use the "filing cabinet" as a scratch pad. So returning to the physical world for a bit. Let's imagine a filing cabinet in which we store the active accounts (perhaps bank accounts). When someone wishes to open an account, we give them a whole bunch of forms to fill in, they go off and fill them in and hand them back to us. We transcribe those forms and do some checkup on the data contained. Once we are happy with the data, we can now give the stuff to the filing clerk and have the clerk create the new file folder. So where were the forms and the checking? In some kind of "blotter" or case management pile on the clerk's desk. They weren't in the active accounts cabinets. And nor should they be.

No we go to a computerized system. We enter the data from the completed forms into the system and "poof" they create an active account. But actually it is more insidious than that. We go through a series of screens putting in different bits of the account - each leading us to a more perfect account, but we aren't there yet. Eventually they will be in the active accounts database (but probably with an inactive flag) so that they can sometime be transacted. This is nuts. We are using the record keeping database (aka the filing cabinet) to manage work in process. This is not a proper separation of duties.

It gets worse. The company decides to "go online". Expensive consultants are hired, golf outings are scheduled, expensive dinners eaten and the "new account workflow" is eventually unveiled. It, too is a sequence of steps. However, the poor schmuck filling this in has to complete each page of the form before moving on. That means that s/he cannot stop for a break - store a scratchpad version of this, do it out of sequence because they can't remember their spouse's Social Security number or whatever. The people in charge of the design of the system understand that THE SYSTEM needs accurate record keeping, have heard that "it is ALWAYS better to validate the data at the point of capture" and other platitudes, but forget that at the end of the line there is the poor user. For these kinds of data entry systems, (and a whole host of housekeeping systems) we need to store the "process state" separately. Don't use the state of the key entity as a substitute for that. Store where I am in the account opening in the account opening process, not in the entity that represents the account.

So what got this diatribe really going? The notion that updates are unnatural - and probably harmful. I posit that the reason that we do updates is mostly because the common need for retrieval of something is the most recent version of it. So it makes sense to have access to the most recent version and update in place. But that isn't always the most expedient behavior. Certainly the most recent value is often the value you need - especially in an operational system. However more and more systems really need the ability to look back. Even something as simple (looking) as you medical record is not something you want to update. Patient History is key. We don't need to know the current cholesterol level (in isolation), we need its trend. So we don't just update the "cholesterol value" in the patient record. We add a new item for the cholesterol and keep the history. We keep the record sorted in time sequence so we can see the latest. We don't just overwrite the value. Our uses of data are so unpredictable, that simply updating database arbitrarily is going to give us data loss. We don't know in advance how serious that data loss might be. Perhaps it would be better to assume that we will need everything and come up with a scheme that at some backbone level ensures that the current view can be reconstructed by replaying the operational events.

Musings with John Schlesinger

John Schlesinger is an event thinker par excellence. So whenever I get the chance, I visit him in London to validate some thinking - or just to spend time with a terrific guy! So on a recent trip to London the subject turned to the rise of event thinking and the downplaying of the traditional SOA patterns. Of course the SOA traditions are being reborn to encompass the events brigade, but because SOA is so broadly and imprecisely defined that's perfectly OK. The SOA hype is over, long live the SOA hype. But that's perhaps a topic for another time.

The key observation from my lunch with John was one I had suspected, but was not able to frame properly. With a few well chosen sentences John had it for me.

This is all concerned with orchestration and control. So (deep breath), here goes. Where an event is raised and that event is to be processed by some subscriber, any intent to orchestrate the handling of the event by the subscriber results in a massive increase in complexity. (Roger Sessions will love this!). Naively one starts to think you have the "OK/Not OK pair" of possible responses. But then the "Not OK" responses blossom out of control. We have situations where the "Not OK" response must result in the retransmission of the event (and how does that happen?) and other cases where it must not. We have cases where the originator of the event has to interpret the behavior of the recipient. That sounds like some awfully nasty coupling to me. So instead of thinking that one has the "OK/Not OK" duality from the recipients view point actually what you have is the"OK/{set of lots of possible not OKs which the sender has to know about} multi-trality. In short that's just crappy design!

Thanks John

Zachman, Frameworks and EA

This post comes out of a quick, but deep, conversation with @cybersal after the first dinner of the architect irregulars twittergroup at Gopals on 20091125. Other members in attendance were: @richardveryard, @taotwit, @Rsessions, @mattdeacon, and @hstrover. As is often the case, when a bunch of EAs get together, the subject of Frameworks comes up. And whenever we discuss frameworks, the venerable Zachman framework is mentioned. Often with much facial contortion and questions like "How do you actually build it?" or "What are in the interesting bits between the rows?"

And then as @cybersal and I were hoofing it back to Charing Cross - avoiding the crowds where possible, the Framework (at least thinking about the titles of the rows) simply gives a context for discussion. You don't really need the columns. So, for example, when thinking about schemas that business services might use in communication, you are working at "row 3". This tells you as much about what you are NOT supposed to be doing as what you are supposed to be doing. It is a really nice shorthand when one is talking to another EA - since EAs have typically all read or heard John Z. So it isn't about using the Zachman Framework as a "Methodology" (whatever that means) but more of a classification system. If you like a set of membership rules.

Now just because you have a set of membership rules, that doesn't mean you have to have the formal club (and if you are Groucho Marx, "I don't care to be a member of any club that would have me as a member" - but I digress). So, no you don't have to instantiate all the rows of the framework and figure out the mappings between them. However you can say to someone, "Come out of Row 4 and think in Row 3." That is in itself a powerful and useful observation, but doesn't really move EA forward much.

Sunday, November 8, 2009

IT Profession? I think not

Recent tweets from @rsessions, @richardveryard, @j4ngis,@cybersal have been looking at how hard various professions are. @richardveryard's observation that "@j4ngis @oscarberg Rocket science isn't even particularly complicated. Goes up, comes down. It is rocket technology that is complicated." in a tweet this morning reminds of a conversation had on the golf course with a very good dr. Let's call him John.
John is, as I have said, a very good doctor. His speciality is anesthesia, but his passion is technology. He is always coming up with schemes to invent solutions to make dr.' s live easier. So much so, that he would probably prefer to do that than what he is trained to do.
So after a particularly inept (actually about normal for us, but inept by anyone else's standards) round of golf, we were trudging wearily back to the 19th. hole when John announces yet another good idea - linking wireless technology, handhelds, voice transcription, remote printing,.... His question to me was, "How hard can this be?".
My response was something along the following lines.
John you are breaking my heart. You are essentially saying that anyone without a modicum of training, experience, expertise, but just with the passion and the idea can bust into my field and take over. Have you no respect? Imagine the situation being reversed. Be me for a day, and I will be you. After all, how hard can it be to administer anesthesia to a patient. You figure out the necessary cocktail, inject it and out they go. I can imagine that there might be a few kinks along the way - like making sure that they wake up - but we can leave that to iteration 2. He was, of course horrified. He asked if I was trying to imply that my chosen line of work was as disciplined as his profession. And for the most part, it probably isn't. The key is I don't work in a profession by any normal definition.
So while we are not in a profession, then any enthusiastic amateur can build "cool stuff". Who cares about the error cases? Who cares about the edge conditions? It is all about the app after all. To take a phrase from the movie industry, "We can fix it in post."
Who cares that the patient lives? Who cares that the patient suffers a quality of life decline? In medicine when we have post - it usually means post mortem. There is no fixing it in post in medicine.

Wednesday, November 4, 2009

A rant on "SOA Projects"

The appearance of the SOA Manifesto has led me to look closely to the naming of projects and the implications of such names.
Time and time again I see projects titled or described with a technology or architecture in the name. How often do we hear, "The SAP project failed?" It isn't because the software doesn't work, there are a host of other potential reasons - all having to do with the human factors. Likewise with "SOA Projects."
The SOA community (huge generalization here) talks about "SOA Projects." Hogwash, I say. There are very few "SOA Projects." There are and should be many projects where the underlying approach is Service Oriented. There are very good reasons for deploying SOA in the enterprise/division or wherever appropriate. The deployment of SOA governance, technologies, etc. might be considered a SOA project, but creating a business application according to those tenets doesn't make that business application a "SOA Project."
Does this really matter? Isn't calling it a SOA project a convenient shorthand? Isn't calling it a SOA Project a convenient way of getting to the right funding bucket? Well, if that's the way the business operates, I, begrudgingly, guess so.
I think it is more insidious than that, however. By putting the technology or architecture top dead center in the name, it gives us an opportunity to make that the primary goal. Rather like hearing the requirement, "I need a database that..." Well databases are fine things, but a requirement that leads us to a "Database Project" again focuses on the wrong things.
So next time your SAP project failed, ask yourself the question, "Is it SAP that failed? or did the business not realize the anticipated benefits for other reasons?" It's easy to blame the technology.

Thursday, October 15, 2009

Technical Debt

There seems to be a theme developing in these posts. I start to see everyday situations and then relate them to everyday happenings - then attempt to do some analysis. Today's is no exception.

Ward Cunningham introduced us to the concept of technical debt in a 1992 experience report. Quoting from wikipedia, "Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite.... The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise. " An excellent follow on blog post from Steve McConnell (at ) expands on the point.

I am going to think a bit more prosaically! I was sitting at the computer minding my own business yesterday afternoon, when from behind me I heard a crash. A picture had fallen off the wall, spreading glass shards everywhere. On closer inspection, I saw that the method of attachment to the wall was inadequate. It was a simple nail, not a proper hanger. What's this to do with technical debt you may ask? Well, presumably when the picture was hung, the hanger (perhaps me) chose a quick and dirty solution - a nail. I could perhaps have used the proper fixture - but that may have entailed considerably more "development effort". I would have had to drive to Lowes or Home Depot to buy the fixture, install it properly... you get the point. Oh and of course there was a deadline. We had a party that night and had to have the picture hung.

That was about 8 years ago. Eventually the hanging approach failed and I now have to pay back the debt for my slapdash method of 8 years ago. The repair cost will far exceed the cost of "doing it right in the first place." However, I made the decision to do it the way I did because I didn't have a good handle on what the longer term implications would be and because I had a deadline. These are both key observations.

When we build systems we often don't actually know what the long term implications might be. We don't for example know what "long term" actually means. We often can't explain why doing it right is more cost effective "eventually". I, for example, wasn't prepared to say to madame, "I'll need to go to the store to get the right fixture, hold the party until I come back and put up the picture." Nor was I prepared to say, "let's not have that feature in the house until after the party."

The point here is that we often make conscious choices about the way we do things. These may be suboptimal in the long run - and they will come back to bite us. However we must suborn the future to the current to make sure things get done.

What is more clear to me is that we understand the operational implications of our choices. We need to build our solutions, "hard enough". We need explicit statements about the "-ilities", so we can apply the right amount of engineering. Let's make sure we get the failure stories as explicit as the success stories when putting our plans together - when deciding what goes into an iteration. Make sure we incur the technical debt for the right reasons.

Sunday, September 27, 2009

Watching Events

It seems that when thinking about events, we have a tendency to put some of the responsibilities in the wrong place. Of course every time we don't have a proper separation of responsibilities, we get extra complexity. So in this post I will look at some of the issues around the responsibilities and see where they should be allocated.

Short political rant that can safely be ignored now. Why is health care insurance (in the USA) handled essentially through employers and employment contracts? They simply don't belong to each other. The time base is wrong, the administration is wrong, the result is wrong.

End of rant!

I think a little naming context would be helpful here. This will be greatly simplified - just so that the essence is laid bare.

E is an event publisher
E has an output queue onto which itpublishes. This is called EQ
E publishes events of type V1, V2, V3 and V4. Individual events are named v1-1, (the first event of type V1), v1-2 (the second event of type V1), etc.
C1 is a consumer of events - it can consume events of type V1, V2
C2 is a Consumer of events - it can consume events of type V2, V3, V4
C3 is a consumer of events - it can consume all kinds of events.
Each consumer has an input queue - C1 has an input queue C1Q, C2 has C2Q, C3 has C3Q

The event network behaves as follows:

E publishes v1-1. The event network must transfer v1-1 to C1Q and C3Q. C1Q and C3Q. C1 and C3 are now able to process the events by reading their respective queues.

Thinking about the outcome possibilities (assuming E was successfuly able to publish) v1-1.

Both C1 and C3 receive the event v1-1
C1 receives the event v1-1, but C3 does not.
C1 does not receive the event v1-1, but C3 does
neither C1 nor C3 receive the event.

Question for the reader. Where should the responsibility lie in taking action for recognizing that one of these conditions obtains. The possibilities are:
(a) E
(b) C1
(c) C3
(d) none of the above

My answer is (d) - none of the above. It isn't any of these players' responsibility to do this. E's job was to publish the events. C1 and C3's jobs are to handle the events on behalf of their scope. But C1 and C3 are autonomous, so they can't be dealing with each others' failures.

So if it is (d), then there must be some other partcipant - yet to be identified that takes the responsibility. That something else is a proxy for the overall business policy. It needs to exist independently of everything else - and therefore needs to have the proper information fed to it.

Now imagine that C1 in some sense "fails" - it completes, but delivers an exception condition. If that condition is serious enough then something has to know. So it would be sensible C1 were to notify something of its own outcome.

Likewise C2 and C3.

So we have a kind of triplet of notifications (which expand into greater complexity for real sized problems).

E says, "I sent e1-1"
C1 says, "I got e1-1"
C1 says "I handled e1-1 with a normal result"
C3 says, "I got e1-1"
C3 says, "I couldn't do e1-1 - I failed because of a business problem."

So we could then implement a policy rule about what to do under these circumstances. That rule could of course inject new events into the "system." That however is a subject for another time.

Bottom line is that we have a three part system now for handling events. There is the standard pub/sub behavior (and don't get too hung up on the specific technologies). There is the abilty to send out content references using a mechanism that is not part of the event channel (quite RESTful really), and then there is the ability to act on non-receipt or improper handling of the event by one of the event subscribers.

A nice compact model that seperates concerns, so that the individual components can be focussed on their own responsibilities and not prying into each others' business.

A new twist on the Taj Chaat process...

For those who have seen the previous post on process improvement at my local Indian chaat
restaurant will probably be intrigued on yesterday's twist.

We were buying some appetizers to go. In this case 2 samosas and an order of golgappe. Normally the process (at least for dining) is to write the items on a form, be assigned a number and given the vibrating pager to let us know when ready. Collect food, eat, check out by handing over the pager so they could locate the items to be villed from an accordian file.

For take-out, the process is very different. Again I fill out the little 2-part form. But before handing it over to central oredering, I take both parts of th3e form to the cash register. I pay for the order. Both parts of the form are stamped paid. I take the form back to central ordering where I am given the a number and a vibrator. The order is prepared, vibrator goes off and now what? I go to the station to pick up the order, but since the collection point of the vibrators is the cash register how do I return the vibrator?

This is made harder because the worker at the cash register is a different person from the one I paid. So there is no session state anywhere. I have a vibrator that has vibrated and a cashier who expects to take cash. She can't find the form parts in the accordion file.....

So why is this important?
Again seeing how non-IT organizations think about the processes that run their own businesses makes us aware that we shouldn't be overengineering. There are exceptions - things that don't follow the normal path that we just have to work around. Of course each workaround decreases efficiency, but if they are sufficiently rare, then the overall efficiency is not decreased by having work arounds. However, if the workarounds do become cumbersome, expect them to be worked on without negatively affecting the frequent path activities.

Oh, and by the way yes the golgappe and samosas were worth it! The whole bill was $6.56!

Processes and events

Another day of talking to Nigel Green - thank you Skype! And some thinking around processes and their relationship with events. Again sounds innocent - but it seems as if both of us strongy event-oriented thinkers come to common ground when thinking about processes and orchestration - namely that while you might use low level messaging semantics for implementing processes, event modeling doesn't really help when trying to model processes. However and here the lightbulb began to flicker dimly, the result of executing of a process or process step can become a source of events.

We chose an example from the airline industry - and from our experiences of being travelers. Not from any great insights from the internals of the business. The focus was the check in process at the airport itself.

Clearly we see interesting policies at different places and for different carriers. For example at Mumbai (at least time I went through there), they seal your bag with some kind of security strap. So it can be seen whether the bag has been tampered with. That is less common in the USA. However, at Miami International Airport when I went through a month or so ago, I saw the ability to wrap the bag in a kind of cling wrap. I presume that can be done elsewhere too. That is all by way of background.

Airlines nowadays can and do charge fees for checking baggage. All of the rules require that checked bags undergo a security check. Bags are subject to weight limits. Passengers are subject to bag limits (no more than n per passenger). Ignoring further complexity like whether actually to collect the fees (elite passengers are exempt, for example) there are some quite interesting process decisions to be made.

If the airline chooses to impose the bag fee immediately that the passenger offers the bag for checkin, then there may be some undo logic if the passenger decides not to check the bag after all. It is worse though. Imagine that the bag is too heavy. When is that discovered? For example if the passenger has checked in (and been charged) at a kiosk, then on presentation of the bag it is discovered to be too heavy so an extra collection is made - that could make the passenger decide not to check it after all, so a refund is in order. Or perhaps the passenger may decide to open it and remove some of the heavier items, and get it to the correct weight. That's fine - but what if it has already been secured with tape or wrapped in a cling wrap. Kinda tricky to undo...

Then there is security. Another opportunity for the passenger to open the bag and remove things if they should not be in checked baggage. And so it goes.

Different airlines and different jurisdictions will implement the Policies - "maximize revenue for bags", "make sure the passengers' possessions are safe", and "transport passengers safely" with different process paths. Those paths need to be orchestrated. It isn't clear how an event network will really help that orchestration. In fact I would go so far as to say it complicates it. However at each step of the various process steps (or sub-processes), it would be very useful to spit out an event that provided useful (possibly actionable) information to trigger some other behaviors.

For example, if during the security screen a weapon were found, we would expect an event to be raised to trigger a whole raft of other processes. We would be jumping outside a process domain into another domain. That of airport behaviors to criminal behaviors. So looks like a terrific event.

Even the mundane events may be interesting to somebody. That a passenger decided not to check the bag after a fee was assessed can be helpful when looking at the behavior as a whole for market and planning purposes. Opportunities for process improvement abound.

The weight of the checked luggage is also useful for "weight and balance" on the aircraft. Necessary so proper takeoff parameters can be computed, proper fuel calculations can be performed, etc. So the event raised as a result of successful baggage check-in is quite a handy event to have.

So bottom line, it seems from this (and a whole raft of other possible examples) that we will typically see events being generated as a result of a process step happening - at least when in a process.

Of course there are lots of other ways of generating events, we don't need to formalize processes so that events can be generated. Relatively random behaviors give rise to events too.

Sunday, September 13, 2009

Event and Content

This conversation with Nigel started innocently enough. Two people who have very similar views talking about architecture in the large. We described our current problem spaces - they looked very similar, but as is often the case we used the same words to mean different things. Of course once we realized this we got back on track quickly. I'll write this from my perspective, but in reality it would be better written by a dispassionate observer.

I set out to describe the separtion that I am seeing in VPEC-T especially around P-E-C. In my definition, I was thinking of Content as being everything about the Event. I was merrily proceeding down this path, thinking that the view was shared until Nigel spoke. We realized simultaneously that the word covered several concepts. There is the "stuff" about the event itself - the event properties like when it happened, what kind of event it is, what channel it was communicated on - essentially a kind of tag cloud for the event. I suppose in the current vernacular this might be the event meta-data. There is also the "state" that exists as a result of something doing a piece of work and generating the event. That may well need to be made available somehow. So for example, in my sales system the "event" of a sale being made has something useful like when it happened, however there is a whole lot of other information like who the parties to the sale were, what the value of the sale is, who the sales team is, what was sold... - in other word the "business document" representating the state as perceived by the event emitter.

In the Chris world, the tags (event meta data) were part of content. In the Nigel world they aren't part of content per se. Actually that is unfair, they aren't ONLY content. In other words, the event meta data can easily travel with the event. But the state information typically won't.

Reversing that point of view for a second, we get the notion that there is definitely a content "store" somewhere that the processor of the event might go to get the contents, and an event store - somewhere the event data would be stored. The content store doesn't have to be explicitly created by the event creating component, it could equally be some external source (e.g. a weather forecast). The event store's job is to store the events (duh) so that they can be replayed for a variety of reasons:
  • An event processor has been out of the loop for a hour and wants to catch up "get me all the sales events for the last hour"
  • A simulation wants to be done using some "live" state. What would the impact be if we had decided to close this airport at 2pm yesterday? So inject a new event into the event store and then replay forward with the existing events after that point. Of course there will quickly be diversion from the reality as processed, but that is OK. We have been able to run the simulation with a known starting point

Considering the second bullet above, it is quite convenient to use the same semantics for processing "missed events" as for processing the events in real time. Kind of rule - never pass the content with the event, always pass a reference to content, but make sure that the event tag (meta-) data do come with the event. Make the event store readable so that the event sender isn't responsible for hanging on to the event, and having to remember who has received it and who hasn't

At the end of our time together on this call we were, as expected, in agreement that for a proper separation of concerns, we should treat content as being the state data and the event store as being the place where the events and associated metadat are stored to allow for retries and simulations (among other things).

The Policy part of the call will come next.....

Friday, July 10, 2009

Signal to Noise

This posting is sparked by a conversation between Nigel Green and me. We were having a spirited discussion about choices in following people in Twitter. But the concept generally applies in almost all event based architectures.

The fundamental principle is that it has to be somehow worth it to sort through the whole mess of communication to get what you need or want. Thinking in interface architecture terms, a point to point interface has a very high signal with very little noise. However, tailoring the communication channel is very expensive - at least for the originator of a message. Similarly a total mind dump of everything you are thinking about at a particular moment can have a very poor signal to noise ratio when viewed from the perspective of someone trying to decode the message.

I have used the example of the Christmas Letter before - the letter that contains all the family news and is sent to every acquaintance. For the originator of the message, it is a very efficient way of communicating stuff. No thinking about what each individual subscriber cares about - let the subscribers figure it out. If there is enough signal in the letter, then filtering through the noise is worth it. If not, it goes quickly to the recycling bin.

Likewise in social media. As a twitterer and as a blogger I don't know what individual followers are going to want. As a blogger, I don't even know who the followers are. And I don't want to. If people find that the signal is rich enough they will follow, if they don't they won't. A very efficient way of information delivery - and a very handy simplification technique in event driven thinking. The basic question is, "Is the signal to noise ratio high enough so that it is worth me continuing to listen on this channel, or are there are other channels with better ratios?" As a subscriber I can often make that choice.

Twitter is such a great example, because the space is flat. When I tweet, it could be on any topic that I find interesting and relevant. To me, the twitterer, it is all signal. To you the follower it is probably mostly noise. If there are occasional useful signals you will continue to follow, if not you won't. Attempting to put some kind of ontology over twitter defeats the purpose because the publisher has no clue which permutations of tags are relevant to any given listener. I regualraly unfollow twitterers when the signal to noise ratio degrades too much.

Sunday, June 21, 2009

Real Life Caching

The thought behind this post comes from the rich pageant of daily life with Madame. Yesterday morning we were discussing when she should visit her mother (a sprightly 96 year old who lives about 1,000 miles from us). So it was a matter of looking for decent airfares, timing the visit to coincide with my absence, minimizing the flight time (it is always requires a change in either Chicago or Detroit).. What you may wonder is relevant about this in architecture. It all has to deal with the location of the process, and the location of some of the data required to complete the process.

The computer was in her home office – up a flight of stairs from the main living areas of the house. Her purse was down in the kitchen. She had dutifully retrieved her credit card from her purse prior to starting the transaction (pre-cached the data), and carried it upstairs. Because of where I work, I almost always try to ensure that bookings are made on – keep all possible revenue in the family.

So the process goes fairly smoothly – Madame is using my Travelocity account to make the booking (easier than creating a new one at that moment). Of course this would be a security violation, so let's say, I was acting as the operator for this purpose! We get all the way through the process (well almost) when we realize that her frequent flier number is not in the account anywhere. Also that data has not been cached anywhere (not in her head, not by bringing the physical card to the point where the transaction is being executed.) So an IO request is issued – resulting in the slow IO processor (me), walking from her office, downstairs to her purse, finding it, and returning the whole purse. Why the whole purse you ask? One reason is privacy – it is hers and I don't access its contents without permission (PII rules). Second is that I don't know what other esoterica might be needed, so I want to save my weary legs – and not have to make another trip to the purse data store (which by the way is a heap storage model).

The IO request is completed. Madame parses the purse data structure until she finds the relevant card, she then types in the number and completes the transaction.

As always with these stories there are a couple of questions:

  • Would it have been more sensible for her to bring the whole purse data structure instead of just caching the credit card? In hindsight, yes. But how would she make that determination. The stairs are steep and it is a long way. The purse could be heavy (too much payload) and unwieldy.
  • Should I have brought the whole purse or just the needed card, or better still a lightweight interpretation of the number (she didn't need all of the information on the frequent flier card)?
  • Does it make sense to cache the whole content of her purse in any room where she might execute transactions? After all with wi-fi that could be anywhere, including outside the firewall (i.e. in the garden or yard)? What about security? The gardeners and the maids (would that we were so lucky!) would then have access to the cached data. That could be troublesome.
  • Is the network transport reliable? Could I have been distracted in the bringing of the data back upstairs – through answering the telephone, needing some refreshment prior to attacking the stairs, pure absent-mindedness or what?

Of course the point isn't to think about home processes much – we will do things that seem expedient without a whole lot of thought. But when designing systems, we have to think quite carefully about these kinds of things. By putting them in this sort of context, it can sometimes help to think things through.

And, oh yes, this is not really architecture – it is design. However, there are patterns that emerge, and they become important aspects of architecture.

Friday, June 5, 2009

Architecture on the Product Side

In an previous post, I wrote about the difference between the application of technology to the building of products, vs the application of technology in running the business. They are pretty different animals, if nothing else because of the way they are deployed.

So, I get to wonder what the head of architecture for the product side of one of these companies does. I can take one of the Architecture frameworks and apply it – but to what? Individual products (each with own P&L, development teams, practices, degrees of product maturity)? Across the whole product organization itself, so that we can begin to standardize/use same services across the product suites. When the products can be deployed in several ways (in house operated and sold as ASP), on-premise at the customer, on=premise hosted by one customer and used by another,…So as the products are deployed we have to think about the how the platform affinities work.

The customers want to customize the way the work is performed using the products/services that they have bought. So somehow we have to think of ripping the process logic out of fixed applications.

There are cross product dependencies as well. So sometimes we have to rip code out of one product and insert it into another.

Oh and the up time for these products must be in the 99.999 range – the products handle very time/life critical activities.

And finally a legacy that dates back a long way – pretty much bulletproof, but definitely has its own way of presenting itself to the world.

This Architect's world feels very different to me than the "traditional" EA world where the need to build for wide deployment and customization is less important.

Any thoughts anyone?


Thursday, May 21, 2009

Offbeat Integration Approaches

I am collecting humerous terms for different kinds of integration. I will seed the discussion with a few, but would be interested in having others added, with descriptions in the comments to this posts. It would be delightful to have a large glossary of terms.

I'll start the ball rolling with some oldies but goodies.

SneakerNet (n,vt) Delivering of (usually) files from one place to another on some form of removable media. The sneakers, of course refer to the need to get it there quickly. Often used to bypass security or other limits. E.g. email attachment size limitations, one system unable to connect to the same network as another system. File type limitations. Usage. I can't email this to you, have you got a thumb drive, I'll sneakernet it to you.

FunnelCast Delivering of information to a large number of people by shouting through a megaphone. The most common means of getting data from the page of a professor's notebook to the page of a student's notebook.

Swivel Chair I first heard this from my colleague Adrian Apthorp. The need to take data from one system and enter it into another can be solved by sitting on a swivel chair so you can switch between the screens with minimal effort.

Have at it and exercise your creativity.


Friday, May 8, 2009

A gedanken on identity

This post arises from conversations (over many years) with John Hall, the late Keith Robinson, Bob Brown, Keri Healy, Nigel Green, Richard Veryard and Fred Fickling. It has to do with how we identify things, immutability and most recently REST. So first the story(originally related to me by John Hall):

Jason has just commissioned a wonderful boat (the Argo) so he could have adventures in the Mediterranean. He commissioned a crew of likely volunteers and set sail in search of (among other things) the Golden Fleece. This was not a short voyage – in fact it lasted many years and they had many adventures – none of which are relevant to this gedanken. Like all boats (holes in the water in which you throw money), Argo needed repairing quite often. So every winter , Argo would be taken to a boatyard and refitted. Old parts were replaced, holes patched, etc.

After several years of this, there were no original parts left on the Argo at all. Everything had been replaced, even down to the smallest dowels. Question 1. Is this boat the "same" Argo as the one originally built? Certainly the crew, the local registrar of ships and taxation authorities would think so. After all, there was no need to reregister it (her, for lovers of boats). However, we know for sure it isn't the same, in fact there isn't a single original component on the boat. So, in an information system, what is the "identity" of the Argo? Does it depend on which information system (registrar/taxation/crew sign-up, repair management for example) we are thinking about?

What we didn't know, is that the wily repair shop had kept all the old parts and had been secretly reassembling them into a boat. When the final wraps came off this new boat, he announced that this was the real Argo, and that he knew the other to be a fake. Question 2. Which is the "real" Argo? Certainly this depends on who wants to know and why? The registrar of ships might well take a pragmatic point of view and decide that the repaired Argo was the "real one" and that the rebuilt one is "something else". Of course the fine piece of woodwork on which the boat's name was carefully inscribed says Argo in each case.

Question 3 (a-f and beyond). Which of our various lenses can/should we be looking at this through? The Checkland CATWOE approach becomes an important set here because the Weltanschauung (loosely translated to be the worldview) partially determines that. I suspect that the Weltanschauung is the overarching concept that allows us to choose our lenses. So what does VPEC-T say here? What does Cynefin say here? What does Systems Thinking have to say? What do Value Networks have to say?...

Question 4. Why bother? This gedanken on its own is a fun mental exercise, but there are some underlying issues that crop up time again in our real world systems. As we start to look at the world through RESTful lenses, we come head-on into representations and the state transitions of representations, and making a clear distinction between the thing and a representation of it. So we have the Argo – and then some systematic representations of Argo. When an event (a repair perhaps) happens to Argo, we may decide to change the state of the Argo representation. Or some system may decide to, while another may not. For example crew scheduling for Argo may care less about repair events than registration. So is there just one representation of Argo (with one URI) or are there many representations of Argo?

As we wrestle with the knotty problems of identification we are forced to combat our views of history – what has happened in the past to the things we care about. How we can or cannot change other people's views of the same thing (the event streams we care about when changes are made are different from other people's). We have to worry about placing ourselves at a point in time and asking questions like, "If I were asking the question last January what would your answer have been? (very useful in market research, courts of law, etc.)"

At the bottom of this is really the "who cares about what?" question and how if we attempt to create universal data models and databases of things we are doomed. Perhaps the best we can do is to keep track of the Event and Content streams as they apply to Policy, manage our own representations of things and broadcast our state changes against those representations pushing the responsibility onto the "subscribers" to decide whether they care.

Wednesday, May 6, 2009

Services, SOA and Web Services

This article
(Can SOA Give You Good Service
) identifies some of the troubles with
words when dealing with all the Service terms.

I found it a very helpful article indeed because it helps keep Service and SOA on the straight and narrow. Nice call out of the Web Service vs Service.

Interesting observation that "A service is a logical representation of a repeatable business activity that has a specified outcome (e.g. check customer credit; provide weather data; consolidate drilling reports). It is self-contained, may be composed of other services, and is a "black box" to its consumers."

Nowhere in this definition is there any requirement to get an answer back, except
maybe a status of "yup, I've done that" or "No couldn't do it, sorry". For sure
a result could come back. For sure some system state may be changed (perhaps because
of a side effect), but the key is that it is a black box. Where we do see some confusion is in granularity. I occasionally here the word operation being applied to Services.

So in your example "Check Customer Credit" above, is that really a Service invocation
or is an invocation of an operation on another Service (perhaps a Customer Management service)?

Just dealing with the granularity would help me in understanding the kinds of numbers
that are bandied about at conferences. "We have over 3000 services" vs "We have
40 services (and 3000 operations). Is that simply a packaging issue? Coming back
to the whole WS-something discussion, you make a telling point that "A Web service
is a software system designed to support interoperable machine-to-machine interaction
over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards".

First up this says machine-to-machine. And while the computer at which a user is sitting, operating the browser is most definitely a machine, I am not convinced that WS-something is the most appropriate way of managing that interaction. It may be, but the point is not proven. Secondly, it is not completely clear to me when services actually need to be exposed (assuming that we have defined service correctly). So how much WSDL wrapping do I do? And why? In some ways - especially for internal code, WSDL wrapped service invocation is just really expensive function or subroutine calling. So even though we can define what a service is, it is harder to define what a service isn't, and harder still to choose the right architectural approach to allow for machine/machine interoperability, and human/system interoperability.

Thursday, April 30, 2009

More Fun(?) with words

Sometimes the TwitterSphere is just too constraining to get a thought across fully. I naively posted a question about the definition of "Application" – to see what would come back. It turned into a delightfully healthy discussion with some twists and turns along the way. It isn't every day that Duns Scotus and Humpty Dumpty show up in the same post – at least not unless Richard Veryard (@richardveryard) is involved!

So a good question is "Why do you want to define application?" My answer is actually that I don't want to DEFINE it, I just want to know what people might mean when they bandy the term about. As Nigel Green (@taotwit) and I chatted this morning at length on this and other topics, the recurring theme was, "I'd like a quick way to parse a conversation." In other words, when I am talking to someone and trying to understand what it is they want (requirements anyone?) I would like to know their frame of reference so that we can communicate.

How often have we heard requirements that say things like, "I want a database that …." Actually, I suspect it isn't usually the database that the requestor wants, it is some way of manipulating the data with a purpose in mind. Maybe, even, an application (gasp).

So just as in a previous post where I was wondering about type/instance nomenclature, so here I am wondering about other opportunities for miscommunication.

Richard makes an interesting point, "@seabird20 If something persistently escapes or evades precision, then maybe it was the wrong concept in the first place." That may well be true, but we can't unbreak the egg. The words are out there, their meanings are many, we can't (and shouldn't) attempt to unify the vocabulary (even the Acad̩mie francaise has stopped trying to keep French completely pure Рle weekend anyone?)

So this is not a cry for definition and ontology for its own sake. It is a means to collect lots of definitions so we can understand each other better. Of course, the more we share context, the more "shorthand" we can use to express ourselves – because we have either tacitly or explicitly agreed to the vocabulary. It is in the "getting to know you" stages where shared context and trust are established. It is at those early stages where slight misunderstandings can blossom into full-fledged disagreements and a loss of opportunity to trust.

Sunday, April 19, 2009

Process Improvement and the Chaat Cafe

My favorite Indian restaurant is a 20 minute walk from my house. It has the most wonderful Indian snacks/appetizers and of course kulfi. It is also a thoroughly confusing place. I will try to describe the changes in process that I have seen over the last 18 months – since it opened.

First, some background. It is set up with three major "stations" where things are cooked. The flat top where the dosas, parathas and other flatbread dishes are prepared. The drink area (lassis, etc.) and the chaat area (samosa, golgappe, etc.)

Version 1.

You wander into the restaurant, decide what you are going to have and fill out a 2 part form in front of the station. Hand it to the cook. The cook makes the dish and when ready calls out over the speakers. You go and collect it from the counter. When time to pay, you take the white parts (tops) of the 2 part form to the register. They total up the bill, you pay and leave. Problem – this didn't scale well because we could never hear the loudspeaker announcements. Well we could hear that there was an announcement, but not what was being said.

First Process Improvement

Numbers were attached to each table. Your table number was required to be placed on the bottom of the form. Problem then was you had to select your table prior to placing an order. How do you keep it while ordering? Can someone else grab it? Much chaos ensued. Advantage was that the loudspeaker announcer just had to call the table number. You still handed your forms to the cooks, etc. There is still the possibility that unscrupulous diners would forget the white chits or (as I have to admit to having done) throwing one away by mistake when clearing away the dishes. So my guess is that auditing showed some yellow chits (what the cooks did) which didn't have matching pair white chits. And no way of running down the offenders until much too late. Everyone seems to pay cash there!

Second Process Improvement

Technology was introduced! Central order taking was introduced. So now, the restaurant has done away with the need to grab a table before creating an order. It has introduced a central ordering location. The forms, however are essentially the same and are not at the central location. The forms are still in front of the cooking stations. So you fill the form out at the cooking station and carry it to the central ordering facility. There (if you have not already been given one) you are given a vibrating pager. Presumably local in-restaurant network only. This is similar to the pagers often used in restaurants to tell you when your table is ready. When you place the order (on the 2 part form) you enter the number of the pager device. Now when each part of your order is ready, the pager device goes off. As you collect the dish, you need to stop by central ordering, so they can turn the thing off. If you fail to do that, it keeps vibrating and you have no indication when another part of your order is ready. Also, unknown to the first time user, the device will be used for computing the bill, so if you are looking for individual receipts, you need one vibrator each.

When you go to pay, you hand your vibrator to the checkout clerk. This automatically brings up the bill. I haven't seen the behind the scenes magic that does this yet. They also have an accordion file into which they put all the yellow copies into the slot numbered with the same number as the vibrating pager. You pay and leave…

So what you may ask?

This is an interesting exercise in changing processes and system state knowledge to adapt to conditions. In the first 2 iterations all the state was pushed to the diners (white copies) and that state information was used for "billing" and "payments". In the final case we see an audit database (the accordion file) and a shared key (vibrator number and slot number).

So by looking at what was causing trouble (what Policies and Values were not being dealt with correctly), the store restaurant owner instituted quite significant changes that make the process of ordering/getting the food/eating it/paying for it (the order to burp process) a great deal more streamlined. Very little automation needed, almost no effect on the primary Value of the place – "Somewhere that serves high quality, tasty, vegetarian chaat with a flavor of home."

It has been fun to watch this humble restaurant put so much thought and effort into helping itself run smoothly and maintain a great connection with its customers. Would that we in IT be as effective.

The trials of pedantry – aka type/instance confusion and other adventures in meaning

"I had Taco Bell for lunch today". I keep hearing that usage from co-workers, students, random people in passing. At some level it makes complete sense – I get it that the person speaking ate at food obtained at a Taco Bell "restaurant" (why anyone would do that is beyond the scope of this posting). However, compare and contrast that with, "I had salad for lunch today." The second describes the food (pretty generically), the first describes the place where the food came from.

Now imagine you don't know that Taco Bell is a food chain, and maybe you are not doing analysis in your primary language). What assumption would you make about the statement, "I had Taco Bell for lunch today?" I think the first reaction would be that "Taco Bell" would be classified as a kind of food, not a kind of place. And Because that is a first impression, it would set a context that could be hard to shake.

As an aside, while working in France a couple of years back, the nearby Pizza Joint was called "Speedy Rabbit". When a project team member suggested we have "Speedy rabbit" for a working dinner – and that I, being the project manager and therefore an impediment to progress should go and fetch it. Since in that part of France, rabbit is often served, and often really tasty, I had visions of having to chase this thing down in my underpowered rental car. Turns out it was a pizza joint and that it was a simple pickup. Not a rabbit in sight. Oh well.

As architects/analysts we often find words or phrases that are used in surprising ways (idiomatically one might say if one were being charitable). The insiders to the group get it, understand the meaning, know when the statement refers to Taco Bell the company HQ, the Taco Bell on the corner, the specific lunch (without knowing the actual details of the meal). So we have to be careful when we are faced with words like "office" or "transaction". We need to know the context that the speaker is coming from and to be able to decouple our understanding of the word from the understanding that the speaker means.

How often have you heard statements like "we have 7 transactions in our banking system"? My first reaction is, "wow those must be huge transactions" or this bank is going under soon. And then I realize that of course it is transaction types or kinds of transactions, not the instances. Once again context rules. If you ask the question, "How many transactions did you do today?" You may well get an answer of 100 million. If you ask the question, "How many transactions can be executed?" You might get the answer 7.

So, how do we know when we are getting a "type" answer? How do we know when we are getting an "instances" answer? How do we make sure handle the precision that we need, without pushing all the annoying explicit packet context onto the people we are striving to understand. We have to learn the context, apply the context properly, and hide the distinctions unless it becomes absolutely necessary. One way of doing this nicely is to have stash of instance models handy. So when talking about concepts we should talk about them extensionally when we mean to describe the instances, and intensionally when talking about the type.


Wednesday, April 15, 2009

EA and Finance

This post come out an initial simple twitter question, "Should EAs understand finance?" The conversation went all over the place. In 144 character bursts. It is hard to get the essence of what I was thinking in these short bursts.

Some of the slices:

How does EA justify its own costs? Interesting, important, but not where I had hoped to take things.
General discussion on values - again interesting but not what I was looking for.
Simplification - again, doesn't get to the meat.
IT should understand spend on everything (not just IT) - this is getting closer. It isn't necessarily about spend, though.

I guess what I should have asked is broader and better categorized than what I did ask.

When I say finance, my first thought was should the EA be able to understand (at a deep level) the balance sheet OF THE BUSINESS?

Second when it comes time to look at an annual report, should the EA be able to interpret the financials? Including the footnotes?

When it comes to investment decisions - should the EA bee in a position to understand the implications of different kinds of investments and tradeoffs? A new airplane hangar vs increase in parts inventory vs a new data center?

Note these are ALL: business decisions.

Now, since the spend on "IT" is so relatively large (1.3% to 5% of turnover or revenue) - depending on what's counted and how you count it, should the EA be an active participant in the allocations of this spend (budgetary vs actual)?

As technologies evolve - the shift from inhouse data centers to outsource (back to insource - outsource costing, in house operated), cloud, the economics become driving factors. How well should the EA understand these economics? How much should the EA be involved in these discussions? Is the EA involvement any different in these IT kinds of decisions than in other technology shifting decisions. For example changing the delivery fleet from gas (petrol) burning to biodiesel/electric/hybrid/CNG, LPG, LNG. All of these have huge effects on the business and its bottom line.

When a business has the opportunity to allocate "money" across its various operations, should the EAs be in that discussion? Normally it is an hierarchical, line of business, roll up approach with little input from EA except the CIO might ask, "What's your budget?" This is nonsense. The EA must have a budget, for sure, but to properly assist the business, it must be an active participant in budgeting decisions.

So as the EA has the opportunity to influence the business (used loosely) that s/he supports, where are the lines around finance?

Sunday, April 5, 2009

Innovation and incremental improvement

I was watching the Formula 1 race from Malaysia this morning. It was on live and early, so I had the TV to myself. For the past several years F1 has been dominated by McLaren and Ferrari. For the past few years the changes to the formula have been relatively small.

This year, however, the FIA have made some pretty dramatic changes – resulting in a major shake-up. The old factory Honda team is no more, but has become reborn as the Brawn F1 team (Ross Brawn, the man behind the rise of Michael Schumacher being the team principal). Toyota have also done well, with the red bull/toro rosso teams also having good outings.

So, what changed. I contend that this is a great example of the difference between steady, linear improvement as managed in lean/six sigma kinds of processes, and the need to be radically different as is the case when major innovation is necessary.

The FIA changed the rules dramatically. Back to slick tires, much reduced rear wing area, the Kinetic Energy Recovery System (KERS) which charges batteries under braking so the power can be deployed at times to suit the driver. Adding an extra 80 hp for about 6 seconds a lap. Engine revs reduced to 18,000, etc.

The teams that appeared to treat the rule changes as evolutionary are doing poorly/ The teams that recognized the radical nature of the changes – and with little to lose, are doing much better.

So, perhaps where innovation is concerned we should not try to put it into well defined, six sigma, DMAIC based processes. Let the creative juices flow, make changes in a non linear fashion until the platform has become relatively stable and then shift to six sigma type approaches.

Saturday, March 28, 2009

Entropy in Systems

The concept of entropy in thermodynamics is well known. This article covers the landscape well. In systems, especially systems that involve human and computer interactions, we have a similar notion.

Entropy, historically, has often been associated with the amount of order, disorder, and/or chaos in a thermodynamic system. The traditional definition of entropy is that it refers to changes in the status quo of the system and is a measure of "molecular disorder" and the amount of wasted energy in a dynamical energy transformation from one state or form to another.

So too in our information systems. The amount of effort that we have to undertake to move a system from one state to another (essentially to make a modification to it) is in 2 parts. The effort expended towards making the change useful, and all the rest of the effort.

This applies both to the change to the system itself and any changes that we make as a result of executing the system. So, for example, if we consider some system fragment, "Visit the doctor because of pain in the shoulder", then any time/effort spent in doing something other than getting the diagnosis/treatment is wasted and increases the system entropy. This time/effort may include (but is not limited to)

  • Getting to the Dr.
  • Filling in new patient paperwork
  • Having payer status checked
  • Explaining symptoms to receptionist
  • Waiting in waiting room
  • Waiting in treatment room
  • Waiting for X-Ray results
  • Driving to radiology lab
  • Filling in radiology lab paperwork
  • Waiting at radiology lab

The point of this is that this "system" is unbelievably wasteful of a precious resource (at least precious to me), my time. So from my perspective all of the above steps indicate great inefficiency. Perhaps because of complexity, perhaps because of a lack of cohesive thinking.

Considering another example, perhaps closer to work for many – the HR portal. That is often the one information system that has been designed to almost entirely ignore the majority of its users. There is often a huge learning curve for the majority of the employees. Of course the users who specify the system, the HR department have it well designed for their own convenience – and what they believe is the convenience of the employees. I leave you to draw your own conclusions!

So at one level, we have the idea of the system in use with every use increasing the entropy of the system.

Now think about attempting to make a change to a system. A whole new dynamic sets in. The need to understand the system in place so it can be changed. This can involve very detailed analysis – deep understanding. The more pieces there are – and the more interconnections, the more understanding there has to be. So depending on the design of the system in place there can be a greater or lesser effect on its entropy. If the system is very involved/convoluted then the understanding as a percentage of the useful work done will be high. If the system is relatively straightforward then the understanding as a percentage of the useful work done will be less.

So systems entropy might be thought of as ( i=0 n∑(Wi - Vi)) where W is the work performed at any state change I and V is the Valuable work performed at any state change i.

Defining and normalizing what we mean by work – and considering some normalized work value equation is, of course complex. For example in the getting shoulder diagnosed and treated, the system implicitly values the Dr.'s time as being more valuable than mine – so the system is optimized to make sure that the Dr.'s change entropy is least. What should be happening is that the total change in entropy should be minimized.

Typically systems that are overly complex, overly bureaucratic or optimized to support a minimal number of stakeholders will exhibit the greatest increases in entropy under a given state change.

As architects we have a responsibility to be looking out across the landscape of a system as a whole and finding ways of minimizing the increases in entropy across common state changes.

Wednesday, March 4, 2009

Prototyping in rails

Now this is a truly bizarre thought. Many business apps and the underlying databases are a bit more complicated than the standard web app. Occasionally it is even necessary to populate the model that you plan to implement and let the users test drive the model through the UI.

This approach used to be called rapid-prototyping, but has rather fallen out of favor.

However, as the browser is becoming (has become) the dominant UI now, there is perhaps an opportunity to do a rapid prototype of some system functionality so the users can be assured that you have the relationships right and that you can quickly demonstrate some scenarios.

This tends to be a royal pain, and then it hit me. Rails is such a quick to develop framework that getting some gnarly prototypes out quickly is very little problem. Of course we don't want to be too convincing because the users might think we have done the hard bits. But really what we will have done is:

Create a simple database model with the interesting relationships (keys/fks, etc.)
Populated that model with some made up data - careful cases
Allowed the users to manipulate/navigate the data through a familiar approach. For example in an auction site, you may have many bids. So a 1:m relationship there. We could show an auction with its many bids easily, then click on a bid to go deeper, ensuring we have a link back to the auction.

using RESTful principles and RAILS quite a lot of this is generated with the scaffold. That's a lot of power.

Evaluating enterprise software

Two things have happened at pretty much the same time. The first is an evaluation of some enterprise class software in conjunction with and on behalf of a client. The second, the appearance of this article in my email - from Alex Rosen.

While the problem to be solved for the client is not BI, there are a lot of behaviors in common between the paper and the experience with the vendor at this client.

We had gone through many of the due diligence stages, spoken to various vendors, created shortlists, evaluated Proofs of Concept until we were down to a vendor that we wished to trial more deeply.

There were some key things we wanted to find out, so we devised tests, questionnaires, scheduled calls, read as much documentation as the vendor would let us see, but we really felt that the vendor was being very stingy with information.

Often direct questions would be answered with, "It doesn't work that way" or "no we can't do that". In the vendor's mind this might have meant, "This is not in scope for the pilot." That isn't how it came across.

It took a huge amount of probing before the vendor finally started to be more forthcoming with information - at which point things went a lot more smoothly.

So, since this is a pattern oft repeated, why do many vendors behave this way?

I can't answer for this specific case, but some ideas include:
  • The process is still sales controlled and the sales team wants to control everything.
  • The process is still sales controlled, so it is important to marginalize the people who can't say yes (but who can say no).
  • The process is still sales controlled and the teams aren't used to the more implementation oriented lines of questioning
  • There are opportunities for consulting dollars, so by keeping the prospect in the dark there is more revenue.
  • The vendor believes that the solution takes some getting used to, so is rationing the information to prevent the customer from being overwhelmed.
  • The vendor doesn't have adequate documentation and doesn't wish to expose that fact just yet.
  • The vendor wants to see the possibilities of hard questions coming so that they can prepare for answers.
There may be other reasons that I am not aware of - and the ones I mention are pure speculation. On the current project, I can't say that any of these reasons are correct.

The key takeaway for me is to learn as much as possible about the solution before pilots. Get any FAQ documentation, get the configuration documentation so you can what's possible from the configuration. Ask the hard questions, don't take the answer, "You'll see it in the Pilot."

Monday, March 2, 2009

You are judged by the company you keep

In the past week or so I have been "followed" by several quite unsavory characters offering services in which I have not the least interest. However, if someone were to plot my social graph, they might see these characters "following me", and draw a conclusion that I might have an interest.

Now if that someone were a prospective employer, or someone trying to establish that I was an unfit parent, I can imagine that my Twitter social graph could be of interest - and possibly grounds for non-hiring or withdrawal of parental rights.

Is that right? No I don't think it is - since I didn't do anything except exist - its the online equivalent of someone putting a flyer under my windshield wipers while parked at the airport. The trouble is that I at least know the flyer is there, the person who put it there (kinda) knows that s/he put it there, but no one else does unless it is a deliberate campaign of mis-information (putting kiddie porn magazines under an ex's wipers and then alerting authorities anonymously).

So, bottom line the associations you have are knowable to anyone with a mind to discover them, and thus open to all manner of misuse - of which blackmail is one of the less benign.

Sunday, March 1, 2009

Agile methods and Design-Build

I was intrigued by a term in my Sunday paper this morning. There is a project underway to extend the light rail line from Dallas to the DFW airport. It passes within a few miles of my house, so I am quite interested in seeing how it is going to look. The Mayor of Irving (Herber gears) wrote an opinion piece in the paper. In it he stressed the Design-Build approach to the project.

Design-Build sounds to me (at least on the face of it) an awful lot like Agile Development for software solutions. Perhaps it is the concrete version. Off to Wikipedia - (among other places) where I found these gems...

"Design-build focuses on combining the design, permit, and construction schedules in order to streamline the traditional design-bid-build environment. This does not shorten the time it takes to complete the individual tasks of creating construction documents (working drawings and specifications), acquiring building and other permits, or actually constructing the building. Instead, a design-build firm will strive to bring together design and construction professionals in a collaborative environment to complete these tasks at the same time."

"Potential problems of the design-build process include:
Premature cost estimating,
A short-cut design process,
Decreased accountability by the service provider, and
Correction of work. "

"It is important to note that the design-build method, while not focused on saving the owner construction costs, nonetheless often saves the owner money on the overall project. The combined effects of carrying a construction loan (which typically carries a higher interest rate than permanent financing) and an earlier useful on-line date usually yields considerable overall profitability to the project and may make seemingly unfeasible projects into genuine opportunities.
The compression is an important aspect of the implementation of this system. Other attributes include:
Enhanced communication between the service provider and the client,
Increased accountability by the service provider,
Single source project delivery, and
A value based project feedback system"

Sound familiar? - It sure does to me.

This is in contrast (as the rather opinionated Wikipedia artice states) to

"For nearly the entire twentieth century, the concept of Design-Build was classified as a non-traditional construction method in the United States, which is the last country to still embrace the old standard of Design-Bid-Build"

So if we think this through, we see the following important ideas:
  • The documents that are produced during the project are for the benefit of moving the project along, not for preparing bid-packages. That alone would appear to have tremendous potential to save time and money
  • There had better be some kind of a major plan in place first (City Plan anyone?) to ensure that the design-build doesn't go off the rails.
  • There had better be standards/codes in place so that we don't end up with shanty towns - the guage of the railway lines should be the same throughout, the standards for connection to standard services should be standard (electricity voltage, connectors, phases, frequency)
  • The approach is "owner driven" not architect or contractor driven
  • There is opportunity to adjust for changing requirements - new materials/material standards, unexpected terrain or landscape needs, changes in aesthetics,...

Of course Design-Build as an approach is not an excuse for no requirements - just as Agile Software Development does not mean, "We will come into the project with a bunch of good ideas and figure out the real requirements as we go along."

Thursday, February 26, 2009

The Fat Controller

In the wonderful "Railway Series" of children's books by the Rev. W.V. Awdry there is an officious character named "The Fat Controller". He is in charge of the railway and is often involved in activities that perhaps would have been better delegated. So, moving on many years and I have embraced the MVC architectural pattern (from my early Smalltalk times) and am actively involved in building a web application that uses Rails.

My own history is that I tend to work outwards from the model - after all the domain is where the interesting rules are enforced, and I can figure out how to implement those without stretching my lousy artistic design sense. After all who needs more than a command line to test a domain model?

Taking this inside out approach certainly isn't typical in many of the Rails books that I have been reading of late. I see crazy statements from "experts" like "the only things that are models are wrappers for ActiveRecords" If it doesn't inherit from ActiveRecord it should not be in the Models directory. So where do these soi-disant experts recommend putting such logic? Ahhh, in the Controller of course. How they reconcile that with DRY (Don't Repeat Yourself) principles defeats me. Oh, and what if we want several different ways of receiving and managing the same data (maybe a b2b feed, an email message, a web form, a tweet, a txt message, an iphone app...), where should that logic reside.

The cognoscenti imply that there should be a single controller for this, so we now have a soup of concerns not a separation of concerns. Putting all the above into a single controller results in creating a Fat Controller of whom the Rev. W.V. Awdry would be proud.

Just say no to fat controllers

Controllers don't have to be activated by views.
Models don't just wrap active records
We don't want fat controllers gumming up the works with their officiousness.

Sir Topham Hatt, please don't teach my controllers to become overweight!

Monday, February 9, 2009

Hijacking of Architecture

In the beginning there was Zachman, three columns and the "Framework for Information Systems Architecture." This was originally published in 1987 in an IBM Systems Review, I believe.

There was much wailing and gnashing of teeth as it was realized that the three columns weren't enough. It needed to embody the 6 questions, "Who, What, Where, When, Why, How" at the 6 levels, so we have a 36 cell matrix. This matrix described what the "Information Systems Architecture" was all about - but didn't offer any prescriptions. How do you create a Business Logical "What?" (data) model? And how is that different from an Application Logical "What" (data) model? And while you are about it, how do you get traceability from one to the other?

Meanwhile the technologists got hold of the word architecture and immediately subverted it into the technology layers - essentially taking on the dimensions of design. So we have architects instead of designers, so those of us at the overall business and IT dimensions were left without a word to categorize ourselves. Job titles such as Java Architect start to cop up - and we immediately get pollution of the name space - or if you prefer overloading of the architect term. So when I would rock up and say to the business, "I am an architect and I am here to help you", I would get about the same reception as someone from the IRS - not very popular. The sorts of things I would hear are, "We aren't ready for programming yet." or "Has osmeone designed how this system is going to be put together?". In other words architecture became tactical, project based and confined to the technology realms.

So we look to other terms/phrases and 2 jump out. One is "City Planning" and the other is "Enterprise Architecture". Well it is hard to sell city planning to a company that makes shoes - not the sexiest term there is. And Enterprise Architecture? Well, that's another term that has been taken over by IT. Even frameworks like TOGAF (The Open Group Architecture Framework) have been more heavily focussed on the technology realm - but that does appear to be changing with version 9 (released Feb, 2009).

Enterprise Architecture Forums on Linkedin and Google seem also to be focussed on keeping track of physical artifacts and dealing (again) with the technology realm.

So it appears to me that the arcchitects are out of luck again. The technologists have coopted architecture at all levels.

So we perhaps should not be using the term architecture at all when trying to have sensible dialog with our business colleagues. We have done such a good job (as an industry) of confusing the term that it is time for a new one. No sooner have we coined it than it will become another victim of grandiloquence. Maybe we should use a term universally despised by the technology community (Analyst anyone?) because then it won't be coopted.

My friend Nigel Green talks about some of these issues in his blog What those of us who straddle the Business/IT divide have to do is facilitate the communication across that divide. Using the language of business - using a framework like VPEC-T

Wednesday, January 28, 2009

Architecture and web companies

I have been doing consulting work for a couple of companies whose products are entirely informational. Essentially companies that provide information services over the web. I have been struck at both of them about the mixing of the technology that delivers the "product" (often really a service, but it helps my head to think in terms of a product!) and the technology that runs the business.

An example from the genuine product world will help illustrate what I am thinking about. When making and selling hamburgers, there is a clear separation of what is delivered and how it is accounted for, tracked - essentially how the back office runs. The selling of hamburgers is a sufficiently different proposition than the creation of the back office business systems. I wouldn't attempt to combine the 2. Yes it is important that the sales information flows... (see earlier post on flow of goods, flow of money, flow of information). However, I don't have my fryer installation crew and my cooks building the systems.

Where the product is informational, companies often think that the product and the back office rely on IT, so they must be the same. So we have the people who think in terms of product, features, etc. responsible for the potentially more mundane chores of installing and managing the back end systems - giving the internal business the data it needs to run and manage the business, and the sales/support and other staff the tools they need to do the job.

In reality these are entirely different groups - and should be. Yes they might share common technology needs/data/platforms (although there is little guarantee even of that). Yes they may share communications infrastructure and communication methods/platforms. But in reality the activities required to deliver a world class product and the activities to provide robust "run/manage the business" systems are as different as flipping burgers and accounting for the flipped burgers. Mixing the teams (and thus not getting a proper separation of "IT" responsibilities) leads to some very brittle systems - often because the value of the "run/manage the business" applications is almost always subjugated to the "develop and operate the product" systems.

Tuesday, January 27, 2009

327x and web scalability

These are strange bedfellows at first blush. But the more I think about them the more parallels I see.

Every book I read (and every web solution I build) talks at length about statelessness - especially session statelessness. Obviously data state is important, I really would like my bank account to know its balance and not derive it by applying all the transactions since the day I opened it every time I want to know my balance.

But I digress.

When I was a neophyte developer, the IBM 3270 family of "green screens" were just becoming mainstream. I had the enviable task of writing a series of macros in PL/I to emulate the behavior of the assembler macros for "basic mapping support." Fun project...

Anyhow in doing said project, I learned more about the 3270 than any human should have to. The key lesson of the device was that the hardware was directly addressable, had the concept of the "field" and would send back only fields that had the "Modified Data Tag" bit turned on. That meant that if a field were modified by a user, that field would be sent, but unchanged data wasn't. If nothing else that cut down the amount of data transmitted compared with approaches that refreshed the whole screen.

One much exploited approach was that the serving application could send the data out with the modified data tag already turned on. This of course meant that the device would send that data back regardless of whether the user actually modified the data or not. Immediately there was an opportunity to manage session state. Just send the stuff you needed next time back in a field with the modified data tag on. That way you have enough context for the next invocation.

The next leap was the ability to use "invisible" fields. Fields that were mapped to the screen but marked invisible (so you couldn't see their contents). Handy for passwords etc. However, if you set a field to invisible + modified data tag on, you could send suitable session data back, but the user at the screen didn't have to deal with it. You got the best of both worlds. Context information sent with the request and visible impact to the user.

Does this all sound familiar? f course nowadays it comes in in the header instead of the data, but it is the same general idea. If an architectural approach demands context data with every call, have the server send it back as part of the resource, so it automatically comes in on the transmission.

Plus ca change, plus c'est la meme chose!

Monday, January 19, 2009

Data ambiguity Part 1

I was having breakfast on Saturday with an old friend. He pointed me to this article by Werner Vogels ( CTO).

This posting by Mr. Vogels is very insightful - along the data replica dimension of distributed data management. This is clearly an important dimension, but it isn't the only one. There is a more general problem of data ambiguity. This isn't necessarily just a database problem, but an overall systems problem.

The basic thought is that when you have two representations of a piece of data that are supposed to have the same value, when do they actually have the same value (and who cares?).

We can imagine the following cases.

  1. The 2 values must always have identical values (to an outside observer). Inside the "transaction" that sets the state of a datum, the values can be mutually inconsistent, but that inconsistency is not manifested to an observer outside of the transaction.

  2. The 2 values will need to be "eventually consistent" - this case is admirably covered by Mr. Vogels.

  3. The 2 values will rarely have identical values, but there are mechanisms for "explaining" the discrepancies.

The first case is almost a default case - yes we would like that please. The second case is a good perspective from a data replication perspective - essentially dealing with a common schema. The third case is the tricky case.

The first case is unattainable at large scale using ACID transactions for replicas of data at Internet scale is simply impractical for performance.

The third case is interesting because of situations where "transactions" can occur against either copy of the data independently and in arbitrary sequences. The communication mechanism between the systems that can update copies of the data may be reliable, or they may be intermittent. That isn't completely the issue.

So, to illustrate this kind of system, let's take a popular application - Quicken. Many people use Quicken to manage their household accounts. The idea is to be able to use Quicken as a kind of front end to bank accounts - but it is only intermittently connected.

At any moment, the balance that Quicken reports and the balance that the bank reports are very likely to be different values. Of course from a data management perspective they are actually different fields, however that subtlety will be lost on the majority of users. Why will the 2 have different values for the balance field? There are lots of reasons, e.g.

  • Transactions have arrived at the bank without being notified to Quicken yet. For example, in an interest bearing account, the interest payment will be automatically added to the balance on the bank's view of the account. Or possibly a paid in check has bounced - the bank will have debited the check amount and (possibly) added a penalty.
  • Transactions are processed in a different sequence in general. When a user writes the checks, there is no guarantee that they will be processed by the bank in the order in which they were written (in fact, the policy varies, e.g. process the biggest checks first if there are many to be processed because that maximizes overdraft charges in the event that an account goes overdrawn).

These reasons boil down to the need to have system autonomy for throughput (imagine having to wait at the bank to process check 101 until check 100 had been processed).

Of course it doesn't matter to us that the systems are rarely fully synchronized, that the "balance" doesn't agree across them - we have accounting methods to help us reconcile. In other words we can accept that everything is OK without caring whether the systems have the same value of the balance.

Friday, January 16, 2009

Which communication method and when?

While this posting doesn't just deal with Enterprise Architecture it does begin to explore how we choose the tool (from quite a wide array) that we might choose for a communication at any given moment.

Just thinking of my own case, I have an unseemly number of communication mechanisms/paths.

For work - 3 email accounts (my employer and 2 clients)
For my business 1 email account
For other purposes 5 email accounts

2 Instant Messenger accounts
2 Groove Ids (for secure file sharing and messaging)
2 blogs (cooking and this one)

Telephone/voicemail (4 numbers) - 1 mobile 1 home, 2 clients
Several RSS feeds on news, technology, etc.
Text messaging
Corporate sharepoint
Client wiki

These obviously aren't all 2-way, but having 25 major channels - and following several news sources, a few Twitter folks that I follow(about 50) it is clear that I have oo much time on my hands!

So why do it? It really comes down to personae and convenience. Taking just the corporate emails - each company (including my employer) has its own email infrastructure. Each client uses its own email addressing scheme to send stuff around. I can't get from one client's system into another's (and nor should I be able to).

If I am doing frivolous things, I tend to use my hotmail account. If I am doing semi-serious, but still relatively public things, I use my gmail account. For my own business and when I know the person at the other end, I tpically use my own business email.

Twitter is a great source of interesting updates. Admittedly of the 500 or so Tweets/day that I receive, about 50 are interesting to me and about 30 really interesting. So my filters are not as good as they could be.

I use the phone, but not a lot. Most of my communication is asynchronous. I text a lot, contribute to my own blogs, read a bunch of news sources. The only things I don't seem to do are listen to/download music or video.

So why is this important from a business perspective? Because we each make our own choices about which media to use. The enterprise needs to enable many different channels for the various purposes.

Is Twitter a corporate tool? Absolutely - especially for corporate travel departments. It's the easiest way to get information out quickly.

Is Facebook a corporate tool? Absolutely - keping track of alumni, enabling corporate communities (extending the ecosystem).

Are blogs corporate tools? Absolutely - a great way for the corporation to provide an authentic experience to the community.

Is Groove a corporate tool? Absolutely - secure internal and external file and message sharing.

Is IM a corporate tool? Absolutely - again enabling community.

Is email a corporate tool - sadly yes. But as we have observed many times, it is very heavyweight. Sometimes the only way to get information in and out of corporations.

Phone/Voicemail? Absolutely.

I would argue that every form of communication that I use has its place in my daily corporate life. Even hotmail and gmail have helped when the corporate network is down and I have to get a response out.

Enterprise are really going to have to rethink communication - recognizing that critical information is going to leak over many channels. Draconian security groups will simply be bypassed since information will continue to flow.

Then we have the symmetry/asymmetry question. How much of what I do is simply reading other people's stuff (following them personally, subscribing to their publications or what?) vs engaging in dialog.

When dialog of some kind is needed, which of the many tools I have at my disposal do I use? My rule of thumb is whatever the person I am communicating with last used when talking to me. Of course it depends on whether it is a single short thought (twitter), a complex large file (Groove/email/SharePoint) or something in between....