Monday, November 30, 2009

Updates Harmful

I have written about this before, I suspect. So forgive me if this is another representation of that resource.

Hanging out with Nigel Green and John Schlesinger is dangerous, so be warned. There be sacred cows being slaughtered in this post.

It all started innocently enough a year or so when Nigel and I were discussing the role of databases (record keeping) vs the role of databases (transaction scheduling, message management, etc.). Faulty normalization came into the picture too. Then the harm done by data modeling (where we thought we could model the rules in data). Large scale data modeling efforts requiring significant investment and it becoming very hard to see where the return comes from. Then an aha. If we go back a bit in time, the worst office job used to be "filing clerk." An imaginary discussion when someone comes home from work, "Well dear, what did you do at work today?" "We opened a new file for Acme enterprises. That meant that our index cards were all messed up because we had run out of space for companies starting with A, so we had to rewrite some of those cards, but while we were there we saw that some cards pointed to files that no longer existed so we removed those - but only after we had added more space for the letter A companies (which we didn't need right now anyway.)" "That's nice dear, how about a nice cold beer?"

The point is that filing clerks used to be the lowliest members of the office - and yet in their electronic reincarnation they have acquired really expensive care and feeding. Of course the new clerks are the databases, the expensive care and feeding is manifested by having a group of thugs (DBAs) who hold everyone to ransom with their talk of normalization, SGA, proper keys,.. All things which we did pretty easily with clerks. So what's going wrong? What is normalization for?

Taking normalization first - it is simply for ensuring that we don't get update anomalies. That something that is to have the same value regardless of usage actually does have that value. You don't have to have a normalized database to ensure that the update anomalies aren't present. Although it is a bit easier.

What is going wrong, is in many ways a harder question. One fundamental thing going wrong is that we use the "filing cabinet" as a scratch pad. So returning to the physical world for a bit. Let's imagine a filing cabinet in which we store the active accounts (perhaps bank accounts). When someone wishes to open an account, we give them a whole bunch of forms to fill in, they go off and fill them in and hand them back to us. We transcribe those forms and do some checkup on the data contained. Once we are happy with the data, we can now give the stuff to the filing clerk and have the clerk create the new file folder. So where were the forms and the checking? In some kind of "blotter" or case management pile on the clerk's desk. They weren't in the active accounts cabinets. And nor should they be.


No we go to a computerized system. We enter the data from the completed forms into the system and "poof" they create an active account. But actually it is more insidious than that. We go through a series of screens putting in different bits of the account - each leading us to a more perfect account, but we aren't there yet. Eventually they will be in the active accounts database (but probably with an inactive flag) so that they can sometime be transacted. This is nuts. We are using the record keeping database (aka the filing cabinet) to manage work in process. This is not a proper separation of duties.

It gets worse. The company decides to "go online". Expensive consultants are hired, golf outings are scheduled, expensive dinners eaten and the "new account workflow" is eventually unveiled. It, too is a sequence of steps. However, the poor schmuck filling this in has to complete each page of the form before moving on. That means that s/he cannot stop for a break - store a scratchpad version of this, do it out of sequence because they can't remember their spouse's Social Security number or whatever. The people in charge of the design of the system understand that THE SYSTEM needs accurate record keeping, have heard that "it is ALWAYS better to validate the data at the point of capture" and other platitudes, but forget that at the end of the line there is the poor user. For these kinds of data entry systems, (and a whole host of housekeeping systems) we need to store the "process state" separately. Don't use the state of the key entity as a substitute for that. Store where I am in the account opening in the account opening process, not in the entity that represents the account.

So what got this diatribe really going? The notion that updates are unnatural - and probably harmful. I posit that the reason that we do updates is mostly because the common need for retrieval of something is the most recent version of it. So it makes sense to have access to the most recent version and update in place. But that isn't always the most expedient behavior. Certainly the most recent value is often the value you need - especially in an operational system. However more and more systems really need the ability to look back. Even something as simple (looking) as you medical record is not something you want to update. Patient History is key. We don't need to know the current cholesterol level (in isolation), we need its trend. So we don't just update the "cholesterol value" in the patient record. We add a new item for the cholesterol and keep the history. We keep the record sorted in time sequence so we can see the latest. We don't just overwrite the value. Our uses of data are so unpredictable, that simply updating database arbitrarily is going to give us data loss. We don't know in advance how serious that data loss might be. Perhaps it would be better to assume that we will need everything and come up with a scheme that at some backbone level ensures that the current view can be reconstructed by replaying the operational events.

Musings with John Schlesinger

John Schlesinger is an event thinker par excellence. So whenever I get the chance, I visit him in London to validate some thinking - or just to spend time with a terrific guy! So on a recent trip to London the subject turned to the rise of event thinking and the downplaying of the traditional SOA patterns. Of course the SOA traditions are being reborn to encompass the events brigade, but because SOA is so broadly and imprecisely defined that's perfectly OK. The SOA hype is over, long live the SOA hype. But that's perhaps a topic for another time.

The key observation from my lunch with John was one I had suspected, but was not able to frame properly. With a few well chosen sentences John had it for me.

This is all concerned with orchestration and control. So (deep breath), here goes. Where an event is raised and that event is to be processed by some subscriber, any intent to orchestrate the handling of the event by the subscriber results in a massive increase in complexity. (Roger Sessions will love this!). Naively one starts to think you have the "OK/Not OK pair" of possible responses. But then the "Not OK" responses blossom out of control. We have situations where the "Not OK" response must result in the retransmission of the event (and how does that happen?) and other cases where it must not. We have cases where the originator of the event has to interpret the behavior of the recipient. That sounds like some awfully nasty coupling to me. So instead of thinking that one has the "OK/Not OK" duality from the recipients view point actually what you have is the"OK/{set of lots of possible not OKs which the sender has to know about} multi-trality. In short that's just crappy design!

Thanks John

Zachman, Frameworks and EA

This post comes out of a quick, but deep, conversation with @cybersal after the first dinner of the architect irregulars twittergroup at Gopals on 20091125. Other members in attendance were: @richardveryard, @taotwit, @Rsessions, @mattdeacon, and @hstrover. As is often the case, when a bunch of EAs get together, the subject of Frameworks comes up. And whenever we discuss frameworks, the venerable Zachman framework is mentioned. Often with much facial contortion and questions like "How do you actually build it?" or "What are in the interesting bits between the rows?"

And then as @cybersal and I were hoofing it back to Charing Cross - avoiding the crowds where possible, the Framework (at least thinking about the titles of the rows) simply gives a context for discussion. You don't really need the columns. So, for example, when thinking about schemas that business services might use in communication, you are working at "row 3". This tells you as much about what you are NOT supposed to be doing as what you are supposed to be doing. It is a really nice shorthand when one is talking to another EA - since EAs have typically all read or heard John Z. So it isn't about using the Zachman Framework as a "Methodology" (whatever that means) but more of a classification system. If you like a set of membership rules.

Now just because you have a set of membership rules, that doesn't mean you have to have the formal club (and if you are Groucho Marx, "I don't care to be a member of any club that would have me as a member" - but I digress). So, no you don't have to instantiate all the rows of the framework and figure out the mappings between them. However you can say to someone, "Come out of Row 4 and think in Row 3." That is in itself a powerful and useful observation, but doesn't really move EA forward much.

Sunday, November 8, 2009

IT Profession? I think not

Recent tweets from @rsessions, @richardveryard, @j4ngis,@cybersal have been looking at how hard various professions are. @richardveryard's observation that "@j4ngis @oscarberg Rocket science isn't even particularly complicated. Goes up, comes down. It is rocket technology that is complicated." in a tweet this morning reminds of a conversation had on the golf course with a very good dr. Let's call him John.
John is, as I have said, a very good doctor. His speciality is anesthesia, but his passion is technology. He is always coming up with schemes to invent solutions to make dr.' s live easier. So much so, that he would probably prefer to do that than what he is trained to do.
So after a particularly inept (actually about normal for us, but inept by anyone else's standards) round of golf, we were trudging wearily back to the 19th. hole when John announces yet another good idea - linking wireless technology, handhelds, voice transcription, remote printing,.... His question to me was, "How hard can this be?".
My response was something along the following lines.
John you are breaking my heart. You are essentially saying that anyone without a modicum of training, experience, expertise, but just with the passion and the idea can bust into my field and take over. Have you no respect? Imagine the situation being reversed. Be me for a day, and I will be you. After all, how hard can it be to administer anesthesia to a patient. You figure out the necessary cocktail, inject it and out they go. I can imagine that there might be a few kinks along the way - like making sure that they wake up - but we can leave that to iteration 2. He was, of course horrified. He asked if I was trying to imply that my chosen line of work was as disciplined as his profession. And for the most part, it probably isn't. The key is I don't work in a profession by any normal definition.
So while we are not in a profession, then any enthusiastic amateur can build "cool stuff". Who cares about the error cases? Who cares about the edge conditions? It is all about the app after all. To take a phrase from the movie industry, "We can fix it in post."
Who cares that the patient lives? Who cares that the patient suffers a quality of life decline? In medicine when we have post - it usually means post mortem. There is no fixing it in post in medicine.

Wednesday, November 4, 2009

A rant on "SOA Projects"

The appearance of the SOA Manifesto has led me to look closely to the naming of projects and the implications of such names.
Time and time again I see projects titled or described with a technology or architecture in the name. How often do we hear, "The SAP project failed?" It isn't because the software doesn't work, there are a host of other potential reasons - all having to do with the human factors. Likewise with "SOA Projects."
The SOA community (huge generalization here) talks about "SOA Projects." Hogwash, I say. There are very few "SOA Projects." There are and should be many projects where the underlying approach is Service Oriented. There are very good reasons for deploying SOA in the enterprise/division or wherever appropriate. The deployment of SOA governance, technologies, etc. might be considered a SOA project, but creating a business application according to those tenets doesn't make that business application a "SOA Project."
Does this really matter? Isn't calling it a SOA project a convenient shorthand? Isn't calling it a SOA Project a convenient way of getting to the right funding bucket? Well, if that's the way the business operates, I, begrudgingly, guess so.
I think it is more insidious than that, however. By putting the technology or architecture top dead center in the name, it gives us an opportunity to make that the primary goal. Rather like hearing the requirement, "I need a database that..." Well databases are fine things, but a requirement that leads us to a "Database Project" again focuses on the wrong things.
So next time your SAP project failed, ask yourself the question, "Is it SAP that failed? or did the business not realize the anticipated benefits for other reasons?" It's easy to blame the technology.