Thursday, March 4, 2010

That Can Never Happen

Some of the most ominous words I hear from development teams.

I will illustrate this with a rather contrived example - but one that I hope makes the point easily. No I am not advocating the writing of yet another date handler, but the problem is neatly bounded, well understood and has sufficient complexity to make a good posting.

You might also be wondering what on earth a small coding problem has to do with Enterprise Architecture. I'll get that out of the way up front. It is relevant because we well get to the core of some of the questions around reuse, system development/deployment philosophy, good practices, etc. Not your typical fare in every day EA, but viewing one of the roles of EA as influencer on "development" we have a nice teaching opportunity.

So here's the situation. A team discovers that it needs to handle a variety of date formats and in its environment of choice there isn't a robust date package that has been thoroughly tested. They mostly know the rules (leap years, time zones, Daylight Savings Time, etc.). They also know the source of the data they need to convert/check. It's coming from a system where, "If the date is sent to us wrongly by the source system, then there is a whole lot more wrong than this minor blip. Those issues will have been caught elsewhere." If a statement like like doesn't make you very suspicious, then nothing will. But why is it a problem?

First off, the statement is true - at least in the narrow context. If the source system gets a date "wrong" then indeed this is symptomatic of a larger problem. So far, so good. The developer doesn't do a proper job of checking error possibilities, "Because they can never happen". So if the system is expecting the month to be the three letter abbreviation (e.g. JAN for January, etc.) and it is in English then seeing FEV for February is a problem. There is no English month that starts FEV, but there is a French one. So is the error a typing error (B and V are very close on a standard US keyboard), is it a semantic one - it was supposed to be the French version, or what? Should the developer have to know every language to make sure that all possible month abbreviations are accounted for? Probably not we think. Treat it as an error and move on. But what if the code is badly written and mismatches are not caught because the programmer used some fall through logic and returned DEC for all all invalid months. DEC being the last month and a good candidate for being returned in error, "Because we can never get an invalid month string." OK, perhaps we can't.

And then project #2 comes along. Developer John says to developer Chris, "Didn't you code up that weird date handling routine last year? I want to use it, can you point me to it?" "Sure it is in the project library at...."

So John does some "copy/paste reuse" for a piece of trusted code. After all it has been in production for a good long time. No problems found. Inserts it into his application, all is well and about 6 months later it blows up. Turns out that application 2 was not getting the data from the same source as application 1, so it was possible for invalid data to show up. "That Can Never happen" suddenly became, "How the !@#* DID THAT HAPPEN, I THOUGHT WE HAD TESTED CODE" with recriminations from customers, senior management, Uncle Tom Cobbley and all.

Long story for some short points:

  • Just because you reuse something doesn't mean it is tested for your situation
  • Copy/paste reuse is often worrying anyway - code handed that way is very context dependent
  • Consider the cost of hardening and making the routine a service of some sort.
  • Promotion of an item to a reusable artifact puts extra stress on development and testing because the more general corner cases have to be considered.
  • Governance and management of reusable components is an important practice
  • When something is promoted, make sure its assumptions are known and its tests are included with it - that way at least a potential user of the code can see what conditions have been explicitly tested.
And finally, if it genuinely "Can't happen." you can be sure that someone, somewhere will make it happen! So again, make the assumptions explicit. Obvious isn't it (especially with hindsight)

3 comments:

Roger Sessions said...

While I completely agree that IT should be watching scenarios like this carefully, it is not in the domain of EA to ensure that this is happening anymore than it is in the domain of EA to make sure the trash is being emptied properly.

EA needs to completely focused on problems that can only be solved at the juncture of IT and business. These are a very special category of problems, they are very important problems, they require very specialized skills, and they can't be solved by any group other than EA.

Christopher Bird said...

I got around to reading Roger's comment today. The issue isn't around the date routines, but more what happens when you hear the phrase, "That can never happen". That principle matters at the higher levels of EA as well as at lower levels of coding and development. Perhaps at the business/IT juncture, hearing business teams articulate we will never, "sell your information." However at some point they may. Absolutes are always tough.

John Schlesinger said...

A related issue which is also of relevance to EA is the phrase "That is a million to one chance". We were working on the code to ingest 12 million financial transactions a day and I pointed out what I thought was an error. The programmer said 'but that is a million to one change' and I replied 'so we will get it 12 times a day'. The underlying truth is that everything you do in Enterprise Applications has to scale. This is as true of coding for race conditions as it is for coding for pre-conditions (invalid dates).

John