Friday, December 31, 2010

Time Part One - Posted Dec 31 2010

I fear that this is going to be a difficult post. But I feel so passionately about time that I will give it a go. What uses of Time do we need to think about. In some ways it's pretty straightforward, there are instants and durations. But there is often complex interplay between people/systems regarding these.
For example, when I see a program on TV, and it covers something I haven't seen before I make an assumption that it is new. But in reality it probably isn't - others will likely have seen it. I still may tweet or otherwise notify a social group that I saw this "new" programme....... So I find it annoying not to know the time context in which something is published or made available. Of course this allows for jokes like Simon Wardley made at his excellent OSCON presentation.
"...Opens up exciting new prospects for the employment of computers in ways and on a scale that would have seemed pure fantasy only five years ago" Simon was applying the idea to the cloud, but actually the quotation stems from 1966. Without the time context we simply don't know.

And that leads to another area of time complexity - relative time - relative time words. testerday, today, five minutes ago. Of course functions that use relative time are ont idempotent. I get a different answer to "yesterday()" today than I will tomorrow.

As an aside, in synchronous conversation we know what time we are talking about, but in asynch we don't but we still use synch modes. So getting a bit more concrete.

Madame had a tennis game scheduled. I am absent minded. Madame needs reading glasses. On her phone there was a text message from Mary saying, "Are we playing tennis tomorrow?"   Madame asked me to read it to her on Tuesday morning. My assumption was that the tennis game was to be on Wednesday (tomorrow viewed from my contextual frame which was Tuesday morning).  Actually the game was Tuesday because Mary sent the message on Monday. Madame and Mary had had previous communication on the subject and so had further context. So there are lots of different times and interpretations lurking here. There's the contextual time of each participant, there's the actual time of the event, there's the time the message was sent, there's the time Madame's phone received the message... Much scope for ambiguity.

Our command and control/standardization brethren might seek to impose unambiguous temporal semantics onto tennis scheduling, our family life, our friends.... However we can see that's going nowhere. Standardizing the terms - yeah right.

However as thinkers about systems these things do matter. How do we understand the temporal frames of the various participants including those participants that are systems as well as those that are humans. When do we have to worry about these differences? How do we describe them internally (so that the things that care deal properly), but not to attempt to impose an external semantics onto every participant?

These are hard questions and get to the heart of interaction/integration and what happens in architecture.

In subsequent posts I will lay out some starting ideas around thinking about time (something delicious here - time from someone who is often late....)

What do you know and when did you know it?

In my Architect role at Sabre, I presented the following at theIATA Commercial conference in Istanbul.

“The Joined UP Airline”

I am using two major themes today when discussing the notion of the joined up airline. The first harks back to the Watergate Conspiracy in the 1970s in the USA. The President was asked repeatedly, “What do you know and when did you know it?” The second theme is even older – it is a quotation from the American Humorist and author Mark Twain, “A lie is half way around the world before the truth has its boots on”. These two themes coincide suggesting that carriers “Think like your customers”.

So let’s unpack these thoughts – and I will lead with an example here. The case of the missing bag. The passenger knows the bag is missing only once all the baggage has arrived at the carousel. But you carriers knew a lot earlier. What do you know? That the bag isn’t on the proper flight. When did you know it? When it was not loaded or when it was loaded onto a different flight. That is certainly before the customer can know it.

You now have an annoyed customer with a powerful weapon – the weapon of instant communication. The customer will of course send a message like, “Those idiots lost my bag….again.” That message is heard instantly – before you have had a chance to manage the message. That opinion is heard while you are still handling the problem operationally. Your “truth” is still tying its boots while the negative message is already circulating. Wouldn’t it have been better to use the passenger’s contact data to notify him or her as soon as you knew?. No matter that that the passenger is probably out of electronic communication – at least send the message so that s/he doesn’t have to wait at the carousel. Apologize and offer some kind of trip appropriate/status appropriate recompense. In other words act on the information that you have – and do it as soon as you can.

This approach is at the forefront of Sabre’s SabreSonic CSS solution. That is true Customer Sales and Service. Discover the meaningful happenings and act on them immediately. It looks simple from the outside, but joining up the data is hard. It’s easy for the passenger but hard for systems. The transactional systems are simply not designed (and nor should they be) to do the necessary analysis. However they have the raw information. Making sense out of the raw information and then acting on it in an appropriate way turns the negative into a positive. You may still see the angry posting, but through proactive message management you can do something to handle the “flame”.

Of course this looks like just plain good customer relationship management. And so from a practices point of view it is. Where we have been lacking up to now is to have platforms that collect data as it becomes available, organizes and standardizes it, joins it with other relevant data and delivers that joined up data to action oriented systems according to your policies. SabreSonic CSS does just that. It allows you to do things that you were unable to do before. Imagine having the freedom and flexibility to do business the way you want to. Let’s turn the previously impossible into current and probable. And remember data without action may be interesting but it isn’t useful.

Tuesday, July 20, 2010

The role of channels

On the tnooz web site there is a whole lot of discussion about the pricing of travel through channels and what the responsibilities of the various players are. Specifically the link poses the question, "Should airlines be forced to disclose equal pricing and fees in all channels?"

Now pricing is an interesting and really important part of any business/business strategy, and thence business architecture. The fundamental question of "How do we price?" affects how we market, how we sell, to some extent how we produce. So for example in a business with only a direct sales model, the systems (in the broadest sense) that deal with sales don't have to handle other channels. But businesses with multiple channels have to make the role of the channel explicit in all systems that are concerned with sales.

If the channels don't control the pricing, what use are they? Surely the idea (really simplistically) is for the channels to provide the conduits for products from suppliers to consumers and thus to set the price for the product via that conduit.

To be successful as a channel, you presumably have to price items higher than what you get them for. Not all items - you can do loss leading if you wish. A pretty simple market place really.

The airlines are responsible for setting the cost to the channel, not the price to the consumer - except where the airline is the channel. The role of the airline acting as its own channel is different from the airline operating its own business. The travel business is a bit different from other retail businesses where, typically the manufacturer doesn't operate its own channel. Although when we see store within a store concepts (like the cases where a makeup provider sets up shop inside a retailer) the makeup provider is acting as its own channel. The store in which the make-up provider is operating is simply another supplier and associated costs must be factored in.

Purchasing is different again. In a world of competition the channels should be clamoring for the consumers' business. Through all manner of inducements. As a consumer I can choose which channel I purchase through depending on a number of factors. Just as I can choose whether to shop at Harrods (Cool, easily identifiable status symbol bag) or Walmart (Look, I am a thrifty person in the new economy).

So in the travel business we really do have to separate out the responsibilities of the various parties in the transaction. There are more parties than just the airline, channel, and purchaser involved. Hotels, rental car companies, restaurants, golf clubs, credit cards,.... All have parts to play. Each may have multiple parts - the part as the supplier, the part as the purchaser, the part as a channel. If we don't separate the parts, we can create interesting opacity. In some industries that is very desirable. However in a business oriented towards consumers, the consumers want transparency.

The job of the channel is to provide that transparency.

So any player in the relationship that wants to be in the role of the channel (whether it be a traditional travel agent, an airline, a hotel, an online travel agent,...) has to act like a proper channel and take the proper channel responsibilities. Simply put:

* Acquire "inventory" at the proper time/price

* Package acquired "inventory" into offers

* Offer the acquired "inventory" under terms that are attractive

* Accept offers to purchase such inventory
*  Fulfill those offers
It gets a bit tricky because there are actually multiple channels involved in the end. I can purchase the basic travel through, say, Travelocity or Orbitz. But when I eventually travel, I may end up buying something directly from the airline itself (say a meal). Actually, at that point the airline is acting as a channel for a meal preparation company, so it can choose how to set its prices.
In order really to understand a complex, network business, we architects really need to think through the roles in the ecosystem, and pay attention to, and manage business architectures that support the pricing models that matter.
When there is turbulence, aim for flexibility. When there is stability, aim for simplicity.

Tuesday, July 6, 2010

Controls and Trust

I am appalled as I look at systems in various companies with whom I have consulted, or who have employed me at the lack of system controls in key places. If you are in the data delivery business and you have agreements with your customers, wouldn't you want to know that you are meeting your service level agreements? Or better still when you are not going to (for whatever reason) and be able to issue warnings, do something about it, or whatever?

Similarly when looking at flow through from one system to another, can you reasonably be assured that everything that was supposed to be processed was?

Do you count your cash after going to the ATM. Maybe the machine didn't deliver correctly because a couple of notes were stuck together. Maybe a new software version caused a miscount under some weird circumstances. The ATM is a "black box" to me. That means that at its boundaries I have to decide what my trust relationship with it will be.

So when I have systems which are supposed to communicate in some way (e.g. by passing data) what controls should be in place to make sure everything is properly accounted for? Should a sending system keep a count of what it has sent? Should receiving systems similarly keep track? How do we reconcile? Should the reconciliation be in-band? Should it be out-of-band? Is logging adequate? Do we have to account for the "value" of the transmission as well as just counts? What tolerances matter if we are concerned with value (perhaps one system rounds off the value differently from another so at the end of the day the total value has a discrepancy)?

This need for controls is exacerbated by systems that use Events as the primary means of notification. Because at the individual event level we can indeed count, maintain value, etc. But often the controls need to be at an aggregate level. One would think in, for example, an airline boarding system that as long as every boarding event is properly received by the "flight", then the system should be in balance. Try telling that to Easyjet. There is a manual control system whereby the Flight Attendants actually count the number of passengers on the plane and attempt to reconcile that with the "expected" number. How the expected number is derived, I have no idea. It could be simply the number of boarding cards collected - but what about electronic boarding? It could be the "system's" view of how many bums there should be on seats. Whatever it is it doesn't appear to be reliable. Chris Potts (Twitter @chrisdpotts) told me the story of what happens when the count is wrong. they recount, they look for people in bathrooms, they delay the flight. It's all a mess.

In the 1960s when phone phreaking was at its peak, people could make free calls because the control signals (tones) for managing the connection system were on the same band of the infrastructure as the call itself. So when a signal tone was detected (and you could get whistles to generate these tones), the system went into a signalling state. By signalling the correct sequence you could generate the sequence to make free calls. Simple fix - put the controls out of band with what you want to transmit.

In a properly reliable infrastructure, the appropriate controls should be built in from the beginning. Again, you may ask, "What's this got to do with Enterprise Architecture?". I argue that it has a great deal to do with the architecture of the enterprise. Good controls make for good compliance and a high level of confidence in our business practices. Bad controls can make your corporation star in places you don't want to be - the front page of the WSJ, in anecdotes among the social networks, resulting in a loss of confidence in your organization.

Thursday, July 1, 2010

Observations on Silos

In this posting Richard Veryard  examines the value of silos and silo thinking in organizations. In some ways silos are necessary organizational mechanisms because they place boundaries on functional activities. However they are also creatures of organizational politics - each silo is headed by "someone important". At the European  Enterprise Architecture Conference 2010, Alec Sharp  observed that silos have negative connotations and that perhaps we should use the term, "Cylinders of Excellence". That at least sounds more empowering.

Often, however, the success metrics of a silo are actually in conflict with those of the organization. Well intentioned success metric/motivational policy in a sales team (e.g. "Orders taken in the last week of the quarter will pay extra commissions") will in all likelihood have exactly the wrong outcome. Commissioned sales force team members are actually incented to do the wrong thing - to sandbag until the last few days of the quarter. This has potential downstream effects. It places extra burden on finance and other back office "systems" (silos?). It doesn't allow the organization to get a true picture of sales performance through the quarter. Somehow the miracle always seems to occur and some huge deal comes in at the end of the quarter to "rescue the quarter", keep the investors happy, commission the sales team...

So as architects we should look at the effects of the silos on organizational goals. Moving the silos around is likely to be a bad idea. Powerful people sit at the tops of silos and the only time to effect change at that scale is when new senior management take over. So rather than being perceived as people who are trying to undermine power, enterprise architects need to be seen as people who facilitate the planning and execution of the enterprise.

A fundamental (inside out) question then is, "How do the goals/reward systems in the current silos ensure that the goals and reward systems of the enterprise? are met" In the example above, the Sales silo would argue that it is helping the enterprise meet its goals by making sure that targets are indeed met. However, there is nothing in the enterprise goals that says "at whatever cost to the rest of the enterprise".

Enterprise architects are fundamentally enterprise level thinkers and thus must be thinking about the enterprise (and possibly the extended enterprise) as a whole. We must be able to understand the implications of conflicts in the enterprise - where Value systems in the silos are either at odds with each other, or at odds with the value systems of the organization as a whole.

Monday, April 12, 2010

Ugliness in the sink

An architectural fundamental is that things that are not related should not cause failure on each others' part. The printer breaking should not stop the processing of orders. It does stop the printing of orders (duh), but should not stop the processing.

Some of the systems I deal with put me in mind of a sewage system. Bear with me here...

So, imagine the following undesirable situation. You are standing at the sink after dinner washing the dishes/pots/pans etc. It had been raining very hard all day, so the drainage systems had been well overtaxed. You are startled to see stuff coming UP the drain (perhaps from your neighbors house - you can tell what they had for dinner too). This is clearly an undesirable effect.

On further analysis you realize that the drainage system couldn't take the runoff fast enough, so your house, being lower than your neighbours, became the lowest point their effluent could reach. Oh dear.

Of course restuarant codes (at least in the USA) have a cure. They make sure there is a buffer zone between the sewage system and any sink where food is prepared. It is an airgap and a sufficiently large gap that any attempt by effluent to rise up will be nullified. It will spread all over the floor instead. Not much better, but definitely not contaminating food in a sink.

So what's the lesson here? Denial of Service through increased load on a system is something to watch out for. make sure you understand the implications and design accordingly.

This is as important at a business level as it is at a technical level. Who hasn't experienced a lack of planning - perhaps an airline offering a fare sale and not scaling enough to have call center representatives. Or the fiasco in Texas where the "business" (the State) offered rebates for trading in old appliances. They underestimated demand horribly.

The enterprise really has to understand its business models and take care to scale appropriately - and communicate that through to all interested and relevant parties

Thursday, March 4, 2010

That Can Never Happen

Some of the most ominous words I hear from development teams.

I will illustrate this with a rather contrived example - but one that I hope makes the point easily. No I am not advocating the writing of yet another date handler, but the problem is neatly bounded, well understood and has sufficient complexity to make a good posting.

You might also be wondering what on earth a small coding problem has to do with Enterprise Architecture. I'll get that out of the way up front. It is relevant because we well get to the core of some of the questions around reuse, system development/deployment philosophy, good practices, etc. Not your typical fare in every day EA, but viewing one of the roles of EA as influencer on "development" we have a nice teaching opportunity.

So here's the situation. A team discovers that it needs to handle a variety of date formats and in its environment of choice there isn't a robust date package that has been thoroughly tested. They mostly know the rules (leap years, time zones, Daylight Savings Time, etc.). They also know the source of the data they need to convert/check. It's coming from a system where, "If the date is sent to us wrongly by the source system, then there is a whole lot more wrong than this minor blip. Those issues will have been caught elsewhere." If a statement like like doesn't make you very suspicious, then nothing will. But why is it a problem?

First off, the statement is true - at least in the narrow context. If the source system gets a date "wrong" then indeed this is symptomatic of a larger problem. So far, so good. The developer doesn't do a proper job of checking error possibilities, "Because they can never happen". So if the system is expecting the month to be the three letter abbreviation (e.g. JAN for January, etc.) and it is in English then seeing FEV for February is a problem. There is no English month that starts FEV, but there is a French one. So is the error a typing error (B and V are very close on a standard US keyboard), is it a semantic one - it was supposed to be the French version, or what? Should the developer have to know every language to make sure that all possible month abbreviations are accounted for? Probably not we think. Treat it as an error and move on. But what if the code is badly written and mismatches are not caught because the programmer used some fall through logic and returned DEC for all all invalid months. DEC being the last month and a good candidate for being returned in error, "Because we can never get an invalid month string." OK, perhaps we can't.

And then project #2 comes along. Developer John says to developer Chris, "Didn't you code up that weird date handling routine last year? I want to use it, can you point me to it?" "Sure it is in the project library at...."

So John does some "copy/paste reuse" for a piece of trusted code. After all it has been in production for a good long time. No problems found. Inserts it into his application, all is well and about 6 months later it blows up. Turns out that application 2 was not getting the data from the same source as application 1, so it was possible for invalid data to show up. "That Can Never happen" suddenly became, "How the !@#* DID THAT HAPPEN, I THOUGHT WE HAD TESTED CODE" with recriminations from customers, senior management, Uncle Tom Cobbley and all.

Long story for some short points:

  • Just because you reuse something doesn't mean it is tested for your situation
  • Copy/paste reuse is often worrying anyway - code handed that way is very context dependent
  • Consider the cost of hardening and making the routine a service of some sort.
  • Promotion of an item to a reusable artifact puts extra stress on development and testing because the more general corner cases have to be considered.
  • Governance and management of reusable components is an important practice
  • When something is promoted, make sure its assumptions are known and its tests are included with it - that way at least a potential user of the code can see what conditions have been explicitly tested.
And finally, if it genuinely "Can't happen." you can be sure that someone, somewhere will make it happen! So again, make the assumptions explicit. Obvious isn't it (especially with hindsight)

Tuesday, February 2, 2010

Openness, opensource, lock in, the downfall of authority

In the last couple of days a few related ideas came floating by. It started when Madame was bemoaning behavior in her profession (she is a University lecturer). Then I got to thinking about Open Source - at the behest of a customer. This led down the path of no longer being able to dictate because you are the authority.
You might wonder where this is going, so let me elaborate.
In school there is a certain amount of concern that the students are tweeting, surfing the net, multitasking, getting dates on Facebook or something like that. The gut reaction of the authoritarians is to find a way to punish the students. Of course in some ways the lecturers have the ultimate authority - dropping a grade - but that is patently unfair. Madame, of course, came up with the practical observation - make the classes interesting enough and they won't be distracted. That does require effort on the part of the lecturer though.
And now to open source. There is a fear that making something open will lower the switching cost and therefore customers will leave. So, the argument goes, increase the lock in by a proprietary method and tie the customers down. The Madame principal, of course, is to provide enough value so that the customers won't want to switch. That of course also requires effort.
We have much anecdotal evidence that lock-in of any sort is despised - sometimes in the software world a reason not to buy. In the current times, where education has become dialog, where software acquisition i also a dialog, we must be prepared to engage directly as equals and not assume positions of authority "because it always worked that way."
There are few excellent companies that actually can get away (at least for a while) with their authoritarian (aka my way or the highway) stance. That happens when the perceived value of the item is so great that the proprietary nature is irrelevant (Apple anyone?), but the advantage can erode quickly when another competitor enters the market (even if that competitor is no more open). The spat in the pricing model between Amazon and Apple comes to mind here. It just took MacMillan to have an alternative and to stand up to the lock in through monopoly bully and the market fractures.