Saturday, February 15, 2014

Narcissistic Applications and Architecture

This post comes out of some work that I am doing with a client. Getting to the essence of event processing and what needs to be in place.
As many have observed, the metadata is often as interesting to the enterprise as the actual data. The trouble is that the enterprise doesn't necessarily know ahead of time what may or may not be interesting. perhaps applications that manage the state of domain objects should tell the world when they have changed the state of a domain object that they manage.
It is only when applications start bragging about what they have done that the enterprise has the ability to draw conclusions that range across the domains.
So while the current state of an interesting domain object may well be locked up in a transactional database somewhere, that the state change occurred could (and should) be made available to any/all interested parties.
Let's think in terms of an intelligent (but fictitious) home environment that we will call the IHE.
Our daily activities in the house include:
  • Using hot water 
  • Turning lights on and off
  • Accessing computers
  • Watching TV
  • Sleeping
  • Opening/closing the refrigerator
  • Cooking
  • Eating
  • Managing the trash
  • Managing the recycling
  • Filling the dishwasher
  • Dressing
  • Doing (or having done) the laundry 
  • Opening/Closing exterior doors
  • ...
I have this feeling that my home bills are too high, so it might be interesting to see if any of my activities are inefficient (I leave lights on sometimes - so we have an event, followed by a negative event), if some of my activities can be co-related. Perhaps a change in one suggests an opposite direction change in another.

Now if, hypothetically, all my activities resulted in events being notified and somehow analyzed, then perhaps (and this is a big perhaps) I have the opportunity to look at my patterns and make some changes that result in savings in time, energy or general annoyance.

Of course we do the obvious ones. When we sleep, running the dishwasher is a no-brainer. But what about multiple uses of the oven? What about leaving lights on? What about leaving doors/windows open correlated with when the heating/cooling are running.

The point is that all of these state changes describe the minutiae of my life and I don't have the time, not the energy to capture them. That detail should be captured at the time it happens - if I am truly interested. It shouldn't wait until after the event when my recollection is hopelessly flawed.

Johnny Cash on Technical Architecure

Yes, that Johnny Cash aka the man in black. He of the deep voice, great songs, San Quentin concert...
I was having a cup of coffee at a local Starbucks yesterday when a friend showed me an "architectural diagram" of the technology components that a customer of his had shown him. Very proudly, all based on open source (because they don't want to pay license fees) they unveiled this masterpiece that they had taken several years to build.

Immediately I was reminded of this terrific song....

http://youtu.be/5GhnV-6lqH8

We architects do need to work on ensuring a few things:
  • Don't overdo the technology
  • Open source may be the way to go, but joining disparate things up can get expensive fast
  • Ensuring that the pieces can connect (bolts and bolt holes anyone 2:05?)

Tuesday, August 27, 2013

Shearing Layers - Part 2, Stuff

In the previous post I introduced the notion of shearing layers - taken from Stewart Brand's book "How Buildings Learn". In this one I am going to look at how having better data can possibly affect the "Stuff" layer.
For example, shopping habits on the web site can show buying patterns and trends that could translate to the brick and mortar store. Looking at what people search for together online could give a clue to what they are looking for when they get to a store.
Note that the could in the previous observation is very much an imponderable. While the shearing "Stuff" on the web site is really easy (facilitating A/B testing, for example), it is still tricky in the brick and mortar store.
Reorganizing store shelves/layout runs the risk of confusing staff and customers. Things aren't where they were yesterday. Our ingrained habits and expectations no longer work for us. So the risk is definitely there, but there could be some interesting small experiments.
Perhaps it is worth grouping trousers by style/size and not by color. Perhaps it is worth grouping shoes by size, mixing up the brands. Of course that one is very tricky because we buy shoes with our eyes, so we may need to see a floor sample which will be of a single size.
The desires of the store, the desires of the brands and the desires of the customer may well come into opposition.
The online shopping experience can give us a rate of change greater than that in the physical store - delivering data to the store planners and merchandisers that can influence product placement - and the ultimate goal of selling more "Stuff" to the customers

Sunday, August 25, 2013

Shearing Layers - Part 1 physical buildings

In Stewart Brand's terrific work, "How Buildings Learn", there are some great analogies to what we do in Enterprise Architecture. He expanded on the concept of "shearing layers" introduced by Robert V. O'Neill in his "A hierarchical concept of ecosystems". The primary notion being that we can hierarchically understand our ecosystems better by understanding the different rates of change possible at the different layers.
 
 
The diagram above is reproduced from How Buildings Learn, and represents the parts of a building which change at different rates. It is arranged from outside in with, in this representation, no absolute correlation between the parts and rate of change. 
Using Brand's own explanation, the layers have the following descriptions:

Site

The site is the geographical setting, the urban location and the legally defined lot whose boundaries and context outlast generations of ephemeral buildings

Structure

The foundation and load bearing elements are perilous and expensive to change, so people don't. These are the building.

Skin

External surfaces can change more frequently than the structure of the building. Changes in fashion, energy cost, safety, etc. cause changes to be made to the skin. However tradition and preservation orders often inhibit changes to the skin since the skin is very much the aesthetic.

Services

These are the working guts of the building, communications/wiring, plumbing, air handling, people moving. Ineffective services can cause buildings to be demolished early, even if the surrounding structure is still sound.

Space Plan

The space plan represents the interior layout - the placement of walls, ceilings, doors, etc. As buildings are re-purposed, as fashion changes so can the interior quickly be reconfigured.

Stuff

The appurtenances that make the space useful for its intended purpose. Placement of tables, chairs, walls cubicles, etc.
 
In further articles, I will develop this theme in 2 directions. First in thinking about how data can affect the way that retail organizations can think about their layout and organization (shearing at the Stuff/Space Plan layers of both brick and mortar and web stores. Second in looking at Enterprise Architecture through the lens of shearing layers - by analogy with Brand's writing and thinking.

Saturday, June 8, 2013

Trying to understand IaaS, and other nonsense

There's been something upsetting me about the whole notion of Infrastructure as a Service. It has taken me a while to put my finger on it, but here goes. But first, an analogy with electricity usage and provisioning.
When I flick on the light switch, I am consuming electricty. It doesn't matter to me at the moment of consumption where it is coming from as long as:
  1. It is there when I want it
  2. It comes on instantly
  3. It delivers enough of it to power the bulb
  4. It doesn't cause a breaker to trip
  5. It doesn't cause the bulb to explode
  6. ...
So that's the consumption model. That's independent of the provisioning model - at least as long as those requirements are met.
I could satisfy that need through several mechanisms:
  1. I could have it delivered to my house from a central distribution facility
  2. I could make it myself
  3. I could steal it from a neighbour
  4. ....
Regardless of which provisioning methods I use, I am still consuming the electricty. The lightbulb doesn't care. However the CFO of the birdhouse does care. Thinking about the service of electricity - it's about how I procure it and pay for it, not how I consume it. Sure I can add elasticity of demand, it's summer and I am running the air conditioners throughout the house and both oven are on and every light.....But that is a requirement on how I procure the service not on how the devices use it.

Similarly in the software defined infrastructure world, the application that is running doesn't really care how the infrastructure it is running on was provisioned. The "as a service" part of IaaS is about the procurement of the environment on which the application runs.

The procurement model can, of course affect the physical environment of the equipment. Just as delivering electricity to my house requires cable, effects on the landscape, metering, centralized capacity management we have to have those kinds of capabilities in our IaaS procurement worlds. No argument there, but at the end of the day it's about how the capability is delivered and paid for, not in how it operates that really matters.

Monday, March 11, 2013

Rate of Change

I have been trying to help operational IT groups understand how important rate of change of resource consumption is. Finally I cam up with an analogy that helps. In the airline industry, it isn't necessarily a bad thing if an aircraft is on the ground. It may not be returning anything to the business, but it isn't necessarily awful. However the RATE at which it got to the ground is very important. Impact at 300kts is unlikely to be what anyone had in mind. Graceful "impact" at 150 kts may be perfectly OK.
So while the lower rate of change is no guarantee that things are good, the higher rate of change means that immediate action will need to be taken.
Likewise in systems, if a disk gets to use 80% of its capacity gradually, there is probably no need to panic. A careful plan will allow operations to ensure that disaster doesn't strike. If it spikes suddenly then there is a definite need to do something quickly before the system locks up or starts losing transactions.
Knowing that there will be an issue is even more important than recovering from an issue that has already happened.

Tuesday, February 12, 2013

Operational Data Stores and canonical models

One way of thinking about an operational data store is to be a near real time repository of/for transactional data organized to support an essentially query workload against transactions within their operational life. This is particularly handy when you need a perspective on "objects" that have a longer life than their individual transactions might allow. Examples might include supply chain apps at a large scale, airline reservations - business objects that may well have transactions against them stretching over time.
In both cases, the main "object" (large, business grained) has a life span that could be long - surviving system resets, versions of the underlying systems, etc.
Considering the case of an airline reservation, it can have a lifespan of a couple of years - 330 days prior to the last flight the resevation can be "opened", and (especially in the cas eof refund processing) it might last up to a year or so beyond that. At least, give or take.
The pure transactional systems (reservations, check in, etc.) are most concerned (in normal operations) with the current transactional view. However there are several processes that care about the historical view while the reservation is still active. There are other processes that deal with and care about history of completed flights, going back years. Taxes, lawsuits, and other requests that can be satisfied from a relatively predictable data warehouse view.
It's the near term stuff that is tricky. We want to gain fast access to the data, the requests might be a bit unpredicatble, the transactional systems may have a stream of XML data available when changes happen, ...
So how do we manage that near real time or close in store? How do we manage some kind of standard model without going through a massive data modeling exercise? How do we get value with limited cost? How do we deal with unknown specific requirements (recognizing the need for some overarching patterns).
Several technologies are springing up in the NoSQL world (MongoDB, hybrid SQL/XML engines, Couchbase, Cassandra, DataStax, Hadoop/Cloudera) which might fit the bill. But are these really ready for prime time and sufficiently performant?
We are also not dealing with very big data in these cases, or they data might become big as we scale out. It is kind of intermediate sized data. For example in a reservation system for an airline serving 50 million passengers/year (a medium sized airline), the data size of such a store is only of the order of 5TB. It is not like the system is "creating" tens of MB/second as one might see in the log files of a large ecommerce site.
If we intend to use the original XML structures as the "canonical" structure - i.e. the structure that will be passed around and shared by consuming apps, then we need a series of transforms to be able to present the data in ways that are convenient for consuming applications.
However, arbitrary search isn't very convenient or efficient against complex XML structures. Relational databases (especially at this fairly modest scale) are very good at searching, but rather slow at joining things up from multiple tables. So we have a bit of a conundrum.
One way might be to use the RDB capabilities to construct the search capabilities that we need, and then retrieve the raw XMLfor those XML documents that match. In other words a hybrid approach. That way we don't have to worry too much about searching the XML itself. We do have to worry, however, about ensuring that the right transforms are applied to the XML so we can reshape the returned data, while still knowing that it was derived from the standard model. Enter XSLT. We can thus envisage a multi part environment in which the search is performed using the relational engine's search criteria, but the real data storage and returned set comes from the XML. The service request would therefore (somehow!), specify the search, and then the shaping transform required as a second parameter.
It is a bit of a kluge pattern, perhaps but it achieves some entyerprise level objectives:
  • Use existing technologies where possible. Don't go too far out on a limb with all the learning curve and operational complexity of introducing radical technology into a mature organization
  • Don't bump into weird upper bound limits (like the 16MB limit in MongoDB)
  • Don't spend too much time in a death by modeling exercise
  • Most access to the underlying data comes through service calls, so data abstraction is minimized
  • Use technology standards where possible.
  • Rebuild indexes, etc. from original data when search schema extensions are needed
  • Possibly compress the raw XML since it is only required at the last stage of the processing pipe
It also has some significant disadvantages:
  • Likely to chew up considerable cycles when executing
  • Some management complexity
  • Possible usage anarchy - teams expressing queries that overconsume resources
  • Hard to predict resource consumption
  • Maybe some of the data don't render cleanly this way
  • Must have pretty well defined XML structures
So this pattern gives us pause for thought. Do we need to go down the fancy new technology path to achieve some of our data goals? perhaps not for this kind of data case. Of course there are many other data cases where it would be desirable to have specially optimized technology. This doesn't happen to be one of them.

Friday, November 23, 2012

Cash to Delivery

This post looks at a specific high level process drawn from the "eating out" industry. It's origins are from that fine barbecue establishment in Dallas - the original Sonny Bryan's. The time 1985. The conversation took place in the shack that was as crowded as ever at lunch time. The method was, place order, pay for order, hang around and wait for order, pick up order, attempt to squueze oversized bottom into school chairs, devour product. While casually waiting for the order to be prepared, I idly asked my colleague, "how do they match the orders with the people?". It seemed as if the orders always came out in sequence, so, being a systems person, I got to wondering about the nature of the process.

I had clearly paid in a single "transaction". Sonny Bryan's had my money. In return I had a token (numbered receipt) that stated my order content as evidence of what I had paid for. However that transaction was not synchronous with the delivery of the food, nor was the line held up while the food was delivered. Had it been, the place would have emptied rapidly because the allotted time for lunch would have expired.

I, as the customer, think that the transaction is done when I have wiped my mouth for the last time, vacated my seat and thrown away the disposable plates, etc. But the process doesn't work like that.
There are intermediate "transactions". The I paid and got a receipt transaction (claim check, perhaps?). The I claimed my order transaction. The I hung around looking for somewhere to sit transaction. The I threw away the disposables transaction.

Each of these transactions can fail, of course. I can place my order and then discover I can't pay for it. No big deal (from a system perspective, but quite embarrasing from a personal perspective). I could be waiting for my order, and get called away, so my food is left languishing on the counter. Sonny Bryans could have made my order up incorrectly. I could pick up the wrong order. I could have picked up the order and discovered no place to sit. Finally I could look for a trash bin, and discover that there isn't one available (full or non-existent).

I defintely want to view these as related transactions, not one single overarching transaction (in the system's sense). In reality what I have is a series of largely synchronous activities, buffered by asynchronous behavior between them.

Designing complete systems with a mixture of synchronous and asynchronous activities is a very tricky business indeed. It isn't the "happy path" that is hard, it is the effect of failure at various stages in an asymchronous world that makes it so tough.

Wednesday, November 21, 2012

A data hoarder's delight

Withe big data, I sometimes feel like the character Davies in the masterful spy novel, "The Riddle of the Sands" by Erskine Childers. "We laughed uncomfortably, and Davies compassed a wonderful German phrase to the effect that 'it might come in useful'"

Some of the "big data" approaches that I have seen are a bit like that. We keep stuff because we can, and because it "might come in useful." For sure there are some very potent use cases. Forbes, in this piece describes how knowledge about the customercan drive predictive analytics. And valuable it is.

However the other compelling use-cases are a bit harder to find. We can certainly do useful analysis of log files, click through rates, etc., depending on what has been shown to a customer or "tire kicker." But beyond that the cases are harder to come by. That is to some extent why much of the focus has been on the technology and technology vendor side. 

There is a pretty significant dilemma here, though. If we wait to capture the data until we know what we need, then we will have to wait until we have sufficient data. If we capture all the data we can as it is flying through our systems and we don't yet know how we might use it, we need to make sure it is kept in some original form. Apply schema at the time of use. That makes us quake in our collective boots if we are data base designers, DBAs etc. We can't ensure the integrity. Things change. Where are the constraints?... In my house that is akin to me throwing things I don't  know what to do with (pieces of mail, camping gear, old garden tools, empty paint cans, bicycle pumps,... you get the idea). So when Madame asks for something - say a suitable weight for holding the grill cover down, I can say, "Aha, I have just the right thing. Here, uses this old, half full can of paint." A properly ordered household might have had the paint arranged in a tidy grouping. But actually that primary classification inhibits out of the box use.

Similarly with data. I wonder what the telephone country code for the UK is.. Oh yes, my sister lives there, I can look up her number and find it out. Not exactly why I threw her number onto the data pile, but handy nonetheless.

So with the driver of cheap storage, cheap processing, we can sudden;y start to manage big piles of data instead of managing things all neatly.

This thinking model started a while back with the advent of tagging models for email. Outlook vs Gmail as thinking models. If I have organized my emails in the Outlook folders way, then I have to know roughly where to look for the mail - all very well if I am accessing by some primary classification path, but not so handy when asked to provide all the documents containing a word for a legal case...It turns out - at least for me that I prefer a tag based model - a flat space like Gmail where I use search as my primary approach, as opposed to a categorization model where I go to an organized set of things.

There isn't much "big" in these data examples. It is really about the new ways we have to organize and manage the data we collect - and accepting that we can collect more of it. Possibly even collecting every state of every data object, every transaction that changed that state, etc. Oh and perhaps the update dominant model of data management that we see today will be replaced with something less destructive.

Thursday, October 25, 2012

Data ambiguity and tolerance of errors

In the current election "season" in the USA there has been much ado about ensuring that only registered voters are allowed to vote. The Republican Party describes this as ensuring that any attempts at fraud are squelched. The Democratic Party describes this as being an attempt to reduce the likelihood that certain groups (largely Democrat voting) will vote. I certainly don't know which view is correct, and that is not the purpose of this post, but it does inform the thinking.

The "perfect" electoral system would ensure that everyone who has the right to vote can indeed do so, do so only once, and that no one who is not entitled to vote does not. Simple, eh? Not so much! Let me itemize some of the complexities that lead to data ambiguity.
  • Registration to vote has to be completed ahead of time (in many places).
  • The placement of a candidate on the ballot has to be done ahead of time, but write-in candidates are permissable under some circumstances..
  • Voters may vote ahead of time.
  • Voters vote in the precinct to which they are assigned (at least in some places)
  • Voters may mail in their votes (absentee ballots)
Again they don't appear insurmountable except that the time element causes some issues. Here are some to think about:
  • What if a person votes ahead of time, and then becomes "ineligible" prior to voting day. Possible causes include death, conviction of a felony, certifiably insane.
  • What if a person moves after registration, but before they vote?
  • What if a candidate becomes unfit after the ballots are printed and before early voting? (death, conviction of a felony, determination of status - eg not a natural born citizen
  • What if a candidate becomes unfit after early votes for that candidate have been cast?
  • .....
These are obviously just a few of the issues that might arise, but enough to give pause in thinking about the process. If we really want 100% accuracy we have a significant problem because we can't undo the history. Now if a voter has become ineligible after casting the vote (early voting or absentee ballot or before the closure of the polls if voting on election day), then how could the system determine that? It would be possible, to cross reference people who have voted with the death rolls (except of course if someone voted early so they could take their trip to look at the Angel Falls where they were killed by local tribespeople and no one knew until after the election).

On a more serious note, voting systems deliver inherently ambiguous results. Fortunately that ambiguity is tiny, but in ever closer elections, it gives those of us who think about systems somethings that are very hard to think about. That is, "How do we ensure the integrity of the total process?" and "How good is good enough?"

Actually that thinking should always apply. While we focus on the happy paths (the majority case), we should always be thinking about what the tolerance for error should be. It is, of course politcal suicide to say that there is error in the voting system, but rest assured - even without malice, there is plenty of opportunity for errors to creep in.

Tuesday, September 4, 2012

Intension vs Extension

Sometimes I feel really split brained! On the one hand I am thinking about the importance of controlling data, data quality, data schema, etc. On the other hand, I realize I can't! So the DBA in me would like the data to be all orderly and controlled - an intensional view of the data. What the model looks like as defined by the kinds of things.
But then I look outside the confines of a system and realize that, at least this human, tends to work extensionally. I look at the pile of data and  create some kind of reality around it. Probably making many leaps of faith, many erroneous deductions, probably drawing erroneous conclusions, positint theories and adding to my own knowledgebase.
So a simple fact (you are unable to meet me for a meeting) + the increase in your linkedin activity, + a tripit notification that you have flown to SJC will at least give me pause for thought. Perhaps you are job hunting! I don't know, but I might posit that thought in my head and then look for things to confirm or deny it (including phoning you to ask). How do I put that into a schema? How do I decide that is relevant?

I don't. In fact I may never have had the explicit job-hunt "object" or at least never had explicit properties for it, but somehow this coming together of data has led me to think about it.

The point here is, of course, that if we attempt to model everything about our data intensionally we are doomed. We will be modeling for ever. If we don't model the right things intensionally, we are equally doomed.

This is the fundamental  dichotomy pervading the SQL/NoSQL movement today. We want to have the control that intensional approaches give us so that we can be accurate and consistent - especially with our transactional data, but we also want the ability to  make new discoveries based on the data that we find.

We can't just have a common set of semantics and have everyone expect to agree. In Women, Fire and Dangerous Things, George Lakoff describes some categories that are universal across the human race. Those are to some extent intensional. Then there are all the others that we make up and define newly, refine membership rules, etc. and those are largely extensional.

Friday, June 8, 2012

In stream and out of band

Big data seems to be popping up everywhere. The focus seems to be on the data and the engines and all the shiny toys for doing the analysis. However the tricky part is often getting hold of the slippery stuff in the first place.
In the cryptography world, one of the most useful clues that something big is about to "go down" is traffic analysis. Spikes in traffic activity provide signals to the monitoring systems that further analysis is required. There is useful information in changes in rate of signals over and above the information that may be contained in the message itself.
Deducing information just from the traffic analysis is an imprecise art, but knowing about changes in volume and frequency can help analysts decide whether they should attempt to decrypt the actual messages.
In our systems, this kind of Signal Intelligence is itself useful too. We see it in A/B testing. We see it in prediction about volume for capacity planning. In other words we are losing a valuable source of data about how the business and the technology environments are working if we ignore the traffic data.
Much of "big data" is predicated on getting hands (well machines) on this rich vein of data and performing some detailed analysis.
However there are some challenges:
  • Getting access to it
  • Analyzing it quickly enough, but without impacting its primary purpose.
  • Making sense of it - often looking for quite weak signals
That's where the notion of in-stream and out of band comes from. You want to grab the information as it is flying by (on what? you may ask), and yet not disturb its throughput rate or at least not much. The analysis might be quite detailed and time consuming. But the transaction must be allowed to continue normally.
In SOA environments (especially those where web services are used), all of the necessary information is in the message body so intercepts are straightforward. 
Where there is file transfer (eg using S/FTP) the situation is trickier because there are often no good intercept points.
Continuing the cryptography example, traffic intercepts allow for the capturing of the messages. These messages flow through apparently undisturbed. But having been captured, the frequency/volume is immediately apparent. However the analysis of content may take some while. The frequency/volume data are "in stream" the actual analysis is "out of band".

Thursday, June 7, 2012

CAP Theorem, partitions, ambiguity, data trust


This posting was written in response to Eric Brewer's excellent piece entitled

CAP Twelve Years Later: How the "Rules" Have Changed

I have copied the statement of the theorem here to provide some context:

The CAP theorem states that any networked shared-data system can have at most two of three desirable properties:
  • consistency (C) equivalent to having a single up-to-date copy of the data;
  • high availability (A) of that data (for updates); and
  • tolerance to network partitions (P).
The original article is an excellent read. Eric makes his points with crystal clarity.

Eric,
I have found the CAP theorem and this piece to be very helpful when thinking about tradeoffs in database design - especially of course in distributed systems. It is rather unsettling to trade consistency for anything, but we have of course been doing that for years.

I am interested in your thinking about the topic more broadly - where we don't have partitions that are essentially of the same schema, but cases where we have the "same data" but because of a variety of constraints, we don't necessarily see the same value for it at a moment in time.
An example here. One that we see every day and are quite happy with. That of managing meetings.
Imagine that you and I are trying to meet. We send each other asynchronous messages suggesting times - with neither of us having insight into each other's calendar. Eventually we agree to meet next Wednesday at 11am at a coffee shop. Now there is a shared datum - the meeting. However there are 2 partitions of that datum (at least). Mine and yours. I can tell my system to cancel the meeting. So my knowledge of the state are "canceled", but you don't know that yet. So we definitely don't have atomicity in this case. We also don't have consistency at any arbitrary point in time. If I am ill-mannered enough not to tell you that I don't intend to show, the eventually consistent state is that the meeting never took place - even if you went at the appointed hour.

I would argue that almost all the data we deal with is in some sense ambiguous. There is some probabilty function (usually implicit) that informs one partition about the reliability of the datum. So, if for example I have the reputation for standing you up, you might attach a low likelihood of accuracy to the meeting datum. That low-probability would then offer you the opportunity to check the state of the datum more frequently. So perhaps there is a trust continuum in the data from a high likelihood of it being wrong to a high likelihood of it being right. As we look at shades of probabilty we can make appropriate risk management decisions.

I realize of course that this is broader than the area that you were exploring initially with CAP, but as we see more on the fly analytics, decision making, etc. we will discover the need for some semantics around data synchronization risk. It's not that these issues are new - they assuredly are not. But we have often treated them implicitly, building rules of thumb into our systems, but that approach doesn't really scale.

I would be interested to hear your thoughts.
PS I have cross posted this against the original article as well.

Tuesday, May 15, 2012

The importance of context

I am about to display my programming roots. History alert.
In a far off kingdom computers were made by an all powerful company - called IBM. IBM had the most magnificent Operating System, inventively called "OS". This operating system came in a number of dialects (OS/MFT, OS/MVT - eventually morphing to SVS and MVS before becoming Z/OS). The people marveled. What wondrous naming! But I digress.
To get work done on these behemoths - especially batch work, a special dialect, conjured from an unfettered imagination, was created. This dialect - whose name is uttered in hushed tones was "JCL" or "Job Control Language".
JCL provided the context under which jobs are scheduled, programs executed, files were created or disposed (disposition processing). The JCL sorcerers were much in demand in the early devops days.
IBM provided a series of utilities for doing useful tasks to files, jobs, etc. But the most cunning, the most fiendish of all was the well named IEFBR14. Befor describing its inner workings in gory detail, we need to step back and look at the JCL some more.
When a program executes in an "OS" environment, it can indicate to the environment that it has been successful or has failed. This is done using a "Return Code". Nothing strange there - at least not on the surface. However the return code value can be used to control what happens next. For example if a program is supposed to create a file, but somehow aborts, one can through the magic of JCL say that the system is to delete the file. If the program is successful, one could tell the system that the file is to be kept, etc.
Genius.
So where was this return code kept? In a general pirpose "register" called register15 (R15) for short. Why there? Because R15 had a use at the beginning of the program and not much thereafter. When a program executes, R15 contains the memory address of the entry point of the program (well almost, but that's close enough for government work). So the one value one would not expect in R15 was 0. It was thus important to explicitly set R15 to the proper value before the program terminated. Other wise the return code would be the starting address of the program. Awkward.
Now lets look at the program IEFBR14. It's genius was that it did absolutely nothing. It started and immediately exited. It used to consist of a single machine instruction. The instruction (BR 14) that causes the program to terminate (actually branch to the address held in register 14, which by convention at the end of the program is back to the OS). When the program terminates, disposition - as controlled by JCL takes over. Since the return code value was random and arbitrary (except its value is always nonzero and evenly divisible by 4), no exection of IEFBR14 ever executed cleanly. Thus messing up disposition processing.
To end a long story, the size of the program IEFBR14 was doubled. From one instruction to 2. First off R15 was cleared to zero - at least its value is now predictable. And then the BR14 instruction executes. Victory!
The important lesson, however, is that you cannot ignore the context in which your systems (in this case a simple program) execute. The environmentals are key.

Law is the requirements specification for the system of being a resident

Our (US) government uses taxation and tax relief as a way of enacting all sorts of public policy legislation. For example, we have a way of paying medical bills through the use of a tax relieved account. Sounds great doesn't it. I put so much aside every month into an account (a Flexible Spending Account or some such TLA). Then when I have to pay for some procedure (like new glasses, eye exams, dental stuff, etc.) that is somehow not covered by my healthcare payment system (aka "Insurance" which it patently isn't), I use money from this tax relieved account. Sounds great - I'll have some of that think the politicians who prepared the bill and sold it to banks, lobbyists, lawyers and eventually the people.

However it doesn't quite work! On every occasion that I have used the account this year I get a letter from the account sdministrators that essentially says, "I don't believe that you have used this for a legitimate purpose, so please provide suitable documentation".
I bought eyeglasses and lenses, I had my teeth cleaned. The credit card receipts showed where I spent the money, but not on what. So, I now have to go through find the receipts with the actual things I paid for on them, submit them to the processing company - who have presumably got a bunch of employees doing low value work verifying that I haven't somehow spent the money on something not covered (toothpaste for $185  at the dentist?, Glasses cleaner for $350 at the optomerist?)

When our lawmakers specify the "system", they don't seem to take the possibility of fraud into consideration. The initial assumption is all unicorns and rainbows. It is assumed that people won't cheat the system, that the happy path is the only path....

That thinking, however, fails to take into account the inventiveness of part of the population. That part of the population that will attempt to use the system in a way it was not designed to be used, for personal gain. So the cycle seems to be.

  1. Create legislation that makes things look really rosy for the populace, vested interests, lawyers, etc.
  2. Roll the "system" that embodies that legislation out
  3. Be shocked that there is abuse
  4. Place layer upon layer of administrative/bureaucratic overhead to prevent the potential abuse
  5. Ignore fraudsters
  6. Proclaim that jobs have been created
  7. Rinse and repeat.
If we can't be sure that a relatively small system will behave properly - even with iterative development methods, what hope is there for the waterfall approach in the legislative process?

Thursday, May 10, 2012

A rant against 1:1

Everey now and again, I get really annoyed with sites that assume you have only one of something. "Please enter your email address" is a common request - except that I have several, and would like to have the opportunity to use any of them as my login id. After all they are each unique. Tripit.com does it right. Many other sites do it wrong. This posting from Robert Scoble illustrates the kind of muddy thinking. Apple making the assumption that there is one credit card.
Years ago, I used Plaxo. However the geniuses behind that didn't think I might have more than 1 email address, more than 1 set of followers, so I would get suggestions from them to follow people I was already following.
In the world there are very few 1:1 correspondences that are timeless. So any time a system assumes that there is a pair of things that are in absolute 1:1 correspondence, I am mightily suspicious.
There are 2 interesting cases to ponder:
1:1 at a time and 1:1 over time.
1:1 at a time, I get. However there have to be rules/policies/processes or whatever to change to a new one. But even those are suspicious because we may have to account for the zero case. And it usually isn't bilaterally 1:1.
1:1 over time is much harder. If it isn't possible for two things to exist independently of each other (for each one there is always exactly one of the other), then we have to question why they are not combined. By the way there are often good technical reasons, but maybe not so many good business or policy reasons.
So a word to the wise, when someone tells you there is a 1:1 correspondence, then they may be talki8ng about a single world view and that you should at least explore the alternatives lest you be trapped in an expensive rethinking process.

Saturday, April 28, 2012

Event Distribution and Event Processing

I have recently been involved in several discussions (sales opportunities perhaps), where the answer seems to be, "We need a CEP engine". Of course if one chooses solutions based on products there's something wrong. And then working with the sales force, I hear, "Customer X wants to buy our CEP engine, you know something about the industry, what use cases should we propose?" When I delicately suggest that nothing they have said so far qualifies a CEP need, and that the problem is bigger (based on industry knowledge), and that it will require more than the CEP engine, I get the message, "That will drive the price up too much, and anyway we have told them that the CEP engine is the way to go, so we can't change..." So why ask me?

But that isn't the whole point of this post.

There are things that CEP engines are *really* good at. However, distributing events isn't necessarily one of them. Now when it comes to interpreting the events in relationship to each other in a tight time window - now we are talking. When it comes to creating events out of that interpretation, we again have good cases. But that isn't distribution either - that's just notification.

But the nagging question is there. "How does the CEP engine (or indeed any other kind of event processor) get to hear about the events it is monitoring?"

A way of looking at that is in terms of the Event Distribution Network. Now that is serious architecture and infrastructure. Not to speak of some mental gymnastics on behalf of both the business and technology communities.

Conceptually, events are easy things. "Something happened". Of greater trouble is making sure that the knowledge that "something happened" gets to the right place.

The right place might be a CEP engine - we want to see the implications of what happened with a whole bunch of other things that happened. Oh, and do it in Near Real Time (NRT) (Whatever that means!).

But another place might be at the next stage of a business process. "The customer paid their bill, let's ship the goods". In other words thge event as the trigger with a process call-to-action. These aren't exclusive alternatives.

Of course there can be many things that need to know about the same event.
So just because you see a customer need that says "event" and you have a product that has the word "event" in its description, don't make the mistake of assuming that one matches the other.

It's as absurd as the trouble compilers have with English in examples like this. "Fruit flies like a banana". "Time flies like an arrow"

Thursday, December 15, 2011

Clouds and scalability

This post comes from an online exchange with Roger Sessions (@rsessions on twitter) Leo de Sousa (@leodesousa) and Chris Potts (@chrisdpotts).

Roger makes the point that the various cloud vendors make their case on "scalability" without defining the term sufficiently. As marketing (almost) always does. So he has a point. The question for me then is, "What scales?". It is my firm commitment that when using terms that you intend to quantify, you had better get the dimensions correct. Is scalability a benefit? Of course that depends on what it means. It feels good, hits us in the unthinking (or as Daniel Kahneman calls it "System1") area. It's only when we look more deeply we realize that we have no idea what it means. Yes I'll have 7kg of scalability please.

It all gets to the economics of what you think you want to do. Here are some examples:
  • I want to be able to increase the workload that my system is capable of without having to buy, provision, manage a bunch of servers - Scaling for workload
  • I want to be able to add lots of new users without having to.....- Scaling for users
  • I want my system to be available and priced according to the actual usage. Kinda like electricity. So when all my users are signing in, I want to allocate lots of capacity because that's intensive. But when it is running along smoothly I need less. Scaling for peak demand
  • I want to empower a demonstration team so they can bring up new instances of a standardized template and demonstrate something to a customer/prospect and then tear it down while incurring as little cost as possible. - Scaling for efficiency of people
  • I want to be able to add new functionality with less effort/cost than previously. Scaling for functionality
  • I want to reduce the burden on in house departments (finance, legal, HR or other "overhead" departments) in the deployment of equipment. - Scaling for organizational effectiveness
While I am about it, I wonder what the effective scaling order looks like. For example, maybe I want to scale linearly for workload. In other words as demand increases, supply increases at the same rate. No effective reduction in cost/transaction.

Or I might be prepared for slightly more - the ratio is for each increase in demand, I get a 1.1 increase in cost of supply.

Or I might want to see a reduction - for each increase in demand, my cost goes down .

So as Roger observed, make the vendors of the cloud services be specific about what they are selling when selling scalability

Friday, September 9, 2011

We call that government

I was reading this post about QANTAS having to stop on the way from Dallas to Brisbane to refuel several times since starting the "nonstop" service. The service is "direct" from DFW to Sydney - which in the strained parlance of the travel industry means it isn't direct, it just means that you have the same flight number all the way, regardless of the number of stops. But I digress...

The posting got me to thinking about the diminishing returns when you add overhead. To fly further, you have to add more fuel. But adding more fuel doesn't give you a linear increase. You have to add more fuel to compensate for the fuel you had added to make you fly further... At the limit, all you do is fly a plane with just fuel and the necessary flight crew. Nothing useful comes of it - except the corner cases where you are testing limits on purpose. The focus is wrong. It isn't about getting the plane there (again some corner cases like getting the plane to a warzone), it's about getting the passengers and freight there.

The mission gets mangled if the focus on the plane not the passengers or freight.
Similarly in many corporations, if we consider the "running of the compnay" to be equivalent to the fuel, then as we add more "running the company" "resources" so we get diminishing returns.

Eventually we have a company that is dedicated to just running itself, does nothing useful, but everyone is busy.

We call that government

Wednesday, August 17, 2011

Social Media and CRM

I am pretty late to the blogosphere about the differences between social media and CRM. But in customer meetings I see this kind of confusion all the time. So here goes. Oh, and since I mainly work in the travel industry, my examples come from there.

So first what are the goals of CRM? Knowing the customer. Being able to have enough insight into the customer to persuade the customer to buy more, become more intimate - generally increase their direct value. Maybe to right wrongs - by providing compensation if bad things happen. So CRM tends to foster a set of 1:1 exchanges. Necessary but not sufficient.

In contrast social media is about somehow leveraging an network of relationships. It's one stage on from CRM (or maybe many stages beyond CRM!). So if I have a bad experience on an airline, I need the airline to do what t needs to do for compensation. But the social aspect doesn't stop there. I am heavily armed with well connected devices, a network of acquaintances and friends and time - especially on the flight. So in some ways the Social Network space is a competition between the airline provider, and the customer. Rach trying to get "the story" out to their social network orbits. The magic happens when the stories coincide - when the CRM aspect of looking after me is also told by the airline and by me. The double dip of great publicity.

But when things go badly for a customer (e.g. my luggage misconnected) then the bad story needs to be acknowledged. Through the social channels - posting on the passenger's FB wall for example, and some compensation placed at the same time. Otherwise the annoyed passenger with lots of time will send out a stream of invective to any/all who may listen.

So when thinking about Social Media realize:
  • It's a conversation
  • It's about the whole orbit
  • Don't forget to do CRM blocking and tackling