Tuesday, August 27, 2013

Shearing Layers - Part 2, Stuff

In the previous post I introduced the notion of shearing layers - taken from Stewart Brand's book "How Buildings Learn". In this one I am going to look at how having better data can possibly affect the "Stuff" layer.
For example, shopping habits on the web site can show buying patterns and trends that could translate to the brick and mortar store. Looking at what people search for together online could give a clue to what they are looking for when they get to a store.
Note that the could in the previous observation is very much an imponderable. While the shearing "Stuff" on the web site is really easy (facilitating A/B testing, for example), it is still tricky in the brick and mortar store.
Reorganizing store shelves/layout runs the risk of confusing staff and customers. Things aren't where they were yesterday. Our ingrained habits and expectations no longer work for us. So the risk is definitely there, but there could be some interesting small experiments.
Perhaps it is worth grouping trousers by style/size and not by color. Perhaps it is worth grouping shoes by size, mixing up the brands. Of course that one is very tricky because we buy shoes with our eyes, so we may need to see a floor sample which will be of a single size.
The desires of the store, the desires of the brands and the desires of the customer may well come into opposition.
The online shopping experience can give us a rate of change greater than that in the physical store - delivering data to the store planners and merchandisers that can influence product placement - and the ultimate goal of selling more "Stuff" to the customers

Sunday, August 25, 2013

Shearing Layers - Part 1 physical buildings

In Stewart Brand's terrific work, "How Buildings Learn", there are some great analogies to what we do in Enterprise Architecture. He expanded on the concept of "shearing layers" introduced by Robert V. O'Neill in his "A hierarchical concept of ecosystems". The primary notion being that we can hierarchically understand our ecosystems better by understanding the different rates of change possible at the different layers.
 
 
The diagram above is reproduced from How Buildings Learn, and represents the parts of a building which change at different rates. It is arranged from outside in with, in this representation, no absolute correlation between the parts and rate of change. 
Using Brand's own explanation, the layers have the following descriptions:

Site

The site is the geographical setting, the urban location and the legally defined lot whose boundaries and context outlast generations of ephemeral buildings

Structure

The foundation and load bearing elements are perilous and expensive to change, so people don't. These are the building.

Skin

External surfaces can change more frequently than the structure of the building. Changes in fashion, energy cost, safety, etc. cause changes to be made to the skin. However tradition and preservation orders often inhibit changes to the skin since the skin is very much the aesthetic.

Services

These are the working guts of the building, communications/wiring, plumbing, air handling, people moving. Ineffective services can cause buildings to be demolished early, even if the surrounding structure is still sound.

Space Plan

The space plan represents the interior layout - the placement of walls, ceilings, doors, etc. As buildings are re-purposed, as fashion changes so can the interior quickly be reconfigured.

Stuff

The appurtenances that make the space useful for its intended purpose. Placement of tables, chairs, walls cubicles, etc.
 
In further articles, I will develop this theme in 2 directions. First in thinking about how data can affect the way that retail organizations can think about their layout and organization (shearing at the Stuff/Space Plan layers of both brick and mortar and web stores. Second in looking at Enterprise Architecture through the lens of shearing layers - by analogy with Brand's writing and thinking.

Saturday, June 8, 2013

Trying to understand IaaS, and other nonsense

There's been something upsetting me about the whole notion of Infrastructure as a Service. It has taken me a while to put my finger on it, but here goes. But first, an analogy with electricity usage and provisioning.
When I flick on the light switch, I am consuming electricty. It doesn't matter to me at the moment of consumption where it is coming from as long as:
  1. It is there when I want it
  2. It comes on instantly
  3. It delivers enough of it to power the bulb
  4. It doesn't cause a breaker to trip
  5. It doesn't cause the bulb to explode
  6. ...
So that's the consumption model. That's independent of the provisioning model - at least as long as those requirements are met.
I could satisfy that need through several mechanisms:
  1. I could have it delivered to my house from a central distribution facility
  2. I could make it myself
  3. I could steal it from a neighbour
  4. ....
Regardless of which provisioning methods I use, I am still consuming the electricty. The lightbulb doesn't care. However the CFO of the birdhouse does care. Thinking about the service of electricity - it's about how I procure it and pay for it, not how I consume it. Sure I can add elasticity of demand, it's summer and I am running the air conditioners throughout the house and both oven are on and every light.....But that is a requirement on how I procure the service not on how the devices use it.

Similarly in the software defined infrastructure world, the application that is running doesn't really care how the infrastructure it is running on was provisioned. The "as a service" part of IaaS is about the procurement of the environment on which the application runs.

The procurement model can, of course affect the physical environment of the equipment. Just as delivering electricity to my house requires cable, effects on the landscape, metering, centralized capacity management we have to have those kinds of capabilities in our IaaS procurement worlds. No argument there, but at the end of the day it's about how the capability is delivered and paid for, not in how it operates that really matters.

Monday, March 11, 2013

Rate of Change

I have been trying to help operational IT groups understand how important rate of change of resource consumption is. Finally I cam up with an analogy that helps. In the airline industry, it isn't necessarily a bad thing if an aircraft is on the ground. It may not be returning anything to the business, but it isn't necessarily awful. However the RATE at which it got to the ground is very important. Impact at 300kts is unlikely to be what anyone had in mind. Graceful "impact" at 150 kts may be perfectly OK.
So while the lower rate of change is no guarantee that things are good, the higher rate of change means that immediate action will need to be taken.
Likewise in systems, if a disk gets to use 80% of its capacity gradually, there is probably no need to panic. A careful plan will allow operations to ensure that disaster doesn't strike. If it spikes suddenly then there is a definite need to do something quickly before the system locks up or starts losing transactions.
Knowing that there will be an issue is even more important than recovering from an issue that has already happened.

Tuesday, February 12, 2013

Operational Data Stores and canonical models

One way of thinking about an operational data store is to be a near real time repository of/for transactional data organized to support an essentially query workload against transactions within their operational life. This is particularly handy when you need a perspective on "objects" that have a longer life than their individual transactions might allow. Examples might include supply chain apps at a large scale, airline reservations - business objects that may well have transactions against them stretching over time.
In both cases, the main "object" (large, business grained) has a life span that could be long - surviving system resets, versions of the underlying systems, etc.
Considering the case of an airline reservation, it can have a lifespan of a couple of years - 330 days prior to the last flight the resevation can be "opened", and (especially in the cas eof refund processing) it might last up to a year or so beyond that. At least, give or take.
The pure transactional systems (reservations, check in, etc.) are most concerned (in normal operations) with the current transactional view. However there are several processes that care about the historical view while the reservation is still active. There are other processes that deal with and care about history of completed flights, going back years. Taxes, lawsuits, and other requests that can be satisfied from a relatively predictable data warehouse view.
It's the near term stuff that is tricky. We want to gain fast access to the data, the requests might be a bit unpredicatble, the transactional systems may have a stream of XML data available when changes happen, ...
So how do we manage that near real time or close in store? How do we manage some kind of standard model without going through a massive data modeling exercise? How do we get value with limited cost? How do we deal with unknown specific requirements (recognizing the need for some overarching patterns).
Several technologies are springing up in the NoSQL world (MongoDB, hybrid SQL/XML engines, Couchbase, Cassandra, DataStax, Hadoop/Cloudera) which might fit the bill. But are these really ready for prime time and sufficiently performant?
We are also not dealing with very big data in these cases, or they data might become big as we scale out. It is kind of intermediate sized data. For example in a reservation system for an airline serving 50 million passengers/year (a medium sized airline), the data size of such a store is only of the order of 5TB. It is not like the system is "creating" tens of MB/second as one might see in the log files of a large ecommerce site.
If we intend to use the original XML structures as the "canonical" structure - i.e. the structure that will be passed around and shared by consuming apps, then we need a series of transforms to be able to present the data in ways that are convenient for consuming applications.
However, arbitrary search isn't very convenient or efficient against complex XML structures. Relational databases (especially at this fairly modest scale) are very good at searching, but rather slow at joining things up from multiple tables. So we have a bit of a conundrum.
One way might be to use the RDB capabilities to construct the search capabilities that we need, and then retrieve the raw XMLfor those XML documents that match. In other words a hybrid approach. That way we don't have to worry too much about searching the XML itself. We do have to worry, however, about ensuring that the right transforms are applied to the XML so we can reshape the returned data, while still knowing that it was derived from the standard model. Enter XSLT. We can thus envisage a multi part environment in which the search is performed using the relational engine's search criteria, but the real data storage and returned set comes from the XML. The service request would therefore (somehow!), specify the search, and then the shaping transform required as a second parameter.
It is a bit of a kluge pattern, perhaps but it achieves some entyerprise level objectives:
  • Use existing technologies where possible. Don't go too far out on a limb with all the learning curve and operational complexity of introducing radical technology into a mature organization
  • Don't bump into weird upper bound limits (like the 16MB limit in MongoDB)
  • Don't spend too much time in a death by modeling exercise
  • Most access to the underlying data comes through service calls, so data abstraction is minimized
  • Use technology standards where possible.
  • Rebuild indexes, etc. from original data when search schema extensions are needed
  • Possibly compress the raw XML since it is only required at the last stage of the processing pipe
It also has some significant disadvantages:
  • Likely to chew up considerable cycles when executing
  • Some management complexity
  • Possible usage anarchy - teams expressing queries that overconsume resources
  • Hard to predict resource consumption
  • Maybe some of the data don't render cleanly this way
  • Must have pretty well defined XML structures
So this pattern gives us pause for thought. Do we need to go down the fancy new technology path to achieve some of our data goals? perhaps not for this kind of data case. Of course there are many other data cases where it would be desirable to have specially optimized technology. This doesn't happen to be one of them.