Friday, June 8, 2012

In stream and out of band

Big data seems to be popping up everywhere. The focus seems to be on the data and the engines and all the shiny toys for doing the analysis. However the tricky part is often getting hold of the slippery stuff in the first place.
In the cryptography world, one of the most useful clues that something big is about to "go down" is traffic analysis. Spikes in traffic activity provide signals to the monitoring systems that further analysis is required. There is useful information in changes in rate of signals over and above the information that may be contained in the message itself.
Deducing information just from the traffic analysis is an imprecise art, but knowing about changes in volume and frequency can help analysts decide whether they should attempt to decrypt the actual messages.
In our systems, this kind of Signal Intelligence is itself useful too. We see it in A/B testing. We see it in prediction about volume for capacity planning. In other words we are losing a valuable source of data about how the business and the technology environments are working if we ignore the traffic data.
Much of "big data" is predicated on getting hands (well machines) on this rich vein of data and performing some detailed analysis.
However there are some challenges:
  • Getting access to it
  • Analyzing it quickly enough, but without impacting its primary purpose.
  • Making sense of it - often looking for quite weak signals
That's where the notion of in-stream and out of band comes from. You want to grab the information as it is flying by (on what? you may ask), and yet not disturb its throughput rate or at least not much. The analysis might be quite detailed and time consuming. But the transaction must be allowed to continue normally.
In SOA environments (especially those where web services are used), all of the necessary information is in the message body so intercepts are straightforward. 
Where there is file transfer (eg using S/FTP) the situation is trickier because there are often no good intercept points.
Continuing the cryptography example, traffic intercepts allow for the capturing of the messages. These messages flow through apparently undisturbed. But having been captured, the frequency/volume is immediately apparent. However the analysis of content may take some while. The frequency/volume data are "in stream" the actual analysis is "out of band".