Big Data: Important and Tricky

One of the areas of so-called big data that is heating up this year is sales analytics. According to Ventana Research, more than 60 percent of the organizations it polled plan to invest in the technology this year. But the predictability that companies want in forecasts may never come.

That’s because companies have some fundamental misunderstandings of what mining sales data can provide. Historic data can only tell you what sold in the past, not what will sell in the future. Furthermore, forecasting often depends on the subjective feelings of salespeople, who tend to be selected for high degrees of optimism that can unrealistically skew projections.

The more of these forecasts you combine, the greater the degree of inaccuracy baked into the results. There are dangers in using big data — dangers that aren’t necessarily obvious the way many traditional information computer information systems degree programs approach the subject. Although large amounts of data can provide insights, they can also easily send you down the wrong path if you’re not careful. Here are some ways big data can mislead:

  • Data isn’t representative. Big data techniques depend heavily on statistics to find the underlying pattern that managers seek. But that means you work under a set of mathematical expectations. A key one is that results only describe the group that the analysis can represent. Draw conclusions outside of that group and they may not hold true. For example, you might examine data from a particular social network to better see what your customers think about your brand. But if customers don’t typically use that network, you may have learned nothing.
  • Know what’s in the driver’s seat. In statistics, there is a difference between causality and simultaneity. In a farcical case, a little old man walks to the beach every morning for a year, rain or shine, right before dawn and stays until after the sun is up. The two events are simultaneous, but not causal. The man awakes and leaves his house before dawn so the sun’s appearance doesn’t seem to be the direct cause of his routine. Clearly the sun doesn’t wait for the man’s walk to the beach. Confusion between simultaneity and causality is rarely so clear-cut, at least to the person trying to make a connection between two sets of data.
  • People make mistakes. The biggest source of error is human in nature. Human beings can look at the wrong data, incorrectly perform calculations, or misuse data to support conclusions they had already drawn. When they do, chances are good that some will try to blame the data or the tools for not delivering what the users expected.

Big data can provide useful views of what is happening in a business. It can also lead to disastrous decisions. The difference is all in how you use it.