Not PC: No, "Big Data" Can’t Predict the Future

Monday, 14 December 2015

No, "Big Data" Can’t Predict the Future

We've been told that with enough data, we can use sophisticated computing methods to predict the future. That often works with the physical sciences, acknowledges Per Bylund in this guest post, but predicting human action is something else altogether...

With Google’s dominance in the online search engine market we entered the Age of Free. Indeed, services offered online are nowadays expected to be offered at no cost. Which, of course, does not mean that there is no cost to it, only that the consumer doesn’t pay it. Early attempts financed the services with ads, but we soon saw a move toward making the consumer the product. Today, free and un-free services alike compete for “users” and then make money off the data they collect.

Data has always been used, but what’s new for our time is the very low (or even zero) marginal cost for collecting and analysing huge amounts of data. The concept of “Big Data” is taking over and is predicted to be “the future” of business.

There’s a problem here, and it is the over-reliance on the Law of Large Numbers in social forecasting. Statistical probabilities for events may mathematically converge to the mean, but is it applicable in the real world? The answer is most definitely yes in the natural sciences. Repeated controlled experiments will weed out erroneous explanations or causes to phenomena, at least assuming we’re good enough at separating and controlling those causes.

What about the social sciences? In this age of scientism, as Hayek called it, we’re told “Big Data” will completely transform production, logistics, and sales. The reason for this is that vendors can better target customers and even foresee what they might want next. Amazon.com does this on their web site in crude form, where they make suggestions based on your purchase history and what others with similar purchase histories have searched for. Sometimes it works, and sometimes it doesn’t.

There is some regularity to our interests and behaviour. All of us are, after all, human beings — and we’re formed in certain cultures. So one American with interests x, y, and z may have other interests similar to another American who also has an interest in x, y, and z.

Human Behaviour Is Unpredictable

But similarity is not the same thing as prediction. Amazon.com’s suggestions or the highly annoying ads following you around web sites are useful methods for sellers because they can somewhat accurately identify what not to offer. Exclusion of very low-probability interests increases the probability for suggesting something that the person behind the eyeballs focusing on the computer screen may be interested in.

To use as prediction, however, exclusion of almost-zero probability events is far from sufficient. Indeed, prediction requires that we are able to accurately exclude all but one or a couple highly probable outcomes. And we have to be able to rely on that these predictions turn out to be true.
Otherwise we’re just playing games, and so we’re making guesses. Sure, they’re educated guesses (because we’ve excluded the impossible and almost-impossible), but they’re still games and guesses.

Where Big Data Fails

Speaking of guesses, Microsoft’s Bing search engine, which powers the Windows digital assistant Cortana among other things, has produced a prediction engine with the purpose of predicting sports and other results. They rely on very advanced algorithms and huge amounts of collected data.
Amazingly, they did very well initially and predicted the outcomes of the Soccer World Cup perfectly. So maybe we can use Big Data to get a glimpse of the future?

No, not so. The Bing teams are learning a lesson only Austrian economists and, more specifically, Misesian praxeologists, seem to be alone in grasping: that there are no constants in human action, and therefore that predictions of social phenomena are impossible. Pattern predictions, as Hayek called them, may not be impossible, but predictions of exact magnitudes are. For instance, we can rely on economic law (such as “demand curves slope downward”) to estimate an outcome such as “the price will be lower than it otherwise would have been,” but we can’t say exactly what that price will be.

When it comes to sports, reality shows and other competitions between individuals or teams, the story is exactly the same. The team with a better track record doesn’t always win. Why? They have objectively performed better than the other team, perhaps exclusively so, but this doesn’t say anything about the future. We’re not here referring to the philosophical doubt as in “will the sun shine tomorrow?” (maybe something changes completely the sun’s ability to shine during the night).

The Social Sciences Are Different

In the social sciences we’re dealing with complex phenomena. Action and, especially, its outcome is the result of a complex system of social interaction, psychology, and much more. Are the players in both teams as motivated and focused as they were before? Did anything in their personal lives affect their mindsets or psyches? How do the players within their teams and players in other teams react on each other before and during the game? A team with a poor track record can upset a team with an objectively better track record; this happens all the time. Sometimes for the sole reason that the better team underestimates the worse team, or because the underdog feels no pressure to perform and therefore plays less defensively.

Bing’s prediction engine struggles with this, just as we would predict. As Windows Central reported recently, the prediction engine had its “worst week yet” picking only four of fourteen winners in the NFL. Overall, its track record was approximately two-thirds right and one-third wrong (95–53). It’s definitely better than tossing a coin, but pretty far from actually predicting the results.
In other words, if you’re placing bets you may want to use the Bing prediction engine. That is, unless you have the type of tacit, implicit understanding of what’s going on that the engine is missing. Maybe you can beat it, or maybe not. In either case, you cannot count on coming out a victor each and every time.

The reason for this is that the outcome simply cannot be predicted perfectly — or even close to it. Even the players themselves cannot predict who’ll win a game, but they may have inside information about whether their own team seems motivated and focused. It is not a perfect method, however, and it certainly cannot be scientific.

Even with Big Data there’s no predicting of social events — there’s only guessing.

Yes, guessing with access to huge amounts of data is easier, at least if the data is reliable and relevant. But a good guess is not the same thing as a prediction; it is still a guess, and it can be wrong.

Winning every time requires luck.

Per Bylund is Assistant Professor of Entrepreneurship and Records-Johnston Professor of Free Enterprise in the School of Entrepreneurship at Oklahoma State University.
Visit his website at PerBylund.com.
This post first appeared at the Mises Daily.
Image source: iStockphoto

7 comments:

Anonymous said...: A Minor point: Services such as google make money mostly because our privacy laws are archaic.

A modern update of traditional understanding of privacy could quickly put Google out of business.

Another point: as a wise man has said, "social science" is neither social or science. By its very nature in cannot be science in the modern meaning of the term. Here we move beyond mere scientism and into pseudo science. This tendency is further exacerbated by the term "data science", which from a scientific perspective is literally an unintelligible utterance. It is not science; moreover, there can hardly be a "science" about "data" per se.

In actuality, marketers have been using statistics for years; all things considered all "big data" does is broaden the sample set.

the whole craze shows a deep lack pf seriousness and a profound misunderstanding of what science, or even mathematics is about.; 14 Dec 2015, 17:25:00
Falafulu Fisi said...: I think that the Professor doesn't understand what prediction, estimation, forecasting are? He needs to learn about those concepts first (mathematical context) before spewing out his word-smithing guesses.; 14 Dec 2015, 18:37:00
Anonymous said...: The Professor appears well knows what he is talking about, Fisi. It appears to have gone right over your head. Try re-reading it.; 15 Dec 2015, 00:43:00
Falafulu Fisi said...: The professor is ignorant about the analytics & big-data field & that's my point. How can someone railed against big data when in fact he has zilch knowledge of what analytics is all about.

Quote "It’s definitely better than tossing a coin, but pretty far from actually predicting the results."

The professor's quote above simply says it all. Does he understand what prediction means or he's just babbling? One predicts an event 'A' with a probability (upper-bound/lower-bound). That's what prediction is in the field of analytics. It doesn't say that event 'A' will occur at date 'B' with probability of 1. Prediction says that event 'A' will happen with a confidence C percents. That's it. This is how human mind works. The mind works in probabilistic reasoning, not certainty.; 15 Dec 2015, 07:17:00
Falafulu Fisi said...: The mind works like the following without the person being aware of it that the mind does roughly probability calculation in his/her mind. It builds into our mechanism of thinking. Objectivist can go further & say, aha, this is one part of knowledge integration without the mind (the person) being aware that itself is computing probabilistic scenarios in his/her thinking to make decisions. Objectivists know what knowledge integration is about. But when probe further to describe what it is quantitatively, then you can't get one, but you're being bombarded with, oh, it's how the mind combines separate nuggets of knowledge to form new facts or nuggets. Well, such explanation is too vague to understand. But how? The answer is what I've just stated above. When the mind computes & weights facts/scenarios in his/her mind, it is in fact doing knowledge integration. Such knowledge integration is often vague (fuzzy & imprecise) or uncertain (probabilistic). When a person is making decision, then prediction (either fuzzy or probabilistic) enters his/her thought. The mind does its job by weighing/computing its premises to come up with roughly accurate consequent or facts.

"Think Rationally via Bayes' Rule"
https://www.youtube.com/watch?v=NEqHML98RgU; 15 Dec 2015, 07:39:00
Falafulu Fisi said...: Quote from the Epistemology site:

Quote : "It is concerned with how our minds are related to reality, and whether these relationships are valid or invalid."

"What is Epistemology?"
http://www.importanceofphilosophy.com/Epistemology_Main.html

It is easy to understand the description above even to 5 year olds. The question is how does the mind do it? How can the mind determine the validity of a hypothesis that's consistent with the qualitative description above? I just explained it in my previous posts. The mind is a computation engine, with or without the person being aware of mathematics or not, but that's how the mind works. In fact all humans are mathematicians (the minds) in their thinking process without any knowledge of math at all.; 15 Dec 2015, 08:07:00
Ben said...: Of course he's ignorant about big data. Austrian Economics folk pride themselves on ignoring empiricism.; 15 Dec 2015, 11:21:00