The Philosophy of Data - NYTimes.com


If you asked me to describe the rising philosophy of the day, I’d say it is data-ism. We now have the ability to gather huge amounts of data.This ability seems to carry with it certain cultural assumptions — that everything that can be measured should be measured; that data is a transparent and reliable lens that allows us to filter out emotionalism and ideology; that data will help us do remarkable things — like foretell the future.

Over the next year, I’m hoping to get a better grip on some of the questions raised by the data revolution: In what situations should we rely on intuitive pattern recognition and in which situations should we ignore intuition and follow the data? What kinds of events are predictable using statistical analysis and what sorts of events are not?
I confess I enter this in a skeptical frame of mind, believing that we tend to get carried away in our desire to reduce everything to the quantifiable. But at the outset let me celebrate two things data does really well.
First, it’s really good at exposing when our intuitive view of reality is wrong. For example, every person who plays basketball and nearly every person who watches it believes that players go through hot streaks, when they are in the groove, and cold streaks, when they are just not feeling it.
But Thomas Gilovich, Amos Tversky and Robert Vallone found that a player who has made six consecutive foul shots has the same chance of making his seventh as if he had missed the previous six foul shots.
When a player has hit six shots in a row, we imagine that he has tapped into some elevated performance groove. In fact, it’s just random statistical noise, like having a coin flip come up tails repeatedly. Each individual shot’s success rate will still devolve back to the player’s career shooting percentage.
Similarly, nearly every person who runs for political office has an intuitive sense that they can powerfully influence their odds of winning the election if they can just raise and spend more money. But this, too, is largely wrong.
The data show that in state and national elections that are well-financed, television ad buys barely matter. After the 2004 election, political scientists tried to measure the effectiveness of campaign commercials. They found that if one candidate ran 1,000 more commercials than his opponent in a county — a huge disproportion — that translated into a paltry 0.19 percent advantage in the vote.
After the 2006 election, Sean Trende constructed a graph comparing the incumbent campaign spending advantages with their eventual margins of victory. There was barely any relationship between more spending and a bigger victory.
In May and June of 2012, the Obama campaign unleashed a giant ad barrage against Mitt Romney, but as political scientist John Sides wrote in The Times’s FiveThirtyEight blog recently, the ads had no lasting effect.
Likewise, many teachers have an intuitive sense that different students have different learning styles: some are verbal and some are visual; some are linear, some are holistic. Teachers imagine they will improve outcomes if they tailor their presentations to each student. But there’s no evidence to support this either.
Second, data can illuminate patterns of behavior we haven’t yet noticed. For example, I’ve always assumed that people who frequently use words like “I,” “me,” and “mine” are probably more egotistical than people who don’t.
But as James Pennebaker of the University of Texas notes in his book, “The Secret Life of Pronouns,” when people are feeling confident, they are focused on the task at hand, not on themselves. High status, confident people use fewer “I” words, not more.
Pennebaker analyzed the Nixon tapes. Nixon used few “I” words early in his presidency, but used many more after the Watergate scandal ravaged his self-confidence. Rudy Giuliani used few “I” words through his mayoralty, but used many more later, during the two weeks when his cancer was diagnosed and his marriage dissolved. Barack Obama, a self-confident person, uses fewer “I” words than any other modern president.
Our brains often don’t notice subtle verbal patterns, but Pennebaker’s computers can. Younger writers use more downbeat and past-tense words than older writers who use more positive and future-tense words.
Liars use more upbeat words like “pal” and “friend” but fewer excluding words like “but,” “except” and “without.” (When you are telling a false story, it’s hard to include the things you did not see or think about.)
We think of John Lennon as the most intellectual of the Beatles, but, in fact, Paul McCartney’s lyrics had more flexible and diverse structures and George Harrison’s were more cognitively complex.
In sum, the data revolution is giving us wonderful ways to understand the present and the past. Will it transform our ability to predict and make decisions about the future? We’ll see.