What to do when data disagrees

Data should be used to inform your decision-making, not to make your decisions for you.

"More data means more information, perhaps, but it also means more false information." —Nassim Taleb

At the most recent Formula 1 British Grand Prix at Silverstone, 7-time world champion Lewis Hamilton received a call on his radio in the middle of the race from his engineer:

"This rain is going to last for another 6 laps, at least."

At the time, Lewis had dry weather tires on. They're called "slicks" because they have no treads or grooves like wet weather tires ("wets") or light rain intermediate tires ("inters"). The tradeoff is that slicks are much faster but have no grip in the rain, while wets and inters are optimal during wet weather but are slow once the track tries out.

So the engineer was basically saying: come in to the pit, Lewis, and change out those slicks. Our radar and weather data says that the rain is going to remain for awhile so you should be on wet weather tires.

Hamilton, the driver, replied:

"There's no rain anymore, mate."

He rejected the engineer's suggestion and stayed out on the slick tires for the rest of the race. The rain never came, and Lewis went on to win his record-breaking 9th win at the Silverstone track, and extended his F1 record to 104 all-time wins.

NORTHAMPTON, ENGLAND - JULY 07: Lewis Hamilton of Great Britain driving the (44) Mercedes AMG

Formula 1 is the pinnacle of motorsport where cars and drivers are pushed to their physical limits, and the importance of data cannot be understated. Most of the data comes from the over 250 different sensors placed throughout the car measuring physical quantities such as temperatures, pressures, torques, speeds, but also the internal system statuses to monitor the state of more complex digital and physical components. Taken altogether and in combination with video feeds, there can be upwards of 1 terabyte (TB) worth of data produced by each car during a single race weekend. A terabyte is about 1,000 gigabytes—the equivalent of 200,000 5-minute songs, 310,000 pictures, or 500 hours worth of movies.

However, the Silverstone example involved weather data which is external to the car and its systems. Formula One has its own traveling weather service to focus on narrow patches of sky (even individual clouds) and to track storms:

“The more information we give to teams, the happier they are,” says Dietz. “Via a web portal we provide the radar but also the station data, the different model forecasts and a live ticker during the track sessions. For rainfall, our aim is to forecast to the minute when rain will arrive, when it will leave, how much rain will fall and so on.”

Despite the mountains of data, analysis, and suggestion, ultimately it's the driver's call. And, in this example, the driver's call was the right one.

"The information, of course, only gets a team and a driver so far; using the information is what makes the difference between being caught out by changing weather and taking full advantage of it. It’s not just a question of making good decisions, but also knowing with whom those decisions reside."

So, what happened here? How do we know when to trust the data, our own judgment, or other people's anecdotes? How much weight should different types of information be given?

The key principle is this:

Data should be used to inform your decision-making, not to make your decisions for you.

Data, in itself, is neutral. Data needs to be interpreted first, and there are many different ways to interpret a given dataset. (See the classic book, How To Lie With Statistics)

Once interpreted in a particular way data becomes information. It has been used to inform. (The English word information comes from Latin by way of French: "informare: to train, instruct, educate; shape, give form to")

Differing interpretations result in the appearance of different information. The "right" way to interpret data is the one which conforms best to the actual reality of things. That is: reality is the final arbiter of truth, accuracy, and validity.

The driver on the track can be thought of as the one most directly in contact with reality—the provider of anecdotal evidence and experiencer of things—but he can also be thought of as the car's most important sensor.

Gathering data is important, of course, but interpreting it is even more important, and the driver is the sensor, the data, and the interpreter of data all at once. The driver is also the only one who can take action on the resulting information accumulated by his team over the radio.

The more data you gather the more information you have to use to make the right decision, but you will also have more false, erroneous, or misleading information that can lead you to the wrong decision. You will also be faced with the scarcity of time, and decisions need to be made quickly.

You will never have complete information. You will never have all the data. You will never have enough time. But you will have actual reality. You will have previous experience. And you will have what actually works.

Data can be misinterpreted. Sensors can be defective and measure incorrectly. Analysts can make mistakes. Data should not be seen as objective and the equivalent of truth, but simply as sources of information to be considered and factored into decisions.

Consider using this algorithm, which is likely similar to the one run by professional racing teams:

  • Here is the data
  • Here is what we think the data is telling us
  • This data we are more confident in, and that data we are less confident in
  • Here's what we know has happened before
  • Here is what we think is happening now
  • Therefore: here is our recommendation

The world is a complex, confusing, extremely challenging thing to understand, and the best we can do boils down to approximates and predictions based on limited visibility and understanding.

Use every source of data you can get to inform your decisions, but if you mistake the data for actual reality then you may find yourself losing out and on the wrong tire.

“The thing I have noticed is that when the anecdotes and the data disagree, the anecdotes are usually right. There is something wrong with the way that you are measuring it.” —Jeff Bezos