Data, Anecdotes, Maps, Territories

Incredibly, I am now N years old. In the years that have passed since I was N - 20, I have found that one of the most important questions I can ask myself at any time is, “What the hell is going on?” If I can answer that question, most of the time I know what to do.

On LinkedIn I had an interesting discussion with an old friend about data and decision making. I mentioned the popular Jeff Bezos saying, "when data and anecdotes disagree, the anecdotes are usually right." Some had doubts about whether it is true. Others felt it is not well expressed. I empathize with both points but I have found this principle quite useful.

Lex Fridman’s recent podcast with Jeff Bezos covered this very topic, starting about 83 minutes in.

https://youtu.be/DcWqzZ3I2cY?t=5064

The key bit is this:

“You forget the truth of why you were watching that metric in the first place. [..] You need to constantly be on guard. managing to metrics that they don’t really understand. They don’t know why they exist, and the world may have shifted under them a little, and the metrics are no longer as relevant as they were when somebody ten years ago invented the metric.”

This is a restatement of the even older aphorism, “the map is not the territory”.

Before I continue, I want to make clear that data is good, and data should be used for decision making. I believe that most organizations vastly underutilize or misinterpret their data. The first order of business for such organizations should be to try to fix this problem, and not get too caught up in anecdotes. Why is data good? Because our judgment is bad - but bad in very useful ways. This has been discussed at length by social scientists, most notably Kahnamen and Tversky in “Thinking Fast and Slow”. Our “System 1” brain is frequently biased: we overemphasize negative outcomes, rely on information that readily comes to mind, and anchoring our assessments in previously set expectations. The list goes on. Data helps us to push past these biases and try to take a more reasoned “System 2” view.

So why are anecdotes useful? Why should we listen to them? Jeff’s point is nuanced. He is not suggesting that a single anecdote trumps a terabyte of data. He is not suggesting that we should not collect data. He is pointing out that the map is not the territory. The map may be out of date. The map may not be detailed enough to make the decision we want to make. We may have the map upside down. We may have sailed off the edge of the earth.

Moreover, even if the data is “right”, we may have done a crappy job of interpreting it. For example, Sam Savage’s Flaw of Averages entertainingly describes the perils of making decisions on single point estimates. Customers may like your product 90% of the time. That’s pretty good. But what if we are making a decision that is much more pertinent to the remaining 10%? The objection to this is “aha, but you can still make a data informed decision.” And that is true. But if the relevant metric, that is, the relevant transformation and pivot of the data, is not brought to fore, the relevant assessment will not happen. Interesting anecdotes often serve the purpose of highlighting the blind spots in how we look at our data.

I once worked for a manager who frequently reduced complicated problems down to rather pithy statements about data. “Ah, we just need to look at the P95.” An overreliance on aggregate metrics can blind us from understanding the variability and variety in the territory. This is often where key battles are won and lost.

I hesitate to use military analogies, but anecdotes can be much like “letters from the front lines”. They tell us what is going on in the world itself; the terra firma. Data, on the other hand, is indirect and necessarily lower fidelity (for as they say, the plural of anecdotes is data). A general who relies only on maps, computer screens, and PowerPoints, and fails to understand what the troops are saying is a poor one. (Orson Welles tells a likely apocryphal story about George Marshall that makes this point rather well.)

Good data scientists notice weird things and look into them. Little outliers, or blips, or things that don’t seem right. They do this even when the big important metrics say everything is fine. These little blips are, in essence, anecdotes. They should not be ignored. Nor should they be blindly followed. They should be investigated. For me, this is the point of the saying.

One last philosophical point. Metrics, on their own, do not require us to form a theory. Neither does code, on its own. I believe that good decisions, if they are good on purpose and not by accident, rely on some kind of theory about the world. A theory that can be tested and re-evaluated. This is entirely compatible with science, and compatible with data, and compatible with anecdotes. Anecdotes can point the way to situations where we do not understand things; where our theory is incomplete. Feynman used the analogy of chess to make this point.

https://www.youtube.com/watch?v=o1dgrvlWML4

So, yes, the aphorism is probably not well stated. The point is that quite often you ignore anecdotes at your peril!