By Joy Phillips
As you may or may not know, 2013 marks the celebration of the International Year of Statistics, coined “Statistics2013” – a celebration to increase public awareness of the power and impact of statistics on all aspects of society. With that said, I could not help but bring to memory the phrase “Lies, d***** lies, and statistics,” which is claimed to have originated in England in the 19th century, but made popular in America by Mark Twain in the 20th century. The phrase describes the persuasive power of numbers to support an argument. But wait: is it fair to refer to statistics as the highest form of lying under any circumstance?
First, let’s define statistics. In Gerald Hahn and Necip Doganaksoy’s book A Career in Statistics: Beyond the Numbers, the definitions include these parts:
- The science of learning from or making sense out of data;
- The theory and methods of extracting information from observational data; and
- The art of telling a story with numerical data.
What grabs my attention is that statistics is defined as both a science and an art. Can statistics be both? Is this the source of confusion that leads ultimately to some people attributing statistics to all lies? For statistics to be a science, it must follow some standard and proven methodology to arrive at its conclusions. For it to be an art, it must lend itself to some notion of manual skill, intellectual manipulation, or personal expression. My belief is that while statistics can be both a science and an art, it cannot be both at the same time. The data is generated first using a scientific process (can be as simple as counting), then it can become an art when it is subjected to human interpretation. Human interpretation can be regarded as a lie only if others know the truth lies elsewhere, and the person giving the interpretation is aware of that.
Case in point: For the District of Columbia, the U.S. Census Bureau has multiple 2010 population numbers: 601,723 (Census 2010); 604,453 (American Community Survey [ACS] 2010 1-year data); and 604,912 (2010 population estimate). Which one is the true population number? The decennial Census, last held in 2010, is a count of the population at a specific point in time. ACS 1-year data is a sample of the population over a twelve-month period from January to December. The population estimate for a particular year is based on administrative data such as births, deaths and migration. The average person will not know the intricacies of the Census Bureau’s methodology and may be forced to conclude that none of these numbers are correct or that they are all wrong (or a lie). Scientifically, it can be argued that each is correct based on the method applied. Artfully, each number is subject to its own interpretation based on the information known or assumed in its derivation.
From my viewpoint, statistical interpretation is such that regardless of the issue or argument, people usually find a number that can be used to support it or they find another number that they believe should be used instead. Does this mean that statistics is a lie? Certainly not! However, for some, the jury is still out. Where do you stand?
Joy Phillips is the Associate Director of the State Data Center, OP