"Facts are stubborn, but statistics are more pliable" - Mark Twain
As an audience you are exposed to presentations and most presentations have some kind of statistics. From simple percentages to detailed market surveys. If you are a CEO you will definitely be flooded with statistics when sitting through any presentation.
It is very easy to lie with statistics. As Mark Twain observed, statistics is very pliable and can be molded to suit your ends. How will you, as an audience, avoid getting fooled by statistics thrown at you?
This is the theme of today's post. It's interesting, so read on.
While travelling from Tirupathi to Chennai last month I read a book. All that you are going to read now comes from that book. It is called "How to lie with statistics" and has been written by Darrel Huff. Without much ado let me start.
1. Sample
If you are reading a research based on a sample ask yourself two questions; (1) Is the sample size enough? If it is too small you cannot draw any conclusion and (2) Is the method of collecting the sample proper? We need to dwell on this a bit more.
Sample Size
The analysis being presented to you holds water only if the sample size is significant. I remember sitting in a presentation where the presenter (an ad agency person) was presenting some findings. She had gone to the market, met retailers and was presenting what they said about the brand. How many retailers were selling the product and what were there complaints. The problem with the research was that the city of Hyderabad has over 8000 retailers and she had gone to only 10. A sample size so small makes no sense and hence you should not let the presenter convince you of anything based on such a sample. Always ask: "What is the sample size? Is it good enough to conclude anything?"
Sample Collection
If the lady had gone to 30 retailers I would have been comfortable listening to her findings. Generally a sample size of 30 is considered 'significant'. But even if she did go to 30 retailers, the next question which should come to our mind is... "Did every retailer in Hyderabad had an equal chance of being visited?" Ask her, how she choose these 30 retailers? Do all these fall in and around her office and so she went there? If yes, then these findings are only relevant to that locality and cannot be extended to (extrapolated to) the entire city of Hyderabad. The sample should be representative of the population. In our case, population means all the retailers of Hyderabad and the sample is the 30 retailers she met personally. Always ask: "How well has the sample being collected?"
2. Average
My biggest learning from the book has been about average. Averages are used in every day life and we know what it means. But that is where lies the catch!
Average Marks in exam = (Total of marks of all subjects) divided by (number of subjects)
This is the mean (mathematical average). There are two more kinds of averages; median and mode. Median = Arrange all values in ascending order, then median is the middle value. Mode = The value which occurs the most number of times. Mean is not the right measure everytime.
If we are talking about the average salary of a batch of MBA students, the best measure to use is the median. Assume the mean is Rs. 14 lakhs and the median is Rs. 7 lakhs. What does Rs. 7 lakhs mean? It means, half of the batch is earning more than Rs. 7 lakhs per year. This is a good measure. Mean gets easily influenced by extreme values. It is possible that 5 or 10 students might be earning Rs. 100 lakhs as well. If you add all the salaries and divide by the number of students (to get mean) you can get an unusually high figure.
If the presenter wants to show a higher figure, she can use mean. If she wants to use a lower figure she has the median. Both are averages. She can use anyone and tell you it is the average. Unless you ask which average she is using, you will surely get fooled.
When someone says 'this is the average', always ask: "Which average are you talking about? Mean, Median or Mode?" Also ask, "Which of these three should be used in this situation?" Better ask the presenter for all three values and ask him/her to justify which average is the best measure to use.
For example, to measure data related to humans, like height or weight, all three measures give almost equal results. For things like salary, median is better.
3. Graphs
This part is pretty interesting. Graphs can be used to influence your reaction and response. Graphs can be distorted easily to suit ones need.
If a newspaper wants to say, "inflation is hitting the roof" it can make the inflation graph dramatic. Look at this example from a newspaper.
The last figure of 9.01% looks like a big jump from 8.06%. Inflation has shot up! Wait, where does the vertical x-axis start? 7. Should it not start at 0? This is what we were taught in school.
Here lies the trick. To make the jump significant, set the axis at 7. If you actually keep the axis at zero, the jump will not look high and hence will not be a 'saleable story' and will never make the front page.
Look at one more on the same topic.
Did you catch it? There is no labels on the x-axis. We have no clue where it starts. But how scary it looks. It zooms up from some point in the bottom to 9.9%. Oh my god! the prices are going up and times are bad.
There is one more disease which newspapers suffer from and others are getting addicted to. Look at this one below.
This is a new trend. The presenter wants to show that the sales of Brand X has doubled. The height of the second image is double that of the first. So whats wrong?
The flaw is, when we increase the height by two times, the width also goes up two times. Even though the label says 40 million, the second image is 4 times bigger than the first. Hence to the eye and the mind the growth 'looks' much more than what it is.
When you are shown a graph look for: (a) Where does the x-axis start? (b) Has the chart been labelled properly? (c) Is the infographic (like the last example above) incorrect?
4. Percentages
Percentages make sense if the numbers are more. If 4 employees left the company last year and only 2 did so this year, do not say there is a 100% decrease in attrition this year. Percentages make sense when numbers are large.
Percentages should always be mentioned along with their base figures. If your agency says "40% people like our ad" ask them how many people they met. What is the base figure? Is it 10 people out of 25 OR 80 people out of 200? The second one is more trustworthy and should be taken seriously.
When a percentage is presented to you ask: (a) What are the actual numbers (numerator and base)? This will give you a clearer picture.
There are many more things which the book covers and explains in more depth. If you have got excited by reading my post, I recommend you buy the book. Click here to buy on Amazon ($6.55) and click here to buy on Flipkart (Rs. 439). It will take 3 hours to read and it's worth the price.
Nice to see an up-to-date re-working of some of Darrel Huff's ideas -- which I first read as a student in California in 1972.
ReplyDeleteI was delighted, and even though I never considered myself good at maths, with that book to lead in to the subject, and an excellent professor teaching our Statistics course, I ended up writing a paper about how we unconsciously use "statistics" in our daily lives -- AFTER the course had finished.
The professor said if I had done it and shown him before he did all the grades, he would have increased mine! Oh well, it was learning we went there for, rather than simply grades -- and I certainly got that.
-- A tutor and journalist in Hong Kong....
A good read.
ReplyDeletewhat I liked the best, is you make so simple to read. I am sure you have got talent to write a book