Monday, March 12, 2012

Graphically Significant


Whilst the correlation coefficient R2 is a terrific tool for quickly interpreting a calibration dataset, there are limitations. Sometimes you need a human to have a look. I was trying to explain this to somebody at work didn’t have a lot of success. Afterwards I found out about Anscombe’s Quartet. These are four graphs that show why statisticians exist. I changed the datasets a little so that the line of best fit would be near on perfect.

The first graph shows a fairly normal calibration dataset with too few samples. I’d be quite pleased if my first 10 samples showed this graph, the only issue would be to reduce the variation and collect more extreme samples. With this it seems the accuracy is ±1 unit.

The second graph would show perfect correlation if I was looking for a polynomial (y = -0.507x2 + 5.56x – 9.00 to be precise).

The third graph indicated to me that the data point at x6.5, y9.74 must be in error, possibly there was a typo in entering the data. All the other data points here give a perfect correlation to the formula y=0.6908x + 1.0056.

The fourth graph on the other hand is terrible; there really is no discernible line of best fit. Possibly there are only two samples and one of them has been repeated ten times.

Don’t try it with your English teacher, but for everybody else a picture is worth a thousand words.

1 comment:

  1. .. on the flipside, our brains have an over-zealous pattern recognition system and are quite happy to see patterns where none exist. The Anscombe quartet are certainly extreme examples of data that seem to be following another pattern, but still may be constructs of a purely random sample set.

    Looks like the proper term is Apophenia

    ReplyDelete