Just about every introductory statistics book will tell you that correlation does

*not*imply causation. And indeed, assuming that

**correlation implies causation**is a logical fallacy. For example: This argument is an example of a false categorical syllogism. One argument is that the fallacy ignores the possibility that the correlation is coincidence. But we can always pick an example where the correlation is as robust as we please. If chocolate-eating and acne were strongly correlated across cultures, and remained strongly correlated for decades or centuries, it probably is not a coincidence. The "fallacy" is ignoring something

*besides*coincidence.

The "fallacy" ignores the possibility that there is a common cause of eating chocolate and having acne. Take another example: apparently it is true that ice-cream sales are strongly (and robustly) correlated with crime rates. The explanation is that high temperatures increase crime rates (presumably by making people irritable) as well as ice-cream sales.

The statement "correlation does not imply causation" is applied as a warning not to deduce causation from a statistical correlation. But while often ignored, the advice is often overstated, as if to say there is no way to infer causal structure from statistical data. Clearly we should not conclude that ice-cream causes criminal tendencies (or that criminals prefer ice-cream to other refreshments!), but the previous story shows that we expect the correlation to point us towards the real causal structure.

If you believe this, then you believe that robust correlations imply some sort of causal story, whether common cause or something more complicated. Hans Reichenbach formulated the Principle of the Common Cause, which asserts basically that robust correlations have causal explanations, and if there is no causal path from A to B (or vice versa), then there must be a common cause, though possibly a remote one.

Reichenbach's principle is closely tied to the Causal Markov Condition used in Bayesian networks. That theory behind Bayesian networks sets out conditions under which you *can* infer causal structure, when you have not only correlations, but also partial correlations. In that case, certain nice things happen. For example, once you consider the temperature, the correlation between ice-cream sales and crime rates vanishes, which is consistent with a common-cause (but not diagnostic of that alone).

While "fallacy" has been used in quotes, it is still a logical fallacy. If you only have A and B, a correlation between them does not let you infer A causes B, or vice versa, much less deduce the connection. In fact, if you only have these two variables, even the most powerful inference techniques built on Bayesian Networks won't help much. But if there was a common cause, and you had that data as well, then often you can establish what the correct structure is. Likewise (and more usefully) if you have a common effect of two independent causes.

Another example illustrating this fallacy was a study which found that British arts funding levels had an extremely close correlation with Antarctic penguin populations.

In statistics literature this issue is often discussed under the headings of spurious correlation or of Simpson's paradox.