Establishing Causality from observational data is extremely difficult. Just think how long it took to prove that smoking causes lung cancer. But there are a number of things one can do.
One important step is to pay attention to time ordering – causes must precede effects, so if you can retain temporal information, this helps. This enables a number of statistical approaches – most notably Granger Causality, for which Clive Granger received the Nobel Prize in Economics in 2003, and Bayesian Networks. These are not perfect, but can be powerful.
The best way is of course to perform experiments – this is the rationale behind randomised drug trials. Even where controlled trials are not possible, if true randomness is introduced somehow, this can be enough – for example Mendelian Randomisation in meiosis, the cell division behind reproduction. Here nature is performs the randomized trial for us, through the random assignment to a child of either their mother or father’s genes
In e-commerce, with a large customer base, it is of course possible to perform experiments, with A/B testing on the web being the simplest case. This becomes more sophisticated in the testing and placing of online advertising, and more complex still when detailed information about individuals is used to target content. And with the advent of big data, it’s now possible to perform sophisticated, coordinated experiments, that truly determine causal relationships in customer behaviour.
In this talk I’ll give a survey of the difficulties and pitfalls of establishing cause-and-effect from observed data, and talk about ways to include experimentation or controlled random trials.
Jason McFall is the CTO at Causata, a startup using Big Data to automate real-time marketing. This combines machine learning with large scale data analysis and structured experimentation, to intelligently market to individuals.
Jason started out as an experimental physicist, working on Particle Physics collider experiments. the connection between the two jobs is uncanny: using big data and low latency technology to analyse data fast, and of course to perform and understand experiments with rigorous statistical confidence
For information on exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org or +1 (707) 827-7148
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts.