Forensic Statistics and Graphical Models
Autumn Semester, 2011
"FS", Tuesdays, 11:15--13:00, Snellius 401, Nielsbohrweg 1, Leiden.
Advanced Bachelor's level --- Master's level.
Forensic statistics is the
art and science of doing statistics in the context of criminal
investigation or prosecution. Especially in the latter context, it
makes particular demands on the statistician, who is called to
communicate to the court the meaning of statistical data with respect
to the questions of interest to the court. Judges, jury, defence,
prosecution, ... all have different interests and different
information. Testimony of a scientific expert, such as a statistician,
has to be neutral and ... scientific. Many of the consumers (jury,
public, lawyers) have no prior understanding of probability or
statistics at all.
The present "dogma of forensic statistics", as I would call it,
contends that the task of the statistician is to impart to the court
the meaning of a piece of evidence, thought of as statistical, i.e.,
partly formed by chance processes, by stating its likelihood ratio with
respect to typically two important and competing hypotheses, usually
referred to as the hypothesis of the prosecution and the hypothesis of
the defence. For instance, if we see a measured DNA profile found from
some trace of human cells at the scene of a crime as the result of a
chance process involving measurement errors, the probability laws of
genetics, and so on, we might like to report the ratio:
Prob(observed profile | the organic material comes from the suspect) :
Prob(observed profile | the organic material comes from an unknown
person, thought of as a random member of the population at large)
Graphical models or Bayes nets are probability models for the dependence structure of a collection of random variables, thought to be related to one another through a directed acyclic graph, each node or vertex representing one of the random variables in question. Their joint probability distribution is built up as follows. Arrange the graph in two dimensions with arrows (connections between nodes) only pointing downwards. First generate all variables corresponding to root nodes (nodes with no connections to them) by drawing them independently according to some specified marginal distributions. Then move down the graph, each time drawing the random variable corresponding to a given node from some specified conditional probability distribution, conditional on the values of the variables corresponding to that node's graph parents -- the nodes with arrows pointing directly to it.
Graphical models turn out to have wonderful probabilistic and
computational properties. A beautiful algorithm, which will be one of
the highlights of the course, helps us to rapidly and highly accurately
compute conditional probabity distributions of some of the variables in
the model given values of some of the other variables. Because of
the graphical representation they lend themselves very well for
communicating between experts from different fields, and laypersons,
about the model for the phenomenon at hand.
forensic_statistics.pdf, talk by
RDG giving overview "what is forensic statistics"
Lauritzen_EMS.pdf, talk by Steffen
Lauritzen at European Meeting of Statisticians at Toulouse, 2009,
about graphical models for analyzing DNA mixtures.
In the first (introductory) lecture of the present course I referred to
two specific recent Dutch cases where the analysis of DNA mixtures
was crucial: "The Deventer Murder Case (the widow Wittenberg)",
and "The case of Tamara Wolvers (Alphen aan den Rijn)". In both cases
I am pretty sure that a miscarriage of justice followed from a wrong
analysis of a DNA mixture. To be more precise: I believe that the
wrong conclusions were drawn from the DNA evidence.
Old course description, to be rewritten. In the course we will study theory and applications of graphical models (wikipedia/Graphical_model). In statistics, a graphical model specifies conditional independence relations among a set of random variables, some observable, some unobservable. It thereby provides statistical models for the joint distribution of the observed variables. The graph not only provides an attractive visual representation of the model but also serves as a computational tool.
For applications, we will focus on genetics and forensic science, where graphical models have proven to be particularly effective, since the laws of genetic inheritance are very neatly expressed in graphical models.
From the point of view of probability theory, conditional independence is a Markov property, and graphical models are "just" Markov fields.
In computer science, the same graphs are used to represent causality and are there called Bayes nets.
Literature:
The definitive resource for the mathematical foundations of the theory of graphical models (a number of chapters of which are essential reading) is the book
S.L. Lauritzen (1996), Graphical Models, Clarendon Press, Oxford, United Kingdom.
A very nice introduction built around applications in genetics is
http://www.math.auc.dk/~steffen/papers/grgenet.ps, published as
Lauritzen and Sheehan (2003),
Graphical Models for Genetic Analyses,
Statistical Science
18, 489--514.
See also George and Thompson (2003), Discovering Disease Genes, Statistical Science 18, 515--531.
For a somewhat different but also very interesting approach see Judea Pearl (2000), Causality -- Models, Reasoning, and Inference, Cambridge University Press. Yet another excellent book is Probabilistic Networks and Expert Systems by Robert G. Cowell, A. Philip Dawid, Steffen L. Lauritzen and David J. Spiegelhalter (1999), Springer Verlag.
Slides of the lectures so far: html, pdf
Workform, examination:
The course will include assignments, papers, presentations by the students; the final evaluation will be in a "mondeling" (viva voce?) examination. Incidently, since the topic allows many different accents to be made (probabilistic, statistical, algorithmic, ...) the participants will also be able to influence the choice of topics.
Web resources:
On internet you will easily find a wealth of material on graphical models and/or Bayes nets. Here are just a few links.
Tutorial on Graphical Models: Kevin Murphy's tutorial.
Interesting course on graphical models, many useful links and resources: Helsinki course.
Free computer package for Bayes nets, unfortunately only available for
Windoze,
GeNIe.
GeNIe runs well via WINE on linux (intel machines), including the
fantastic new intel macs -- you can use DARWINE and stay inside OS X if
you are not into Parallels Virtual Desktop or dual booting with windoze.
Much work is being done to give us graphical models in R.