NIAS/Lorentz Centre Research project and workshop

Science meets Justice: forensic statistics at the interface


Legal institutions in the modern world work more and more with modern technology and science. In particular, in criminal investigation and prosecution (and defence!), scientific evidence is more and more central. By definition, this is evidence which is understood and interpreted by experts. By definition, lawyers, judges, and juries are “consumers” of science and technology. Legal institutions are necessarily and in a positive sense conservative: slow to adapt to societal changes. Democratic society makes laws through democratic government; the legal system interprets and implements them. It cannot in itself change the laws. Science and technology is by definition innovative. However, justice is made for man, not man for justice. Justice should be understood by society. When society takes an ultimate recourse on individuals who have transgressed its laws, it is crucial that this is seen and felt to be just. The rule of law in our society is only safeguarded when the systems and the persons who implement it earn the respect of society by their wise decisions. 

In all modern countries, one may identify the same crisis in the legal systems, though in different systems expressed in different ways. In the Anglo-Saxon confrontational model, scientists need to behave like movie stars in movies, or TV talk-show personalities on TV talk shows, in order to win not just the minds but also the hearts (if not the guts) of lay-juries. They are hired by one side or the other, and can be disqualified in the eyes of the jury by attacks on their personal integrity rather than on the logic of their arguments. In the mainland European inquisatorial model, judges have the responsibility to choose which expert to believe, what information to give that expert, what questions to ask; the legal system expects all true experts to agree; if experts disagree, then apparently some of them are not experts after all. In both systems, judges and jurors prefer experts who confidently make clear, definitive and assertive statements. However, it is often the better expert who realizes that the evidence which he or she should interpret is not clear-cut, that there are important but subtle or complex provisos to the conclusions of any sensible analysis of it, and that the interpretation of that particular item of evidence depends crucially on many contextual aspects about which he or she does not have full information.

In science, when well qualified and respected experts disagree, it usually means that they have different information, or have made different assumptions. If the difference persists, and is maintained when more experts are consulted, then this typically means that science itself does not know the answer. Lack of concensus means that more science needs to be done. Lack of concensus is an opportunity for science to step forward. As Niels Bohr is supposed to have said on the publication of the EPR paradox, concerning the proper interpretation of quantum physics: “great, now we have a contradiction, at last we can make progress!” (Which was indeed the case; though Einstein – the E of EPR – seems finally, with the work of John Bell, to have lost the case through EPR’s own “counter-example” to Bohr’s position).

Statistical science plays a rather special role in these developments. While mathematics is the universal language (by definition) of the exact sciences, statistics is perhaps not the language but certainly the universal critical tool of many empirical sciences. Statistical reasoning is full of pitfalls. It is no coincidence that two of the most common fallacies in probabilistic reasoning are called the prosecutor’s and the defence lawyer’s fallacy, respectively. Lawyers are not trained in dealing with uncertainties, yet statistics is about quantifying uncertainties, and taking decisions in the face of uncertainties. Legal proceedings can only cope with two contradictory “facts” by rejecting one so-called fact or the other (black or white; everything or nothing). However, statistics is needed in science when the observed or experimental facts do not logically exclude any particular explanation. Some explanations simply make the ensemble of observed facts collectively more likely than others. Probabilistic reasoning, especially as implemented in so-called Bayes nets or graphical models, allows (in principle!) combination of all facts; it allows exclusions of those explanations of the facts which make the ensemble of all the observed facts highly implausible relative to other explanations. On the other hand it relies on a common framework in which both positions can be expressed. Since the chosen framework makes assumptions, the choice of framework is not neutral. The dream of a “computational” or algorithmic solution of debates simply results in an infinite regress.

Statistical evidence has the feature that uncertainty is exactly quantified. On the other hand, in the Dutch legal system for instance, a medical expert or a finger-print expert is supposed to either say definitely what was the cause of death, or if two prints are of the same person; or just to say that they cannot say anything at all. A witness recognises a suspect as the perpetrator of a crime, or doesn’t. The judge accepts this witness’s evidence as a true (and moreover legal) fact, or rejects it. Thus evidence in a criminal case either becomes a legal fact, or is discarded altogether. However statistical evidence always comes in gradations, moreover, in quantitatively expressed gradations. Rutherford said: “if you need statistics, you did the wrong experiment”. But the experiment was done,  we actually are interested in determining what precisely went wrong in what was clearly a wrong experiment (since it ended in murder, or whatever). However, a criminal investigation and a criminal court case are, just like a scientific research project, investigations to determine the truth. Both in science and in justice we believe that there is an objective, or at least inter-subjective, truth which we need to uncover, describe, approximate as well as possible. 

More and more, statistical evidence needs to be used in police investigations and needs to be interpreted in courts. 

From the point of view of statistics, many have thought that criminal investigation and prosecution is just another field where the same basic statistical methodologies can be used as anywhere else. And this has led to enough disasters already! However, there are actually two huge differences with the way a statistician should operate in the judicial context, compared to his “mode” of operation in typical scientific collaboration. 

The first big difference concerns the type of question which the statistician has to answer. The archetypical forensic statistical questions concern the matching of evidence found at a crime scene with evidence found on a suspect. For instance, a window may have been smashed during a burglary; and there may be fragments of glass in the clothing of a suspect. Statistics is all about variation and here there are two important kinds of variation, conventionally called “within” and “between” (actually, there are complex hierarchies of variation). Taking a window-pane of glass as a basic unit, the “within” variation would refer to the following fact: the composition of (chemical) elements in fragments of glass varies throughout a single pane of glass. Secondly, the “between” variation: the composition of glass varies from pane to pane, from batch to batch, from factory to factory, from type to type. There are rare kinds of glass and common kinds of glass. When asking the question, ‘is the glass found in the clothing of the suspect, glass from the window which was smashed during the burglary?’, both variations can be important. If the fragments in the suspect’s clothing and fragments from the crime scene are similar in composition but the type of glass is common, a strong similarity is not decisive. If the fragments are similar in composition and the type of glass is very rare, the similarity is more informative.  Some types of glass actually show more variation over a windowpane than others, which again makes similarity less significant (and probably means, cf. Rutherford, that we should go back and get more data!). Forensic glass experts build up a lifetime of experience and go to forensic glass conferences to share their knowledge with others. Statistical data is provided by data-bases of case-material collected by forensic institutes, police forces, and so on. Statistical methodology is in its infancy. (In fact I have strong criticisms of the “latest” proposals of some of the leaders in the field; we discuss them together and search for ways forward!)

Forensic statistical investigations are typically about individual cases, not about populations, and often concern the question whether material from a crime-scene is related to material found elsewhere: is this glass from the same window, is this print from the same person, is this CCTV image and this passport photo images of the same person? More and more, these questions need to be dealt with using statistics, and statistics is needed precisely when the answer is not clear-cut, when different levels of variation must be taken account of, hence variation must be measured, both in the samples and in the population.

The second main difference in mode of operation of a statistician in the judicial context is that, in court proceedings, a statistician has got to communicate with non-scientists. He or she is called on to say what his expertise allows him to say about the significance of a particular piece of evidence. The usual language of statistical tests, estimates, confidence intervals, or p-values, is simply inappropriate; typically totally misunderstood by laypersons and legal professionals alike. Especially misunderstood by legal professionals, by medical professionals, by professional journalists, who on the other hand are well known for their decisive and authoritative statements. The authoritative style is used not only in their own specialistic fields, since it is a personality trait needed to be succesful in that profession. 

There is growing concensus among forensic statistical professionals that the statistician should report the so-called likelihood ratio only, that is, the ratio of the probability to have obtained precisely this piece of evidence, under scenarios favourable to prosecution and defence respectively. Even in the apparently simple case of DNA evidence, when the prosecution claims that the DNA found at the crime scene comes from the suspect, and hence (because the profiles match) that that particular DNA profile was 100% certain; while the defence might argue that the apparent match is due to coincidence, since an arbitrary (unidentified) individual has a certain chance to have that same profile, we just need to know this chance. However, the chances which are routinely calculated in DNA profiling are based on estimates of gene frequencies whose reliability is unknown, typically based on too small samples from the wrong population; the multiplication of gene freqencies to find the probabilities of matches of whole profiles is based on disputable theory of population genetics, possibly true for ideal populations of drosophila flies in laboratory conditions, but certainly false for the complex structures of human populations. Still, the fact that DNA profiles can be determined from smaller and smaller quantities of biological material, of worse and worse quality, with every increasing possibilities of errors and contamination, means that more and more sophisticated statistics has to be used, and more kinds of uncertainties have to be simultaneously quantified. Especially the analysis of “low copy number”, DNA mixtures, is presently at the boundary of what modern statistics can do, and at the boundary of forensic DNA analysis; it requires expert knowledge from genetics and biochemistry, data from population genetics, sophisticated statistical modelling, and it requires finally that the meaning and limitations of the findings be communicated to lay-persons: to lawyers, jurors, to the public, to the accused, to victims and to friends and relatives of victims, to journalists, and indeed, to TV talk-show hosts.

Despite the concensus among forensic statisticians, lawyers do not know what a likelihood ratio is, and there is strong opposition, some on good grounds, some on bad, to it ever gaining a foothold in court. Moreover there is internally controversy about what is the right likelihood ratio. As long as this controversy persists, it is probably best to keep likelihood ratios far from the eyes of lawyers. I have ideas of how to resolve, or at least, how to approach these controversies.

Research plans

Since my involvement starting three years ago in the Lucia de B. case, I have become more and more fascinated by the challenges and opportunities for statistics in forensic investigation and in legal proceedings and judicial enquiry. In my biased opinion, statistics is one of the most interdisciplinary disciplines of all. Outside of the the pure mathematical analysis of existing statistical models within existing statistical paradigmas, real life applied statistics involves to the greatest possible degree multi-way interdisciplinary communication. This is not typically or usefully the kind of scientific collaboration where the statistician is merely a kind of computational machine who is able to “give the right answer” when the scientist asks for a p-value or a confidence region or the outcome of a test of a null-hypothesis. In real life statistics, statistical collaboration requires a constant cycling between data and models, the confrontation between the two yielding new questions, new insights, the necessity to refine or reject models, to reject or reinterpret data, to reject or reformulate questions, to ask for new data, to create new models, to pose more meaningful questions. The statistician needs to learn the language of the scientist he is collaborating with, the logical structure of the investigation, the abstract nature of the models being used. This is where mathematical analysis, mathematical abstraction, plays a role; at some point it does not matter whether one is talking about mice or men, proteins or pullovers. But the results of statistical analysis need to be explained in terms of the applied field, their scientific significance (as opposed to their statistical significance) requires appreciation of the real scientific issues at hand.

I am now giving master’s courses on forensic statistics, collaborating with forensic scientists at NFI, with other Dutch forensic science researchers and businesses and abroad, organising workshops and studygroups, working with PhD students, and working for lawyers, public prosecution, and private investigators in a wide variety of projects, all having in common a major and challenging statistical component which requires innovative statistical science. I have a lot of ideas for new methodology, in particular in the evaluation and calibration of empirical likelihood ratios, combining empirical Bayes, semiparametric modelling, and insights from computational and statistical learning. Another area concerns the proper incorporation of population heterogeneity in DNA profiling, the proper way to estimate and extrapolate rare DNA profiles (in the case of Y-chromosome and mitrochondrial DNA profiles). At the same time, my activities in the controversial Lucia case led first to enmity with many representatives of the legal profession in the Netherlands, but now to mutual understanding and mutual respect with many key figures in legal acadaemia and judicial practice. The message did get through that there is some kind of crisis in the way scientific methodology is used in legal proceedings (all over the world, this is the case, fortunately). Police, judges, lawyers, are all making steps to improve their knowledge, to build networks of scientific contacts; and all are becoming more critical of the way the other legal parties employ forensic science and scientific reasoning. Which is of course as it should be. I am actively collaborating with several lawyers (in the broad sense of people from the legal profession) both from judiciary, and from the public prosecution service.

I was invited to organise a session and speak on forensic statistics at the biggest applied statistics conference in the world, JSM, Vancouver, 2010 (declined for family reasons); invited and accepted to speak in a similar session at the European Meeting of Statisticians, Pireaus, 2010 (accepted). Am regularly consulted by investigative journalists. Have been commissioned by the Annals of Applied Statistics to write a big paper on the Lucia case.

However publications are lagging behind. I also have uncompleted research collaborations from my previous great passion, quantum statistics; am involved in new research in medical statistics coming out of the famous Utrecht probiotica trial. I am president of the Dutch society for statistics and operations research, and one of the initiators of a brand new interdisciplinary and interuniversity master’s programme on statisticial science. I teach several courses per semester, and have my full share of administrative and organisational tasks in the department, in the faculty and university, and in the scientific community. 

A Lorentz fellowship would come at the perfect moment to keep the momentum and start reaping the fruits of the present passion of forensic statistics. I have above sketched some of the within-discipline areas on which I want to further develop my ideas, and above all I want to further explore the way this can be shared with non-scientific community.


Forensic statistics by its nature requires interdisciplinary cooperation far beyond the traditional experimental and observational sciences. The academic tradition in law is that of the humanities. Law students (in Dutch universities) have presently, on average, the least exposure to the natural sciences of all possible university curricula. They are possibly the only academic discipline which typically does not include a compulsory course in basic statistics and basic scientific method (not that those courses do much good in the humanities, social science, or even medicine). [Physics at many Dutch universities is possibly the only other – Rutherford strikes again]. The nearest that forensic science comes in law faculties, is in the discipline of legal psychology. Consequently the legal psychologists have almost a monopoly position at the information gateway between law and (exact) science. 

Many legal cases involve medical expertise. Like the legal profession, the medical profession is a somewhat closed community with very strong “team-spirit” and resistance to change demanded by outside influences. Modern medical research depends for a huge degree on modern statistical science yet the gulf between the worlds of a medical doctor and a statistician remains wide and deep, and fraught with communication problems. 

From my position in applied statistics and experience (before quantum statistics, my passion was “survival analysis”, a branch of medical statistics) and especially from my experience in the Lucia case, I have built up the necessary scientific and social network for embarking on the present project. I enjoy now very cordial relations with many key figures in the Dutch academic law profession (Ybo Buruma, Marc Groenhuisen, ..), and have had contacts (and expect future contacts) with Minister of Justice Ernst Hirsch-Ballin, Chief Public Prosecutor Harm Brouwer; also in the law psychology business (Crombag, van Koppen, …). I have intensive research contacts and burgeoning research projects with forensic scientists such as Ate Kloosterman (NFI), Peter de Knijff (LUMC), Marjan Sjerps (NFI), and have (or have had) warm contacts with other key persons in forensic scientific Netherlands such as Ton Broeders; Richard and Selma Eikelenboom. I have communicated extensively with toxicologists and pathologists such as Freek de Wolfe and Don Uges. The differences in insight which do certainly occur in many of these relations are precisely the things which stimulate inquisitiveness and research. I have been working with the maverick scientist “outsiders” Jan Frijters (dog scent-trials) and Fred Vos (fire: Schiphol, Sweeney case…).

In my activities in the Lucia case my method was always to meet and talk with people who held strongly dissonant opinions from mine, whether they were medical researchers or journalists. In fact, some of the my deepest (though admittedly amateur) insights into the psychology and sociology of that trial come from “scene of the crime” witnesses: journalists present at the very first court hearings in the Lucia case; decent and intelligent men who had been totally convinced by statistics and by body language and by the skills of the judges in eliciting contradiction from the suspect, that Lucia was guilty as hell, and who maintained their position even after most informed people had been won to the Lucia side.

I have spoken, by invitation, at a large number of interdisciplinary workshops and conferences in the Netherlands on the exact science – law communication problem; as well as at Science Cafe’s, debate evenings, student societies, and most recently at the international annual hackers’ conference at Vierhouten.

An orthogonal scientific/social network is my international network in forensic statistics; I correspond with a greater or lesser frequency with such persons as Aitkin, Taroni, Balding, Dawid, Lauritzen, Peter Gill (no relative – at least, not a close relative), Donnely,  … and of course I meet and talk with them at conferences.

To make a year at NIAS into a true succes, and in particular to prepare for a truly interdisciplinary and international conference towards its close, I propose to collaborate specifically and intensively with Hans Nijboer, of the Leiden law faculty, and moreover a frequent NIAS visitor and fellow. We have already discussed this at length and I give his name and our plans here with his warm approval and encouragement. We already have had the splendid experience of an informal workshop, held at my institute, called “Conviction Intime: A Research Kitchen on legal versus scientific proof and investigation”. The dozen or so participants came from our joint network of lawyers, medical scientists, pathologists, forensic scientists, statisticians. The informal nature of the meeting, with the rules that everyone could say what they liked, but that what they said should be thought of as confidential and reserved for the intimate circle of those present, allowed heated, passionate, no-holds-barred debate; which led to clarification of positions, discovery of common ground, and great hope for future collaboration. (Persons should be respected, but their prior convictions should be open to criticism.) A number of persons previously totally sceptical about the possibility of any useful communication between the two camps, and on both sides, had totally discarded their scepticism by the end of the gruelling day.