Statistical notes, data, analyses in the Lucia de Berk case
===========================================================
Statistical analyses with the open
source and free package R
http://www.r-project.org/
=====================================
The data consists of two sets of three 2x2
tables: the "new data" and the "original data".
The "new data" has been obtained by scrutiny of
original hospital records by Metta de Noo, and also
by applying a legal constraint: suspicious incidents
(deaths and reanimations), suspicious meaning that there
is no medical explanation of why "precisely then".
The incidents are split according to whether
or not Lucia was on duty, and split over
three wards at two hospitals: one ward at JKZ,
two wards at RKZ (wards 41 and 42).
The legal constraint is that incidents during
shifts of Lucia but where Lucia was NOT finally
judged guilty of (attempted) murder are removed.
She could not be called responsible,
because it was too plausible that these
were "natural" events. There are not only legal but
also (good) statistical (noise-reduction) reasons why such
incidents should be removed. However, the question now
arises whether non-Lucia incidents, which have only
recently been uncovered, are suspicious enough,
that had Lucia been on duty, she would have been
sentenced for those cases also. A uniform and objective
definition of incident should be used throughout,
and it should be applied without any information as to
which nurses were on duty. (Some of the existing
incidents were ony deemed suspicious by medical
specialists because Lucia was present)
Why two wards at RKZ are taken separately --
they are adjacent, the same nurses work in both,
they have the same function) -- but a sister-ward
of the one ward at JKZ is not included --
where Lucia also worked and where she was
initially also under suspicion -- is a mystery.
Biased data selection? Leaving out the other JKZ ward
works against Lucia, splitting the two RKZ wards
works against her too, when Elffers' unjustified
and unjustifiable method of combining wards is used
(multiplication of independent p-values without
correction for the number of p-values so combined).
######################################################
# The new data
Nurse <- c("Lucia", "Others")
Shift <- c("Incident", "Normal")
Ward <- c("JKZ", "RKZ41", "RKZ42")
JKZnew<-matrix(c(5,2,137,885),nr=2, dimnames = list(Nurse, Shift))
RKZ41new<-matrix(c(1,4,2,359),nr=2, dimnames = list(Nurse, Shift))
RKZ42new<-matrix(c(1,10,57,271),nr=2, dimnames = list(Nurse, Shift))
ALLnew<-array(c(JKZnew,RKZ41new,RKZ42new),c(2,2,3),list(Nurse,Shift,Ward))
ALLnew
#### output ###############
#
#
, , JKZnew
Incident Normal
Lucia 5 137
Others 2 885
, , RKZ41new
Incident Normal
Lucia 1 2
Others 4 359
, , RKZ42new
Incident Normal
Lucia 1 57
Others 10 271
#
#
###########################
fisher.test(JKZnew,alternative="greater")
#### output ###############
#
#
data: JKZnew
p-value = 0.0007805
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
3.295461 Inf
sample estimates:
odds ratio
16.07117
#
#
###########################
fisher.test(RKZ41new,alternative="greater")
#### output ###############
#
#
data: RKZ41new
p-value = 0.04054
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
1.242436 Inf
sample estimates:
odds ratio
41.81938
#
#
###########################
fisher.test(RKZ42new,alternative="greater")
#### output ###############
#
#
data: RKZ42new
p-value = 0.8773
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
0.02187728 Inf
sample estimates:
odds ratio
0.4762666
#
#
###########################
pvalues<-c(fisher.test(JKZnew,alternative="greater")$p.value,
fisher.test(RKZ41new,alternative="greater")$p.value,
fisher.test(RKZ42new,alternative="greater")$p.value)
stat <- -sum(log(pvalues))
1-pgamma(stat,3,1) # Fisher combination method
#### output ###############
#
#
0.001846793
#
#
###########################
27*(1-pgamma(stat,3,1)) # idem, plus Bonferoni "post-hoc correction"
#### output ###############
#
#
0.04986342
#
#
###########################
######################################################
# Quick and Dirty: just add the three tables.
MERGEnew<-JKZnew+RKZ41new+RKZ42new
MERGEnew
#### output ###############
#
#
Incident Normal
Lucia 7 196
Others 16 1515
#
#
###########################
fisher.test(MERGEnew,alternative="greater")
#### output ###############
#
#
data: MERGEnew
p-value = 0.01288
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
1.373202 Inf
sample estimates:
odds ratio
3.377780
#
#
###########################
27*fisher.test(MERGEnew,alternative="greater")$p
#### output ###############
#
#
0.3477224
#
#
###########################
######################################################
#
#The professional solution:
#
#Cochran-Mantel-Haenszel Chi-Squared Test for Count Data:
#a test of the null that two nominal variables are
#conditionally independent in each stratum,
#with large power against the alternative of dependence
#without three-way interaction (ie, Lucia is equally
#murderous -- or at least, equally different, in terms
#of her odds ratio -- in all wards).
#
mantelhaen.test(ALLnew,alternative ="greater",correct
=TRUE,exact=TRUE, conf.level=0.95)
#### output ###############
#
#
data: ALLnew
S = 7, p-value = 0.01605
alternative hypothesis: true common odds ratio is greater than 1
95 percent confidence interval:
1.318301 Inf
sample estimates:
common odds ratio
3.410813
#
#
###########################
27*mantelhaen.test(ALL,alternative ="less",correct
=TRUE,exact=TRUE, conf.level=0.95)$p
#### output ###############
#
#
0.4333131
#
#
###########################
##############################################################
#
#The original data (collected by the public prosecutor and
#analysed by Elffers). The definition of suspicious/unexplained
#incident is rather vague. The police/prosecution felt that
#incidents at JKZ occuring when Lucia was not on duty
#can't have been suspicious so they are excluded. A number
#of incidents are included where Lucia was on duty but
#for which she was not finally charged
#
##############################################################
Nurse <- c("Lucia", "Others")
Shift <- c("Incident", "Normal")
Ward <- c("JKZold", "RKZ41old", "RKZ42old")
JKZold <-matrix(c(8,0,134,887),nr=2,dimnames = list(Nurse,Shift))
RKZ41old<-matrix(c(1,4,0,361),nr=2,dimnames = list(Nurse,Shift))
RKZ42old<-matrix(c(5,9,53,272),nr=2,dimnames = list(Nurse,Shift,))
ALLold<-array(c(JKZold,RKZ41old,RKZ42old),c(2,2,3),list(Shift,Nurse,Ward))
ALLold
#### output ###############
#
, , JKZold
Lucia Others
Incident 8 134
Normal 0 887
, , RKZ41old
Lucia Others
Incident 1 0
Normal 4 361
, , RKZ42old
Lucia Others
Incident 5 53
Normal 9 272
#
#
###########################
pvaluesold<-c(fisher.test(JKZold,alternative="greater")$p.value,
fisher.test(RKZ41old,alternative="greater")$p.value,
fisher.test(RKZ42old,alternative="greater")$p.value)
1/(27*prod(pvaluesold))
#### output ###############
#
#
342638149 # Elffers "one in 342 million"
#
#
###########################
round(1/(27*mantelhaen.test(ALLold,alternative ="greater",correct=TRUE,exact=TRUE, conf.level=0.95)$p))
#### output ###############
#
#
97707 # only one in 100 thousand, by Mantel-Haenszel
#
#
###########################
MERGEold<-JKZold+RKZ41old+RKZ42old
round(1/(27*fisher.test(MERGEold,alternative="greater")$p))
#### output ###############
#
#
141494 # only one in 140 thousand, by quick and dirty merge
#
#
###########################
statold <- -sum(log(pvaluesold))
1/(27*(1-pgamma(statold,3,1)))
#### output ###############
#
#
1192806 # one in a million, by Fisher combination
#
#
###########################
########################################################################
The Juliana Kinder-Ziekenhuis is the hospital where Lucia was
working when Amber died. The Rode Kruis Ziekenhuis was being
merged with JKZ and shared the same director (Smits). There can
be many nurses on duty at the same time, we classify shifts
according to whether or not Lucia was on duty (possibly
together with others), and whether or not a medically unexpected
incident (or death) occurred: a medical crisis
necessitating an attempt at reanimation. Incidents which
retrospectively could be given a convincing natural
medical explanation as to why they occurred at that
particular moment, are excluded. The incidents which remain
are actually unexpected incidents in the illness of rather
ill patients; though no medical reason could be given that that
particular incident occurred at that particular moment
(unless it was by poison administered by Lucia?)
it need not be unlikely that such an incident would
have occurred "one of those days".
The wards are all "medium care" wards. This does not mean they
are safer; a glance at some statistics shows that you are much
likelier to die in medium-care than in intensive-care.
The new data is as I obtained if from Ton Derksen
recently (22/11/06). Study of original medical records
from time periods when Lucia was not on duty - something
which the prosecution failed to do - had turned
up several new "non Lucia" incidents (Metta de Noo).
I use a Bonferoni-like post-hoc correction with a factor
27 (the number of nurses in JKZ) since I am imitating
Henk Elffers, who believed that it had already been
established that many deaths were occurring (there weren't),
and wanted to investigate if they were associated with one
of the nurses. The number of nurses at RKZ 41 and 42 is not
available. Those two wards were so similar and so close by
that the same nurses work at different times in both wards.
The ward at JKZ also has a "sister ward" but no data is
available from there, even though Lucia was originally
suspected of murders there too.
The children's hospital JKZ is obviously very different
from the "ordinary " hospital RKZ, but there seems to be
no real difference between wards 41 and 42 at the latter.
Probably 41 is kept separate because of the striking
feature that (originally) there was only one incident
at 41, and Lucia was present; the case of this old
lady (with terminal cancer) was made a lot of
in the prosecution's medical arguments. Another patient
at RKZ was a notable (and very elderly) Chinese judge
from the International Court of Justice. At JKZ
(a children's hostpital) many patients were babies or
young children with severe birth defects associated
with marriage between close blood relatives
between Moroccan immigrants. Several were being
treated under a "NTBR" regime - "not to be resuscitated".
I have the impression that this could be the reason
that one child, for instance, was being given amounts
of (calming) medicine far above the highest recommended
dose and in disapproved combination with other
psycho-pharmaceutical treatment. The poison
which allegedly killed another had been used
medicinally in large quantities in a recent
heart operation which seems to have also left
some "gauze" behind, inside the body of the child.
My present opinion is that it is plausible
that Lucia experienced more than the average number
of incidents, but not to a large degree.
I think it highly plausible that there is
natural / innocent heterogeneity between nurses.
This means that it is likely for one nurse to experience
strikingly more incidents than others, for purely innocent
reasons. In the Netherlands this means that innocent nurses
are going to be sent to jail for life much more frequently than
serial killer nurses are active, let alone caught out. So far,
no quantitative evidence is available, to my knowledge, for or
against this heterogeneity. The medical specialist community does
not believe in it, members of the nursing community do.
Nurses do have a large part to play in choosing their shifts,
for instance how many night shifts they get, when they are
likely to be on their own and have a more responsible task,
and at a time of day when there are more likely to be
incidents; terminally ill people tend to prefer to die
at night.
I have also analysed this data supposing a gamma distributed
intensity of incidents per nurse, with standard deviation roughly
equal to the mean (ie, an exponential distribution), ie,
coefficient of variation approximately 1; this seems
very plausible if one listens to nurses' experiences.
This amount of heterogeneity made Lucia's share in the
total number of incidents high, but not surprisingly high.
I am both combining the three 2x2 tables using Fisher's
method, and using the Mantel-Haenszel test. I consider the
Cochran-Maentel-Haenszel test much more more reasonable,
since Fisher's method is known to have poor
power against sensible alternatives. Fisher's method
treats all tables equally, even though the amount of data
in each table is different and the marginal probabilities
are different, hence the statistical value of the tables
can vary strikingly. The advantage of the Fisher method is that
it is quick and easy. Calculation of the power of the (per table)
chi square test leads to another almost quick and easy way to
combine: convert the three p-vales to normal quantiles, then weight
according to the non-centrality parameters of the (root)
chi-square one statistics. This should give very similar
results as Mantel-Haenszel if we assume that the odds ratios
are equal over the three tables. There doesn't seem to be any reason
to prefer any other alternative. A more "omnibus" approach -- eg,
add the three chi-square(1) statistics to get a chi-square(3) --
results in lower power still against interesting alternatieves,
in favour of power against alternatives that are not on the cards
at all.
Whatever statistic is used, the null hypothesis distribution
can always be computed under the null hypothesis
of symmetry within each 2x2 table.
In other words, assuming a particular kind of alternative
does not compromize the null-hypothesis validity of the test.
By the way, the weights on the normal quantiles
to convert the three p-values to something
locally asymptotically equivalent to Mantel-Haensel under
constant odds ratio alternatives, are proportional to the
square root of the product of all four marginal probabilities
with the total sample sizes. One must be careful
doing the conversion from p-value of chi-squared(1) to
normal quantile. We want to convert from a one-sided test
to a one-sided test. Thus if the observed odds ratio
is actually in favour of Lucia, which happens in one ward, with
the new data, then the p-value is much larger than one half.
Derksen has simply merged (added) the three 2x2 tables obtaining essentially
the same result as I did with Mantel-Haenszel. This would be exactly
what we would expect if there is no difference in marginal frequency
of incidents in the three wards, since in that case, merging tables
corresponds to wanting power against constant odds ratio. Fisher
gives a somewhat lower p-value. All this is consistent with the idea
that a coincidental large number of incidents in Lucia's shifts was
first observed at JKZ but could only be barely confirmed at RKZ
41 and 42 (now 42 shows Lucia having less than expected incidents;
originally this was the other way round!).
Another ward at RKZ and wards at the other hospitals
where Lucia was supposed to be serially
killing people has never been made available. The suggestion is that
the data was collected till a small enough p-value was obtained using
the totally incorrect method of multiplying p-values of independent
tests. This multiplication was motivated by the judge by recourse
to a medical expert who said that of course
you can multiply independent probabilities.
Of course you can, if you are really interested in the probability of
A and B and C and ... The more you go on multiplying, the smaller
the result gets and eventually every nurse in the Netherlands is
sent to jail - she just needs to work in enough different
wards .. If not we can always look at her data on a per year basis.
20 years will be plenty, to sentence most of the nurses in Holland.