Statistical notes, data, analyses in the Lucia de Berk case =========================================================== Statistical analyses with the open source and free package R http://www.r-project.org/ ===================================== The data consists of two sets of three 2x2 tables: the "new data" and the "original data". The "new data" has been obtained by scrutiny of original hospital records by Metta de Noo, and also by applying a legal constraint: suspicious incidents (deaths and reanimations), suspicious meaning that there is no medical explanation of why "precisely then". The incidents are split according to whether or not Lucia was on duty, and split over three wards at two hospitals: one ward at JKZ, two wards at RKZ (wards 41 and 42). The legal constraint is that incidents during shifts of Lucia but where Lucia was NOT finally judged guilty of (attempted) murder are removed. She could not be called responsible, because it was too plausible that these were "natural" events. There are not only legal but also (good) statistical (noise-reduction) reasons why such incidents should be removed. However, the question now arises whether non-Lucia incidents, which have only recently been uncovered, are suspicious enough, that had Lucia been on duty, she would have been sentenced for those cases also. A uniform and objective definition of incident should be used throughout, and it should be applied without any information as to which nurses were on duty. (Some of the existing incidents were ony deemed suspicious by medical specialists because Lucia was present) Why two wards at RKZ are taken separately -- they are adjacent, the same nurses work in both, they have the same function) -- but a sister-ward of the one ward at JKZ is not included -- where Lucia also worked and where she was initially also under suspicion -- is a mystery. Biased data selection? Leaving out the other JKZ ward works against Lucia, splitting the two RKZ wards works against her too, when Elffers' unjustified and unjustifiable method of combining wards is used (multiplication of independent p-values without correction for the number of p-values so combined). ###################################################### # The new data Nurse <- c("Lucia", "Others") Shift <- c("Incident", "Normal") Ward <- c("JKZ", "RKZ41", "RKZ42") JKZnew<-matrix(c(5,2,137,885),nr=2, dimnames = list(Nurse, Shift)) RKZ41new<-matrix(c(1,4,2,359),nr=2, dimnames = list(Nurse, Shift)) RKZ42new<-matrix(c(1,10,57,271),nr=2, dimnames = list(Nurse, Shift)) ALLnew<-array(c(JKZnew,RKZ41new,RKZ42new),c(2,2,3),list(Nurse,Shift,Ward)) ALLnew #### output ############### # # , , JKZnew Incident Normal Lucia 5 137 Others 2 885 , , RKZ41new Incident Normal Lucia 1 2 Others 4 359 , , RKZ42new Incident Normal Lucia 1 57 Others 10 271 # # ########################### fisher.test(JKZnew,alternative="greater") #### output ############### # # data: JKZnew p-value = 0.0007805 alternative hypothesis: true odds ratio is greater than 1 95 percent confidence interval: 3.295461 Inf sample estimates: odds ratio 16.07117 # # ########################### fisher.test(RKZ41new,alternative="greater") #### output ############### # # data: RKZ41new p-value = 0.04054 alternative hypothesis: true odds ratio is greater than 1 95 percent confidence interval: 1.242436 Inf sample estimates: odds ratio 41.81938 # # ########################### fisher.test(RKZ42new,alternative="greater") #### output ############### # # data: RKZ42new p-value = 0.8773 alternative hypothesis: true odds ratio is greater than 1 95 percent confidence interval: 0.02187728 Inf sample estimates: odds ratio 0.4762666 # # ########################### pvalues<-c(fisher.test(JKZnew,alternative="greater")\$p.value, fisher.test(RKZ41new,alternative="greater")\$p.value, fisher.test(RKZ42new,alternative="greater")\$p.value) stat <- -sum(log(pvalues)) 1-pgamma(stat,3,1) # Fisher combination method #### output ############### # # 0.001846793 # # ########################### 27*(1-pgamma(stat,3,1)) # idem, plus Bonferoni "post-hoc correction" #### output ############### # # 0.04986342 # # ########################### ###################################################### # Quick and Dirty: just add the three tables. MERGEnew<-JKZnew+RKZ41new+RKZ42new MERGEnew #### output ############### # # Incident Normal Lucia 7 196 Others 16 1515 # # ########################### fisher.test(MERGEnew,alternative="greater") #### output ############### # # data: MERGEnew p-value = 0.01288 alternative hypothesis: true odds ratio is greater than 1 95 percent confidence interval: 1.373202 Inf sample estimates: odds ratio 3.377780 # # ########################### 27*fisher.test(MERGEnew,alternative="greater")\$p #### output ############### # # 0.3477224 # # ########################### ###################################################### # #The professional solution: # #Cochran-Mantel-Haenszel Chi-Squared Test for Count Data: #a test of the null that two nominal variables are #conditionally independent in each stratum, #with large power against the alternative of dependence #without three-way interaction (ie, Lucia is equally #murderous -- or at least, equally different, in terms #of her odds ratio -- in all wards). # mantelhaen.test(ALLnew,alternative ="greater",correct =TRUE,exact=TRUE, conf.level=0.95) #### output ############### # # data: ALLnew S = 7, p-value = 0.01605 alternative hypothesis: true common odds ratio is greater than 1 95 percent confidence interval: 1.318301 Inf sample estimates: common odds ratio 3.410813 # # ########################### 27*mantelhaen.test(ALL,alternative ="less",correct =TRUE,exact=TRUE, conf.level=0.95)\$p #### output ############### # # 0.4333131 # # ########################### ############################################################## # #The original data (collected by the public prosecutor and #analysed by Elffers). The definition of suspicious/unexplained #incident is rather vague. The police/prosecution felt that #incidents at JKZ occuring when Lucia was not on duty #can't have been suspicious so they are excluded. A number #of incidents are included where Lucia was on duty but #for which she was not finally charged # ############################################################## Nurse <- c("Lucia", "Others") Shift <- c("Incident", "Normal") Ward <- c("JKZold", "RKZ41old", "RKZ42old") JKZold <-matrix(c(8,0,134,887),nr=2,dimnames = list(Nurse,Shift)) RKZ41old<-matrix(c(1,4,0,361),nr=2,dimnames = list(Nurse,Shift)) RKZ42old<-matrix(c(5,9,53,272),nr=2,dimnames = list(Nurse,Shift,)) ALLold<-array(c(JKZold,RKZ41old,RKZ42old),c(2,2,3),list(Shift,Nurse,Ward)) ALLold #### output ############### # , , JKZold Lucia Others Incident 8 134 Normal 0 887 , , RKZ41old Lucia Others Incident 1 0 Normal 4 361 , , RKZ42old Lucia Others Incident 5 53 Normal 9 272 # # ########################### pvaluesold<-c(fisher.test(JKZold,alternative="greater")\$p.value, fisher.test(RKZ41old,alternative="greater")\$p.value, fisher.test(RKZ42old,alternative="greater")\$p.value) 1/(27*prod(pvaluesold)) #### output ############### # # 342638149 # Elffers "one in 342 million" # # ########################### round(1/(27*mantelhaen.test(ALLold,alternative ="greater",correct=TRUE,exact=TRUE, conf.level=0.95)\$p)) #### output ############### # # 97707 # only one in 100 thousand, by Mantel-Haenszel # # ########################### MERGEold<-JKZold+RKZ41old+RKZ42old round(1/(27*fisher.test(MERGEold,alternative="greater")\$p)) #### output ############### # # 141494 # only one in 140 thousand, by quick and dirty merge # # ########################### statold <- -sum(log(pvaluesold)) 1/(27*(1-pgamma(statold,3,1))) #### output ############### # # 1192806 # one in a million, by Fisher combination # # ########################### ######################################################################## The Juliana Kinder-Ziekenhuis is the hospital where Lucia was working when Amber died. The Rode Kruis Ziekenhuis was being merged with JKZ and shared the same director (Smits). There can be many nurses on duty at the same time, we classify shifts according to whether or not Lucia was on duty (possibly together with others), and whether or not a medically unexpected incident (or death) occurred: a medical crisis necessitating an attempt at reanimation. Incidents which retrospectively could be given a convincing natural medical explanation as to why they occurred at that particular moment, are excluded. The incidents which remain are actually unexpected incidents in the illness of rather ill patients; though no medical reason could be given that that particular incident occurred at that particular moment (unless it was by poison administered by Lucia?) it need not be unlikely that such an incident would have occurred "one of those days". The wards are all "medium care" wards. This does not mean they are safer; a glance at some statistics shows that you are much likelier to die in medium-care than in intensive-care. The new data is as I obtained if from Ton Derksen recently (22/11/06). Study of original medical records from time periods when Lucia was not on duty - something which the prosecution failed to do - had turned up several new "non Lucia" incidents (Metta de Noo). I use a Bonferoni-like post-hoc correction with a factor 27 (the number of nurses in JKZ) since I am imitating Henk Elffers, who believed that it had already been established that many deaths were occurring (there weren't), and wanted to investigate if they were associated with one of the nurses. The number of nurses at RKZ 41 and 42 is not available. Those two wards were so similar and so close by that the same nurses work at different times in both wards. The ward at JKZ also has a "sister ward" but no data is available from there, even though Lucia was originally suspected of murders there too. The children's hospital JKZ is obviously very different from the "ordinary " hospital RKZ, but there seems to be no real difference between wards 41 and 42 at the latter. Probably 41 is kept separate because of the striking feature that (originally) there was only one incident at 41, and Lucia was present; the case of this old lady (with terminal cancer) was made a lot of in the prosecution's medical arguments. Another patient at RKZ was a notable (and very elderly) Chinese judge from the International Court of Justice. At JKZ (a children's hostpital) many patients were babies or young children with severe birth defects associated with marriage between close blood relatives between Moroccan immigrants. Several were being treated under a "NTBR" regime - "not to be resuscitated". I have the impression that this could be the reason that one child, for instance, was being given amounts of (calming) medicine far above the highest recommended dose and in disapproved combination with other psycho-pharmaceutical treatment. The poison which allegedly killed another had been used medicinally in large quantities in a recent heart operation which seems to have also left some "gauze" behind, inside the body of the child. My present opinion is that it is plausible that Lucia experienced more than the average number of incidents, but not to a large degree. I think it highly plausible that there is natural / innocent heterogeneity between nurses. This means that it is likely for one nurse to experience strikingly more incidents than others, for purely innocent reasons. In the Netherlands this means that innocent nurses are going to be sent to jail for life much more frequently than serial killer nurses are active, let alone caught out. So far, no quantitative evidence is available, to my knowledge, for or against this heterogeneity. The medical specialist community does not believe in it, members of the nursing community do. Nurses do have a large part to play in choosing their shifts, for instance how many night shifts they get, when they are likely to be on their own and have a more responsible task, and at a time of day when there are more likely to be incidents; terminally ill people tend to prefer to die at night. I have also analysed this data supposing a gamma distributed intensity of incidents per nurse, with standard deviation roughly equal to the mean (ie, an exponential distribution), ie, coefficient of variation approximately 1; this seems very plausible if one listens to nurses' experiences. This amount of heterogeneity made Lucia's share in the total number of incidents high, but not surprisingly high. I am both combining the three 2x2 tables using Fisher's method, and using the Mantel-Haenszel test. I consider the Cochran-Maentel-Haenszel test much more more reasonable, since Fisher's method is known to have poor power against sensible alternatives. Fisher's method treats all tables equally, even though the amount of data in each table is different and the marginal probabilities are different, hence the statistical value of the tables can vary strikingly. The advantage of the Fisher method is that it is quick and easy. Calculation of the power of the (per table) chi square test leads to another almost quick and easy way to combine: convert the three p-vales to normal quantiles, then weight according to the non-centrality parameters of the (root) chi-square one statistics. This should give very similar results as Mantel-Haenszel if we assume that the odds ratios are equal over the three tables. There doesn't seem to be any reason to prefer any other alternative. A more "omnibus" approach -- eg, add the three chi-square(1) statistics to get a chi-square(3) -- results in lower power still against interesting alternatieves, in favour of power against alternatives that are not on the cards at all. Whatever statistic is used, the null hypothesis distribution can always be computed under the null hypothesis of symmetry within each 2x2 table. In other words, assuming a particular kind of alternative does not compromize the null-hypothesis validity of the test. By the way, the weights on the normal quantiles to convert the three p-values to something locally asymptotically equivalent to Mantel-Haensel under constant odds ratio alternatives, are proportional to the square root of the product of all four marginal probabilities with the total sample sizes. One must be careful doing the conversion from p-value of chi-squared(1) to normal quantile. We want to convert from a one-sided test to a one-sided test. Thus if the observed odds ratio is actually in favour of Lucia, which happens in one ward, with the new data, then the p-value is much larger than one half. Derksen has simply merged (added) the three 2x2 tables obtaining essentially the same result as I did with Mantel-Haenszel. This would be exactly what we would expect if there is no difference in marginal frequency of incidents in the three wards, since in that case, merging tables corresponds to wanting power against constant odds ratio. Fisher gives a somewhat lower p-value. All this is consistent with the idea that a coincidental large number of incidents in Lucia's shifts was first observed at JKZ but could only be barely confirmed at RKZ 41 and 42 (now 42 shows Lucia having less than expected incidents; originally this was the other way round!). Another ward at RKZ and wards at the other hospitals where Lucia was supposed to be serially killing people has never been made available. The suggestion is that the data was collected till a small enough p-value was obtained using the totally incorrect method of multiplying p-values of independent tests. This multiplication was motivated by the judge by recourse to a medical expert who said that of course you can multiply independent probabilities. Of course you can, if you are really interested in the probability of A and B and C and ... The more you go on multiplying, the smaller the result gets and eventually every nurse in the Netherlands is sent to jail - she just needs to work in enough different wards .. If not we can always look at her data on a per year basis. 20 years will be plenty, to sentence most of the nurses in Holland.