Here's a critique I had written previous on this study. I'd welcome any comments or questions - it is somewhat technical, but certainly not rocket science:
Here’s a scattering of thoughts and some more lengthy critiques. As I stated before, the underlying statistically modeling is sound, but its practical application is flawed. Because it’s such a long post, I’ve included xhtml formatting to hopefully make it more readable. Here’s the actual paper once again: Mortality before and after the 2003 invasion of Iraq: cluster sample survey.
1. Misreporting.
This isn’t the fault of the authors of the paper. This is due to reporters trying to draw conclusions that can’t be made from the paper and data. Some examples below:
*You cannot make a finding that the 98K deaths are only civilian, since they don’t even attempt to exclude insurgents/terrorists/military. So, this number of estimated deaths includes terrorists/insurgents/military.
*The data that is presented in the paper doesn’t establish any statistical correlation between the increased death rate in the post-OIF period and coalition actions. Now, deaths from coalition use of force doesn’t require a statistical correlation, since it is by definition correlated; however, for example, is it fair to correlate increased accidents with the invasion? So, you cannot conclude from this report that the invasion is responsible for their statistical findings. Instead, a regression analysis would be required to establish these relationships.
2. Fundamental flaws.
The authors of this study chose a cluster sample technique to develop their sample population. Using this technique, elements of the population are divided into a number of clusters. You then choose at random a sample of these clusters, after which a simple random sample of the elements in each chosen cluster is selected. What are the advantages – it is easier to sample a small number of clusters while still getting a large sample that should better approximate a normal distribution. The disadvantage is that your results are less precise than a simple random sample.
With that in mind, the study determined that they should visit 33 clusters spread across the 18 different governates, with each governate receiving the number of clusters proportional to its share of the overall population. At this point, their methodology is sound (although the assumption is that you have correct census data – this assumption may be flawed and is addressed below). However, due to the potential of security risks, they then deviated from this setup and began pairing governates under the false assumption that they had similar economic and violence statistics during the three years preceding the study:
“To lessen risks to investigators, we sought to minimise travel distances and the number of Governorates to visit, while still sampling from all regions of the country. We did this by clumping pairs of Governorates. Pairs were adjacent Governorates that the Iraqi study team members believed to have had similar levels of violence and economic status during the preceding 3 years. The paired Governorates were: Basrah and Missan, Dhi Qar and Qadisiyah, Najaf and Karbala, Salah ad Din and Tamin, Arbil and Sulaymaniya, and Dehuk and Ninawa.”
There are two pairings that are incompatible based on my experience in these governates during my time in Iraq as well as in following OIF in the news:
Dahuk is above the Green Line in Kurdistan, while Ninevah is below the Green Zone, and with its cross-sect population and one of the four provinces where the insurgency was not under full control, skews the results. As a point of reference, US soldiers in Dahuk take off their all their protective body armor and are allowed to roam freely through the city and rarely have to pay for meals, as the Kurds often won’t let them pay. On the contrary, in Mosul, units rarely operate in less than platoon strength and never remove their protective body armor while on patrol. Next, the economies of the two cities are vastly different. To give you a better perspective, here’s a great blog entry: Fork in the Road
Salah ad Din is also one of the four provinces where the insurgency is not considered under control. While not as blatant as the previous example, since there has been violence in Kirkuk, it definitely doesn’t add up to Tikrit, which was Saddam’s hometown, Bayji, which contains key oil pumps, and Samarra.
So, the result of this altered methodology was to skew the sample towards areas that suffered more violence and to skew the results. How does this skew the results? First, the sample mean would be lowered. Second, due to taking a statistics that are close to the mean and replacing them with statistics that would be closer to zero, you would actually increase the variance and hence, the confidence intervals. So, you end up with a much greater chance of statistically insignificant results, meaning that you cannot make a scientific finding that the death rate increased in the post-OIF period. Obvious, this is an intuitive deduction, so while more likely than not to be true, actual sample results would be required to make a 100% conclusion.
3. Biased conclusions implied based on their language
Fallujah. “In Falluja, the team noted that vast areas of the city had been devastated to an equal or worse degree than the area they had randomly chosen to survey.”
Here are some calculations that make the above statement appear very biased. Now, since I don’t have access to their data, I use the average persons per cluster calculation below to extrapolate the findings’ predicted number of deaths in Fallujah. This means that the calculations will most likely be slightly off, but that doesn’t detract from the solid conclusion that Fallujah is an outlier and that the study in the same breath calls it an outlier while implying that it’s the norm as well.
7868 people surveyed / 33 clusters = 238 people per cluster
238 people per cluster / 256K people living in Fallujah = sample is 1/1075th of the population
1075 x 52 violent deaths = 55,900 Fallujans dead
If there’s that many dead, what is the number wounded?
So, if you look at the calculations above coupled with their qualitative statement, the conclusion that is drawn from the study is that surveying a different neighborhood means that they still would have arrived at sample statistics resulting in at least 55,900 Fallujans dead. WTF?
Pre-invasion deaths and discussion bias. Next, look at the number of violent deaths prior to OIF across all of Iraq. Only 1 violent death in the sample prior to the invasion? Saddam the benevolent?
They were quick to point out that they didn’t visit Ramadi, Tal Afar, Najaf, and that the Sadr City neighborhood was unscathed. To be fair, they do discuss ways in which their estimate is biased upwards, one-sided presentation bias; maybe this particular point would be better off in the misreporting section, as it does provide specific ammunition for those who wish to champion the thought that the study’s estimates were low.
4. Failure to separate insurgent/terrorist deaths from civilian deaths.
[i]“Many of the Iraqis reportedly killed by US forces could have been combatants. 28 of 61 killings (46%) attributed to US forces involved men age 15–60 years . . . t is not clear if the greater number of male deaths was attributable to legitimate targeting of combatants who may have been disproportionately male, or if this was because men are more often in public and more likely to be exposed to danger.”[br /]
5. Mortality Rates.
The following discussions point out that the study fails to adequately address data from other studies/estimates. Additionally, I would point out that based on hypothesis testing at their 95% confidence interval, which is a standard CI used in statistical testing, the differences between the mortality death rates is statistically insignificant (i.e. the pre-war mortality death rate lies within the confidence interval of the post-war mortality death rate). Therefore, you cannot conclude that these two death rates are statistically different. This doesn’t affect the overall death rate calculations, but hopefully this example shows how you can’t just start making conclusions from the study without looking deeper.
During the period before the invasion, from Jan 1, 2002, to March 18, 2003, the interviewed households had 275 births and 46 deaths. The crude mortality rate was 5•0 per 1000 people per year.
After the invasion, from March 19, 2003, to mid-September, 2004, in the interviewed households there were 366 births and 142 deaths—21 deaths were children younger than 1 year. The crude mortality rate during the period of war and occupation was 12•3 per 1000 people per year (95% CI 1•4–23•2)
For this portion of the discussion, I will be cutting and pasting some large portions from the following site: Crooked Timber. None of the following is my work, other than cutting and pasting. It will be easier to just go to the site and read the discussion so you can see the give and take and get both sides and decide which posts are more convincing. I’ve included the more convincing ones for me.
Comment 99. I haven’t read any of the other threads on this issue at CT, so I don’t know whether it has already been covered or not. But there are two things with the mortality study in the Lancet: (1) Most people concentrated on the issue of how representative the sample is. Yes, the small sample size is quite normal for health statisticians, but only under three circumstances: (a) that the issue you study isn’t too complex, (b) that you have a stable environment where you can run repeated tests, (c) that you have comparative data to see whether your sample is an outlier or not. ad (a) general mortality rates in a country are complex at the best of times, even more so under conditions of civil war ad (b) the Lancet didn’t run repeated tests, the study isn’t reproducible and there are hardly any good comparative data, and those which exist suggest that the Lancet study is an outlier (see below) In conlusion, the Lancet sample is sufficient for hypothesis-generating and to guide further research, but in itself it would be unwise to prove anything with it as it doesn’t meet two crucial scientific tests (repeated, reproducible tests; comparative data) (2) But the more important point is not sample size; what’s important is what they compare, ie the “before” and “after” The 98,000 (8,000-194,000) number comes from a 1.58 increase in the overall mortality rate in Iraq from 5 before the war to 7.9 after the war. There’s no other data than the Lancet’s study of 7.9 for after the war as of yet. But there are other numbers for the before the war scenario. In the 1980s, the last time international agencies could check any real numbers, Iraq’s mortality rate was always in the region of 7 to 8 (per 1,000 people). According to the World Bank’s “World Development Indicators” (http://www.worldbank.org/data/wdi200...s/table2-1.pdf) report, the 2002 figure for Iraq was 8 as well. And if you look at some other reports from various reports, you will find that most reports put the figure somewhere around 8. Let’s compare this with some other countries. The US had a mortality rate of 9 in 2002, the UK 10, the average for high-income countries is 9 too. The average for low-income countries, where Iraq belongs to as well, was 8, The world’s average was 9. Now think back: The Lancet study suggest that in 2002 the mortality rate in Iraq was 5. This means that prior to the war, (a) Iraq had one of the lowest mortality rates in its recent history and (b) that Iraq had one of the world’s lowest mortality rates. In other words, if you believe the Lancet study, then Iraq before the war was one of the safest, healthiest and best places to live in the world. Do they really believe that? In conclusion, whatever the problems of their small sample size; even if we accept their mortality rate of 7.9 after the war, set against the most likely number of around 8 before the war, it would suggest that there hasn’t been a great increase in excess deaths. So unless some better data comes out, the civilian death toll of the Iraq war stands at about 15,000 (from Iraq Body Count)
Comment 106. Some more data on the mortality or death rate per 1,000 in Iraq before the war: Source: UN Population Division & Unicef Downloadable for example at: pdf.wri.org/wr98_hh2.pdf Mortality rate in Iraq: 1975-1980: 8.8 1995-2000: 8.5 1980-1991 doesn’t produce any good data because of the Gulf wars; “smart sanctions” were introduced in 1996, so sanctions don’t have a great influence on the data (beyond what was the government’s responsibility). Average figures are especially valuable as they eliminate statistical outliers. I think it can be safely said that Iraq’s “natural” mortality rate is somewhere around 8 per 1,000. Of course, Les Roberts et al. think that the mortality rate in 2002 was 5 per 1,000. Here’s the challenge: Can anyone find anywhere in the academic literature and scientific databases a source which would correspond to Roberts et al finding of 5 (+/- 10%) for a year before the Iraq invasion? All the statistics for various years before the invasion lie somewhere around 8, which is the best scientific data we’ve got so far. Up to now, Roberts et al are the only one who claim it was 5. Now the question is, which claim looks more likely, is based on more solid methodological grounds, is validated by reiterated tests, and provides greater overall consistency with other comparative historical and cross-country datasets? I think it’s pretty obvious that Roberts’ sample is the outlier here, not all the other studies before. A more careful peer-review process would have pointed this out and had given Roberts the opportunity to revise & resubmit his work after another round of sampling and a more solid methodological basis. Roberts should have published his initial data on his website, if he wanted to influence the US election with it. But with the premature publication, the Lancet unfortunately lost a lot of academic credibility. (Counterexample: Suppose a Bush-supporting team had found that the mortality rate in 2002 was 10 and after the invasion 8, with a reduction in deaths by probably somewhere around 50,000—do you think the Lancet would have published it?)
Comment 154. the rather unsatisfactory way dsquared has answered your question in the past is to say that the two estimates (pre- and post-liberation) use the same methodology, so any error should affect both the same (fingers crossed ;-), and even if both numbers are wrong, at least the increase will be correct. But that’s incorrect. The Lancet study is one study, but it consists of two estimates. (1) One estimate, the pre-invasion, relies on memory reporting; the other, the post-invasion, is based more on actual present-day evidence. (2) The sampling population situation pre and post is very different. The war produced great movements of people and other disruptions. As such, for example, the result of the pre-war rate may reflect a post-war sampling error, etc. (3) Most important point: Consistency within one study is nice. Overall consistency with other data points is more important. (4) The same-rate-increase argument would only hold if Roberts et al had done the same study in 2002 (rather than a backward estimate in 2004). To repeat: Roberts et al can only tell us something interesting (however limited) about a first estimate for 2004. Their 2002 estimate is another thing, under different circumstances, needing another methodology (because you have to sample the people as if it were 2002, not after all the changes as if it is 2004). Big picture: I’m getting tired of all this. The main conclusion is that the Lancet study provides some initial results. Further research is urgently needed. Responsible scientists wait for better results before rushing to judgement. To produce a 100,000 excess deaths figure was a highly political move and bad science. Though my politics is closer to Roberts than many people who attack him, I’m just deeply worried about the reputation of proper and responsible science here. People should shut up and forget about the Lancet study for the moment, while concentrating on the things that really matter such as reconstructing Iraq. The blame game can be played later, dsquared.
Comment 156. Let’s take up dsquared points: (a) Because it’s the most recent piece of fieldwork, and because it’s the only fieldwork-based estimate from more recently than 2000. Yes, it the most recent after the war, and I accepted that we should take seriously Roberts et al estimate for 2004. But that leaves the issue of what the mortality rate was before. I would need to check where bodies such as the World Bank or the UN Population Division get their 2002 data from. But even if you’re right that there hasn’t been a fieldwork estimate since 2000, it is highly unlikely that the mortality rate has so dramatically declined from roughly 8 to 5 in just two years. Again, Roberts et al need to engage with previous estimates and show why they have been wrong; otherwise the odds are that they represent the outlier and not the other studies. (b) Also, because it is in no sense an “exploratory” sample; it consisted of over 7000 individuals in 33 clusters. First, it’s 30 households in 33 locations, which makes 990 data points (which happen to be roughly 7,000 people large, but that’s not the sample size, the sample size is 30×33). Second, it is largely exploratory because key environmental and demographic basis census data are missing so that in order to estimate the sample they had to estimate an awful lot of fundamental data which are normally given and not much debated in stable conditions. (c ) You will notice that no serious epidemiologist has made any of the criticisms you’re scrabblng for. First, I notice that there hasn’t been any serious scientific debate on it; largely because there can’t be at the moment because it’s too early. Most scientists have (rightly) defended Roberts et al. against some of the wildest claims, the verdict on his scientific work is still out and we all need to wait for further research. Second, I’m a statistician and the basic principles underlying Roberts et al study are very simple and can be grasped by a normally educated person, so we don’t need the authority of scientists as the ultimate judge whether the study is flawed or not. Third, proper science is slow. Proper peer-review processes take months and rejoinders take another couple of months. I’m sure in a year’s time or so other teams of scientists will have evaluated the Lancet study and written their responses or even done alternative estimates etc. We only had a political debate about the Lancet study so far, let’s wait a bit to let the scientific debate begin. (d) anyone who starts talking about sampling error without so much as considering the possibility of an underestimate (despite the fact that we know that there were clusters like Fallujah which had much higher death rates), loses a whole lot of credibility in my eyes. I didn’t rule out an underestimate. (As I wrote before, I’m sure the mortality rate today is higher than before. It might be by 8,000, or by 98,000 or by 300,000. We don’t know. Unfortunately Robert’s et al estimate doesn’t look very solid). (e) if there was not a very material rise in the death rate in Iraq due to the war, how likely would it be that you would get a sample of 7000 Iraqis which reported such a large rise in the death rate? I’m a bit puzzled by this question, partly because I don’t fully understand it. First, you don’t have 7,000 Iraqis reporting a higher death rate. You’ve got 990 data points consisting of 7,000 Iraqis, where I think something like 160 or so deaths were registered (I’m not sure I correctly remember the actual death figure they’ve found; I would need to look it up). It’s not the case, as dsquared seems a bit to imply, that you’ve got 7,000 Iraqis who all reported a higher death rate. Second, dsquared seems to mean that whenever you find a large increase in one sample, it must be true for the entire population (or at least very likely). Well, yes, if you do your sampling correct (and even then this leaves a significant error margin in such large-scale studies). Again, if the sample is skewed in one direction, it’s very easy for the sample to be out of the loop with the entire population. Think of a very easy error source: Official demographics before the war inflated the percentage of the Sunni and after the war you had an extra inflow of Kurdish and Shiite refugees. Sunnis probably had a disproportinate low mortality rate under Saddam and a disproportionally high after the invasion. If that’s the case, and you estimate your sample size according to old Saddam-era census data (which Roberts et al seem to have done), then it’s easy to overrepresent the Sunni in your sample. That would result in two errors: You would underestimate the mortality rate before the war and overestimate it afterwards. If you get your basis data for demographics by 3% wrong, it means that in a country like Iraq you’re wrong by 750,000 people. If you then build this additional error into your estimate model, your CI is virtually destroyed and your data useless. This is very likely what happened in Roberts case.



LinkBack URL
About LinkBacks
Reply With Quote
Share this thread with friends: