Monday, 5 November 2012

External Validity and False Positives in Randomised Control Trials



Two presentations at the latest NEUDC grabbed my attention in providing some necessary caveats on the conclusions we can reach from RCT's. 

First, a paper looked at the scalability of a proven intervention. Researchers intentionally select able NGO’s to implement the projects. (They often also put a lot of individual effort in to ensure quality implementation). However, policy conclusions often involve large-scale government roll out. Can a government repeat on scale the success of a highly motivated and able NGO?

This paper looked specifically at the use of contract teachers in education. (a great summary of this on the CSAE blog). Duflo Dupas and Kremer (2009) show that the use of contract teachers can significant increase educational outcomes in Kenya, partly because they face stronger incentives to teach well. However, turns out that implementation relied on a food NGO. When the NGO scaled up the project it worked, but when the government scaled up the project it didn’t.

This places some caveats on the policy conclusions we can reach from many RCT’s.

The second paper applies the standards required from RCT’s in medical trials to economic papers and finds us severely lacking. We have stolen the method of RCT’s from medicine, but we ignored what they have learnt about the shortcomings of RCT’s. This is a glaring gap and I can’t believe that this paper is the first to do it.

Randomisation solves endogeneity problems; however, biases emerge in the way that we conduct and report our studies, which could lead to false positives.

One big source of bias is lack of “blinding”. Participants respond or act differently because they know they are being treated, a kind of “Hawthorne effect”. This change in behaviour could have nothing to do with the actual treatment. In medical trials this is solved by giving the control group a placebo, but this is far more difficult in social projects. Furthermore, data collectors might ask questions differently in treatment units, because of the perceived pressure from the researcher to get a positive result.

The big problem, of course, is that researchers are biased. Aspirational graduate students (like myself!) invest years in a project and future job prospects often depend on finding a positive result. So, the more discretion is left to the researcher (in sample selection or reporting of results, for example), the higher the bias.

The authors propose introducing standards for conducting RCT’s and reporting results, similar to that in medical trials. The more we can tie the hands of the researchers, the less chance that /she can bias results. This would be a massive contribution to the field.

No comments:

Post a Comment