NEED EVIDENCE OF WHAT WORKS:  COULD RANDOMISED CONTROL TRIALS BE THE ANSWER?

In 1747, James Lind took 12 sailors with scurvy who were “as similar as I could have them”. He divided them into 6 pairs and gave each different treatments. 

The treatments varied quite a bit. One pair received two oranges and one lemon per day. Another received a measure of salt water. Perhaps the luckiest got a quart of cider, while the unluckiest took ‘elixir of vitriol’ (a mixture of sulphuric acid and alcohol).

Lind’s results, published in 1753, found that eating oranges and lemons cured scurvy. Cider was next best. Elixir of vitriol was as effective as it was tasty. It was the first modern controlled comparative trial.

Fast forward 200 years and Sir Austin Bradford Hill - a stern-looking chap with a predilection for bow ties – trialled the use of streptomycin to treat tuberculosis. One group received streptomycin and bed rest, the other just bed rest. Patients were selected randomly and neither patient nor clinician knew who received what treatment. The randomised control trial was being perfected.

Fortunately for the world, streptomycin was a success. Bradford Hill, who later helped link smoking and lung cancer, went on to identify nine criteria for determining epidemiologic causality. He is one person who seems to have put his privileged upbringing to good use.

Fast forward again and randomised control trials (RCTs for those in the know) are entrenched as the ‘gold standard’ for testing new medical treatments. They have made an enormous contribution to improving health world-wide.

Increasingly, RCTs are also being used to assess other government interventions. As one academic sees it:

“For an ambitious government, there is little limit to the policy questions that might be answered … through well-designed randomised trials.”

It’s a big statement. One that sees empirical evidence driving public policy. The academic in question is now Australia’s Assistant Treasurer, Andrew Leigh. 

Leigh wrote the above words in 2003. Later he wrote a whole book on the topic – Randomistas: How Radical Researchers Changed Our World. World Bank economist Berk Özler describes him as a true believer.

Leigh is not alone. UK-based Ben Goldacre borders on the evangelistic. Goldacre sees randomised trials as the “best way to find out if something works”. His dream was to open a Number 10 Policy Trials Unit in place of the British Behavioural Insights Team.

In a stunning case of ‘if you can’t beat them join them’, Goldacre later worked with that same team to co-author of a report on RCTs.  Published in 2012, it identified nine steps for using RCTs as part of a test, learn and adapt framework. The report spawned a network of independent ‘what works’ centres in the UK.

RCTs are not the only way of establishing what works. But fans argue they create the neutral evidence needed to overcome the self-interest, expert bickering and blind ideology that often skews public policy.

Not everyone is a fan though. Most experts agree that RCTs can be useful. However, some argue that practical and conceptual problems can also result in them being less panacea and more placebo.

Sujatha Raman and Warren Pearce, for example, caution against over-reliance on RCTs and emphasise the need to place evidence from trials into a broader context. Chad Cook and Charles Thigpen outline five good reasons to be disappointed with medical RCTs and argue there are many more. Even the mostly positive Özler gently chided Leigh for not giving enough consideration to concerns about RCTs.

A large gap exists between academic concept and effective practice. It is a gap the Assistant Treasurer has an opportunity to fill when Australia’s first Evaluator General is appointed.  The EG will do more than RCTs, but making the most of RCTs will be high on the list. Here are four things for the Evaluator General to consider.

Cost versus Value

In 2014, Aylin Sertkaya and colleagues estimated that medical RCTs in the US cost between $22m and $71m (USD) depending on the treatment involved. This is before government costs in making decisions are considered. It is, by any measure, a lot of money.

Costs, of course, need to be set against benefits. The high cost of medical RCTs look much more reasonable (a bargain even) when the potential for harm from inappropriate medical treatments is considered.

Not all policy decisions warrant such large investments. Fortunately, less expensive RCT options do exist. Brookings argues that, where data are both easily available and fit for purpose, costs for simple non-medical policy trials can be as low as $50K (USD). Sadly, trials are rarely so simple and relevant data are rarely available and organised quite so conveniently. 

Australian experience suggests that low cost RCTs can provide some useful insights. But there is a world of difference between helpful insight and demonstrating what works.

Timing and focus

One of the earliest RCTs of a social program commenced in 1935. Known as the Cambridge-Somerville Youth Study (CSYS) it tracked two sets of ‘pre-delinquent’ youths. One set received an intensive program of individual support which lasted around 5 years. The other did not.

Follow-ups 15 and 21 years later found the program had no measurable effect on criminal activity (a key program focus). 30 years after the intervention a more comprehensive study was conducted. It found that those receiving services were more likely to have a range of wellbeing problems than those who did not.

The CSYS experience reveals a tension. The 5-year program was designed to improve (in modern parlance) life outcomes for at-risk youths. Yet it was only after 30 years that the program’s impacts became clear. Waiting this long for an answer, especially a negative one, would trouble even the most patient government.

Unlike the CSYS, most evaluations are timed around budget cycles. This has some logic. New programs are often given a limited funding life. To continue they must prove their worth in the Budget process – hence an evaluation.

The catch is that budget cycles can be too short for the true impact of an intervention to be seen. Political cycles further compound the issue, with new governments and ministers often changing trial programs ‘mid-flight’. Fast fail models are useful, but only if the program is given a genuine chance to succeed.

This opens a big question: should RCTs be designed and timed to meet the needs of the budget or should they be designed and timed to allow a true picture of impact to emerge.

One thing or many

Unravelling the mysteries of DNA created much excitement. One early idea was that key traits (disease risks etc) could be tracked back to a single gene. Identifying these genes would tell us our future and open the door for potential treatments.

Ultimately, few such genes have been found. Examples exist, but our traits/outcomes generally flow from complex interactions between constellations of genes and external factors.

What is true in genetics is true in policy. Most outcomes flow from a complex and dynamic constellation of interrelated factors, not just one.

RCTs (and evaluations generally) adopt a single gene approach. Rather than examining combinations of policies and programs, RCTs seek to isolate the impact of one. In doing so, they ignore the rest of the constellation.

This approach has value. But treating policy as an atomised set of individual actions, rather than a coordinated whole, has distinct limitations. For one thing, it can distract attention from ensuring the overall coherency of policy. The experimental approach implicit in an RCT model can also undermine human agency by reducing policy certainty and stability.

Defining success

The Hippocratic Oath is possibly the most famous oath in history. Parts of the original 2400-year-old text are now anachronistic. But two promises - to act diligently and solely for the “benefit of my patients” and to “abstain from whatever is deleterious” have stood the test of time.

The focus of the oath is on intent, not outcome. It reflects inherent uncertainties in the provision of medical treatment. Individual outcomes vary, even with the best of medical care.

In public policy, like medicine, well-meaning action delivers outcomes which range from the beneficial to the deleterious for individuals. Defining policy success is not simple and requires judgements that the balance of outcomes (good and bad) is in the public interest.

Vaccines provide a classic example. Governments (and doctors) promote vaccines on the evidence-based expectation of public and individual benefit, despite knowing some people will be unintentionally harmed.

Evaluations of income management similarly found that some people have been harmed by the program and others helped. These results have been used by different groups to argue opposite positions on the success of the program.

The promise of RCTs - to identify what works – is seductive. However, reality is more complicated and involves a question RCTs alone cannot answer - is the balance of benefits in the public interest?

A final thought

Like all good tools, RCTs add value when used well. Finding a sweet spot for the use of RCTs will be an early challenge for the Evaluator General.

Government’s commitment to better evidence should be welcomed. But perhaps an even bigger challenge lies in translating better evidence into better policy.

It is worth remembering that, despite Lind’s clear findings about scurvy, it took the Royal Navy another 50 years to introduce preventative treatments as standard practice.

Previous
Previous

PART 1:  CONSULTANTS - CON ARTISTS OR VIRTUOUS PARTNERS

Next
Next

THE MONEY OR THE FUN: WILL AUSTRALIA ADOPT A FOUR-DAY WORKING WEEK?