The preceding tutorial presented a list of criteria which readers can use to differentiate studies that are likely to be valid from those that may not be. Studies which do not satisfy most of the methodological filters are usually best ignored. This section considers how therapists should interpret those trials which satisfy most of the methodological filters. The message is that it is not sufficient to look simply for evidence of a statistically significant effect of the therapy. You need to be satisfied that the trial measures outcomes that are meaningful, and that the positive effects of the therapy are big enough to make the therapy worthwhile. The harmful effects of the therapy must be infrequent or small so that the therapy does more good than harm. Lastly, the therapy must be cost-effective.
Of course, for a trial to be useful it must investigate meaningful effects of treatment. This means that the outcomes must be measured in a valid way. In general, because we usually judge the primary worth of a treatment by whether it satisfies patients’ needs, measurement outcomes should be meaningful to patients. Thus a trial which shows that low-energy laser lowers serotonin levels is much less useful than one which shows that it reduces pain, and a trial which shows that motor training reduces spasticity is much less useful than one which shows it enhances functional independence.
The size of the therapy’s effect is obviously important, but often overlooked. Perhaps this is because many readers of clinical trials do not appreciate the distinction between “statistical significance” and “clinical significance”. Or perhaps it reflects the preoccupation of many authors of clinical trials with whether “p < 0.05” or not. Statistical significance (“p < 0.05”) refers to whether the effect of the therapy is bigger than can reasonably be attributed to chance alone. That is important (we need to know that the observed effects of therapy were not just chance findings) but on its own tells us nothing about how big the effect actually was. The best estimate of the size of the effect of a therapy is the average difference between groups. Thus, if a hypothetical trial on the effects of mobilisation reports that shoulder pain, as measured on a 10 cm visual analogue scale, was reduced by a mean of 4 cm in the treatment group and 1 cm in the control group, our best estimate of the mean effect of treatment is a 3 cm reduction in VAS (as 4 cm minus 1 cm is 3 cm). Another hypothetical trial on muscle stretching before sport might report that 2% of patients in the stretch group were subsequently injured, compared to 4% in the control group. In that case our best evidence is that stretching reduced the risk of injury by 2% (as 4% minus 2% is 2%). Readers of clinical trials need to look at the size of the reported effect to decide if the effect is big enough to be clinically worthwhile. Remember patients often come to therapy looking for cures (of course this generalisation may not hold in all areas of clinical practice) – most are not interested in therapies which have only small effects.
There is an important subtlety in looking at the size of a therapy’s effects. It applies to studies whose outcomes are measured with dichotomous outcomes (dichotomous outcomes can have one of two values, such as dead or alive, injured or not injured, admitted to nursing home or not admitted; this contrasts with variables such as VAS measures of pain, which can have any value between and including 0 and 10). Many studies that measure dichotomous outcomes will report the effect of therapy in terms of ratios, rather than in terms of differences. (The ratio is sometimes called a “relative risk” or “odds ratio” or “hazard ratio”, but it comes by other names as well). Expressed in this way, the findings of our hypothetical stretching study would be reported as a 50% reduction in injury risk (as 2% is half of 4%). Usually the effect of expressing treatment effects as ratios is to make the effect of the therapy appear large. The better measure is the difference between the two groups. (In fact, the most useful measure may well be the inverse of the difference. This is sometimes called the “number needed to treat” because it tell us, on average, how many patients we need to treat to prevent one adverse event – in the stretching example the NNT is 1/0.02 = 50, so one injury is prevented for every 50 subjects who stretch).
Many studies do not report the harmful effects of therapies (ie, the “side effects” or “complications” of therapy). That is unfortunate, because the absence of reports of harmful effects is often interpreted as indicating that the therapy does no harm, but clearly that need not be so. Glaziou and Irwig (BMJ 311: 1356-1359, 1995) have argued that the effects of therapy are usually most pronounced when given to patients with the most severe conditions (for example, bronchial suction might be expected to produce a greater reduction in risk of respiratory arrest in a head-injured patient with copious sputum retention than in a head-injured patient with little sputum retention). In contrast, the risks of therapy (in this case, from raised intracranial pressure) tend to be relatively constant, regardless of the severity of the condition. Thus a therapy is more likely to do more good than harm when it is applied to patients with severe conditions, and therapists should be relatively reluctant to give a therapy which has potentially serious side effects when the patient has a less serious condition.
In practice, it is often difficult for clinical trials to detect harmful effects, because harmful effects tend to occur infrequently, and most studies will have insufficient sample sizes to detect harmful effects when they occur. Thus, even after good randomised controlled trials of a therapy have been performed there is an important role for large scale “monitoring” studies which follow large cohorts of treated patients to ascertain that harmful events do not occur excessively. Until such studies have been performed, therapists should be wary about applying potentially harmful therapies, particularly to patients who stand to gain relatively little from the therapy.
An extra level of sophistication in critical appraisal involves consideration of the degree of imprecision of estimates of effect size offered by clinical trials. Trials are performed on samples of subjects that are expected to be representative of certain populations. This means that the best a trial can provide is an (imperfectly precise) estimate of the size of the treatment effect. Clinical trials on large numbers of subjects provide better (more precise) estimates of the size of treatment effects than trials on small number of subjects. Ideally readers should consider the degree of imprecision of the estimate when deciding what a clinical trials means, because this will often affect the degree of certainty that can be attached to the conclusions drawn from a particular trial. The best way to do this is to calculate confidence intervals about the estimate of the treatment effect size, if these are not explicitly supplied in the trial report. A tutorial on how to calculate and interpret confidence intervals about common measures of effect size is given in Herbert RD (2000). How to estimate treatment effects from reports of clinical trials. I: Continuous outcomes. Australian Journal of Physiotherapy 46: 229-235 and Herbert RD (2000). How to estimate treatment effects from reports of clinical trials. II: Dichotomous outcomes. Australian Journal of Physiotherapy 46: 309-313. Readers who are confident (sorry) with confidence intervals may find it useful to download PEDro’s confidence interval calculator. The calculator is in the form of an Excel spreadsheet.
The last part of deciding the usefulness of a therapy involves deciding if the therapy is cost-effective. This is particularly important when health care is paid for, or subsidised, by the public purse. There will never be enough resources to fund all innovations in health care (probably not even all good innovations). Thus the cost of any therapy is that money spent on it cannot be spent on other forms of health care. Sensible allocation of finite funds involves spending money where the effect per dollar is greatest. Of course a therapy cannot be cost-effective if it is not effective. But effective therapies can be cost-ineffective. The methods used determine cost-effectiveness are outside this author’s expertise, and it is probably better if I defer to more authoritative sources. If you are interested, you might like to read:
- Drummond MF, Richardson WS, O’Brien BJ, Levine M, Heyland D (1997). User’s guide to the medical literature: XIII. How to use an article on economic analysis of clinical practice: A. Are the results of the study valid? JAMA 277: 1552-1557.
- O’Brien BJ, Heyland D, Richardson WS, Levine M, Drummond MF (1997). User’s guide to the medical literature: XIII. How to use an article on economic analysis of clinical practice: B. What are the results and will they help me in caring for my patients? JAMA 277: 1802-1806.
To summarise this section:
Statistical significance does not equate to clinical usefulness. To be clinically useful, a therapy must:
- affect outcomes that patients are interested in
- have big enough effects to be worthwhile
- do more good than harm
- be cost-effective.
If you want to read further on assessing effect size, you could consult:
Guyatt GH, Sackett DL, Cook DJ (1994). User’s guide to the medical literature: II. How to use an article about therapy or prevention: B. What were the results and will they help me in caring for my patients? JAMA 271: 59-63.