Conflict of Interest and Campbellian Validity
Abstract: The conflicting interests of evaluators are biasing the findings of some evaluations and experiments, even technically rigorous studies. One improvement would be to emphasize evaluator and investigator conflict of interest threats in conceptions of validity. Such discussions could suggest ways to assess and avoid conflicts of interest. I explore the possibilities in one highly regarded framework, the Campbellian conception of experimental validity. However, all evaluations and experiments, whatever their methods, are vulnerable, and all conceptions of validity should address such threats.
When I reviewed the medical research literature a few years ago, I discovered statements in journals that the pharmaceutical drug evaluations under review were biased and that the same studies were high quality (House, 2008). What could this mean? How could the same studies be rigorous and biased simultaneously? What the reviewers seemed to mean was that the studies had randomized assignment and double-blind controls—markers of rigorous design--but were manipulated in other ways to produce positive results. Here are sources of bias in these studies.
Opportunistic choice of comparison (Placebo rather than competitor)
Improper choice of sample (Younger subjects suffer fewer side-effects)
Manipulation of dosages (Higher dosages for sponsor drugs)
Incorrect administration of drugs (E.g., oral instead of injected)
Manipulation of time scales (Chronic use drugs tested for short periods)
Opportunistic outcome selection (Ignoring possible side-effects)
Ignoring actual negative effects
Redefining goals of study after findings to achieve success
Opportunistic data analyses
Opportunistic interpretations (“This drug is now the treatment of choice.”)
Concealing unfavorable data
Control of authorship (Company employees, rather than researchers, writing reports)
Selective publishing (Publishing only positive findings)
Deceptive publishing (Publishing positive findings repeatedly under different authors)
These biases are deliberate. That is, many drug evaluations are being designed
and conducted to deliver positive findings (Angell, 2004). The studies are
biased even while adhering to rigorous design safeguards. There is a sense in
which rigorous methods, the way we normally think of them, are insufficient for
claiming the studies are valid. That’s not to say that rigorous methods are not
necessary. In these late stage clinical drug trials, there is no doubt that they
are.
Drug evaluations are controlled by sponsoring
companies for whom positive findings mean billions of dollars in profits. Testing a new drug for the market is risky
and expensive. Drug companies bear the cost. And, apparently, leaving the
results to honest evaluation is too chancy for some. Factors that strongly influence
findings inappropriately include these:
Sponsorship of the study
Terms of the contract
Financial ties
Proprietary ties
Personal ties
Gifts
Indeed, vested interests have been shown
empirically to influence findings (Als-Nielsen, Chen,
Gluud, & Kjaergard,
2003). In spite of such evidence, some researchers claim that they are not
influenced by the large sums of money paid by companies, nor are
they influenced by holding patents. It does not help their credibility that
many conceal their financial ties to companies.
How serious is the problem? In 2008 Senator
Grassley held hearings comparing drug company payout records to fees medical researchers
say they received. For example, from 2000 to 2007, Charles Nemeroff,
an eminent psychiatrist and editor of Psychopharmacology,
was chief investigator of a $3.9 million NIH grant that used Glaxo, Smith, Kline drugs. NIH
rules require investigators to report income of $10,000 in any year to NIH and for
universities to replace such investigators. Nemeroff reported
earning $35,000 from Glaxo; actually he earned $960,000
(Harris, 2008). In total, he received $2.8 million from drug companies and
failed to report $1.2 million of it. This is a violation of NIH rules. Child
psychiatrist Joseph Biderman from Harvard Medical
School and his colleague Timothy Wilkens reported receiving
several hundred thousand dollars each from drug makers. Actually they earned
$1.6 million each. Such researchers serve on panels that approve drugs. Senator
Grassley said universities seem incapable of policing these conflicts of
interest.
Drug evaluations are not the only manifestation
of conflict of interest (COI). Gorman and colleagues conducted analyses showing
conflicts of interest in the evaluation of school-based drug and violence
prevention programs (Gorman & Conde, 2007; Gorman & Huber, 2009;
Gorman, Conde, & Huber, 2007; Gorman, 2005; Gorman, 2002). Evaluator
independence was lacking in these studies, and the financial interests of the
evaluators were affected by the findings. Studies contained questionable
analyses, opportunistic changes in outcome variables, and weaknesses in selection
and retention. Based on these studies the programs were placed on the Department
of Education’s list of evidence-based model programs. The push to be included
on evidence-based recommendation lists has intensified conflicts of interest
because the designation is marketable (Gorman, 2006).
In noting that drug use prevention program
findings had little validity, Moskowitz (1993) reported
the studies suffered from conflicts of interest. Similarly, Eisner (2009)
questioned disparities in findings from criminology studies done independently
versus those done by evaluators with conflicts of interest. Although there are unresolved
issues, it is safe to conclude that conflicts of interest are widespread and
worsening (Feinstein, 1988; MacCoun, 1998).
Campbell and Stanley’s Conception of
Validity
One step to deal with conflict of interest threats would be to emphasize
them in our conceptions of validity. My example here is Campbell and Stanley’s
conception of experimental validity and its later revisions, but these concerns
apply to all conceptions of validity. As my colleagues in this volume note,
Campbell and Stanley’s typology is a conception of experimental validity, not
of evaluation. However, Campbell and Stanley and approaches derived from it are
used as the core of many evaluations, and these validity discussions influence evaluations
employing experimental methods. Drug studies are both evaluations and experiments.
Although I am focused on evaluations, similar considerations apply to
experimental studies in which the stakes are high and variously sponsored, as
with many environmental, biological, and ecological studies. Experimental studies
are vulnerable to investigator conflict of interest, depending on circumstances.
The seminal work on experimental design is Campbell and Stanley (1963),
followed by Cook and Campbell (1979) and Shadish,
Cook, and Campbell (2002). I start with the original conception to illustrate
that conceptions of validity are products of their time, that validity conceptions
take shape from the situation, and that situations change. Here are the threats
to validity enumerated in Campbell and Stanley (i.e., other effects confounded
with treatment effects).
History—Events occurring between first and second measurements
Maturation--Processes
occurring within respondents
Testing—Effects
of test taking on later testing
Instrumentation—Changes
in instruments, observers, and scorers
Statistical regression—Operates where groups are selected on extreme scores
Biases--Differential selection of respondents for
comparison groups
Experimental
mortality—Differential loss of respondents from groups
Selection-maturation interactions
Reactive or interactive effects of testing
Interactive effects of selection biases and experimental variables
Reactive effects of experimental treatment precluding generalization
Multiple
treatment interference
These threats are not deliberate. They are accidents and artifacts of experimental conditions. Colleagues inform me that Campbell was aware of potential investigator biases: “…scientists are thoroughly human beings: greedily ambitious, competitive, unscrupulous self-interested, clique-partisan, biased by tradition and cultural misunderstandings…. (Campbell, 1984, p. 31). He goes on: “The complete sociology of applied-science validity…would take into account environmental impacts on commitments to validity which applied science careers involve” (Campbell, 1984, p. 41).
Why then wasn’t conflict of interest included in the original framework? My view is that there was no reason to include it at the time. Conflicts of interest were not major threats. In the 1950s and 1960s it made good sense to ignore deliberate manipulation of findings. Perhaps we were too naïve, but the belief was that researchers aimed for findings that contributed to theory and practice. What was the point in manipulating findings? In any case, findings were subject to review and replication.
I did not emphasize
conflict of interest in my work at the time, nor did most evaluators. The
exception is Michael Scriven, who worried about government
reporting (Scriven, 1973). In the 1980s the situation
changed. Reagan began privatizing and deregulating many functions of
government, including research and evaluation, trends which continued in
ensuing administrations and led to a different environment.
Revised
Conceptions
The original Campbellian conception of validity was brilliantly conceived, tremendously influential, and a product of its time. Campbell created the internal/external distinction because students mistakenly believed that Fisherian randomization controlled all threats to validity (Shadish, Cook, and Campbell, p.37, Footnote 3). Cook (2004) has explicated changes in conception in Cook and Campbell (1979) and Shadish, Cook, and Campbell (2002).
In Shadish, Cook, and Campbell (2002), the latest version, validity refers to the approximate truth of an inference. According to the authors, inferences from studies invariably involve human judgment and are not absolute or automatically derived. Nor are inferences guaranteed by particular methods or designs. The conception is pragmatic in that it emphasizes ruling out alternative explanations. Some threats can be recognized early and blunted by design controls. Others are not easily anticipated and must be considered post hoc. Researchers can investigate threats by asking how they apply in particular cases, whether they are plausible, and whether they operate in the same direction as program effects. Threats change over time, and their saliency varies from study to study. Lists of threats serve as useful heuristics for investigators, though such lists are not complete. I find this conception quite reasonable, without delving into the sub-categories of statistical, internal, construct, and external validity. It is admirably flexible and allows for changing times.
The revised conception also fits my notion of how validity is determined generally. Determining validity requires evaluating the study itself. In analyzing the logic of evaluative argument, I suggested that evaluations consist of arguments that piece together strands of information, quantitative and qualitative, general and particular (House, 1977). Cronbach used this idea to recast standardized test validity, which at that time consisted of establishing correlations among test scores (Cronbach, 1989). He contended that test validation should be based on broad justifying arguments. The idea of validity based on arguments makes sense.
Elsworth (1994) extended the original conception in another direction. Working from Campbell’s relabeling of internal validity as local molar causal validity, Elsworth applied a scientific realist view to validating field studies. For scientific realists social systems are open, and inferences are validated by explanatory power. In social systems unknown intervening causal interactions are always possible and often undetected. Elsworth also construed threats as plausible alternative explanations, with threats countered by design and post hoc analyses involving constructing arguments for and against (Dunn, 1982). Threats can be discovered by examining particular situations. Elsworth’s conception is compatible with Shadish, Cook, Campbell. (See Mark, Henry, & Julnes (2000) for another conception based on scientific realism and Norris (2005) for an overview of the concept of validity.)
Most discussions of validity do not emphasize conflict of interest threats. In Shadish, Cook, and Campbell, motivational threats to construct validity include these:
Self report motivation
Participant perceptions of situation
Experimenter expectancies
Novelty and disruption effects
Compensatory equalization
Compensatory rivalry
Resentful demoralization
These threats result from participant
motivations, not investigator motivations, except for experimenter
expectations. The latter refers to Rosenthal’s work on how experimenter expectations
influence participants, not on how researchers deliberately bias findings. Elsworth included “coercion” of the investigator to perspectives
of the group studied, addressing the danger of “going native,” but not evaluator
motivation to change findings because of self-interest. Remedies presented by Shadish, Cook, and Campbell and by Elsworth
include placebos, masking procedures, less obvious measures, delayed outcome
measures, multiple experimenters, and minimizing contacts between researchers
and participants.
Conflicts of interest are different kinds
of threats that arise from the objective situation. They cause investigators to
behave differently. Conflicts of interest require bringing the investigators
themselves into the validity analysis. Some might argue that these threats
should be handled outside validity frameworks, but that would leave these
schema ignoring serious threats. To recite Campbell’s concern about Fisherian randomization, the omission might lead students
and others to think all threats are controlled when they are not. Others might
argue that these threats can be handled solely through technical controls, but
this remedy has not worked well in drug studies. A better solution is to address
conflicts of interest directly. Judgments of validity should consider not only
technical safeguards, but also the conflicts of interest of investigators.
Including Conflict of Interest Threats
How might such threats be included in the Shadish, Cook, and Campbell conception of validity? The framework is an individual-psychological model of causal inference that posits investigators drawing inferences backed by evidence. Right now, it does not emphasize conflict of interest. As different kinds of threat, evaluator conflicts of interest can adversely affect anything in the evaluation, as the drug studies demonstrate, and be manifested in all four validity types. For example, drawing inappropriate conclusions might threaten external, internal, or construct validity. Tracking all such possibilities would require enormous effort.
Another way to address such threats would be to discuss whether the evaluator/investigator is without conflict at the beginning, to handle conflict of interests or its absence as an explicit assumption early on. Conflict of interest threats could be addressed at the front edge of the framework, so to speak, as an assumption that might be violated. That would keep the four validity types (statistical conclusion, internal, construct, and external) intact. By modifying the front part of the causal inference model, one could discuss conflicts of interest, how they can be discovered, and remedies for dealing with them. In other words, this is what happens when a key assumption is violated and how to remedy it.
Such an expansion of the conception makes sense theoretically by placing the assumption where it fits the causal inference model, and practically by treating conflict of interest threats succinctly without disturbing other components. (There might be other assumptions.) Cook (2004) noted that the Campbellian schema has sometimes valued practical fixes at the expense of building inconsistencies into major concepts. In this case there is no need for theoretical inconsistency. The proposed revision fits the causal inference model.
In my opinion, changes to the schema should be done by those who have deep tacit knowledge of the framework. Nonetheless (prodded by editors), I would enumerate conflict of interest threats and inquire if the evaluator/investigator harbors such conflicts. The relationship of the evaluator to sponsors and program is central. If conflicts of interests do obtain, I would ask if the study should proceed, or if underway or completed, whether the study should receive close independent scrutiny. I would explore remedies developed by those who have wrestled with these problems, such as the FDA, the medical journals, and the medical schools. Of course, these are suggestions. There may be better ways to handle such threats. The important point is to handle them somewhere in the framework.
Remedies
Strategies for dealing with conflicts of
interest include transparency, oversight, and organization.
Transparency. Revealing conflicts of interest is
critical. When participants serve on study panels for the National Academy of Sciences,
they serve unpaid, a requirement set down by Abraham Lincoln when he founded the
Academy (Feuer, 2009). Lincoln stipulated that the
organization should serve the public and not merely enhance professional
status. Panel members are asked to reveal potential conflicts of interest that
might impair the task at hand. These written statements are reviewed by Academy
staff and discussed by fellow panel members to determine how serious the conflicts
might be.
For example, in a review of NASA’s
education programs, some panellists, particularly those with previous NASA
affiliations, might have conflicts. One left the panel when he decided to bid
on a NASA contract. Revealing such possibilities in writing is not an odious
requirement, and similar safeguards would not be asking too much of evaluators.
Transparency—revealing potential conflicts of interest--might be required
for evaluators conducting studies, with written declarations part of the record.
Another transparency
tactic would be to make data and methods accessible for criticism. Although transparency
seems critical in any scientific endeavour, it is not always practiced. Indeed,
drug companies often do not make data available, though formally required to do
so.
Oversight. Revealing conflicts of interest voluntarily
is not enough. For example, the credibility of NAS reviews is reinforced by oversight
and organizational arrangements. First, the review is performed by a presumably
impartial scientific organization. Second, panel members are chosen for expertise,
and the panel is balanced with members representing different viewpoints.
Third, the panel is given a formal charge, like a jury; most panellists take
such charges seriously. Fourth, part of the charge is that panel members agree
on a consensus report, if possible. This requirement forces members to engage
in extended discussions and arguments. Serious deliberation leads to better
informed and more impartial findings (if done properly). Finally, the draft
report is subject to external reviews.
The FDA requires advisory panel members to reveal
conflicts of interest in writing, and these declarations are
assessed by FDA staff (Committee on the Assessment of US Drug Safety
System, 2006). Panel members who receive $50,000 or more from a company with a
product at issue are not allowed to serve. Those who receive less can serve,
but are not allowed to vote on drug approval. Vioxx
would not have been returned to the marketplace if this rule had been in effect
(Harris, 2007). Ten of 32 advisers voting for approval would have been
disqualified.
Some
medical schools have restricted gifts to students and faculty (Moynihan, 2003).
A recent study found medical students were influenced favorably toward products by simple gifts like
coffee mugs and pens with the drug’s name on them, even while believing they
were not susceptible to influence (Nagourney, 2009).
The American Medical Student Association endorses stricter guidelines such as limiting
visits of drug representatives, restricting events sponsored by industry, and limiting
journals accepting advertising (Moynihan, 2003).
Until recently, medical research journals regarded themselves as neutral conduits for scientific information, but several incidents have led to reassessment. The New England Journal of Medicine retracted a key article about Vioxx because Merck failed to include negative data about cardiovascular risk, though the missing data were reported to the FDA (Armstrong, 2006, Zimmerman & Tomsho, 2005). Eventually the drug was recalled. Missteps led the journal editors not to challenge the original manuscript. Drug advertising contributes significant journal revenue. The journal received $88 million in publishing revenue in 2005 and $697,000 from Merck for reprints of the Vioxx article.
Journal authors are asked to reveal financial ties and sign statements that they wrote the article. Company employees have been ghostwriting articles and having MDs sign (Singer, 2009). Journals also are requesting that all studies be registered in an NIH database so they can track studies never reported. Sometimes positive findings for drugs are published by different authors in different journals without cross-reference, as if these were separate studies. Without access to all studies, reviewers are likely to base recommendations on biased evidence unknowingly (Melander, Ahlqvist-Rastad, Meijer, & Beermann, 2003).
Organization. Often conflicts of interest arise from the organization of the evaluation enterprise. For example, drug companies have become involved in every aspect of studies (Bodenheimer, 2000). Companies hire contract research organizations to implement company designs, and these contract organizations are heavily dependent on the firms for funding (Bodenheimer, 2006). In the 1990s, the FDA began accepting user fees for clinical trials and negotiating with companies about how to do reviews, giving the industry leverage over the agency (Mathews, 2006).
The obstacles to honest evaluation are less obvious in the internal organization of the FDA (Committee on Assessment of the US Drug Safety System, 2006). Two offices inside are responsible for evaluating drugs. One oversees clinical trials, while the other tracks long-term effects of drugs already on the market. Clinical trials receive most funding and authority because they make or break success initially. Also, the clinical trial office employs higher status research methods while the follow up office uses epidemiological methods to track effects.
The thousands of drugs on the market could all interact with each other. No clinical trial, no matter how huge, could anticipate interactions across all patients. Clinical trials necessarily focus on efficacy, not drug safety. An Institute of Medicine review recommended more funding and authority for the follow-up office, which is seriously underfunded. (Committee on Assessment of the US Drug Safety System, 2006). These problems reside in the organization and interaction of the industry with its regulators.
Some time ago Scriven (1976) posited three principles of organizational bias control. The principle of independent feedback is that no unit should rely entirely on a subunit for evaluative information about that subunit. The second principle, the instability of independence, is that independence is fleeting and subject to compromise. Hence, there must be arrangements for renewal and replacement as evaluators become co-opted. The third principle, dynamic equilibrium, is that although there are no totally unbiased evaluators, there are arrangements that can reduce the influence of the most damaging biases. “…independence is essential, impermanent, and situational” (Scriven, 1976, p. 139).
Scriven’s clarification of the concept of “bias” is also worth reiterating. Sometimes bias refers to the systematic tendency to make errors and sometimes to actually making the errors. The former meaning is crucial in maintaining the credibility of evaluations while the latter affects validity. We need both credibility and validity. Conflicts of interest increase the tendency to commit errors. In spite of this evaluators sometimes conduct valid evaluations even when findings weigh against their own material interests. Unfortunately, the track record is not encouraging. By reducing conflict of interest--the potential for bias that gives rise to actual bias--we can improve both credibility and validity.
Summary
Although drug evaluations are technically
rigorous, some are manipulated to produce favorable findings. Evaluator and
investigator conflicts of interest are becoming increasingly serious and
widespread. Our conceptions of validity do not emphasize conflict of interest
threats, including Campbell and Stanley’s conception of experimental validity
and its later revisions. Including conflict of interest threats in validity
frameworks is important—best to address such threats directly so everyone
will be alerted. Such threats could be addressed in various places within the Campellian framework. One reason the tradition has endured
nearly fifty years is that it has long prized practical responses to new problems.
Campbell noted, “A validity producing social system of science is nothing we
should take for granted” (Campbell, 1984, p. 32). Of course, other validity
conceptions should also address these threats.
References
Als-Nielsen, B. Chen, W., Gluud, C., & Kjaergard, L. L. (2003). Association of funding and conclusions in randomized drug trials. Journal of the American Medical Association, 290: 7, 921
Angell,
M. (2004). The truth
about the drug companies. New York: Random House.
Armstrong, D. (2006). How the New England Journal missed warning signs on Vioxx. Wall Street Journal, May 15,
Bodenheimer, T. (2000). Uneasy alliance: Clinical investigators and the pharmaceutical industry. New England Journal of Medicine. 342:20, 1539-1544
Campbell, D. T. & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago: Rand McNally.
Campbell, D. T. (1984). Can we be scientific in applied social science? In R. F. Connor, D. G. Altman, C. Jackson (Eds.). Evaluation Studies Review Annual. Sage: Beverly Hills, CA.
Committee on the Assessment of the US Drug Safety System. A. Baciu, K. Stratton, & S. P. Burke (Eds.). (2006). The future of drug safety: Promoting and protecting the health of the public. Institute of Medicine. Washington, DC: National Academies Press.
Cook, T. D. (2004). Causal generalization. In M. C. Alkin (Ed.) Evaluation Roots. 88-113. Sage: Thousand Oaks, CA.
Cook, T. D. & Campbell, D. T. (1979). Quasi-experimentation: Designs and analysis issues for field settings. Chicago: Rand McNally.
House,
E. R. (2008). Blowback: Consequences of Evaluation for Evaluation, American Journal of Evaluation, 29 (4)
416-426.
House, E. R. (1977) The Logic
of Evaluative Argument. Los Angeles: Center for the Study of
Evaluation. UCLA.
MacCoun, R. J. (1998). Biases in the interpretation and use of research results. American Review of Psychology. 49: 259-87.
Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation. San Francisco: Jossey-Bass.
Mathews. A. W. (2006). Drug firms use financial clout to push industry agenda at FDA. Wall Street Journal, Sept 1.
Melander, H., Ahlqvist-Rastad, J., Meijer, G., & Beermann, B. (2003). Evidence b(i)ased medicine—selective reporting from studies sponsored by pharmaceutical industry: review of studies in new drug applications. British Medical Journal, 326: 1171-1173, 31 May.
Moynihan,
R. (2003). Who pays for the pizza? Redefining the relationships between doctors
and drug companies. 2: Disentanglement. British
Medical Journal, 326:1193-1196, 31 May.
Moskowitz, J. M. (1993). Why reports of outcome evaluations are often biased or uninterpretable. Evaluation and Program Planning. 16: 1-9.
Nagourney, E. (2009) Small gifts found to influence doctors. New York Times, May 19, D6.
Norris, N. (2005). Validity. In S. Mathison (Ed.) Encyclopedia of Evaluation, Thousand Oaks, CA: Sage, 439-442.
Pollack, A. (2008). Stanford to ban drug makers’ gifts to doctors, even pens. New York Times, September 12, C2.
Zimmerman, R. & Tomsho, R. (2005). Medical editor turns activist on drug trials. Wall Street Journal, May 26.
Special thanks to G. Elsworth; Gene V Glass; D. Gorman; Steve Lapan and his students; and M. Mark for help in shaping these ideas.
No comments:
Post a Comment