Friday, October 7, 2022

The Evaluator As Moral Fiduciary

January 8, 2021

 

The Evaluator as Moral Fiduciary

 

Ernest R. House

 

            I have long advocated the idea of justice in evaluation and, in particular, the deliberative democratic approach. As society changes, we might add new techniques and practices to meet new challenges and strengthen our approach. What might these challenges be, and what should we do to meet them?

Evaluations don’t seem as effective as they once were. Evaluator Tom Schwandt (2019) contends that evaluation has entered a post-normal phase similar to that of post normal science. In normal science, scientific studies are widely accepted by the government and public and often acted upon. In post-normal times, findings are frequently disputed, criticized, and ignored. Environmental science is an example, as is the epidemiology of Covid-19.

            I believe these changes in evaluation and science reflect changes in the larger society. Over decades, American society has become increasingly fragmented, polarized, distrustful, unequal, and corrupt (House, 2020). In such a society, evaluations become less effective. How can we make them more effective? Evaluators might act as moral fiduciaries, cultivate cognitive empathy, focus on deep stories and deep values, control for biases, and be transparent. In this chapter, I outline significant changes in American society, delineate several practices that can strength evaluations, discuss conditions of their implementation, and briefly tie these practices to evaluation theory.

 

Societal Changes

 

Over the past seventy years, American society has changed dramatically. When I was young, say about 1960, there was a consensus among the majority that the government was good and that society was on the right track. This consensus emerged from World War II and Roosevelt’s New Deal (Kennedy, 1999). There was a palpable sense of unity and national purpose. During the 1960s, the consensus and trust in government began breaking down. There was the Vietnam War and the Civil Rights movement, followed by Nixon and Watergate. These events engendered discord and distrust.

By 1980 Reagan could run on the slogan, “The government is not the solution to the problem; the government is the problem.” He initiated an era of privatization, deregulation, and strong anti-government sentiment. In the new century, there was the contested Bush/Gore election, the trauma of Nine Eleven, the Afghan and Iraq wars, and the Great Financial Crisis. Distrust deepened. American society fragmented.

Three powerful trends emerged: a deep decline in trust, a sharp rise in inequality, and an increase in corruption. People did not trust the government, they did not trust their institutions, and they did not trust each other. Decline in trust was precipitous during Vietnam and Watergate when trust in government dropped from 77% in 1964 to 30% by 1980 (Rothstein, 2018). By 2014 trust in government was 20%. Trust in Congress was 6%. Not all countries are like this. Trust in government in the Nordic countries is 70%.

Another powerful trend has been steepening inequality. In 1981 Reagan lowered the tax rate on the wealthy, followed by more tax cuts by later administrations. America now has the most extreme inequality of income and wealth of any developed country (Saez and Zucman, 2019, Stiglitz, 2018, Piketty, 2014). The three richest Americans own more than the bottom 160 million. The richest .1 percent have as much wealth as the lower 90 percent (Hacker and Pierson, 2020, Krugman, 2017). CEOs of major corporations used to make 20 times the average worker wage. They now make 360 times as much (Dubchuk and Fried, 2004). Top hedge fund managers make 100 million dollars per month (Saez and Zucman, 2019). Worker wages have stagnated for decades. Social mobility has declined. Nobel economist Joseph Stiglitz (2018) said the economy is rigged in favor of the wealthy. Conservative media mogul Rupert Murdoch noted that America is developing permanent social classes.

A third trend has been worsening corruption. When Governor Blagojevich of Illinois tried to fill the Senate seat vacated by Obama, few expected him to auction the seat to the highest bidder for personal gain. Patronage had become corruption. American society was beset with cronyism, favoritism, and conflicts of interest. Special interests dominated Congress and legislatures to secure special favors (Lessig, 2018). Seventy-five percent of Americans believe corruption is widespread throughout the government (Rothstein 2018). Political scientists call such behavior institutional corruption

Distrust, inequality, and corruption interact in downward spirals. The more a government is distrusted, the less it can remedy corruption or inequality, and the more it becomes vulnerable to cronyism and conflicts of interest. Distrust allows corruption that generates more distrust and inequality. These trends are reflected in dystopian novels, apocalyptic films, and polarized politics (Hacker and Pierson, 2020).

 

Act as a Moral Fiduciary

 

            How should evaluators behave in such a society? Problems in the larger society are manifested in evaluation. Evaluators encounter fragmentation, distrust, inequality, and corruption, such as conflicts of interest, in their work. They are often mistrusted and work with stakeholders with quite diverse views. Some sources of inequality, such as tax policies, lie beyond evaluation, but evaluators can attend to inequalities, fragmentation, distrust, and corruption within the evaluation space. They can act as moral fiduciaries. They might assume some moral fiduciary responsibility for helping those lower in the unequal socio-economic structure. They can strive to protect the interests of those less advantaged. Here’s an example.

In 2009 eleven thousand forensic rape kits were discovered in a police warehouse in Detroit. The kits had not been processed. For the victims, undergoing a rape investigation is invasive. The process takes hours and involves swabbing samples from every orifice of the body. Rape is a crime often committed by repeat offenders, and rape kits provide an opportunity to identify serial rapists. Yet the kits remained unexamined for years.

In an examination of police files, Rebecca Campbell and her colleagues (Campbell, Shaw, and Fehler-Cabral, 2015) discovered that the police investigating the crimes repeatedly dismissed victim claims of rape on the grounds that the women were prostitutes, sexually permissive, or did not want their parents to know they had sexual partners. In the judgment of the police, the women’s claims were not worth pursuing as crimes. Such judgments reflected police racial framing about the presumed character of minority women. Police reports often referred to victims in highly pejorative terms.

Campbell and her colleagues worked through the cases with police until the rape kits were processed, an exhausting exercise, with police officials admitting that classifying women this way was mistaken. In a sense, the researchers appealed to empirical evidence and the police sense of fairness. This was a difficult study to conduct because it involved conflicts of deep stories and deep values. During the study, investigators kept in mind the welfare of those who had been abused. Their welfare was most at stake, not that of the police nor those who funded the study. In such conditions of inequality, evaluators might give priority to the interests of those less able to defend their interests.

What is a fiduciary? In the financial community, the idea of a fiduciary means that the financial agent must act in the best interests of the client. As an experienced investor, I can attest that investment transactions are riven with conflicts of interest. For example, if an investor asks the advice of a financial advisor, the advisor is free to recommend mutual funds that enrich the advisor, not the client. Advisors often receive fees or a portion of the investment from mutual funds they recommend. Clients suffer because this is not the best investment for them. This is standard practice in finance.

Fortunately, there are advisors who swear to a higher fiduciary ethical standard. That is, they will offer advice that is in the best interests of the client. If they fail, they can be sued. Reformers have tried to make the fiduciary ethic the legal standard for investment professionals, but professionals have fought vigorously against this, an indication of how prevalent not acting in the best interests of the client is. Nonetheless, some advisors do choose to act as financial fiduciaries.

Evaluators might act as moral fiduciaries, meaning they act in the best interests of those less able to defend their interests, rather than in the interests of the sponsors or other stakeholders. Such an ethic doesn’t preclude acting in other stakeholder interests as well, but it does mean those less advantaged will be given priority. In most programs, it’s not difficult to determine who’s less advantaged. In Detroit, Campbell and her colleagues chose to work through case records with police to process the kits. They did not seek to reform the police view overall, but to change their view about specific cases. Evaluators can take some responsibility for the severe inequality in their society by doing something about it in actual evaluations.

 

Cultivate Cognitive Empathy

 

In conducting the Detroit study, the evaluators exercised some complex skills. One was to understand what the police were thinking. That insight was key to figuring out why the rape kits had not been processed. In his analysis, Schwandt recommends that evaluators strive to understand other people’s perspectives. Evaluators cannot assume their perspective is the only perspective. This is critical in a fragmented society.

            Sociologist Mario Small (2019) calls such understanding “cognitive empathy.” He contends that social researchers are quantitatively literate, but not as qualitatively literate. Qualitative literacy entails understanding other perspectives in depth. Cognitive empathy is not feeling empathy. It’s not feeling the same as others. Nor is it sympathy, feeling sorrow or pity for them. Rather it’s the ability to understand people’s predicament as they understand it. Their view will seem rational within their perspective. We can understand why they believe the way they do. That doesn’t mean we agree with them. Certainly, Campbell and her colleagues did not agree with the police stereotyping of minority women.

Small cautions against overgeneralizing other views. He calls this avoiding “out-group homogeneity bias.” When groups are far removed from our perspective, their views appear less diverse than they are. Not all police are alike. Another caution is to be sensitive to data used as supporting evidence. For example, when journalists and others report standardized test scores, they often explain the results based on no empirical evidence whatsoever. They may contend scores are low because the education system is dysfunctional. But there’s no supporting evidence about causes in the test scores results. Where do such explanations come from? From beliefs and stereotypes held by those making the claims.

 

Focus on Deep Stories

 

Deep stories are the subjective prisms through which we view the world, including how we feel about it. They are stories that “feel right” emotionally. They are interpretations of events and situations that people act on. The idea of deep stories comes from sociologist Arlie Hochschild (2016), who studied Tea Party voters in Lake Charles, Louisiana. They voted for Trump overwhelmingly in 2016. In their view, life is a long march towards the American dream, which lies just over the hill. But the line they’re in is stalled. Others are cutting in line ahead of them, minorities and immigrants, people who used to be behind them. That’s not fair, in their view. Government agencies are helping these people. It’s the government’s fault. Trump also blames the government, minorities, and immigrants. He derides the elites. This is the “deep story” of the Tea Party. It’s also the deep story of Fox News, their source of news. Fairness is critical, construed a particular way.

Similarly, sociologist Katherine Cramer (2016) studied people in small towns and rural areas of Wisconsin. Their towns are struggling, though people work hard. They believe the government is taking money away from them and giving it to minorities and immigrants in the big cities. In their view, decision-makers don’t respect rural people. These resentful voters tipped the state to Trump in 2016. People act on their deep stories.

There are other deep stories, like that of the progressives in Berkeley. In their view, Americans built a magnificent public square, but marauders invaded the square, dismantled it, and stole pieces to build private mansions. Massive accumulations of wealth threaten democracy itself. That’s not fair, either. They supported Bernie Sanders (Hochschild, 2016). These are a few of America’s deep stories.

If I were evaluating an environmental education program in these communities, it would help considerably to understand their deep stories. Lake Charles has oil refineries and toxic industries. Even though people die of cancer at high rates, the residents see these industries as vital to their livelihoods and have ways of thinking about the problem, including religion. In rural Wisconsin my evaluation would be different, and different yet again in Berkeley.

 

Concentrate on Deep Values

 

Deep values are entwined in deep stories. The Louisiana story is built around deep values about race, gender, and social class, around assumptions of a hierarchy of race, gender, and social class. It’s assumed that some should be behind others in the natural order, a vision derived in part from Louisiana history, with its tradition of populist movements led by politicians like Huey Long (Hochschild, 2016). Within this perspective, social status is determined by how far a person is from the bottom. Being ahead of minorities and immigrants is a matter of entitlement.

Everyone has a hierarchy of values, with some values more central to the belief structure. These might be called deep values or core values. For most, they include family, fairness, and in-group loyalty. For some, they also include racism and sexism, deep-seated beliefs about who belongs where and deserves what. Not all deep values are good or benign. Everyone has a deep story and deep values, and it’s worthwhile for evaluators to reflect on theirs. I’ve discussed mine elsewhere, which began in childhood as a Roosevelt Democrat (House, 2015). Evaluators might locate their story within the population of deep stories. Evaluators grow up in a particular region, social class, and identify with those in their vocation. Having an idea of your position helps understand others and the constellation of perspectives in the evaluation space. An inability or reluctance to understand other perspectives is a common failure of American foreign policy. It can damage evaluations.

            I’m not suggesting that all views are relative or equally good. Some views are better than others because they are more moral and soundly based. An informed view includes a grasp of other views. A moral view takes into account the welfare of others. Even in fragmented societies, there may be room for agreement based on shared deep values. For example, in conducting an evaluation of environmental education in Lake Charles, I would focus on shared deep values. I would be unlikely to change their world view, but I might find agreement about the value of environmental education in specific areas, such as how pollution affects their children long term. They care deeply about their children’s future, and they have a sense of fairness. They want to be fair to their children. The deep value of fairness plays a central role for everyone.

 

Control for Biases to Enhance Fairness and Honesty

 

Being biased means being influenced by things evaluators should not be influenced by. Biases might be technical, like sampling error, or social, like racial and sexual framing, or situational, like conflict of interest, or psychological, like inappropriate anchoring. Whatever the biases, they can result in distorted findings. Stakeholders are justified in seeing biased studies as unfair. Taking care to mitigate bias is critical in conducting fair evaluations. Fortunately, there are analyses about how to protect against different biases (Scriven, 1976, Shadish, Cook, and Campbell, 2002, Kahneman, 2011, House, 2011).

 

Focus on Racial and Sexual Framing

 

Racial framing is a key mechanism through which racist beliefs are perpetuated. They play a huge role in American history (Feagin, 2011, DuBois, 1986). The white racial frame asserts that African Americans are violent, criminal, unintelligent, lazy, and oversexed (Feagin, 2013, p. 101). Whites are superior. Whites are immersed in racial framing in childhood and often act on it unconsciously (House, 2017). The effects are pernicious. Something similar happens with stereotyping females. The tacit nature of racial and sexual framing makes such biases extremely difficult to eradicate. Evaluators should look carefully for social biases in programs and evaluations, including program effects (House, 2017). They should check their own predispositions, those of colleagues, and expect them to check theirs.

 

Be Transparent

 

Trust is at a premium in fragmented societies. Transparency engenders trust; lack of transparency engenders mistrust. In climates of distrust, people imagine bad things are happening. Evaluators should be transparent about what they’re doing. In a Denver bilingual education program beset with years of distrust between the school administration and Latino community, in my role as federal court monitor, I made clear to each group what I was doing. I was open to discussion and recommendations from them. I shared the data we collected and solicited advice as to what to collect next to determine whether the program was being implemented properly. Over time transparency helped establish trust in the evaluation and improved trust among groups. Transparency is no panacea, but it helps.

 

Implementing the Practices

 

The purpose of these practices is to enhance the effectiveness of evaluations and address societal problems. The practices, singly or collectively, could strengthen many evaluations.

Acting as a moral fiduciary would be most useful in projects where there is a steep hierarchy. Inequalities in the larger society are manifested in evaluations. This can occur in large or small, formative or summative evaluations. An early evaluation task would be to carefully consider the hierarchy. Race, gender, and social class are key. The criterion for success is how well the evaluation protects and prioritizes the interests of those least advantaged.

For example, in the “Women Affirming Motherhood” program, two women developed a program to provide pre and postnatal care to minority mothers (Alkin and Christie, 2019). Later, a large NYC foundation offered funding if the program became one of several demonstrating interagency cooperation, a foundation goal. There evolved a steep socioeconomic hierarchy extending from the mothers and infants to the chair of the foundation. This hierarchy included the mothers and infants, those working directly with the mothers and infants, the program founders, the coordinating agencies, the foundation officers, and the foundation board. The hierarchy itself presented problems of understanding and communication.

In such a hierarchy, the interests of the mothers and infants might well be neglected. Those in each position have different perspectives, values, and interests, as well as age, race, gender, and class differences. Since the program was developing, I proposed a formative evaluation focused on helping the mothers and infants foremost, those who were least advantaged. A central evaluation focus was to determine whether they were getting the services they needed. Even with the best intentions, their welfare might be overlooked amidst the micropolitics and machinations of a large endeavor. We have seen the value of patient advocates who help patients in large hospitals. I envisioned the program founders, mothers, and core workers as prime audiences for the evaluation since they had most effect on the infants and most urgent need for information. (For details about how this might play out, see House, 2019).

Cultivating cognitive empathy can be complementary to any evaluation strategy, large or small, perhaps more useful in formative studies. There are several ways to accomplish this, depending on evaluation resources. The most likely path is for the evaluator to grasp other viewpoints through extensive discussions. Cognitive empathy should make a difference in how the evaluation is designed and received. If not, the evaluator isn’t delving deeply enough. Attaining cultural empathy can be built into the evaluation design by including regular interactions with stakeholders as on-going data collection in the study.

Focusing on deep stories and deep values provides a sense of direction about what to look for. What are these people about? What do they want? How do they see the world? There is a logic to their actions from their perspective. Some glimpse of this logic is invaluable in designing and carrying out an effective evaluation. Evaluators don’t need polished studies, such as those constructed by sociologists like Hochschild (2016), Kramer (2016), and Desmond (2016). Generally, the more interactions evaluators have with stakeholders, the better the possibility of understanding.

There are a few deep values that bind us as a community, including family, fairness, and in-group loyalty, as well as sexual and racial attitudes. How these are woven into deep stories marks us and leads us to act in certain ways. Since people share deep values, there is usually room for some cooperation, even amidst fragmentation.

Controlling for biases is obligatory for all evaluations. By biases I mean mistakes that can invalidate the study. If an evaluator draws an inappropriate sample, employs the wrong statistic, or uses racial or sexual stereotypes, observers might rightfully claim that such a study is biased, unfair, and even invalid. Over decades evaluators and others have assembled lists of actions and conditions that can cause people to arrive at incorrect findings in certain conditions. We call these biases. Knowledge of biases is core knowledge in evaluation. All biases have been demonstrated empirically to cause errors when drawing conclusions.

How transparent should evaluators be? There should be no secret side deals with various stakeholders. Even if innocuous, these destroy trust. Explaining the study may be enough, but if some stakeholders are more interested in particular aspects of the study, evaluators should accommodate them. There is no need to explain every detail. Rather, tell them what they need to know and what they believe they need to know. Interact with them frequently. Keep them informed periodically. Face to face meetings are most useful. People can “read” how credible you are through non-verbal cues. Also, you can learn much important information you never suspected. Transparency establishes trust in times of distrust.

 

Connection to Theory

 

The ethic of the moral fiduciary fits a long tradition of moral philosophy. Rawls (1971) theory of justice as fairness changed the dominant utilitarian conception of justice. With utilitarianism, you could justify fighting inflation by inducing a recession that forced large numbers of workers out of jobs. They would suffer, but the larger society benefited--greatest good for the greatest number—but without regard for how benefits and suffering were distributed within society. In elaborate arguments, Rawls said this was unfair. We should attend to those less advantaged as a moral duty.

Later critics of Rawls said those left out of the decision processes should also have some voice in making decisions that affect them. They could not always rely on decision makers to make the right decisions without being included in the discussions. Deliberative democratic advocates recommended inclusion, discussion, and deliberation of stakeholders in decision making and evaluation (Gutmann and Thompson, 1996, House and Howe, 1999).

However, democratic participation alone doesn’t seem potent enough in a society so unequal. Evaluators should take stronger actions to protect the interests of those less advantaged. Acting as moral fiduciaries is one way to help. Instead of balancing the interests of different stakeholders equally or putting the interests of sponsors first, evaluators might give priority to those who need help most. The core value is fairness, fairness in both society and evaluation.

 

References

 

Alkin, M. and C. A. Christie. (Eds.). (2017) Theorists’ Models in Action. New Directions in Evaluation, 163, Fall, 161-72.

Bebchuk, L and Fried, J. (2004). Pay without performance. Cambridge: Harvard University Press.

Campbell, R., Shaw, J., and Fehler-Cabral. (2015). Shelving justice: The discovery of thousands of untested rape kits in Detroit. City and Community, 14:2, June, 151-166.

Cramer, K. (2016). The politics of resentment. Chicago: University of Chicago Press.

Desmond, M. (2016). Evicted. New York: Crown.

DuBois, W. E. B. (1986). Writings. Library of America. New York.

Feagin, J. R. (2013). The white racial frame. (2nd ed.). New York: Routledge.

Guttmann, A. and Thompson, D. Democracy and disagreement. Cambridge MA: Belknap.

Hacker, J. S. and Pierson, P. (2020). Let them eat tweets. New York: Liveright.

Hochschild, A. R. (2016). Strangers in their own land. New York: New Press.

House, E. R. and Howe, K. R. (1999). Values in evaluation and social research. Thousand Oaks, CA: Sage.

House, E. R. (2011). Conflict of interest and Campbellian validity. New Directions for Evaluation, 2011, 69-80.

House, E. R. (2015). Evaluating: Values, biases, and practical wisdom Charlotte NC: Information Age.

House, E. R. (2017). Evaluation and the framing of race. American Journal of Evaluation, 38(2), 167-189.

House, E. R. (2019). Evaluation with a focus on justice. In M. Alkin and C. A. Christie. (Eds). Theorists’ Models in Action. New Directions in Evaluation, 163, Fall, 161-72.

House, E. R. (2020). Evaluating in a fragmented society.  Journal of MultiDisciplinary Evaluation. 16(36), 26-36.

Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Strauss, Giroux.

Kennedy, D. M. (1999). Freedom from fear. New York: Oxford University.

Kramer, K, (2016). The politics of resentment. Chicago: University of Chicago.

Krugman, P. (2017). The gilded age. In H. Boushey, J.B. DeLong, and M. Steinbaum (Eds). Beyond Piketty. (2014). Cambridge MA: Harvard University Press. 60-71.

Lessig, L. (2018). America, compromised. Chicago: University of Chicago Press.

Piketty, T. (2014). Capital in the twenty-first century. Cambridge MA: Belknap.

Rawls, J. (1971). A theory of justice. Cambridge MA: Belknap.

Rothstein. B. (2018). How the trust trap perpetuates inequality. Scientific American. 319, 5, 7.

Saez, E. and Zucman, G. (2020). The triumph of injustice. NY: W.W. Norton.

Schwandt, T. A. (2019). Post-normal evaluation? Evaluation. Vol 25(3), 317-329.

Small, M. L. (2019). Rhetoric and social science in a polarized society. 2019 Spencer Lecture, American Educational Research Association, Toronto, Canada, May 22.

Scriven, M. (1976). Evaluation bias and its control. In G. V Glass (Ed.). Evaluation studies review annual. Pp. 119-139. Beverley Hills: Sage.

Shadish, W. R., Cook, T. D., and Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.

Stiglitz, J. E. (2018). A rigged economy. Scientific American, 319:5, 57-61.

Thursday, October 6, 2022

Conflict of Interest and Campbellian Validity

Validity Reconsidered
August 19, 2010

Conflict of Interest and Campbellian Validity

Ernest R. House

 

Abstract: The conflicting interests of evaluators are biasing the findings of some evaluations and experiments, even technically rigorous studies. One improvement would be to emphasize evaluator and investigator conflict of interest threats in conceptions of validity. Such discussions could suggest ways to assess and avoid conflicts of interest. I explore the possibilities in one highly regarded framework, the Campbellian conception of experimental validity. However, all evaluations and experiments, whatever their methods, are vulnerable, and all conceptions of validity should address such threats.

 

When I reviewed the medical research literature a few years ago, I discovered statements in journals that the pharmaceutical drug evaluations under review were biased and that the same studies were high quality (House, 2008). What could this mean? How could the same studies be rigorous and biased simultaneously? What the reviewers seemed to mean was that the studies had randomized assignment and double-blind controls—markers of rigorous design--but were manipulated in other ways to produce positive results. Here are sources of bias in these studies.

Opportunistic choice of comparison (Placebo rather than competitor)

Improper choice of sample (Younger subjects suffer fewer side-effects)

Manipulation of dosages (Higher dosages for sponsor drugs)

Incorrect administration of drugs (E.g., oral instead of injected)

Manipulation of time scales (Chronic use drugs tested for short periods)

Opportunistic outcome selection (Ignoring possible side-effects)

Ignoring actual negative effects

Redefining goals of study after findings to achieve success

Opportunistic data analyses

Opportunistic interpretations (“This drug is now the treatment of choice.”)

Concealing unfavorable data

Control of authorship (Company employees, rather than researchers, writing reports)

Selective publishing (Publishing only positive findings)

Deceptive publishing (Publishing positive findings repeatedly under different authors)

These biases are deliberate. That is, many drug evaluations are being designed and conducted to deliver positive findings (Angell, 2004). The studies are biased even while adhering to rigorous design safeguards. There is a sense in which rigorous methods, the way we normally think of them, are insufficient for claiming the studies are valid. That’s not to say that rigorous methods are not necessary. In these late stage clinical drug trials, there is no doubt that they are.

Drug evaluations are controlled by sponsoring companies for whom positive findings mean billions of dollars in profits. Testing a new drug for the market is risky and expensive. Drug companies bear the cost. And, apparently, leaving the results to honest evaluation is too chancy for some. Factors that strongly influence findings inappropriately include these:

Sponsorship of the study

Terms of the contract

Financial ties

Proprietary ties

Personal ties

Gifts

 

Indeed, vested interests have been shown empirically to influence findings (Als-Nielsen, Chen, Gluud, & Kjaergard, 2003). In spite of such evidence, some researchers claim that they are not influenced by the large sums of money paid by companies, nor are they influenced by holding patents. It does not help their credibility that many conceal their financial ties to companies.

How serious is the problem? In 2008 Senator Grassley held hearings comparing drug company payout records to fees medical researchers say they received. For example, from 2000 to 2007, Charles Nemeroff, an eminent psychiatrist and editor of Psychopharmacology, was chief investigator of a $3.9 million NIH grant that used Glaxo, Smith, Kline drugs. NIH rules require investigators to report income of $10,000 in any year to NIH and for universities to replace such investigators. Nemeroff reported earning $35,000 from Glaxo; actually he earned $960,000 (Harris, 2008). In total, he received $2.8 million from drug companies and failed to report $1.2 million of it. This is a violation of NIH rules. Child psychiatrist Joseph Biderman from Harvard Medical School and his colleague Timothy Wilkens reported receiving several hundred thousand dollars each from drug makers. Actually they earned $1.6 million each. Such researchers serve on panels that approve drugs. Senator Grassley said universities seem incapable of policing these conflicts of interest.

Drug evaluations are not the only manifestation of conflict of interest (COI). Gorman and colleagues conducted analyses showing conflicts of interest in the evaluation of school-based drug and violence prevention programs (Gorman & Conde, 2007; Gorman & Huber, 2009; Gorman, Conde, & Huber, 2007; Gorman, 2005; Gorman, 2002). Evaluator independence was lacking in these studies, and the financial interests of the evaluators were affected by the findings. Studies contained questionable analyses, opportunistic changes in outcome variables, and weaknesses in selection and retention. Based on these studies the programs were placed on the Department of Education’s list of evidence-based model programs. The push to be included on evidence-based recommendation lists has intensified conflicts of interest because the designation is marketable (Gorman, 2006).

In noting that drug use prevention program findings had little validity, Moskowitz (1993) reported the studies suffered from conflicts of interest. Similarly, Eisner (2009) questioned disparities in findings from criminology studies done independently versus those done by evaluators with conflicts of interest. Although there are unresolved issues, it is safe to conclude that conflicts of interest are widespread and worsening (Feinstein, 1988; MacCoun, 1998).

 

Campbell and Stanley’s Conception of Validity

 

One step to deal with conflict of interest threats would be to emphasize them in our conceptions of validity. My example here is Campbell and Stanley’s conception of experimental validity and its later revisions, but these concerns apply to all conceptions of validity. As my colleagues in this volume note, Campbell and Stanley’s typology is a conception of experimental validity, not of evaluation. However, Campbell and Stanley and approaches derived from it are used as the core of many evaluations, and these validity discussions influence evaluations employing experimental methods. Drug studies are both evaluations and experiments. Although I am focused on evaluations, similar considerations apply to experimental studies in which the stakes are high and variously sponsored, as with many environmental, biological, and ecological studies. Experimental studies are vulnerable to investigator conflict of interest, depending on circumstances.

The seminal work on experimental design is Campbell and Stanley (1963), followed by Cook and Campbell (1979) and Shadish, Cook, and Campbell (2002). I start with the original conception to illustrate that conceptions of validity are products of their time, that validity conceptions take shape from the situation, and that situations change. Here are the threats to validity enumerated in Campbell and Stanley (i.e., other effects confounded with treatment effects).

History—Events occurring between first and second measurements

Maturation--Processes occurring within respondents

Testing—Effects of test taking on later testing

Instrumentation—Changes in instruments, observers, and scorers

Statistical regression—Operates where groups are selected on extreme scores

Biases--Differential selection of respondents for comparison groups

Experimental mortality—Differential loss of respondents from groups

Selection-maturation interactions

Reactive or interactive effects of testing

Interactive effects of selection biases and experimental variables

Reactive effects of experimental treatment precluding generalization

Multiple treatment interference

These threats are not deliberate. They are accidents and artifacts of experimental conditions. Colleagues inform me that Campbell was aware of potential investigator biases: “…scientists are thoroughly human beings: greedily ambitious, competitive, unscrupulous self-interested, clique-partisan, biased by tradition and cultural misunderstandings…. (Campbell, 1984, p. 31). He goes on: “The complete sociology of applied-science validity…would take into account environmental impacts on commitments to validity which applied science careers involve” (Campbell, 1984, p. 41).

Why then wasn’t conflict of interest included in the original framework? My view is that there was no reason to include it at the time. Conflicts of interest were not major threats. In the 1950s and 1960s it made good sense to ignore deliberate manipulation of findings. Perhaps we were too naïve, but the belief was that researchers aimed for findings that contributed to theory and practice. What was the point in manipulating findings? In any case, findings were subject to review and replication.

I did not emphasize conflict of interest in my work at the time, nor did most evaluators. The exception is Michael Scriven, who worried about government reporting (Scriven, 1973). In the 1980s the situation changed. Reagan began privatizing and deregulating many functions of government, including research and evaluation, trends which continued in ensuing administrations and led to a different environment.

 

Revised Conceptions

 

The original Campbellian conception of validity was brilliantly conceived, tremendously influential, and a product of its time. Campbell created the internal/external distinction because students mistakenly believed that Fisherian randomization controlled all threats to validity (Shadish, Cook, and Campbell, p.37, Footnote 3). Cook (2004) has explicated changes in conception in Cook and Campbell (1979) and Shadish, Cook, and Campbell (2002).

In Shadish, Cook, and Campbell (2002), the latest version, validity refers to the approximate truth of an inference. According to the authors, inferences from studies invariably involve human judgment and are not absolute or automatically derived. Nor are inferences guaranteed by particular methods or designs. The conception is pragmatic in that it emphasizes ruling out alternative explanations. Some threats can be recognized early and blunted by design controls. Others are not easily anticipated and must be considered post hoc. Researchers can investigate threats by asking how they apply in particular cases, whether they are plausible, and whether they operate in the same direction as program effects. Threats change over time, and their saliency varies from study to study. Lists of threats serve as useful heuristics for investigators, though such lists are not complete. I find this conception quite reasonable, without delving into the sub-categories of statistical, internal, construct, and external validity. It is admirably flexible and allows for changing times.

The revised conception also fits my notion of how validity is determined generally. Determining validity requires evaluating the study itself. In analyzing the logic of evaluative argument, I suggested that evaluations consist of arguments that piece together strands of information, quantitative and qualitative, general and particular (House, 1977). Cronbach used this idea to recast standardized test validity, which at that time consisted of establishing correlations among test scores (Cronbach, 1989). He contended that test validation should be based on broad justifying arguments. The idea of validity based on arguments makes sense.

Elsworth (1994) extended the original conception in another direction. Working from Campbell’s relabeling of internal validity as local molar causal validity, Elsworth applied a scientific realist view to validating field studies. For scientific realists social systems are open, and inferences are validated by explanatory power. In social systems unknown intervening causal interactions are always possible and often undetected. Elsworth also construed threats as plausible alternative explanations, with threats countered by design and post hoc analyses involving constructing arguments for and against (Dunn, 1982). Threats can be discovered by examining particular situations. Elsworth’s conception is compatible with Shadish, Cook, Campbell. (See Mark, Henry, & Julnes (2000) for another conception based on scientific realism and Norris (2005) for an overview of the concept of validity.)

Most discussions of validity do not emphasize conflict of interest threats. In Shadish, Cook, and Campbell, motivational threats to construct validity include these:

Self report motivation

Participant perceptions of situation

Experimenter expectancies

Novelty and disruption effects

Compensatory equalization

Compensatory rivalry

Resentful demoralization

These threats result from participant motivations, not investigator motivations, except for experimenter expectations. The latter refers to Rosenthal’s work on how experimenter expectations influence participants, not on how researchers deliberately bias findings. Elsworth included “coercion” of the investigator to perspectives of the group studied, addressing the danger of “going native,” but not evaluator motivation to change findings because of self-interest. Remedies presented by Shadish, Cook, and Campbell and by Elsworth include placebos, masking procedures, less obvious measures, delayed outcome measures, multiple experimenters, and minimizing contacts between researchers and participants.

Conflicts of interest are different kinds of threats that arise from the objective situation. They cause investigators to behave differently. Conflicts of interest require bringing the investigators themselves into the validity analysis. Some might argue that these threats should be handled outside validity frameworks, but that would leave these schema ignoring serious threats. To recite Campbell’s concern about Fisherian randomization, the omission might lead students and others to think all threats are controlled when they are not. Others might argue that these threats can be handled solely through technical controls, but this remedy has not worked well in drug studies. A better solution is to address conflicts of interest directly. Judgments of validity should consider not only technical safeguards, but also the conflicts of interest of investigators.

Including Conflict of Interest Threats

How might such threats be included in the Shadish, Cook, and Campbell conception of validity? The framework is an individual-psychological model of causal inference that posits investigators drawing inferences backed by evidence. Right now, it does not emphasize conflict of interest. As different kinds of threat, evaluator conflicts of interest can adversely affect anything in the evaluation, as the drug studies demonstrate, and be manifested in all four validity types. For example, drawing inappropriate conclusions might threaten external, internal, or construct validity. Tracking all such possibilities would require enormous effort.

Another way to address such threats would be to discuss whether the evaluator/investigator is without conflict at the beginning, to handle conflict of interests or its absence as an explicit assumption early on. Conflict of interest threats could be addressed at the front edge of the framework, so to speak, as an assumption that might be violated. That would keep the four validity types (statistical conclusion, internal, construct, and external) intact. By modifying the front part of the causal inference model, one could discuss conflicts of interest, how they can be discovered, and remedies for dealing with them. In other words, this is what happens when a key assumption is violated and how to remedy it.

Such an expansion of the conception makes sense theoretically by placing the assumption where it fits the causal inference model, and practically by treating conflict of interest threats succinctly without disturbing other components. (There might be other assumptions.) Cook (2004) noted that the Campbellian schema has sometimes valued practical fixes at the expense of building inconsistencies into major concepts. In this case there is no need for theoretical inconsistency. The proposed revision fits the causal inference model.

In my opinion, changes to the schema should be done by those who have deep tacit knowledge of the framework. Nonetheless (prodded by editors), I would enumerate conflict of interest threats and inquire if the evaluator/investigator harbors such conflicts. The relationship of the evaluator to sponsors and program is central. If conflicts of interests do obtain, I would ask if the study should proceed, or if underway or completed, whether the study should receive close independent scrutiny. I would explore remedies developed by those who have wrestled with these problems, such as the FDA, the medical journals, and the medical schools. Of course, these are suggestions. There may be better ways to handle such threats. The important point is to handle them somewhere in the framework.

 

Remedies

 

Strategies for dealing with conflicts of interest include transparency, oversight, and organization.

Transparency. Revealing conflicts of interest is critical. When participants serve on study panels for the National Academy of Sciences, they serve unpaid, a requirement set down by Abraham Lincoln when he founded the Academy (Feuer, 2009). Lincoln stipulated that the organization should serve the public and not merely enhance professional status. Panel members are asked to reveal potential conflicts of interest that might impair the task at hand. These written statements are reviewed by Academy staff and discussed by fellow panel members to determine how serious the conflicts might be.

For example, in a review of NASA’s education programs, some panellists, particularly those with previous NASA affiliations, might have conflicts. One left the panel when he decided to bid on a NASA contract. Revealing such possibilities in writing is not an odious requirement, and similar safeguards would not be asking too much of evaluators. Transparency—revealing potential conflicts of interest--might be required for evaluators conducting studies, with written declarations part of the record.

Another transparency tactic would be to make data and methods accessible for criticism. Although transparency seems critical in any scientific endeavour, it is not always practiced. Indeed, drug companies often do not make data available, though formally required to do so.

Oversight. Revealing conflicts of interest voluntarily is not enough. For example, the credibility of NAS reviews is reinforced by oversight and organizational arrangements. First, the review is performed by a presumably impartial scientific organization. Second, panel members are chosen for expertise, and the panel is balanced with members representing different viewpoints. Third, the panel is given a formal charge, like a jury; most panellists take such charges seriously. Fourth, part of the charge is that panel members agree on a consensus report, if possible. This requirement forces members to engage in extended discussions and arguments. Serious deliberation leads to better informed and more impartial findings (if done properly). Finally, the draft report is subject to external reviews.

The FDA requires advisory panel members to reveal conflicts of interest in writing, and these declarations are assessed by FDA staff (Committee on the Assessment of US Drug Safety System, 2006). Panel members who receive $50,000 or more from a company with a product at issue are not allowed to serve. Those who receive less can serve, but are not allowed to vote on drug approval. Vioxx would not have been returned to the marketplace if this rule had been in effect (Harris, 2007). Ten of 32 advisers voting for approval would have been disqualified.

            Some medical schools have restricted gifts to students and faculty (Moynihan, 2003). A recent study found medical students were influenced favorably toward products by simple gifts like coffee mugs and pens with the drug’s name on them, even while believing they were not susceptible to influence (Nagourney, 2009). The American Medical Student Association endorses stricter guidelines such as limiting visits of drug representatives, restricting events sponsored by industry, and limiting journals accepting advertising (Moynihan, 2003).

Until recently, medical research journals regarded themselves as neutral conduits for scientific information, but several incidents have led to reassessment. The New England Journal of Medicine retracted a key article about Vioxx because Merck failed to include negative data about cardiovascular risk, though the missing data were reported to the FDA (Armstrong, 2006, Zimmerman & Tomsho, 2005). Eventually the drug was recalled. Missteps led the journal editors not to challenge the original manuscript. Drug advertising contributes significant journal revenue. The journal received $88 million in publishing revenue in 2005 and $697,000 from Merck for reprints of the Vioxx article.

Journal authors are asked to reveal financial ties and sign statements that they wrote the article. Company employees have been ghostwriting articles and having MDs sign (Singer, 2009). Journals also are requesting that all studies be registered in an NIH database so they can track studies never reported. Sometimes positive findings for drugs are published by different authors in different journals without cross-reference, as if these were separate studies. Without access to all studies, reviewers are likely to base recommendations on biased evidence unknowingly (Melander, Ahlqvist-Rastad, Meijer, & Beermann, 2003).

 

Organization. Often conflicts of interest arise from the organization of the evaluation enterprise. For example, drug companies have become involved in every aspect of studies (Bodenheimer, 2000). Companies hire contract research organizations to implement company designs, and these contract organizations are heavily dependent on the firms for funding (Bodenheimer, 2006). In the 1990s, the FDA began accepting user fees for clinical trials and negotiating with companies about how to do reviews, giving the industry leverage over the agency (Mathews, 2006).

            The obstacles to honest evaluation are less obvious in the internal organization of the FDA (Committee on Assessment of the US Drug Safety System, 2006). Two offices inside are responsible for evaluating drugs. One oversees clinical trials, while the other tracks long-term effects of drugs already on the market. Clinical trials receive most funding and authority because they make or break success initially. Also, the clinical trial office employs higher status research methods while the follow up office uses epidemiological methods to track effects.

The thousands of drugs on the market could all interact with each other. No clinical trial, no matter how huge, could anticipate interactions across all patients. Clinical trials necessarily focus on efficacy, not drug safety. An Institute of Medicine review recommended more funding and authority for the follow-up office, which is seriously underfunded. (Committee on Assessment of the US Drug Safety System, 2006). These problems reside in the organization and interaction of the industry with its regulators.

            Some time ago Scriven (1976) posited three principles of organizational bias control. The principle of independent feedback is that no unit should rely entirely on a subunit for evaluative information about that subunit. The second principle, the instability of independence, is that independence is fleeting and subject to compromise. Hence, there must be arrangements for renewal and replacement as evaluators become co-opted. The third principle, dynamic equilibrium, is that although there are no totally unbiased evaluators, there are arrangements that can reduce the influence of the most damaging biases. “…independence is essential, impermanent, and situational” (Scriven, 1976, p. 139).

Scriven’s clarification of the concept of “bias” is also worth reiterating. Sometimes bias refers to the systematic tendency to make errors and sometimes to actually making the errors. The former meaning is crucial in maintaining the credibility of evaluations while the latter affects validity. We need both credibility and validity. Conflicts of interest increase the tendency to commit errors. In spite of this evaluators sometimes conduct valid evaluations even when findings weigh against their own material interests. Unfortunately, the track record is not encouraging. By reducing conflict of interest--the potential for bias that gives rise to actual bias--we can improve both credibility and validity.

 

Summary

 

Although drug evaluations are technically rigorous, some are manipulated to produce favorable findings. Evaluator and investigator conflicts of interest are becoming increasingly serious and widespread. Our conceptions of validity do not emphasize conflict of interest threats, including Campbell and Stanley’s conception of experimental validity and its later revisions. Including conflict of interest threats in validity frameworks is important—best to address such threats directly so everyone will be alerted. Such threats could be addressed in various places within the Campellian framework. One reason the tradition has endured nearly fifty years is that it has long prized practical responses to new problems. Campbell noted, “A validity producing social system of science is nothing we should take for granted” (Campbell, 1984, p. 32). Of course, other validity conceptions should also address these threats.

 

 

           

References

 

Als-Nielsen, B. Chen, W., Gluud, C., & Kjaergard, L. L. (2003). Association of funding and conclusions in randomized drug trials. Journal of the American Medical Association, 290: 7, 921

Angell, M. (2004). The truth about the drug companies. New York: Random House.

Armstrong, D. (2006). How the New England Journal missed warning signs on Vioxx. Wall Street Journal, May 15,

Bodenheimer, T. (2000). Uneasy alliance: Clinical investigators and the pharmaceutical industry. New England Journal of Medicine. 342:20, 1539-1544

Campbell, D. T. & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago: Rand McNally.

Campbell, D. T. (1984). Can we be scientific in applied social science? In R. F. Connor, D. G. Altman, C. Jackson (Eds.). Evaluation Studies Review Annual. Sage: Beverly Hills, CA.

Committee on the Assessment of the US Drug Safety System. A. Baciu, K. Stratton, & S. P. Burke (Eds.). (2006). The future of drug safety: Promoting and protecting the health of the public. Institute of Medicine. Washington, DC: National Academies Press.

Cook, T. D. (2004). Causal generalization. In M. C. Alkin (Ed.) Evaluation Roots. 88-113. Sage: Thousand Oaks, CA.

Cook, T. D. & Campbell, D. T. (1979). Quasi-experimentation: Designs and analysis issues for field settings. Chicago: Rand McNally.

Cronbach, L. J. (1989). Construct validation after thirty years. In Intelligence: Measurement, Theory, and Politics, Robert E. Linn, Ed. Urbana, IL: University of Illinois Press. 147-171.

Dunn, W.N. 1982. Reforms as arguments. Knowledge: Creation, Diffusion, Utilization 3:293-326.

Eisner, M. (2009). No effects in independent trials: can we reject the cynical view? Journal of Experimental Criminology. 5:163-183.

Elsworth, G. R. (1994). Arguing challenges to validity in field research. Knowledge: Creation, Diffusion, Utilization, 15 (3), 321-343.

Feinstein, A. R. (1988). Fraud, distortion, delusion, and consensus: The problems of human and natural deception in epidemiologic science. American Journal of Medicine, 84:475-78.

Feuer, M. (2009). Science advice as procedural rationality: Reflections on the National Research Council. Centre for the Philosophy of Natural and Social Science. London: London School of Economics.

Gorman, D. M. (2006). Conflicts of interest in the evaluation and dissemination of drug use prevention programs. In J. Kleinig & S. Einstein (eds). Intervening in Drug Use: Ethical Challenges. Huntsville TX: Sam Houston State University. 171-187.

Gorman, D. M. (2005). Does measurement dependence explain the effects of the Life Skills Training program on smoking outcomes? Preventive Medicine 40: 479-487.

Gorman, D. M. (2002). Defining and operationalizing ‘research-based’ prevention: a critique (with case studies) of the US Department of Education safe, disciplined, and drug-free schools exemplary programs. Evaluation and Program Planning.

Gorman, D. M. & Conde, E. (2007). Conflict of interest in the evaluation and dissemination of “model” school-based drug and violence programs. Evaluation and Program Planning. 30: 422-429.

Gorman, D. M., Conde, E., & Huber, Jr, J. C. (2007). Drug and Alcohol Review, November, 26: 585-593.

Gorman, D. M. & Huber, J. C. (2009) The social construction of “evidence-based drug prevention programs. Evaluation Review.

Harris, G. (2006). Study condemns F.D.A.’s handling of drug safety. New York Times, September 22, A1.

Harris, G. (2007). F.D.A. limits role of advisers tied to industry. New York Times. March 22,

Harris, G. (2008) Top psychiatrist didn’t report drug makers pay, files show. New York Times, Oct 4, A1.

House, E. R. (2008). Blowback: Consequences of Evaluation for Evaluation, American Journal of Evaluation, 29 (4) 416-426.

House, E. R. (1977) The Logic of Evaluative Argument. Los Angeles: Center for the Study of Evaluation. UCLA.

MacCoun, R. J. (1998). Biases in the interpretation and use of research results. American Review of Psychology. 49: 259-87.

Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation. San Francisco: Jossey-Bass.

Mathews. A. W. (2006). Drug firms use financial clout to push industry agenda at FDA. Wall Street Journal, Sept 1.

Melander, H., Ahlqvist-Rastad, J., Meijer, G., & Beermann, B. (2003). Evidence b(i)ased medicine—selective reporting from studies sponsored by pharmaceutical industry: review of studies in new drug applications. British Medical Journal, 326: 1171-1173, 31 May.

Moynihan, R. (2003). Who pays for the pizza? Redefining the relationships between doctors and drug companies. 2: Disentanglement. British Medical Journal, 326:1193-1196, 31 May.

Moskowitz, J. M. (1993). Why reports of outcome evaluations are often biased or uninterpretable. Evaluation and Program Planning. 16: 1-9.

Nagourney, E. (2009) Small gifts found to influence doctors. New York Times, May 19, D6.

Norris, N. (2005). Validity. In S. Mathison (Ed.) Encyclopedia of Evaluation, Thousand Oaks, CA: Sage, 439-442.

Pollack, A. (2008). Stanford to ban drug makers’ gifts to doctors, even pens. New York Times, September 12, C2.

Scriven, M. (1976). Evaluation bias and its control. In Evaluation studies review annual. G. V Glass, Ed. Beverly Hills, CA: Sage, 119-39.

Singer, N. (2009). Medical papers by ghostwriters pushed therapy. New York Times, August 5, A1, B2.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.

Zimmerman, R. & Tomsho, R. (2005). Medical editor turns activist on drug trials. Wall Street Journal, May 26.

 

Special thanks to G. Elsworth; Gene V Glass; D. Gorman; Steve Lapan and his students; and M. Mark for help in shaping these ideas.

Coherence and Credibility: The Aesthetics of Evaluation

1979 Ernest R. House. (1979). Coherence and Credibility: The Aesthetics of Evaluation, Educational Evaluation and Policy Analy...