Thursday, October 6, 2022

Evaluation’s Conflicted Future

August 15, 2013

Evaluation’s Conflicted Future

Ernest R. House

Abstract. During the past forty years the social context of evaluation has changed. The focus has shifted from concern for the public interest to fostering private interests. As a consequence of privatization and deregulation, conflicts of interest have become pervasive. In the spirit of evaluation as a transdiscipline, what can evaluators learn from other areas to prevent evaluator conflicts of interest that lead to biased studies?
During my career, the social context of evaluation has changed greatly. Evaluation gained its impetus during Lyndon Johnson’s presidency with passage of the Great Society legislation in 1965, which mandated evaluation for some federal programs for the first time. Historians interpreted these initiatives as a continuation of Roosevelt’s New Deal, going back to the reforms of the 1930s. Through the sixties and seventies, the role of evaluation seemed to be to legitimate government initiatives by evaluating them. (“We aren’t sure this will work, but we will evaluate it to see.”).

However, in the eighties Reagan began reversing fifty years of these policies by privatizing, deregulating, and discrediting government endeavors. The private sector could do things better, he maintained. In addition to new government priorities, this decade also saw the rise of conservative “think tanks,” organizations funded with private money to conduct research that furthered conservative causes.

During the Clinton administration, privatization and deregulation continued, including repeal of the Glass-Steagall act that separated riskier investment banking from other banking activities. The government refused to regulate the burgeoning derivatives trade. These decisions led directly to the financial crisis in 2008. Clinton and Gore also tried “reinventing” government by making government the manager rather than the producer of social services. In such a framework, evaluators would supply timely information to managers.

In the new century Bush embarked on more radical privatizing and deregulating initiatives. Attention to private interests, rather than the public interest, became increasingly important. In many areas there was no significant government regulation or oversight. In the education field, privatization, deregulation, and de-professionalization crossed new boundaries. Private foundations and other agents of concentrated wealth sponsored many of these changes.

These powerful social currents also affected evaluation. Conservative think tanks conducted evaluations, including studies biased by the agenda of the sponsors. In some areas, private entities captured the evaluation process itself. For example, as recently as the eighties, evaluations of pharmaceutical drugs were funded by federal agencies and conducted mostly by universities. However, pharmaceutical companies took increasing control and became involved in all aspects of drug evaluations. The result has been an increase in biased studies that produce findings favorable to the sponsored drugs.

There is no reason to believe these powerful influences will not affect social and educational evaluations. The capture of evaluation by its sponsors is the greatest threat the evaluation community has faced for some time. In fact, the credibility of the field is at risk. In the spirit of Scriven’s conception of evaluation as a transdiscipline that provides insights and methods useful across different fields, what might we learn from other areas about threats emanating from conflicts of interest?

Conflicts of Interest in Pharmaceutical Studies

In the evaluation of pharmaceutical drugs, the latest scandal involves Medtronic’s Infuse, a bone growth protein used in a quarter of the 432,000 spine surgeries performed annually. In an unusual action, five spinal surgeons, led by Eugene Carragee of Stanford, challenged the published work of 15 other surgeons who conducted 13 clinical trials evaluating Infuse. In journal articles, the Medtronic-sponsored surgeons had failed to report serious complications resulting from use of Infuse (Meier and Wilson, 2011, Carreyrou and McGinty, 2011). The unreported complications included cancer, sterility, infections, and bone loss in 10% to 50% of patients. The surgeons who failed to report these findings had received $62 million collectively from Medtronic. Often, these surgeons did not reveal their financial ties to Medtronic. The challenging surgeons said, “It harms patients to have biased and corrupted research published. It harms patients to have unaccountable special interests permeate medical research” (Meir and Wilson, 2011, p. B1-2).

This is not an unusual case. Even though most drug evaluations are randomized and double-blinded, there are many ways to bias studies to produce positive findings (House, 2008). Sources of bias include these:

  • Opportunistic choice of comparison (Placebo rather than competitor)
  • Improper choice of sample (Younger subjects suffer fewer side-effects)
  • Manipulation of dosages (Higher dosages for sponsor drugs)
  • Incorrect administration of drugs (E.g., oral instead of injected)
  • Manipulation of time scales (Chronic use drugs tested for short periods)
  • Opportunistic outcome selection (Ignoring possible side-effects)
  • Ignoring actual negative findings
  • Redefining goals of the study after findings to achieve success
  • Opportunistic data analyses
  • Opportunistic interpretations (“This drug is now the treatment of choice.”)
  • Concealing unfavorable data
  • Controlling authorship (Company employees, not researchers, writing reports)
  • Selective publishing (Publishing only positive findings)
  • Deceptive publishing (Publishing positive findings repeatedly under different names)

These biases are deliberate. That is, many drug evaluations are designed and conducted to deliver positive findings (Angell, 2004). The studies are biased even while adhering to some rigorous design standards. There is a sense in which rigorous methods, as we normally think about them, are insufficient. That’s not to say that rigorous methods are not necessary. In these clinical drug trials, they certainly are.

Currently, drug companies have great control over the evaluations. Positive findings can mean billions of dollars in profits, and testing a new drug is risky and expensive. Apparently, leaving the results to honest evaluation is too chancy for some. Factors that influence findings inappropriately include sponsorship of the study, terms of the contract, financial ties, proprietary rights, personal ties, and gifts.

Not surprisingly, vested interests have been shown empirically to influence medical research findings (Als-Nielsen, Chen, Gluud, & Kjaergard, 2003). In one study, even the gift of a free coffee mug with the name of the drug imprinted led medical students to have a favorable view of the drug (Nagourney, 2009). In spite of such evidence, many medical researchers claim that they are not influenced by the large sums of money paid by companies, nor by holding patents on the drugs. It doesn’t help their credibility that many conceal their financial ties to companies.

Conflicts of Interest in the Great Financial Crisis

Evaluative and quasi-evaluative judgments in the financial world are also revealing. The financial event of our time has been the collapse of the markets in 2008, the worse crisis since the 1930s. Conflict of interest was a major cause. In the old days when you wanted a mortgage to buy a house, you went to your local banker or savings and loan officer, who evaluated your credit worthiness, issued you a mortgage, and held the mortgage for the next 30 years. If the loan went bad, the lender suffered a loss. Ensuring that you could handle the loan was in the lender’s interest as well as yours. It was important to evaluate the mortgage applicant accurately.

In the new mortgage system, the lender didn’t keep the loan but sold it to others, who sold it to others. The profit for the local lender was in making loans and passing them on as quickly as possible. Not being stuck with the mortgages long-term, there were fewer incentives for lenders to carefully assess the borrower’s reliability. Mortgage quality deteriorated into “liar” and NINJA loans--borrowers with no income, no job, and no assets who lied to obtain mortgages (Morgenson and Rosner, 2011). (In 2009 after the crisis, I talked to a lender of sub-prime mortgages in Chicago. When asked why these practices were followed, he said, “Among my thirty sales people, I had a young woman in her twenties who had no previous experience with mortgages. The first year she made a million dollars. The second year, she made more.” She had no incentive to care about the borrowers or those buying the mortgages. Nor did her boss.)

But who would buy such risky mortgages? Investment bankers came up with the idea of packaging them with more traditional mortgages, and they developed quantitative models showing the probability of all failing together was low. Why were the bankers keen on promoting such unknown risky securities? Their personal compensation depended on how much they sold. And the bonuses were huge. If the securities failed later, there was no penalty for them. What did they care what happened to the buyers (Roubini and Mihm, 2010)?

The bankers took these securities, called CDOs (collective debt obligations) to the bond rating agencies, Fitch, Moody’s, and Standard and Poor’s. In their assessments, the professional bond raters (bond evaluators, in effect) realized there were serious issues with these securities. But the rating agencies had their own problem. If they gave these securities bad ratings, the bankers could take their business to other rating companies. Plus, the rating agencies ran lucrative consulting contracts with these same banks. They might lose their consulting business as well. There were arguments between the bankers and rating agencies, but, in the end, the rating agencies gave these securities triple “A” ratings, thus falsely assuring investors, like pension funds, municipalities, and financial institutions all over the world, who bought them (Roubini and Mihm, 2010).

The financial debacle might have been limited to the mortgage and housing industries except for one more financial innovation, credit default swaps. These are derivatives that offer insurance against something failing, like collective debt obligations. The swaps aren’t called insurance because if they were, they would have to be regulated, and the insurers would have to offer proof that they could cover their obligations. Derivatives like these are unregulated. The bankers knew they held many toxic securities on their books until they could unload them. AIG, the giant insurance company, said it would sell credit default swaps to insure the CDOs (Stiglitz, 2010).

The banks went a step beyond. Hedge fund manager John Paulson asked Goldman Sachs to create special packages of CDOs that were highly likely to fail by design. Goldman Sachs put together such suspect packages and sold them to their clients, telling the clients these securities were high quality. Both Paulson and Goldman bought credit default swaps against the securities. You can buy swaps insuring securities you don’t actually own. Other banks did the same. When all this came tumbling down, Paulson personally made about two billion dollars from the swaps. AIG went bankrupt (Story, 2011).

While all this was happening, where were the government agencies that were supposed to protect the public? Alan Greenspan headed the Federal Reserve when the housing bubble started followed by Ben Bernanke. There was considerable criticism of subprime mortgages, and the Federal Reserve had the authority to stop these questionable practices. However, Greenspan believed that markets were self-correcting. In his view, bankers would not engage in practices that were damaging because that would hurt them too. They were too prudent. After the crash, Greenspan recanted, noting that his beliefs about self-correcting markets had been wrong. In Greenspan’s case, the problem seems to have been his mistaken beliefs about markets rather than conflict of interest.

Not only did the Federal Reserve ignore dangerous lending practices, but earlier when the big investment banks lobbied the federal government to repeal the Glass-Steagall act during the Clinton administration, Greenspan had been in favor of repeal and also of allowing banks to increase their leverage from about 15 to 30 or 40 to one (Johnson and Kwak , 2010). This increased leverage was another reason some banks collapsed eventually. Clinton’s Treasury Secretary Robert Rubin, who came from Goldman Sachs and returned to Wall Street later, championed these changes inside the administration. The Clinton finance team, which included Laurence Summers and Timothy Geithner, protégés of Rubin, also strongly opposed regulating derivatives like CDOs and CDSs (Hirsh, 2010). Banks make ten times the money on unregulated derivatives than regulated ones.

When the big crash came, AIG couldn’t afford to pay the money to the banks and hedge funds that their swaps obligations required. The insurer had seriously miscalculated. (I don’t know the cause of this error. I have yet to see a definitive account of the events inside AIG.) Normally, when a firm goes bankrupt, its creditors receive a fraction of the money owed them, typically twenty or thirty cents on the dollar. In this case, the US Treasury and New York Federal Reserve came to the rescue. They took the unusual step of bailing out AIG’s debts and paying out 100 percent of what AIG owed to banks like Goldman Sacks and the hedge funds, though the government had no obligation to do so. At the time, the Treasury Secretary was Henry Paulson, former head of Goldman Sachs, and the head of the New York Federal Reserve was Timothy Geithner. So far, this unusual bailout has cost US taxpayers $180 billion dollars (Stiglitz, 2010).

What about the oversight committees in Congress? Unfortunately, they were not as vigilant as they might have been. A sense of the problem is indicated by a study showing the investments of members of the House of Representatives outdid the stock market averages by 6.8% a year between 1985 and 2001. Those lawmakers investing in industries regulated by committees they served on beat the market by eight percentage points. Senators beat the market by 10.7 percentage points (Stryjewski, 2011). These are phenomenal investment returns. The best investment firms in the country can’t attain such results. Either these members of Congress were financial geniuses or they might have traded on inside information gleaned from their privileged positions.

I could discuss the actions of other key actors and agencies like Fannie Mae and the SEC, but you get the picture. In summary, conflicts of interest biased evaluative judgments at several levels of the financial system and contributed heavily to the crisis. Of the ten or so evaluative judgments embedded in this scenario, at least eight were seriously biased. (For example, the woman evaluating mortgage applicants, her boss evaluating her performance, the bankers’ quantitative models assessing CDO risks, the evaluation of policy changes during the Clinton administration, and so on’) Unfortunately, in spite of much talk, there has been little reform, mostly because of Wall Street influence in Washington. The face value of derivatives currently outstanding is ten times the global domestic product, and they are still largely unregulated. The lack of reform in the aftermath of the 2008 crisis contrasts sharply with the 1930s when the Roosevelt administration established financial safeguards that worked for decades.

Preventive Strategies

What lessons can we learn from these events for social and educational evaluation? Conflicts of interest are different kinds of threats; they are powerful and pernicious. They can cause evaluators and others making evaluative judgments to behave differently. By conflicts of interest I mean situations where evaluators have their own material interests weighing significantly on one side of the outcome. Traditional research methods alone are insufficient to control such biases. A better approach to diminishing such threats is to address them directly, to bring the situation of the evaluators into the analysis of potentially biasing factors. Scriven’s (1976) clarification of the concept of “bias” is useful here. Sometimes bias refers to the systematic tendency to make errors and sometimes to actually making the errors. The first meaning is crucial in maintaining credibility while the second affects validity. Conflicts of interest damage credibility and increase the tendency to commit errors. By reducing conflicts of interest--the potential for bias that often gives rise to actual bias--we can improve both credibility and validity.

Strategies for combating conflicts of interest include transparency, oversight, and organization (House, 2011). To begin with, we should discuss such threats more extensively in our training programs and theories. The biasing effects are larger and more pervasive than other sources of bias, and they don’t receive enough attention. Doing so does require examining our own behavior and that of our colleagues.

We could require evaluators to reveal sponsors and potential conflicts in writing. The National Academy of Science requires such statements for panel members on its committees, and so does the FDA. Journals could ask authors of submitted manuscripts to reveal sponsors, funding, and potential conflicts. Some medical journals now require such information. Knowing about such ties would help editors, reviewers, and readers assess the articles.

We might also ask evaluators to make their data available so it can be examined and re-analyzed, especially in situations where there might be conflicts. Drug companies are required to do so by the FDA, though they don’t always comply. Independent reassessments have resulted in changed recommendations. For example, examining data from the Medtronic Infuse studies enabled critics to raise concerns. A surprising (and hopeful) development is that Medtronic has given Yale a grant to review the Infuse studies using the original evaluation data, the first time a medical device company has turned over its data to independent experts (Meier, 2011).

In addition to transparency and oversight, the organization of the evaluation enterprise is critical. For example, the FDA has two internal groups that evaluate drugs. One is in control of field trials, and this group is well empowered and funded. The other group tracks drug effects over long periods. No matter how well the field trials are conducted, there is no way the trials can assess the interactions of hundreds of drugs on diverse populations. The task is overwhelming. A thorough evaluation requires tracking effects. Unfortunately, the tracking office is underfunded and sometimes ignored, according to an Institute of Medicine report (Committee on the Assessment of the US Drug Safety System, 2006). There is a subtle bias within the organizational structure downplaying long-term negative effects. After all, it’s not in the interest of drug companies to pursue this information.

In 1976, Scriven (1976) posited three principles of organizational bias control. The principle of independent feedback is that no unit should rely entirely on a subunit for evaluative information about that subunit. The second principle, the instability of independence, is that independence is fleeting and subject to compromise. Hence, there must be arrangements for renewal and replacement as evaluators become co-opted. The third principle of dynamic equilibrium stipulates that although there are no totally unbiased evaluators, there are arrangements that can reduce the influence of the most damaging biases. “… independence is essential, impermanent, and situational” (Scriven, 1976, p. 139). These principles seem even more relevant today.

For example, it’s noteworthy that during the build up to the financial crisis, there were some brave dissenters in Washington who objected to what was happening. For the most part, they were ignored or silenced. Only those with an independent base like the Congressional Budget Office could maintain their critiques in the face of intense pressures (Morgenson and Rosner, 2011).

We could also formulate guidelines for when and under what conditions evaluators should take on studies in which they have potential conflicts. We could recommend that evaluations be conducted by organizations not under control of sponsors. For example, drug companies design studies and sub-contract them to smaller companies that are totally dependent on drug company funding. These companies often enlist individual physicians to carry out the work, increasingly in third world countries. This is not a good arrangement for producing valid results. Admittedly, the degree of independence is not always easy to assess (Kaiser and Brass, 2010).

Another idea would be to review evaluations that are not otherwise subject to scholarly review. Think tanks typically report findings claiming scientific authority. Kevin Welner at Colorado has developed procedures for critiquing educational policy studies as soon as they emerge and providing these independent critiques to the media. We could do something similar for high profile evaluations. We should subject important evaluations with conflicts of interest to close scrutiny. All of this requires action by the organized evaluation profession. Ultimately, we have to exercise serious professional review. We are past the era of stating standards for evaluators to follow and hoping for the best. The countervailing pressures are too strong.

Conclusion

Finally, why has the American social context changed? Why are we beset with pervasive conflicts of interest throughout society? In addition to drug evaluations and finance, I could have discussed education, military contracting, and many other areas. Some would say it’s only human nature to become embroiled in conflicts of interest. No doubt, just as it’s human nature to commit fraud, but crime rates vary significantly from place to place and time to time. Other factors must be in play.

For what it’s worth, my sense of an explanation would lead towards changes in the governing class, to changes in the situation, composition, beliefs, and behavior of the political and financial leadership. We can identify decisions that led to pervasive conflicts of interest, that allowed drug companies to have a greater role in drug evaluations, and that removed long-standing financial safeguards. Since conflicts of interest account for bad evaluative judgments across so many areas, I would look for conflicts of interest within the leadership class itself.

This overall analysis supports Scriven’s conception of evaluation as a transdiscipline, that is, an alpha discipline useful in helping sort out biases in evaluative judgments across other disciplines. After all, if similar forces, factors, and causes operate in different areas, similar insights and methods might apply. (That’s not to say that other disciplines would welcome such advice.) And, beyond that, perhaps, evaluation as a paradigm discipline for the social sciences. Dare one imagine a value-based discipline of economics in which mainstream economists would challenge decisions like those leading to the great financial debacle? Of course, such a grand vision for evaluation presumes that the field of evaluation itself can maintain its own honesty and integrity by withstanding the corrupting forces it seeks to help others with. That will be no easy task.

References

Angell, M. (2004). The truth about the drug companies. New York: Random House.

Als-Nielson, B., Chen, W., Gluud, C., & Kjaergard, L. L. (2003). Association of funding and conclusions in randomized drug trials. Journal of the American Medical Association, 290(7), 921.

Carreyrou, J. and McGinty, T. (2011, June 24). Medtronic surgeons held back, study says. Wall Street Journal, pp. B1, B2.

Committee on the Assessment of the US Drug Safety System. A. Baciu, K. Stratton, & S. P. Burke (Eds.). (2006). The future of drug safety: Promoting and protecting the health of the public. Washington DC: National Academies Press.

Hirsh, M. (2010). Capital offense: How Washington’s wise men turned America’s future over to Wall Street. New York: Wiley.

House, E. R. (2008). Blowback: Consequences of Evaluation for Evaluation, American Journal of Evaluation, 29, 4, pp. 416-426.

House, E. R. (2011). Conflict of interest and Campbellian validity. In H. T. Chen, S. I. Donaldson, and M. M. Marks, Eds. Advancing validity in outcome evaluations, New Directions in Evaluation, 130, pp. 69-80.

Johnson, S. & Kwak, J. (2011). 13 Bankers. New York: Vintage.

Kaiser, F. M. & Brass, C. J. (2010). Independent Evaluators of Federal Programs: Approaches, devices, and examples. Congressional Research Service. R41337.

Meier, B. (2011, August 4). Medtronic giving Yale grant to review bone growth data. New York Times, p. B7.

Meier, B. and Wilson, D. (2011, June 29). Spine experts repudiate Medtronic studies. New York Times, p. B1, B2.

Morgenson, G. & Rosner, J. (2011). Reckless endangerment. New York: Henry Holt.

Nagourney, E. (2009, May 19). Small gifts found to influence doctors. New York Times, p. D6.

Roubini, N. and Mihm, S. (2010). Crisis economics: A crash course in the future of finance. New York: Penguin.

Scriven, M. (1976). Evaluation bias and its control. In G. V Glass (Ed.). Evaluation studies review annual (pp. 119-139). Beverly Hills, CA: Sage.

Stiglitz, J. E. (2010) Freefall: America, free markets, and the sinking of the world economy. New York: W. W. Norton.

Story, L. (2011, August 18). U. S. eyes S. & P. ratings of mortgages. New York Times. p. A1, B10.

Stryjewski, L. (2011, June 13). Psst! The ultimate insiders? Barrons, p.10.

No comments:

Post a Comment

Coherence and Credibility: The Aesthetics of Evaluation

1979 Ernest R. House. (1979). Coherence and Credibility: The Aesthetics of Evaluation, Educational Evaluation and Policy Analy...