Ernest R. House Archives: The Oral History of Evaluation: The Professional Development of Ernest House

2011

The Oral History of Evaluation:

The Professional Development of Ernest House

Robin Lin Miller, Jean King, Melvin Mark,

Valerie Caracelli, and The Oral History Project Team

Since 2003, the Oral History Project Team has conducted interviews with individuals who have

made particularly noteworthy contributions to the theory and practice of evaluation. In 2011, Mel

Mark, Robin Miller, and Miles McNall sat with Ernest (Ernie) House in Anaheim, CA, at the site

of the American Evaluation Association (AEA) annual conference. The interview was taped and

subsequently transcribed verbatim into a 20-page document. Working with House, we edited the

transcript for clarity and length. House reviewed and approved the article prior to its submission

to American Journal of Evaluation as well as the final version following revisions made in response

to editorial feedback.

Ernest R. House is Professor Emeritus at the University of Colorado, Boulder, where he was

Professor of Education from 1985 to 2001. He was on the faculty of the University of Illinois,

Urbana, where he was affiliated with the Center for Instructional Research and Curriculum Evaluation

from 1969 to 1985. House has been a visiting scholar at numerous universities nationally

and internationally, and at the Center for Advanced Study in the Behavioral Sciences.

House is probably best known for his writing and practice regarding the inclusion of values in

evaluation. In an article published in 1976 in the first edition of Evaluation Studies Review Annual,

House drew on a major work in philosophy, John Rawls’s A Theory of Justice. In that article and

in other publications, House advocated for evaluators to incorporate social justice concerns in

their work, and particularly to consider a program’s consequences from the vantage of the least

well-off. More recently, with colleague Ken Howe, House has written a book on deliberative democratic

evaluation. In this work, House and Howe argue against traditional stances on the so-called

fact-value dichotomy. Instead, they argue that through careful processes involving relevant stakeholder

groups, evaluators can achieve relatively unbiased value judgments. Consider one indication

of the importance of these two approaches to values in evaluation: House is the only person to

have been listed in two places on any single version of Christie and Alkin’s (2008) evaluation

‘‘theory tree’’ (both on the valuing branch). House is the author of many other books, articles,

chapters, and reports. In addition to his work as a practicing evaluator, he has been called upon

to conduct meta-analyses of several major evaluations. Among House’s awards is AEA’s Lazarsfeld

Award for evaluation theory.

In this contribution to the oral history project, House describes his journey from English major

to evaluator and evaluation theorist. He talks about his first evaluation project, as well as the major

evaluation figures he worked with early in his career. House notes an important lesson from his

early experience as an evaluator, that is, ‘‘just how political evaluation was.’’ He explains that

this realization helped lead to his interest in values. After talking about a wide range of topics,

including meta-evaluation, randomized trials, the qualitative–quantitative debate, and the risk

of conflicts of interest in evaluation, House comes back to his early experiences and their role

in his focus on social justice.

The interview includes references to numerous individuals with whom House has worked over

the years. In the case of people outside the evaluation community, such as program and agency

staff involved in an evaluation, a brief explanation of their roles is given. Many of the individuals

to whom House refers are luminaries in the history of evaluation. These include Egon Guba,

Michael Scriven, Bob Stake, and Dan Stufflebeam, with whom House interacted early in his career

at the University of Illinois, as well as Don Campbell and Lee Cronbach, who made important contributions to social science methods as well as evaluation theory and practice. Their influential

work is described variously in Alkin (2004, 2013), Shadish, Cook, and Leviton (1991), other

entries in this oral history project, evaluation textbooks, and in their own many publications.

House also refers to Gene Glass, a research methods and statistics expert, perhaps best known for

early contributions to the practice and methods of meta-analysis; Lloyd Humphries, a noted individual

difference psychologist and measurement specialist; Barry MacDonald, a British evaluator

who emphasized democratic participation in evaluation; Roy Bhaskar, a British philosopher who

espoused a particular approach to realist philosophy; House collaborators Ken Howe, Steve

Lapan, Decker Walker, and Les Maclean; and current evaluation scholars and practitioners Leslie

Cooksy, Valerie Caracelli, and Gary Henry.

Interview With Ernie House

Mel: You were an English major at Washington University in St. Louis. How did you get from that as

your undergraduate interest to evaluation?

Ernie: I started in Engineering, but found it not that interesting. And then I decided to be an English

scholar. When I started going through all these books in the Washington University library in St.

Louis, I found these books that hadn’t been read for 50 years, 70 years. I thought, ‘‘Do I want to

do this?’’ I decided to go into medicine, to become a doctor. I have some science background. After

carrying this cat soaked in formaldehyde around in a bag for a semester, I decided, ‘‘I don’t want to

do this either.’’ I went back to my hometown and thought about what I would do. Within a month

of being drafted, I decided to teach English. This was between the [Korean and Vietnam] wars. You

could get any kind of a job, just about, particularly teaching, and you’d get out of the draft. I started

teaching English in high school. I taught for 4 years. People from the University of Illinois asked me to

teach a new curriculum, semantics and linguistics. I could do that. They liked what I did and asked me

to help write the materials and be the state consultant for the program. After that, the College of

Education there offered me a fellowship with James Gallagher, who was a special education expert

in the area of the gifted. When I finished my degree, they asked me to evaluate the Illinois Gifted Program.

I never intended to be an evaluator.

Mel: I wanted to ask you about that. How did you get chosen at this relatively early career stage to

evaluate that program?

Gallagher was the leading authority in the United States on the gifted. I had a fellowship with

him, and they asked him. Actually, evaluating this program was a political move on the part of David

Jackson, who was the main political actor in the College of Education at Urbana. He wanted the

program evaluated, and so did the state legislature, because gifted programs are not very popular.

Maybe they could show some effectiveness. Evaluation was all new territory. Jackson got the legislature

to provide the money. He asked Gallagher, ‘‘Who would you recommend to do the evaluation?’’

I was one of his students.

Mel: Well, what was the program like? It was a statewide gifted program?

The program had 20 demonstration centers all over the state demonstrating ideal teaching methods,

model teaching methods, new curricula, new math, new science, new English materials, most of them

out of the University of Illinois.

Mel: What did the evaluation look like? This was your first evaluation, right?

Well, I had a half million dollars, which was really a lot of money in 1967 terms and 4 years to

evaluate the program. I did many different kinds of studies. The program had five parts. The experimental

program funded studies. The demonstration program set up and funded the 20 centers. The

reimbursement program provided funds that districts could apply for to adopt these programs. The idea

was that you would do experiments and have proven methods for teaching and dealing with gifted

kids, and then you would have these things demonstrated someplace like Evanston High School.

Teachers would see these practices and put them into practice in Elgin or Alton, Illinois, where I’m

from, or someplace like that. It was a research, development, diffusion model of educational change.

We selected a random sample of school districts, and I tried a survey the first year because I had to

provide something for the state legislature right away. And I found out after I did that, that people lied

like hell on what they were doing when you sent them this form. So I selected a stratified random sample

of school districts in IL that had this program. I sent in trained teams of people to these districts.

The teams spent a couple of days there collecting materials and interviewing people intensively about

what they were actually doing, enough that we could make an assessment of what they were doing.

Sometimes our team would show up and the principal would say, ‘‘Oh, hell, we’re not doing anything,

really. We just wanted the money from the state to do this.’’ That’s the kind of stuff you’d find. We had

a set of instruments, 40 or 50 different kinds of measures and interviews. We developed a special

instrument to measure classroom atmosphere, a classroom climate–type instrument. One of the guys

working for me, Joe Steele, had developed this for his dissertation. We had a lot of different measures.

We didn’t use test scores as the outcome measure. We were looking at many different grade levels.

Mel: Well, what did you draw on in trying to put together this evaluation design at that point?

I read the whole literature in evaluation, which I put into a small cardboard box. I read the whole

thing in 30 days. I mean, that’s all there was, really. I talked to my statistics instructor, who happened

to be Gene Glass. ‘‘Look, I said, I don’t know very much about evaluation.’’ He said, ‘‘Nobody does.’’

He said Stake knows a lot about it and he said put Scriven and Stake on an advisory panel. Scriven had

moved to California and didn’t want to do it. I had Guba, Stufflebeam, Stake, and a couple of other

people. They provided me with advice. I used Bob’s countenance paper as the model (Stake, 1967).

That’s how I laid out the program. I set up a data collection team headed by Steve Lapan. By that time,

Jim Gallagher was the U.S. Deputy Commissioner of Education. I took the plan to him. He looked at it

and said, ‘‘This is a great plan. Take you about 15 years to do this and probably as much money as the

Vietnam War. You’re going to have to cut it down.’’ So I cut it way back.

Mel: What, if any, were the kinds of lessons from that experience that stuck with you?

Well, for one thing, how political evaluation is. And people taking money. At the end of the

evaluation, we had our results, and I had battles with the demonstration centers because by tracking

teachers going to the centers and going back into their district, we found that teachers would go to

the centers and see these model materials. They would say, ‘‘These materials are fantastic.’’ Then

they’d go back home and say, ‘‘My principal will never let me use this.’’ You know? ‘‘My principal

wouldn’t agree to this.’’ And, so you’d have tremendous enthusiasm at the demonstration centers

and everybody assumed that this would translate into practice back when they went back home.

But it didn’t. That was a big fight with the demonstration directors, who by that time had become

a powerful force within the program, the 20 of them. And they fought. They even tried to do their

own study to counter ours. In the end, the demonstration centers did some good. What we found is

that you have to go to work in the person’s district, in the teachers’ districts themselves, to get things

done. In order to implement, you had to do it there. The other thing was that when we got the results

after 4 years, I said to a guy who was a state representative, ‘‘I want this to be used.’’ He said, ‘‘Well,

here’s what you do. There’s this guy. Here’s the guy’s name,’’ I think it was Gene Hoffman. ‘‘He’s

the head of the School Problems Commission. Convince him that you’ve got something and take

it through that way.’’ So I asked the head guy to go out to dinner in Springfield. We drank about

three or four martinis apiece. I told him about the evaluation. At the end, he said, ‘‘Oh, hell, I believe

you.’’ I gave a presentation of our findings to the legislative committees. I don’t know if they

knew what they were looking at, but the program got a 20% increase in their budget because we did

find that the program had a lot of effective parts. The program was restructured based on the work

that we had done. I thought the evaluation was pretty satisfactory. But there were many political

interactions.

Mel: You mentioned an impressive set of people who were involved in this and who you studied with.

Gene Glass, Egon Guba, and Scriven. I think you worked with Stufflebeam and Bob Stake. What was

it like, as this group of people we now look at as evaluation luminaries and you were this young buck

trying to figure out how to do this project and learn this field?

I was 30 years old. And, they were good. Now Gene had gone off to Colorado by that time, so

he didn’t actually serve on the committee. He gave some advice about the program. Bob and the

others were really good. I just said, ‘‘I don’t know what I’m doing. Here’s what I plan to do. What

do you think I should do?’’ It was really—actually it was one of the best advisory groups I’ve ever

served on or been served by.We’d meet once every 3 or 4 months. I had money to bring them in and

pay them, and they were great. Guba and Stufflebeam, Stake, all of them. Dan was still developing

his CIPP (context, input, process, and product) model. Egon was changing his view of things. He had

not yet gone qualitative. They were just really a great group to work with. Intellectually, a very

exciting time. Really, really great time.

Mel: And you knew it?

Oh, I knew it at the time. I knew it was very exciting. This new field and new ideas, a lot of people to

talk to I didn’t know. These were the leading guys in the field, I thought, at the time.

Mel: One of the things you’re probably best known for is addressing the issue of values in evaluation.

What do you think it is that drew you to that as an issue to be addressed?

Thinking about how political evaluation was and thinking, ‘‘Is there anything beyond the politics?

Is there something beyond this other than just who’s got the most power and influence?’’ I started

thinking about value judgments and, of course, Scriven later on had written something about this

(Scriven, 1980). I started reading material on this, and I eventually ended up conceiving this idea

in terms of social justice. Social justice is an important thing to do. John Rawls’ book, A Theory

of Justice came out in the early ’70s. I had the traditional view that most people have when they start

out in social sciences: values are subjective. I think Scriven more than anybody convinced me both

in reading and talking to him personally that value claims can be objective, if we understand what we

mean by that. That is, they can be relatively unbiased. That doesn’t mean some other things that people

often take the word objective to mean. You can have well-founded value claims. I got interested

in the idea of values.

After finishing the gifted programs, I needed something. I thought, ‘‘Is there anything beyond

the politics of this? Is this all politics?’’ I saw this book, A Theory of Justice reviewed in New

York Review of Books, the leading book of our time in terms of social justice. So I bought a copy,

took it to England with me on leave and read the book. From that came my article on justice in

evaluation.

Mel: Did it feel at all dissonant? I mean, Rawls’ book was thick in more ways than one and you were

trying to apply it to this very practice-based field.

Well, I think that was the difficult part. I read the book thoroughly—read it a couple times, and so I

had actually a good conception of it. But then how do you apply it? The application was the hard part.

There were different ways you could think about this, but I didn’t want to take it as the ‘‘original position.’’

The idea would be taking Rawls’ two principles of justice and saying ‘‘we should adhere to

these in evaluation,’’ one of which deals with inequality. Inequality, if it’s allowed, should be to the

benefit of those least advantaged in society. I saw that as being an entre´e into programs for the poor,

the impoverished. The other principle is a rights issue. You shouldn’t violate student rights.

Mel: In a commentary in a handbook, you wrote about the work of several evaluation theorists, and I

want to quote you. ‘‘For their part, the theorists know from experience that the half-formed ideas they

toy with today may form the basis for proposals in evaluation studies tomorrow, or that the ideas may

be adopted by governments and organizations as a basis for policy. Fuzzy ideas of yesterday have a

way of becoming orthodoxy tomorrow.” When I read that I wondered how much is that an autobiographical

statement because there was some skepticism originally, I think, in terms of the response of

many people to your ideas about social justice and Rawls.

Oh, yeah, tremendous disagreement, I mean, just shock. First shock, ‘‘What’s justice got to do with

evaluation?’’ They just didn’t conceive it in the same—those two things live in two different worlds.

You’ve got this science, which is value neutral or value-free, and then you’ve got social justice, which

is connected to values. That’s how a lot of people still conceive the role. Economics is supposed to be

value-free. How evaluation can be value-free is hard to understand. So people were shocked and had

tremendous reactions. People at the university, some of the leading professors of psychology and education,

decided that they didn’t know if I was right. They would have a seminar with me leading the

seminar. There was Lloyd Humphreys. We got together, about 10 or 15 people, and started reading the

Rawls’ book, which they found very tough sledding. And we went through it. They were so concerned

about it that they actually wanted to have this seminar with no students in there. They didn’t want to

make fools of themselves in front of the students. So, you know, we just had this 30-year-old guy, I

guess I was about 35 then, doing this seminar. That was interesting. Lloyd never did agree with me. He

came to terms with it, but he never did buy it. Don Campbell was one of the first guys who asked for

some copies. Campbell was always pretty open-minded. He never bought into it fully because he still

held to some kind of value-free position; this in spite of the fact that the same arguments that he used

against foundationalism [of knowledge] apply to foundationalism in values. Campbell espoused anti-foundationalism with respect to knowledge claims, which involves arguing that all knowledge is fallible and contingent.

That is, if he used the same reasoning on values, he would have to go with a different conception of values. But he was

always open-minded. He was great to talk to.

And even the government people started picking it up in Washington and mentioning social justice.

I remember one guy at the Department of Education in charge of evaluations. He said, ‘‘Damn, I wish

I’d thought of that. If I had read that book first, I could have written that paper before you did.’’ And

my response was, ‘‘Well, take it. I don’t have any ownership of it. You can take it and do with it

whatever you want to.’’

Mel: As you know, a year ago at the AEA conference, some of your work was highlighted, including the

truth, beauty, and justice ideas.

It’s nice to see it still might be relevant, a book you did 30 years ago. Yeah, that’s a nice feeling

to have. I don’t want to get too carried away because I always think that our influence is very limited.

In my view, our influence is a lot more limited than what we realize. You know, you’re a big name

guy, and you go give talks and a lot of people show up, or maybe they don’t. But you give these talks,

man, just going on forever. Well, a few years after you die, hey, that’s pretty much gone. I don’t think a

lot of it hangs on. Campbell’s perhaps the exception to that. Take even somebody like [Lee] Cronbach.

You don’t hear of his name so much anymore, and what an enormous figure he was in the field.

Mel: And he wrote stuff worth rereading today.

Oh, yeah, Cronbach had great stuff. No, it’s nothing about the quality. It’s something—there’s

another dynamic in operation, where the work gets replaced with other ideas, which probably aren’t

as good as the work replaced. What has happened with Campbell is that you’ve got a set of proprietors

who have taken these ideas and reformulated them. And have re-done them, considerably, I might add.

By the third edition, those ideas have been redone. That’s a difference between Campbell and Cronbach.

Cronbach’s ideas were terrific. Other than maybe in testing, he didn’t have a group of people

established who have carried on the evaluation ideas. Maybe the ideas will return. Usually, when they

do, they are renamed.

Mel: You more recently, working with Ken Howe, have been advocating notions of democratic deliberative

evaluation.

The idea would be, let’s try to make evaluation more democratic. If you’re going to bring people’s

values into evaluation, how could you bring a lot of values together? Bring the values together with the

stakeholders and make sense out of them somehow. Our idea would be to base values, the evaluation,

and the value judgments on people involved in the program, the beneficiaries of the program, and all

the major stakeholders in the program. Bring those together, and you’ve got inclusion, discussion, and

deliberation, three broad principles. You try to include the people who are important and not exclude

people unless you have good reasons for excluding them. You try to have serious discussions and listen

to them and their point of view. That’s an error evaluators often make. They listen to the sponsors,

but not to people in the program and in the field. Then bring it all together in a process of deliberation.

We haven’t defined deliberation very well, I think. Deliberation is a tough concept. You get people

together in a room, you sit around, you talk about this theory, that theory. You argue back and forth.

And you have an evaluation based on the values of the community.

Mel: What’s your sense of the state of the art in practice in carrying this out?

I think it’s hard to do. I did the project in Denver, which worked all right when I was a federal

monitor for the bilingual program for about 5 or 6 years. Ken and I talked about this. I said, ‘‘You

know people. You have to give people a set of procedures.’’ He was arguing, being a philosopher,

‘‘No, let’s just go with the principles because there are many ways to do it, and people have to fill

in.’’ I think it’s difficult to do, to tell people to be inclusive without telling them how to do it. I can

say, ‘‘Be deliberative.’’ But how do you do it? You need some ideas about how to do it. I’d say that

where the ideas get caught right now is in the implementation. I’ve always been better at conceiving

abstract stuff than implementing. I can do the evaluation and can write about how I did it, but it

would be difficult to write a protocol for, ‘‘You do this. You do that.’’ I think that’s where it sits;

a conceptual scheme right now. Some people can do it. It doesn’t take that much to do, and it may

not be far from what a lot of people do already.

Mel: You think there’s much risk of things exploding when you put these diverse value bases of people

on the table?

Yeah, it’s a possibility.

Mel: How do you avoid that?

You have to be careful. I think you try not to get some groups together. In the Denver project, we

always had a lot of militant people who wanted Spanish taught in all the schools. I didn’t try to bring

them in directly. I never had a town hall meeting where everybody got together. In the past, those

ended in shouting matches, if not worse. Rather, I talked to the militants. I went to their headquarters,

talked to them, recorded their views, and listened. I didn’t try to have them confront the school district

people. I did get the lawyers together and present them with data twice a year. Part of the evaluation is

to accept that you have to structure the interaction just as you have to structure the data collection.

Mel: In the late ’70s, you and a group of people published a critique, a kind of meta-evaluation of

Follow Through, a program designed for kids after completing Head Start. You criticized the evaluation

in terms of a couple of things, one being that the measures that they selected fit better with some

program models than others; they were comparing across a few different variants or types of Follow

Through programs. And I think part of the point was that any claims about relative success were kind

of illusory given that differential fit. Now, I think you also used the concept of justice in making that

criticism. Can you talk about that project, including how it came to be that you and others were doing

this critique?

The Ford Foundation asked me to consider doing it. When the Follow Through evaluation was

coming out, a lot of people—this had been going on for a while—a lot of the people with the programs

were unhappy with the way the evaluation was going and how the evaluation had developed. The evaluation

started with a very broad array of measures. The original idea was to have [a collection of]

measures [that together would be appropriate] for all 12 types of programs. In other words, you’d have

measures across all types of programs. You’ve got rather different kinds of programs. But the evaluation

was cut back to only a few evaluation measures. Some program people complained to Ford.

A woman at Ford asked me if I would be interested in taking a look at the evaluation. I said, ‘‘Well,

let me check it out.’’ I went to a Follow Through meeting and talked to a bunch of people. I thought,

‘‘Yeah, maybe they’ve got a point here.’’ What had happened was that the Department of Education

had started with this vast array of measures and at the first data collection the evaluators had collected

two tons of data. They couldn’t possibly manage it. Nobody could manage it. So they reduced it down

to a few outcome measures. You’ve got all these measures for everybody, now they’ve reduced it

down to a few, such as the metropolitan reading test. That reduction of outcome measures tended to

favor certain programs. When we looked at the teaching materials of the programs, we found that a

particular program would perform very well in math. When you’d look at the internal subtests you’d

see they were about the same as everybody else. Except, oh, this one subtest jumps up like this, and

that’s what brings their average up. That’s why they scored high in math. Then I track this back to why

they do so well on that particular subtest. I look at their teaching materials and I look at the subtests,

and I find what they’ve done is teach materials very close to the subtest. For example, instead of

[formatting] multiplication problems up and down, the test items multiply [horizontally]. Some of the

programs knew that’s what was on the tests. This is an open test and they structured their materials that

way, a very close match to that test. It wasn’t exactly the same items, but very close. The whole format

was very close. That was suspicious because that’s where all their high scores are coming. That’s how

they get this high score in math. By doing that kind of analysis, we developed the idea that the outcomes

were a bit arbitrary because of the closeness of the materials to what was on the test. Perhaps

it was accidental. Nonetheless, it gave a great advantage to certain programs. Decker Walker was on

the panel, along with Gene Glass and Les McLean. Decker had done some research looking at the

closeness of fit in other tests. That’s why I had him on there. His research indicated that how closely

the curriculum fits the test makes a big difference. That’s been replicated over and over again since

then. The other big factor was that you had programs with six different sites around the country. The

outcome variation among those sites was terrific, huge variation. So the variation across sites within a

program was about the same as between-program variation, if you follow me.

How can you say that Program A is better than another program if there is so much internal variation

within a program? There was no consistency, internally, within those programs. For those reasons, our

critique was entitled, ‘‘No Simple Answer.’’ Glass also did some effect size comparisons. We ended up

with a critique challenging the Follow Through evaluation, which was the big evaluation of the day. It

cost a lot of money and was supposed to be definitive. The Department of Education was figuring they

were going to tell everybody in the country, ‘‘Use this, use this, and use this.’’ We thought that was

going too far with the results they had.

Mel: I think that your critique helped contribute to skepticism about the potential of these large-scale

federal evaluations as definitive guides to action.

Yeah. Yeah. I think it did. I think it wasn’t only our critique, but the fact they put so much money

into this evaluation over so many years, and they end up with an evaluation that is, one can argue,

rather arbitrary in the results and that has inconsistent results in many ways. Maybe these big definitive

evaluations just don’t pay off. There have been several other big evaluations. That was just one of

them. And that may have been the final one to tip over that had people thinking. ‘‘Well, let’s not

do these big, huge, massive things. We’ve got to wait for years and years to get the results from them,

and then find the results are rather arbitrary.’’ In Washington, they expected definitive results out of

this big chunk of money. And so the people in charge of it just thought something went wrong. Something

was wrong with the evaluation since it didn’t produce definitive results. Another way to look at it

is that the reality they’re trying to map is too complex. Reality is more complex than what they realize.

And you’re not going to get simple results and totally consistent results. Because if you put a program

in one place and you put the same program in another place, you end up with different results. You’ve

got different contexts in different places, and you end up with rather different results sometimes.

Mel: Any sense of irony that this critique had such widespread implications?

Well, I don’t know. I didn’t expect it would be quite that—the Washington people were very

unhappy with it and so were some of the sponsors who had won, what they called, the ‘‘horserace.’’

They thought they had won the horserace. So we say, ‘‘Well, the horse was doped’’ or whatever. We

didn’t say that, of course. But, it’s not that those programs weren’t good programs; it’s just that

the claims were too strong based on the evidence they had. Yeah, that was one of the times I

got—one of many times—I got blackballed in Washington for running counter to federal policy.

Mel: What’s your sense of how evaluations have done since then with these kind of issues of context

specificity and generalization?

Well, I think the evaluation community is far more sensitive to those issues now. I think that

we’ve tended to do other things, have other kinds of studies. I would say there are various things

that people have done. One is program theory by mapping out what the program is like. That limits

the possibilities if you do that, I think. Another is meta-analysis. If you put a program in place here,

here, here, here, and here, even though you get different results and under different circumstances,

I actually think that adds to the robustness of the overall findings you end up with. It doesn’t have

to be one massive study. Maybe it’s better to have a bunch of smaller scale studies in different

places and different contexts and then try to summarize with something like a meta-analysis. Even

if you’re going to do randomized studies. That makes more sense to me, and I think we’re more

sophisticated now. Sometimes the randomized stuff worries me in terms of claims made for it, such

as this huge focus during the Bush administration. I think people are over relying on randomized

studies. Indeed, my understanding of why Don Campbell invented internal and external validity was

to get away from Fisherian randomization. Because the students thought it solved all the problems,

and it doesn’t.

Mel: You’ve done actually a number of things that are critiques or meta-evaluations: The Follow

Through, Jesse Jackson’s PUSH-Excel, and the New York Promotional Gates program. Is there something

that draws you to doing that kind of work, or something that draws people who are looking for

that kind of work to you?

I think I like the political stuff. I’ve always had a penchant for the political: political theory and

political philosophy. Maybe if I did a degree again, I might do it in political science. It has a draw

to it, and every once in a while I think from my own personal background I have an urge to engage

in conflict. I also have this sense of trying to seek out social justice. So I’m willing to take on some

critiques. Like the New York people called me for the Promotional Gates program, which was the big

New York City program. The Mayor’s office called and said, ‘‘We’ve got this program set up and

we’re going to do an evaluation, but we need somebody to monitor the evaluation. We have to have

somebody who’s pretty tough. We’ve talked around, and people say you’re the toughest guy we could

find.’’ At this point, I was at the University of Illinois. That was flattery, right, from my point of view.

And so, of course, I signed on. I couldn’t help myself, could I? I fundamentally got myself in a lot of

trouble. We had a lot of newspaper coverage over the whole thing. I think it was the only time I ever

agreed to make my reports to the Mayor’s office and the Chancellor’s office confidential. They wanted

it confidential. I said, ‘‘Okay.’’ Well, we did it. But their bureaucracy leaked the reports. You put stuff

in one end and it runs out the bureaucracy. They had a leaking bureaucracy. Some of what we were

writing, saying, ‘‘You know, all you’ve got here are no effects in your summer programs, where you

think you’ve got these big test score gains.’’ They had failed to adjust for regression to the mean. ‘‘And

so you really don’t have these results that you’ve been touting in the newspaper.’’ The Mayor was

running an election campaign, and he was on television saying, ‘‘I raised the test scores in New York

City.’’ Well, the results he was talking about didn’t exist in reality when you analyzed the data the

proper way. They didn’t do it deliberately; they just didn’t know any better. Bob Linn served on that

panel with me. We had to go tell them they didn’t have any effects from their summer training program.

At the same time, they finally decided, ‘‘Okay, just help us do it the right way.’’ I thought they

turned out to be very reasonable about it. We had lots of ups and downs, and they tried intimidation,

you know, the usual stuff.

Mel: Despite doing several of these kinds of critiques, monitoring, and meta-evaluations, I don’t

think that you’re widely associated with meta-evaluation as a topic. Any thoughts you want to share

about the practice of meta-evaluation?

Well, I’ve taken on the meta-evaluations. I have written some, and Leslie Cooksy and Valerie

Caracelli interviewed me and used some of those critiques in their work about meta-evaluation.

I’ve done some writing, but I never attempted to do what they’re doing, which is to pull some of

that ideas together about how we approach the issue of meta-evaluation. I’ve tended to philosophize,

and although that’s interesting, I find the meta-evaluations interesting too. The other thing about it

that appeals to me is because it acts as a conscience for the field. I think we need to have people in an

important role, and possibly a critical role, evaluating people for their careers and their work.

There’s a lot riding on it. You really need checks within the field itself to say, ‘‘Look, we have

to police ourselves on this stuff. We can’t just let people do anything and everything.’’ The metaevaluations,

particularly if they’re big enough to be an example, have real appeal as one way of

saying, as with Follow Through, ‘‘Don’t do the studies this way.’’ The Follow Through critique

influenced a lot of people because it had a high profile. If it had been a small thing, it wouldn’t have

had quite so much influence. You need some meta-evaluation, some checks on what people are

doing. It serves as a conscience for the field or a way of exercising the conscience of the field.

We should check ourselves. We should evaluate ourselves. Scriven’s absolutely right on it.

Mel: I want to ask—thinking back to a past AEA conference when Chip Reichardt and Sharon Rallis

were Presidential Strand chairs. You were one of the people that they invited to talk on the qualitative/

quantitative debates. As I recall you ended up saying, ‘‘Let’s stop wasting all this human

capital on this debate. Let’s do the hard work of evaluation.’’ What’s your sense of where that discussion

has gone?

I think we have made a lot of progress. I think that conference helped. We still have a split, of

course, between the quantitative and qualitative people, but that’s all right. The randomization issue

got bent out of shape. I think it bent people out of shape somehow. So that was a little bit of a throw

back. Not that I’m against randomization but pushing it very hard was a bit of a throw back for a while.

Generally, I think we’re in really good shape in terms of quantitative/qualitative. I can understand why

quantitative people see themselves as playing an irreplaceable, important role. And they see themselves

as quantitative people, you know? Gary Henry showed me the latest analysis he had done

recently. He said, ‘‘I finally think I’ve done a real piece of work that I can show my granddaughter

and say, ‘This is really a serious piece of work that I’ve done.’’’ I think quantitative people identify

themselves with the rigor. Part of their identity is with that. It’s more than an issue about method. What

happened to the quantitative group was they felt their status was being diminished in some ways. It’s a

legitimate concern on their part. They wanted to reassert themselves as being important in the area.

Sometimes people with all the talk about the qualitative stuff tend to diminish the importance of the

quantitative. And I can see that would affect people who put a lot of their identity into the quantitative.

On the other hand, the qualitative people are sensitive about their methods as being too soft. If you say,

‘‘This evaluation isn’t worth anything,’’ they’re going to react. That’s why we get into these shouting

matches. I’ve been in a few back in the early days with Tom Cook, before Tom decided he liked the

qualitative as well as quantitative. Those were some of the early battles we had back in those days.

Mel: Is there a place for randomized trials from the perspective of social justice?

Oh, yeah, sure, absolutely, yeah. The trouble with randomization from my perspective is—well,

it’s hard to do sometimes, but it’s more about the metaphysics of what we’re dealing with than it is

about the methods themselves. Sometimes experimental design tends to treat the program like an X.

Like an X is unchanging. But we now realize when you put the X into a setting it’s not really X, it’s

really X sub-one, and you’ve got all these variations, and those variations in something like an educational

program can be very, very substantial. Put the same program with a different teacher, and

you’ve got pretty damn substantially different programming. That kind of program variation makes

the randomization more difficult, or at least what you can derive from it. It’s not that you can’t do it.

So I don’t agree with some of our realist colleagues. You and I are realists, of course, in our own

frameworks. But as you know, some British realists say, ‘‘Well, you shouldn’t do experiments.’’

Well, that’s not my view. Of course, you can do experiments. You just have to be careful of the

conclusions you draw from them. I mean, they take Bhaskar, who says that you cannot do social

experiments that are definitive in social work the same way you can do definitive experiments in

physical work (Bhaskar, 1978). The nature of reality is somewhat different. Social reality is much

more plastic. But that doesn’t mean you can’t do experiments. You just have to be more careful

about how you do them and interpret them. So there’s nothing wrong with randomization.

Mel: How did you get to the realist philosophers, by the way? I’m thinking of that Educational

Researcher paper (House, 1991).

I was down at the University of Sussex giving a talk back in the late ’70s probably or early ’80s.

There was a conference of social philosophers. I looked at the books they had there, and I saw these

Bhaskar books. I thought, ‘‘realist theory of science.’’ You know that I read philosophy all the time and

I used to do it for fun. I mean I read a lot of philosophy, and I read widely in lots of fields. I thought,

‘‘Well, I’ll take a read at this.’’ And I read that and I thought, ‘‘You know, that would address the quantitative/

qualitative thing.’’ If you have a particular conception of a reality, and here’s the conception

and you could approach it, you could do that with quantitative methods, and you could do it with

qualitative methods. It’s still the same reality. I saw that as a way of getting through the quantitative/

qualitative dispute, and that’s how I wrote that original realist article I did. Well, I didn’t do so

much realist evaluation as I did a realist research perspective.

Mel: More recently, you’ve been paying a good deal of attention to conflicts of interest, drawing on

medical evaluations but applying it to evaluation more generally. Can you talk a little bit about how

serious a threat you think this poses to our work?

I think it’s a very big threat. I think it’s a threat to the whole society. And I got into doing this

because I was looking at how politics influences experiments, even randomized experiments. I saw

all these studies, these medical studies, changing their findings, such as ‘‘This medication has these

effects, and nobody found this out.’’ I started looking into that issue, which some of the medical

people had looked into as well. I’m reading these medical articles and discovering there’s a much

deeper issue. This phenomenon is systemic. It doesn’t just happen from time to time, rather there

turns out to be a systemic bias built into the system itself. I became alarmed the more I looked at

it. It is quite widespread in medicine, unfortunately, and I found it’s widespread in other fields, too.

As you know, I’m an active investor and read a lot in economics and financial investing. One of

the big things that happened in the 2008 financial crisis was that there was conflict of interest in the

financial system from top to bottom. The economist Jeffrey Sachs calls it the collapse of civic virtue

by our financial and political elite. It’s a pretty big issue. And it’s important to us because if you

have evaluation being bought and sold for its results, that is, ‘‘We’re going to have you do the evaluation,

and here are the results that we want.’’ Well, what use is evaluation? Conflict of interest has

tremendous pernicious effects when you start doing evaluations of medical and education programs.

If you let the sponsors call the shots on the findings, it’s dangerous. It’s the biggest danger to the

field for the future, in my view.

Mel: I want to ask a couple of questions about writing. If I remember correctly, in some of your early

books, the chapters had appeared in earlier versions as journal articles or book chapters elsewhere.

Did you have a map as you were working on the individual projects that led to the book, or did you sort

of connect the dots after the fact and create a larger work?

I think the 1980 validity book was more like that (House, 1980a). I did the article on social justice,

and I did an article on truth, the argument logic one. These were responses to particular issues, like

the persuasion paper, the monograph I did at University of California, Los Angeles, one summer

(House, 1977, 1980b). That was a response to people who were arguing that all you had to have was

a method. I was saying, ‘‘A method’s not enough. You really have to make an argument. The method

fits into the argument, but you’re making an argument. You have to make an argument. Science

makes arguments. There’s a method within the argument, that’s part of the argument, but it’s the

total argument that counts, not just the method that counts. Although the method is important, of

course.’’ Some articles were to address social justice. Social justice addressed the question, ‘‘Is it

only politics out there? Have we got anything beyond politics that we can value?’’ So, there were

specific topics of interest. But then I took some of them for the validity book and put them all

together. Some places I needed to fill in the pattern, you know, the dots between the political and

social justice and methods with the political theory. I found ideas to fill in the things I wanted.

Mel: How did you start working on writing fiction?

I was an English literature major, going way back, and every English lit major wants to write a

novel. Probably psychologists do, too, they just don’t want to admit it. But I always wanted to write

something that’s creative—creative writing. I gave a talk at the Center for Advanced Study in the

Behavioral Sciences at Stanford where I was a fellow for a year. I wanted to explain to these other

fellows, who were very bright people, economists and historians, and so on, what evaluation was.

A few knew about it, but most didn’t. I picked out three concepts: causes, values, and politics. Now

I could handle causes with my realist framework, with how I thought our conception of causes had

changed. For values, I had how our conception of values had changed from value-free to value-embedded.

However, when I got to politics, the third concept, I didn’t have a ready analysis. And

you have to understand the politics. So I just told a story. I told a story of a project that seemed

to work pretty well. And I couldn’t think of an analytic framework that would handle that concept.

The speech went over very well, I thought, from my own point of view. The staff thought it was the

best talk of the year. Totally unbiased opinion, of course. I got to thinking, ‘‘Well, maybe you could

do that in a larger framework.’’ I was talking to this old colleague and good friend of mine, Barry

MacDonald, in England. He was retiring and complaining that the young people didn’t listen to him

Any more. He knew all about how to manage projects and deal with the politics, and all that. But he

couldn’t pass it on. So I thought one thing I could do is write a novel, and students can read that. I’ll

make it more fun. So the novel was written for students (House, 2007). I tried to include some of the

knowledge that we have and incorporate it in story form for them, which seems to be a natural form

for that kind of knowledge anyway. A political novel is a hell of a lot more interesting to read than

any political science text you’ve picked up recently. You would have that in there. And a lot of it is

detail. It has to be. There’s a reason why the narrative form fits politics so well, because it’s the little

ins-and-outs and backs and forth and nuances and all that detail that’s so important. I decided to

write the novel and put it online, but few people found it. Finally, I just put it in hardback.

I think part of the thing about writing something, if you can take a complex idea, a philosophic idea,

and make it transparent then you’ve done a very good job of writing. That’s one of the criteria I use.

Could I take this idea that may be confusing and very complicated and make it clear? You know, that’s

the challenge in trying to do it. If you do that, you’ll probably end up with something worthy.

Mel: In that quote I mentioned earlier there was a little earlier part to it and it was that ‘‘The role of

the theorist in the world of evaluation practice is ambivalent in some ways. The theorists are high

profile, lauded and sought after to deliver speeches and lend legitimacy to projects, proposals, and

meetings. At the same time practitioners lament that the ideas are far too theoretical, too impractical.

Practitioners have to do the project work tomorrow, not jawbone fruitlessly forever.’’ Again, a

little autobiographical?

Oh, yeah, sure, definitely. Been to many places, you know, and given a lot of advice that probably

wasn’t very useful to people, particularly in other countries. No, no, of course. I gave a talk in New

Zealand a couple of years ago in Maori country. You know the Maori have a huge influence there.

They were looking at culturally sensitive issues. And so I told the story of the Denver evaluation that

I did. I told the story rather than talk about democratic evaluation as a concept. I think it went over

much better—they could take the story and adjust it for whatever their own circumstances were.

Because I couldn’t conceivably know what their circumstances are. They could make whatever

adjustments they wanted to, the idea is there. That’s one way of dealing with the relevance issue,

but it’s always an issue. You get paid, and you become well known for writing books, and articles,

and more articles. You’ve got to put the ideas in a certain form to go into certain journals. But as

every practitioner will tell you, that doesn’t help the practitioners very much. I’ve always lamented

that. I mean, I always have. I always see that gap there. It’s also true—because I started as a teacher

for a few years, and I saw the big difference between what people are teaching teachers and what

the teachers actually have to deal with. There’s a huge difference. Actually, that problem exists

in all the professions to some degree.

Mel: Any ambivalence about sort of reaping the benefits of this highly lauded career—you know, the

flights to give talks at places, the requests, ‘‘Please be on our advisory board,’’ the honors—and the

focus on social justice? Does that ever twinge a little bit?

Yeah, yeah, yeah, sure, I notice the difference. You’re trying to do what you can. You go someplace

and people spend a lot of money to support these boards and all that, and sometimes you wonder,

‘‘Does this really make any difference for the people out there who are doing or suffering that

the programs are supposed to be for?’’ I mean, ‘‘Does that really help? Does this help them that

we’re doing all this?’’ I think anybody with some conscience would think about that. If you have

a sense of it, there are people out there suffering who need these programs. You have some sense

of that in the end. So yeah, there’s a twinge I can’t do more. Now I don’t know what else I would

do. I mean that’s the other thing. What would I do that . . . what would I do to help? So there’s still a

question of could I have done something else? I haven’t thought much about that. I think with evaluation

I’ve done about as well as I could for social betterment, given my position where it was and where my

thinking was at that time, certainly. But I would have killed a lot of people if I’d gone into medicine,

been a doctor, I’m sure. So, you know, at least I don’t think I’ve damaged anybody in that way.

Mel: Does Alton come back to you at times? Is that part of your concern for social justice?

Yeah, sure. I grew up in difficult circumstances. A lot of people are still back there. Those people

didn’t have a very good life and somehow I was lucky, fortunate, to get out of the circumstances we

were in. Some people did, but a lot of people didn’t. Yeah, that’s part of it, sure. I never forgot those

people, the kind of circumstances they were in. I know what it’s like to go to bed hungry. I’ve been

there a few times in my life. And, you know, my relatives didn’t get out of there since they stayed in

the same social class, same kind of circumstances. There’s first-hand experience that fuels the social

justice. It’s not simply an abstract concern. You know people who went through all this and who could

have used help from somebody at some time. And so evaluation, for me, is a way of trying to keep the

programs honest, but then you’ve got to keep the evaluations honest, too, to try to help these people.

Robin: A lot of these ideas were germinating during critical shifts in our country socially, politically,

culturally; how did that fit into the mix of the evolution of your ideas, reading about theories of justice,

thinking about where you came from? Can you situate some of your thinking in the context of those

times politically and culturally?

Well, context definitely has a strong influence on the work that I did and why I wanted to do that

work at that time. The social justice ideas I did during the time of the great expansion of social

programs in the ’70s, following the Great Society programs, which really set evaluation on its feet,

got it going. This was the expansion of programs and program evaluation. The idea Campbell developed

was this social experimentation idea (Campbell, 1998). But I was trying to develop the idea

that this should include social justice too as we looked at these programs. When you get to the

1980s, Reagan comes into power, and the whole tenor of the country changes. It has taken a different

direction since that time. So what are we involved with now? Conflict of interest is where we’ve

ended up. Serious, big, damaging conflict of interest from the top of the government to the bottom.

I worry about that now looking at it from the role of the evaluators, whether it’s evaluators doing

drug studies or doing education evaluations. The society has somehow become infected with conflict

of interest and other forms of corruption. Now, I can’t quite blame Reagan for all that, although

the trend started with Reagan. I don’t think anybody’s got a very good story on why the social backdrop

has changed. I find myself dealing with program issues that the context delivers. And social

justice is always there. I’ve discovered that from my own background and from the theory of John

Rawls. As of now, 30, 40 years later, you end up with conflict of interest and this systemic self-interest

eating up the society. The whole society has changed since that time.

Miles: Can evaluation play a role in saving us from ourselves?

I would say ‘‘yes,’’ but first you have to save evaluation. You have to save evaluation from conflict

of interest. That is, you can’t have evaluation for hire for results. You can’t have drug companies saying,

‘‘I’m giving your company the funds to do this study for me, but you know damn good and well if

you don’t get the results to come out the way we want, you’re not going to get another study from us.’’

We’ve got to save evaluation from that. We’ve got to save the field before we can do anything beyond

the field. That’ll be tough, tough.

Acknowledgments

We gratefully acknowledge the donation of transcription services by Gargani and Company and the editorial

assistance of Katherine Cloutier.

Authors’ Note

The Oral History Project Team consists of Robin Lin Miller (Michigan State University), Jean King (University

of Minnesota), Melvin Mark (The Pennsylvania State University), and Valerie Caracelli (U.S. Government

Accountability Office).

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication

of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Alkin, M. C. (Ed.). (2004). Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks, CA: Sage.

Alkin, M. C. (Ed.). (2013). Evaluation roots: A wider perspective on theorists’ views and influences. Thousand

Oaks, CA: Sage.

Bhaskar, R. (1978). A realist theory of science. London: Leeds.

Campbell, D. T. (1998). The experimenting society. In W. N. Dunn (Ed.), The experimenting society: Essays in

honor of Donald T. Campbell. Piscataway, NJ: Transaction Publisher.

Christie, C. A., & Alkin, M. C. (2008). Evaluation theory tree re-examined. Studies in Educational Evaluation,

34, 131–135.

House, E. R. (1976). Justice in evaluation. In G. V Glass (Ed.), Evaluation studies review annual (Vol. 1),

75–100. Beverly Hills, CA: Sage.

House, E. R. (1977). The logic of the evaluative argument. Los Angeles: University of California Center for the

Study of Evaluation

House, E. R. (1980a). Evaluating with validity. Beverly Hills, CA: Sage.

House, E. R. (1980b). Evaluation as persuasion: A reply to Kelly’s critique. Educational Evaluation and Policy

Analysis, 2(5), 39–40.

House, E. R. (1991). Realism in research. Educational Researcher, 20, 2–9.

House, E. R. (2007). Regression to the mean: A novel of evaluation politics. Charlotte, NC: Information Age

Publishing.

Scriven, M. (1980). The logic of evaluation. Inverness, CA: Edge Press.

Shadish, W. R., Cook, T. D., & Leviton, L. C. (1991). Foundations of program evaluation: Theories of practice.

Newbury Park, CA: Sage.

Stake, R. E. (1967). The countenance of educational evaluation. Teachers College Record, 68, 523–540

Ernest R. House Archives

Blog Archive

Friday, December 2, 2022

The Oral History of Evaluation: The Professional Development of Ernest House

2011

No comments:

Post a Comment

Coherence and Credibility: The Aesthetics of Evaluation