2011
The Oral
History of Evaluation:
The
Professional Development of Ernest House
Robin Lin Miller, Jean King,
Melvin Mark,
Valerie Caracelli, and The
Oral History Project Team
Since 2003, the
Oral History Project Team has conducted interviews with individuals who have
made
particularly noteworthy contributions to the theory and practice of evaluation.
In 2011, Mel
Mark, Robin
Miller, and Miles McNall sat with Ernest (Ernie)
House in Anaheim, CA, at the site
of the
American Evaluation Association (AEA) annual conference. The interview was
taped and
subsequently transcribed verbatim into a 20-page document. Working with House,
we edited the
transcript
for clarity and length. House reviewed and approved the article prior to its
submission
to
American Journal of Evaluation as well as the final version following revisions
made in response
to
editorial feedback.
Ernest R. House
is Professor Emeritus at the University of Colorado, Boulder, where he was
Professor of
Education from 1985 to 2001. He was on the faculty of the University of
Illinois,
Urbana, where
he was affiliated with the Center for Instructional Research and Curriculum
Evaluation
from 1969
to 1985. House has been a visiting scholar at numerous universities nationally
and
internationally, and at the Center for Advanced Study in the Behavioral
Sciences.
House is
probably best known for his writing and practice regarding the inclusion of
values in
evaluation.
In an article published in 1976 in the first edition of Evaluation Studies
Review Annual,
House drew on a
major work in philosophy, John Rawls’s A Theory of Justice. In that article and
in other
publications, House advocated for evaluators to incorporate social justice
concerns in
their
work, and particularly to consider a program’s consequences from the vantage of
the least
well-off.
More recently, with colleague Ken Howe, House has written a book on
deliberative democratic
evaluation.
In this work, House and Howe argue against traditional stances on the so-called
fact-value
dichotomy. Instead, they argue that through careful processes involving
relevant stakeholder
groups,
evaluators can achieve relatively unbiased value judgments. Consider one
indication
of the
importance of these two approaches to values in evaluation: House is the only
person to
have been
listed in two places on any single version of Christie and Alkin’s
(2008) evaluation
‘‘theory
tree’’ (both on the valuing branch). House is the author of many other books,
articles,
chapters,
and reports. In addition to his work as a practicing evaluator, he has been
called upon
to
conduct meta-analyses of several major evaluations. Among House’s awards is
AEA’s Lazarsfeld
Award for evaluation theory.
In this
contribution to the oral history project, House describes his journey from
English major
to
evaluator and evaluation theorist. He talks about his first evaluation project,
as well as the major
evaluation
figures he worked with early in his career. House notes an important lesson
from his
early
experience as an evaluator, that is, ‘‘just how political evaluation was.’’ He
explains that
this
realization helped lead to his interest in values. After talking about a wide
range of topics,
including
meta-evaluation, randomized trials, the qualitative–quantitative debate,
and the risk
of
conflicts of interest in evaluation, House comes back to his early experiences
and their role
in his
focus on social justice.
The interview
includes references to numerous individuals with whom House has worked over
the
years. In the case of people outside the evaluation community, such as program
and agency
staff
involved in an evaluation, a brief explanation of their roles is given. Many of
the individuals
to whom
House refers are luminaries in the history of evaluation. These include Egon Guba,
Michael Scriven, Bob Stake, and Dan Stufflebeam,
with whom House interacted early in his career
at the
University of Illinois, as well as Don Campbell and Lee Cronbach,
who made important contributions to social science methods as well as
evaluation theory and practice. Their influential
work is
described variously in Alkin (2004, 2013), Shadish, Cook, and Leviton (1991), other
entries
in this oral history project, evaluation textbooks, and in their own many
publications.
House also
refers to Gene Glass, a research methods and statistics expert, perhaps best
known for
early
contributions to the practice and methods of meta-analysis; Lloyd Humphries, a
noted individual
difference
psychologist and measurement specialist; Barry MacDonald, a British evaluator
who
emphasized democratic participation in evaluation; Roy Bhaskar,
a British philosopher who
espoused
a particular approach to realist philosophy; House collaborators Ken Howe,
Steve
Lapan,
Decker Walker, and Les Maclean; and current evaluation scholars and
practitioners Leslie
Cooksy,
Valerie Caracelli, and Gary Henry.
Interview With Ernie House
Mel: You were an English major at
Washington University in St. Louis. How did you get from that as
your undergraduate interest to
evaluation?
Ernie: I started
in Engineering, but found it not that interesting. And then I decided to be an
English
scholar. When I started going through all these
books in the Washington University library in St.
Louis, I found
these books that hadn’t been read for 50 years, 70 years. I thought, ‘‘Do I
want to
do this?’’ I decided to go into medicine, to
become a doctor. I have some science background. After
carrying this cat soaked in formaldehyde around in
a bag for a semester, I decided, ‘‘I don’t want to
do this either.’’ I went back to my hometown
and thought about what I would do. Within a month
of being drafted, I decided to teach English.
This was between the [Korean and Vietnam] wars. You
could get any kind of a job, just about,
particularly teaching, and you’d get out of the draft. I started
teaching English in high school. I taught for 4
years. People from the University of Illinois asked me to
teach a new curriculum, semantics and
linguistics. I could do that. They liked what I did and asked me
to help write the materials and be the state
consultant for the program. After that, the College of
Education there
offered me a fellowship with James Gallagher, who was a special education
expert
in the area of the gifted. When I finished my
degree, they asked me to evaluate the Illinois Gifted Program.
I never intended
to be an evaluator.
Mel: I wanted to ask you about that. How
did you get chosen at this relatively early career stage to
evaluate
that program?
Gallagher was
the leading authority in the United States on the gifted. I had a fellowship
with
him, and they asked him. Actually, evaluating
this program was a political move on the part of David
Jackson,
who was the main political actor in the College of Education at Urbana. He wanted the
program evaluated, and so did the state
legislature, because gifted programs are not very popular.
Maybe they could
show some effectiveness. Evaluation was all new territory. Jackson got the
legislature
to provide the money. He asked Gallagher, ‘‘Who would you recommend to do the evaluation?’’
I was one of his
students.
Mel: Well, what was the program like? It
was a statewide gifted program?
The program had
20 demonstration centers all over the state demonstrating ideal teaching
methods,
model teaching methods, new curricula, new math,
new science, new English materials, most of them
out of the University of Illinois.
Mel: What did the evaluation look like?
This was your first evaluation, right?
Well, I had a
half million dollars, which was really a lot of money in 1967 terms and 4 years
to
evaluate the program. I did many different kinds of
studies. The program had five parts. The experimental
program funded studies. The demonstration program
set up and funded the 20 centers. The
reimbursement program provided funds that districts
could apply for to adopt these programs. The idea
was that you would do experiments and have
proven methods for teaching and dealing with gifted
kids, and then you would have these things
demonstrated someplace like Evanston High School.
Teachers would
see these practices and put them into practice in Elgin or Alton, Illinois,
where I’m
from, or someplace like that. It was a
research, development, diffusion model of educational
change.
We selected a
random sample of school districts, and I tried a survey the first year because
I had to
provide something for the state legislature right
away. And I found out after I did that, that people lied
like hell on what they were doing when you sent
them this form. So I selected a stratified random sample
of school districts in IL that had this
program. I sent in trained teams of people to these districts.
The teams spent
a couple of days there collecting materials and
interviewing people intensively about
what they were actually doing, enough that we
could make an assessment of what they were doing.
Sometimes our
team would show up and the principal would say, ‘‘Oh, hell, we’re not doing
anything,
really. We just wanted the money from the state
to do this.’’ That’s the kind of stuff you’d find. We had
a set of instruments, 40 or 50 different
kinds of measures and interviews. We developed a special
instrument to measure classroom atmosphere, a
classroom climate–type instrument. One of the guys
working for me, Joe Steele, had developed this for
his dissertation. We had a lot of different measures.
We didn’t use
test scores as the outcome measure. We were looking at many different grade
levels.
Mel: Well, what did you draw on in trying
to put together this evaluation design at that point?
I read the whole
literature in evaluation, which I put into a small cardboard box. I read the
whole
thing in 30 days. I mean, that’s all there was, really. I talked to my statistics instructor, who
happened
to be Gene Glass. ‘‘Look, I said, I don’t
know very much about evaluation.’’ He said, ‘‘Nobody
does.’’
He said Stake knows
a lot about it and he said put Scriven and Stake on
an advisory panel. Scriven had
moved to California and didn’t want to do it. I
had Guba, Stufflebeam,
Stake, and a couple of other
people. They provided me with advice. I used
Bob’s countenance paper as the model (Stake, 1967).
That’s how I
laid out the program. I set up a data collection team headed by Steve Lapan. By that time,
Jim Gallagher
was the U.S. Deputy Commissioner of Education. I took the plan to him. He
looked at it
and said, ‘‘This is a great plan. Take you
about 15 years to do this and probably as much money as the
Vietnam War.
You’re going to have to cut it down.’’ So I cut it way back.
Mel: What, if any, were the kinds of
lessons from that experience that stuck with you?
Well, for one
thing, how political evaluation is. And people taking money. At the end of the
evaluation, we had our results, and I had battles
with the demonstration centers because by tracking
teachers going to the centers and going back into
their district, we found that teachers would go to
the centers and see these model materials.
They would say, ‘‘These materials are fantastic.’’
Then
they’d go back home and say, ‘‘My principal will
never let me use this.’’ You know? ‘‘My principal
wouldn’t agree to this.’’ And, so you’d have
tremendous enthusiasm at the demonstration centers
and everybody assumed that this would
translate into practice back when they went back home.
But it didn’t.
That was a big fight with the demonstration directors, who by that time had
become
a powerful force within the program, the 20
of them. And they fought. They even tried to do their
own study to counter ours. In the end, the
demonstration centers did some good. What we found is
that you have to go to work in the person’s
district, in the teachers’ districts themselves, to get things
done. In order to implement, you had to do it
there. The other thing was that when we got the results
after 4 years, I said to a guy who was a state
representative, ‘‘I want this to be used.’’ He said, ‘‘Well,
here’s what you do. There’s this guy. Here’s the
guy’s name,’’ I think it was Gene Hoffman. ‘‘He’s
the head of the School Problems Commission.
Convince him that you’ve got something and take
it through that way.’’ So I asked the head
guy to go out to dinner in Springfield. We drank about
three or four martinis apiece. I told him about
the evaluation. At the end, he said, ‘‘Oh, hell, I believe
you.’’ I gave a presentation of our findings
to the legislative committees. I don’t know if they
knew what they were looking at, but the program
got a 20% increase in their budget because we did
find that the program had a lot of effective
parts. The program was restructured based on the work
that we had done. I thought the evaluation was
pretty satisfactory. But there were many political
interactions.
Mel: You mentioned an impressive set of
people who were involved in this and who you studied
with.
Gene Glass, Egon Guba, and Scriven. I think you
worked with Stufflebeam and Bob Stake. What was
it like, as this group of people
we now look at as evaluation luminaries and you were this young buck
trying
to figure out how to do this project and learn this field?
I was 30 years
old. And, they were good. Now Gene had gone off to Colorado by that time, so
he didn’t actually serve on the committee. He
gave some advice about the program. Bob and the
others were really good. I just said, ‘‘I don’t
know what I’m doing. Here’s what I plan to do. What
do you think I should do?’’ It was
really—actually it was one of the best advisory groups I’ve ever
served on or been served by.We’d
meet once every 3 or 4 months. I had money to bring them in and
pay them, and they were great. Guba and Stufflebeam, Stake, all
of them. Dan was still developing
his CIPP (context, input, process, and product)
model. Egon was changing his view of things. He had
not yet gone qualitative. They were just
really a great group to work with. Intellectually, a very
exciting time. Really, really
great time.
Mel: And you knew it?
Oh, I knew it at
the time. I knew it was very exciting. This new field and new ideas, a lot of
people to
talk to I didn’t know. These were the leading
guys in the field, I thought, at the time.
Mel: One of the things you’re probably best
known for is addressing the issue of values in evaluation.
What do you think it is that drew you to
that as an issue to be addressed?
Thinking about
how political evaluation was and thinking, ‘‘Is there
anything beyond the politics?
Is there
something beyond this other than just who’s got the most power and influence?’’
I started
thinking about value judgments and, of course, Scriven later on had written something about this
(Scriven, 1980). I started reading material on this, and I eventually ended up
conceiving this idea
in terms of social justice. Social justice is
an important thing to do. John Rawls’ book, A Theory
of Justice came out in the early ’70s. I had
the traditional view that most people have when they start
out in social sciences: values are subjective.
I think Scriven more than anybody convinced me both
in reading and talking to him personally that
value claims can be objective, if we understand what we
mean by that. That is, they can be relatively
unbiased. That doesn’t mean some other things that people
often take the word objective to mean. You can
have well-founded value claims. I got interested
in the idea of values.
After finishing
the gifted programs, I needed something. I thought, ‘‘Is
there anything beyond
the politics of this? Is this all politics?’’
I saw this book, A Theory of Justice reviewed in New
York Review of
Books, the leading book of our time in terms of social justice. So I bought a
copy,
took it to England with me on leave and read
the book. From that came my article on justice in
evaluation.
Mel: Did it feel at all dissonant? I mean,
Rawls’ book was thick in more ways than one and you were
trying
to apply it to this very practice-based field.
Well, I think
that was the difficult part. I read the book thoroughly—read it a couple
times, and so I
had actually a good conception of it. But then
how do you apply it? The application was the hard part.
There were
different ways you could think about this, but I didn’t want to take it as the
‘‘original position.’’
The idea would
be taking Rawls’ two principles of justice and saying ‘‘we should adhere to
these in evaluation,’’ one of which deals with
inequality. Inequality, if it’s allowed, should be to the
benefit of those least advantaged in society. I
saw that as being an entre´e into programs for the
poor,
the impoverished. The other principle is a
rights issue. You shouldn’t violate student rights.
Mel: In a commentary in a handbook, you
wrote about the work of several evaluation theorists, and I
want to quote you. ‘‘For their
part, the theorists know from experience that the half-formed ideas they
toy with today may form the
basis for proposals in evaluation studies tomorrow, or that the ideas may
be adopted by governments and
organizations as a basis for policy. Fuzzy ideas of yesterday have a
way of becoming orthodoxy
tomorrow.” When I read that I wondered how much is that an autobiographical
statement
because there was some skepticism originally, I think, in terms of the response
of
many people to your ideas about
social justice and Rawls.
Oh, yeah,
tremendous disagreement, I mean, just shock. First shock, ‘‘What’s
justice got to do with
evaluation?’’ They just didn’t conceive it in the
same—those two things live in two different worlds.
You’ve got this
science, which is value neutral or value-free, and then you’ve got social justice,
which
is connected to values. That’s how a lot of
people still conceive the role. Economics is supposed to be
value-free. How evaluation can be value-free is
hard to understand. So people were shocked and had
tremendous reactions. People at the university, some
of the leading professors of psychology and education,
decided that they didn’t know if I was right. They
would have a seminar with me leading the
seminar. There was Lloyd Humphreys. We got
together, about 10 or 15 people, and started reading the
Rawls’
book, which they found very tough sledding. And we went through it. They were so
concerned
about it that they actually wanted to have this
seminar with no students in there. They didn’t want to
make fools of themselves in front of the
students. So, you know, we just had this 30-year-old guy, I
guess I was about 35 then, doing this seminar.
That was interesting. Lloyd never did agree with me. He
came to terms with it, but he never did buy it.
Don Campbell was one of the first guys who asked for
some copies. Campbell was always pretty
open-minded. He never bought into it fully because he still
held to some kind of value-free position; this
in spite of the fact that the same arguments that he used
against foundationalism [of knowledge] apply to
foundationalism in values. Campbell espoused anti-foundationalism with
respect to knowledge claims, which involves arguing that all knowledge is
fallible and contingent.
That is, if he
used the same reasoning on values, he would have to go with a different conception
of values. But he was
always open-minded. He was great to talk to.
And even the
government people started picking it up in Washington and mentioning social
justice.
I remember one
guy at the Department of Education in charge of evaluations. He said, ‘‘Damn, I
wish
I’d thought of
that. If I had read that book first, I could have written that paper before you
did.’’ And
my response was, ‘‘Well, take it. I don’t
have any ownership of it. You can take it and do with it
whatever you want to.’’
Mel: As you know, a year ago at the AEA
conference, some of your work was highlighted, including the
truth, beauty, and justice ideas.
It’s nice to see
it still might be relevant, a book you did 30 years ago. Yeah, that’s a nice
feeling
to have. I don’t want to get too carried away
because I always think that our influence is very limited.
In my view, our
influence is a lot more limited than what we realize. You know, you’re a big
name
guy, and you go give talks and a lot of people
show up, or maybe they don’t. But you give these talks,
man, just going on forever. Well, a few years
after you die, hey, that’s pretty much gone. I don’t think a
lot of it hangs on. Campbell’s perhaps the
exception to that. Take even somebody like [Lee] Cronbach.
You don’t hear
of his name so much anymore, and what an enormous figure he was in the field.
Mel: And he wrote stuff worth rereading
today.
Oh, yeah, Cronbach had great stuff. No, it’s nothing about the
quality. It’s something—there’s
another dynamic in operation, where the work gets
replaced with other ideas, which probably aren’t
as good as the work replaced. What has
happened with Campbell is that you’ve got a set of proprietors
who have taken these ideas and reformulated
them. And have re-done them, considerably, I might add.
By
the third edition, those ideas have been redone. That’s a difference between Campbell and Cronbach.
Cronbach’s
ideas were terrific. Other than maybe in testing, he didn’t have a group of
people
established who have carried on the evaluation ideas.
Maybe the ideas will return. Usually, when they
do, they are renamed.
Mel: You more recently, working with Ken
Howe, have been advocating notions of democratic deliberative
evaluation.
The idea would
be, let’s try to make evaluation more democratic. If you’re going to bring
people’s
values into evaluation, how could you bring a lot
of values together? Bring the values together with the
stakeholders and make sense out of them somehow. Our
idea would be to base values, the evaluation,
and the value judgments on people involved in
the program, the beneficiaries of the program, and all
the major stakeholders in the program. Bring
those together, and you’ve got inclusion, discussion, and
deliberation, three broad principles. You try to
include the people who are important and not exclude
people unless you have good reasons for excluding
them. You try to have serious discussions and listen
to them and their point of view. That’s an
error evaluators often make. They listen to the sponsors,
but not to people in the program and in the
field. Then bring it all together in a process of deliberation.
We haven’t
defined deliberation very well, I think. Deliberation is a tough concept. You
get people
together in a room, you sit around, you talk about
this theory, that theory. You argue back and forth.
And you have an
evaluation based on the values of the community.
Mel: What’s your sense of the state of the
art in practice in carrying this out?
I think it’s
hard to do. I did the project in Denver, which worked all right when I was a
federal
monitor for the bilingual program for about 5 or 6
years. Ken and I talked about this. I said, ‘‘You
know people. You have to give people a set of
procedures.’’ He was arguing, being a philosopher,
‘‘No, let’s just
go with the principles because there are many ways to do it, and people have to
fill
in.’’ I think it’s difficult to do, to tell
people to be inclusive without telling them how to do it. I can
say, ‘‘Be deliberative.’’ But how do you do
it? You need some ideas about how to do it. I’d say that
where the ideas get caught right now is in the
implementation. I’ve always been better at conceiving
abstract stuff than implementing. I can do the
evaluation and can write about how I did it, but it
would be difficult to write a protocol for,
‘‘You do this. You do that.’’ I think that’s where it sits;
a conceptual scheme right now. Some people
can do it. It doesn’t take that much to do, and it may
not be far from what a lot of people do
already.
Mel: You think there’s much risk of things
exploding when you put these diverse value bases of people
on the table?
Yeah, it’s a
possibility.
Mel: How do you avoid that?
You have to be
careful. I think you try not to get some groups together. In the Denver
project, we
always had a lot of militant people who wanted
Spanish taught in all the schools. I didn’t try to bring
them in directly. I never had a town hall
meeting where everybody got together. In the past, those
ended in shouting matches, if not worse. Rather,
I talked to the militants. I went to their headquarters,
talked to them, recorded their views, and
listened. I didn’t try to have them confront the school district
people. I did get the lawyers together and
present them with data twice a year. Part of the evaluation is
to accept that you have to structure the
interaction just as you have to structure the data collection.
Mel: In the late ’70s, you and a group of
people published a critique, a kind of meta-evaluation of
Follow Through, a program designed for kids
after completing Head Start. You criticized the evaluation
in terms of a couple of things,
one being that the measures that they selected fit better with some
program
models than others; they were comparing across a few different variants or
types of Follow
Through programs.
And I think part of the point was that any claims about relative success were
kind
of illusory given that
differential fit. Now, I think you also used the concept of justice in making
that
criticism.
Can you talk about that project, including how it came to be that you and
others were doing
this critique?
The Ford
Foundation asked me to consider doing it. When the Follow Through evaluation
was
coming out, a lot of people—this had been
going on for a while—a lot of the people with the programs
were unhappy with the way the evaluation was
going and how the evaluation had developed. The evaluation
started with a very broad array of measures. The
original idea was to have [a collection of]
measures [that together would be appropriate] for
all 12 types of programs. In other words, you’d have
measures across all types of programs. You’ve got
rather different kinds of programs. But the evaluation
was cut back to only a few evaluation
measures. Some program people complained to Ford.
A woman at Ford
asked me if I would be interested in taking a look at the evaluation. I said,
‘‘Well,
let me check it out.’’ I went to a Follow
Through meeting and talked to a bunch of people. I thought,
‘‘Yeah, maybe
they’ve got a point here.’’ What had happened was that the Department of Education
had started with this vast array of measures
and at the first data collection the evaluators had collected
two tons of data. They couldn’t possibly
manage it. Nobody could manage it. So they reduced it down
to a few outcome measures. You’ve got all
these measures for everybody, now they’ve reduced it
down to a few, such as the metropolitan reading
test. That reduction of outcome measures tended to
favor certain programs. When we looked at the
teaching materials of the programs, we found that a
particular program would perform very well in math.
When you’d look at the internal subtests you’d
see they were about the same as everybody
else. Except, oh, this one subtest jumps up like this, and
that’s what brings their average up. That’s why
they scored high in math. Then I track this back to why
they do so well on that particular subtest. I
look at their teaching materials and I look at the subtests,
and I find what they’ve done is teach
materials very close to the subtest. For example, instead of
[formatting] multiplication problems up and down, the test
items multiply [horizontally]. Some of the
programs knew that’s what was on the tests. This is
an open test and they structured their materials that
way, a very close match to that test. It
wasn’t exactly the same items, but very close. The whole format
was very close. That was suspicious because
that’s where all their high scores are coming. That’s how
they get this high score in math. By doing that
kind of analysis, we developed the idea that the outcomes
were a bit arbitrary because of the closeness
of the materials to what was on the test. Perhaps
it was accidental. Nonetheless, it gave a
great advantage to certain programs. Decker Walker was on
the panel, along with Gene Glass and Les
McLean. Decker had done some research looking at the
closeness of fit in other tests. That’s why I had
him on there. His research indicated that how closely
the curriculum fits the test makes a big
difference. That’s been replicated over and over again since
then. The other big factor was that you had
programs with six different sites around the country. The
outcome variation among those sites was terrific,
huge variation. So the variation across sites within a
program was about the same as between-program
variation, if you follow me.
How can you say
that Program A is better than another program if there is so much internal variation
within a program? There was no consistency,
internally, within those programs. For those reasons, our
critique was entitled, ‘‘No Simple Answer.’’ Glass
also did some effect size comparisons. We ended up
with a critique challenging the Follow Through
evaluation, which was the big evaluation of the day. It
cost a lot of money and was supposed to be
definitive. The Department of Education was figuring they
were going to tell everybody in the country,
‘‘Use this, use this, and use this.’’ We thought that was
going too far with the results they had.
Mel: I think that your critique helped
contribute to skepticism about the potential of these large-scale
federal
evaluations as definitive guides to action.
Yeah. Yeah. I
think it did. I think it wasn’t only our critique, but the fact they put so
much money
into this evaluation over so many years, and
they end up with an evaluation that is, one can argue,
rather arbitrary in the results and that has
inconsistent results in many ways. Maybe these big definitive
evaluations just don’t pay off. There have been
several other big evaluations. That was just one of
them. And that may have been the final one to tip
over that had people thinking. ‘‘Well, let’s not
do these big, huge, massive things. We’ve got
to wait for years and years to get the results from them,
and then find the results are rather
arbitrary.’’ In Washington, they expected definitive results out of
this big chunk of money. And so the people in
charge of it just thought something went wrong. Something
was wrong with the evaluation since it didn’t
produce definitive results. Another way to look at it
is that the reality they’re trying to map is
too complex. Reality is more complex than what they realize.
And you’re not
going to get simple results and totally consistent results. Because if you put
a program
in one place and you put the same program in
another place, you end up with different results. You’ve
got different contexts in different places,
and you end up with rather different results sometimes.
Mel: Any sense
of irony that this critique had such widespread implications?
Well, I don’t
know. I didn’t expect it would be quite that—the Washington people were
very
unhappy with it and so were some of the sponsors
who had won, what they called, the ‘‘horserace.’’
They thought
they had won the horserace. So we say, ‘‘Well, the horse was doped’’ or
whatever. We
didn’t say that, of course. But, it’s not that
those programs weren’t good programs; it’s just that
the claims were too strong based on the
evidence they had. Yeah, that was one of the times I
got—one of many times—I got
blackballed in Washington for running counter to federal policy.
Mel: What’s your sense of how evaluations
have done since then with these kind of issues of
context
specificity
and generalization?
Well, I think
the evaluation community is far more sensitive to those issues now. I think
that
we’ve tended to do other things, have other
kinds of studies. I would say there are various things
that people have done. One is program theory by
mapping out what the program is like. That limits
the possibilities if you do that, I think.
Another is meta-analysis. If you put a program in place here,
here, here, here, and here, even though you get
different results and under different circumstances,
I actually think
that adds to the robustness of the overall findings you end up with. It doesn’t
have
to be one massive study. Maybe it’s better to
have a bunch of smaller scale studies in different
places and different contexts and then try to
summarize with something like a meta-analysis. Even
if you’re going to do randomized studies.
That makes more sense to me, and I think we’re more
sophisticated now. Sometimes the randomized stuff
worries me in terms of claims made for it, such
as this huge focus during the Bush
administration. I think people are over relying on randomized
studies. Indeed, my understanding of why Don
Campbell invented internal and external validity was
to get away from Fisherian
randomization. Because the students thought it solved all the problems,
and it doesn’t.
Mel: You’ve done actually a number of
things that are critiques or meta-evaluations: The Follow
Through, Jesse Jackson’s PUSH-Excel, and
the New York Promotional Gates program. Is there something
that draws you to doing that kind
of work, or something that draws people who are looking for
that kind of work to you?
I think I like
the political stuff. I’ve always had a penchant for the political: political
theory and
political philosophy. Maybe if I did a degree again,
I might do it in political science. It has a draw
to it, and every once in a while I think from
my own personal background I have an urge to engage
in conflict. I also have this sense of trying
to seek out social justice. So I’m willing to take on some
critiques. Like the New York people called me for
the Promotional Gates program, which was the big
New
York City program. The
Mayor’s office called and said, ‘‘We’ve got this
program set up and
we’re going to do an evaluation, but we need
somebody to monitor the evaluation. We have to have
somebody who’s pretty tough. We’ve talked around,
and people say you’re the toughest guy we could
find.’’ At this point, I was at the University
of Illinois. That was flattery, right, from my point of view.
And so, of
course, I signed on. I couldn’t help myself, could I? I fundamentally got
myself in a lot of
trouble. We had a lot of newspaper coverage over
the whole thing. I think it was the only time I ever
agreed to make my reports to the Mayor’s office
and the Chancellor’s office confidential. They wanted
it confidential. I said, ‘‘Okay.’’ Well, we
did it. But their bureaucracy leaked the reports. You put stuff
in one end and it runs out the bureaucracy.
They had a leaking bureaucracy. Some of what we were
writing, saying, ‘‘You know, all you’ve got here
are no effects in your summer programs, where you
think you’ve got these big test score gains.’’
They had failed to adjust for regression to the mean. ‘‘And
so you really don’t have these results that
you’ve been touting in the newspaper.’’ The Mayor was
running an election campaign, and he was on
television saying, ‘‘I raised the test scores in New York
City.’’ Well, the
results he was talking about didn’t exist in reality when you analyzed the data
the
proper way. They didn’t do it deliberately; they
just didn’t know any better. Bob Linn served on that
panel with me. We had to go tell them they
didn’t have any effects from their summer training program.
At the same
time, they finally decided, ‘‘Okay, just help us do it the right way.’’ I
thought they
turned out to be very reasonable about it. We had
lots of ups and downs, and they tried intimidation,
you know, the usual stuff.
Mel: Despite doing several of these kinds
of critiques, monitoring, and meta-evaluations, I don’t
think that you’re widely
associated with meta-evaluation as a topic. Any thoughts you want to share
about the practice of
meta-evaluation?
Well, I’ve taken
on the meta-evaluations. I have written some, and Leslie Cooksy
and Valerie
Caracelli
interviewed me and used some of those critiques in their work about
meta-evaluation.
I’ve done some
writing, but I never attempted to do what they’re doing, which is to pull some of
that ideas together about how we approach the
issue of meta-evaluation. I’ve tended to philosophize,
and although that’s interesting, I find the
meta-evaluations interesting too. The other thing about it
that appeals to me is because it acts as a
conscience for the field. I think we need to have people in an
important role, and possibly a critical role,
evaluating people for their careers and their work.
There’s a lot
riding on it. You really need checks within the field itself to say, ‘‘Look, we
have
to police ourselves on this stuff. We can’t
just let people do anything and everything.’’ The metaevaluations,
particularly if they’re big enough to be an example,
have real appeal as one way of
saying, as with Follow Through, ‘‘Don’t do the studies
this way.’’ The Follow Through critique
influenced a lot of people because it had a high
profile. If it had been a small thing, it wouldn’t have
had quite so much influence. You need some
meta-evaluation, some checks on what people are
doing. It serves as a conscience for the field
or a way of exercising the conscience of the field.
We should check
ourselves. We should evaluate ourselves. Scriven’s
absolutely right on it.
Mel: I want to
ask—thinking back to a past AEA conference when Chip Reichardt
and Sharon Rallis
were Presidential Strand chairs. You were one
of the people that they invited to talk on the qualitative/
quantitative debates. As I recall you ended up saying,
‘‘Let’s stop wasting all this human
capital on this debate. Let’s do the hard work of
evaluation.’’ What’s your sense of where that discussion
has gone?
I think we have
made a lot of progress. I think that conference helped. We still have a split,
of
course, between the quantitative and qualitative
people, but that’s all right. The randomization issue
got bent out of shape. I think it bent people
out of shape somehow. So that was a little bit of a throw
back. Not that I’m against randomization but
pushing it very hard was a bit of a throw back for a while.
Generally, I
think we’re in really good shape in terms of quantitative/qualitative. I can
understand why
quantitative people see themselves as playing an
irreplaceable, important role. And they see themselves
as quantitative people, you know? Gary Henry
showed me the latest analysis he had done
recently. He said, ‘‘I finally think I’ve done a
real piece of work that I can show my granddaughter
and say, ‘This is really a serious piece of
work that I’ve done.’’’ I think quantitative people identify
themselves with the rigor. Part of their identity is
with that. It’s more than an issue about method. What
happened to the quantitative group was they felt
their status was being diminished in some ways. It’s a
legitimate concern on their part. They wanted to
reassert themselves as being important in the area.
Sometimes people
with all the talk about the qualitative stuff tend to diminish the importance
of the
quantitative. And I can see that would affect people
who put a lot of their identity into the quantitative.
On the other
hand, the qualitative people are sensitive about their methods as being too
soft. If you say,
‘‘This
evaluation isn’t worth anything,’’ they’re going to react. That’s why we get
into these shouting
matches. I’ve been in a few back in the early days
with Tom Cook, before Tom decided he liked the
qualitative as well as quantitative. Those were some
of the early battles we had back in those days.
Mel: Is there a place for randomized trials
from the perspective of social justice?
Oh, yeah, sure,
absolutely, yeah. The trouble with randomization from my perspective
is—well,
it’s hard to do sometimes, but it’s more about
the metaphysics of what we’re dealing with than it is
about the methods themselves. Sometimes
experimental design tends to treat the program like an X.
Like an X is
unchanging. But we now realize when you put the X into a setting it’s not
really X, it’s
really X sub-one, and you’ve got all these
variations, and those variations in something like an educational
program can be very, very substantial. Put the same
program with a different teacher, and
you’ve got pretty damn substantially different
programming. That kind of program variation makes
the randomization more difficult, or at least
what you can derive from it. It’s not that you can’t do it.
So I don’t agree
with some of our realist colleagues. You and I are realists, of course, in our
own
frameworks. But as you know, some British realists
say, ‘‘Well, you shouldn’t do experiments.’’
Well, that’s not
my view. Of course, you can do experiments. You just have to be careful of the
conclusions you draw from them. I mean, they take Bhaskar, who says that
you cannot do social
experiments that are definitive in social work the
same way you can do definitive experiments in
physical work (Bhaskar,
1978). The nature of reality is somewhat different. Social reality is much
more plastic. But that doesn’t mean you can’t
do experiments. You just have to be more careful
about how you do them and interpret them. So
there’s nothing wrong with randomization.
Mel: How did you get to the realist
philosophers, by the way? I’m thinking of that Educational
Researcher paper (House, 1991).
I was down at
the University of Sussex giving a talk back in the late ’70s probably or early
’80s.
There was a
conference of social philosophers. I looked at the books they had there, and I
saw these
Bhaskar
books. I thought, ‘‘realist theory of science.’’ You know that I read
philosophy all the time and
I used to do it
for fun. I mean I read a lot of philosophy, and I read widely in lots of fields.
I thought,
‘‘Well, I’ll
take a read at this.’’ And I read that and I thought, ‘‘You
know, that would address the quantitative/
qualitative thing.’’ If you have a particular
conception of a reality, and here’s the conception
and you could approach it, you could do that
with quantitative methods, and you could do it with
qualitative methods. It’s still the same reality. I
saw that as a way of getting through the quantitative/
qualitative dispute, and that’s how I wrote that
original realist article I did. Well, I didn’t do so
much realist evaluation as I did a realist
research perspective.
Mel: More recently, you’ve been paying a
good deal of attention to conflicts of interest, drawing on
medical
evaluations but applying it to evaluation more generally. Can you talk a little
bit about how
serious a
threat you think this poses to our work?
I think it’s a
very big threat. I think it’s a threat to the whole society. And I got into
doing this
because I was looking at how politics influences
experiments, even randomized experiments. I saw
all these studies, these medical studies,
changing their findings, such as ‘‘This medication has these
effects, and nobody found this out.’’ I started
looking into that issue, which some of the medical
people had looked into as well. I’m reading these
medical articles and discovering there’s a much
deeper issue. This phenomenon is systemic. It
doesn’t just happen from time to time, rather there
turns out to be a systemic bias built into the
system itself. I became alarmed the more I looked at
it. It is quite widespread in medicine,
unfortunately, and I found it’s widespread in other fields, too.
As you know, I’m
an active investor and read a lot in economics and financial investing. One of
the big things that happened in the 2008
financial crisis was that there was conflict of interest in the
financial system from top to bottom. The economist
Jeffrey Sachs calls it the collapse of civic virtue
by our financial and political elite. It’s a
pretty big issue. And it’s important to us because if you
have evaluation being bought and sold for its
results, that is, ‘‘We’re going to have you do the evaluation,
and here are the results that we want.’’ Well,
what use is evaluation? Conflict of interest has
tremendous pernicious effects when you start doing
evaluations of medical and education programs.
If you let the
sponsors call the shots on the findings, it’s dangerous. It’s the biggest
danger to the
field for the future, in my view.
Mel: I want to ask a couple of questions
about writing. If I remember correctly, in some of your
early
books, the chapters had appeared
in earlier versions as journal articles or book chapters elsewhere.
Did you have a map as you were working on
the individual projects that led to the book, or did you sort
of connect the dots after the
fact and create a larger work?
I think the 1980
validity book was more like that (House, 1980a). I did the article on social
justice,
and I did an article on truth, the argument
logic one. These were responses to particular issues, like
the persuasion paper, the monograph I did at
University of California, Los Angeles, one summer
(House,
1977, 1980b). That was
a response to people who were arguing that all you had to have was
a method. I was saying, ‘‘A method’s not
enough. You really have to make an argument. The method
fits into the argument, but you’re making an
argument. You have to make an argument. Science
makes arguments. There’s a method within the
argument, that’s part of the argument, but it’s the
total argument that counts, not just the method
that counts. Although the method is important, of
course.’’ Some articles were to address social
justice. Social justice addressed the question, ‘‘Is
it
only politics out there? Have we got anything
beyond politics that we can value?’’ So, there were
specific topics of interest. But then I took some
of them for the validity book and put them all
together. Some places I needed to fill in the
pattern, you know, the dots between the political and
social justice and methods with the political
theory. I found ideas to fill in the things I wanted.
Mel: How did you start working on writing
fiction?
I was an English
literature major, going way back, and every English lit major wants to write a
novel. Probably psychologists do, too, they just
don’t want to admit it. But I always wanted to write
something that’s creative—creative writing. I
gave a talk at the Center for Advanced Study in the
Behavioral
Sciences at Stanford where I was a fellow for a year. I wanted to explain to these other
fellows, who were very bright people, economists
and historians, and so on, what evaluation was.
A few knew about
it, but most didn’t. I picked out three concepts: causes, values, and politics.
Now
I could handle
causes with my realist framework, with how I thought our conception of causes
had
changed. For values, I had how our conception of
values had changed from value-free to value-embedded.
However, when I
got to politics, the third concept, I didn’t have a ready analysis. And
you have to understand the politics. So I just
told a story. I told a story of a project that seemed
to work pretty well. And I couldn’t think of
an analytic framework that would handle that concept.
The speech went
over very well, I thought, from my own point of view. The staff thought it was
the
best talk of the year. Totally
unbiased opinion, of course. I got to thinking, ‘‘Well, maybe you could
do that in a larger framework.’’ I was
talking to this old colleague and good friend of mine, Barry
MacDonald,
in England. He was
retiring and complaining that the young people didn’t listen to him
Any more. He
knew all about how to manage projects and deal with the politics, and all that.
But he
couldn’t pass it on. So I thought one thing I could
do is write a novel, and students can read that. I’ll
make it more fun. So the novel was written for
students (House, 2007). I tried to include some of the
knowledge that we have and incorporate it in story
form for them, which seems to be a natural form
for that kind of knowledge anyway. A political
novel is a hell of a lot more interesting to read than
any political science text you’ve picked up
recently. You would have that in there. And a lot of it is
detail. It has to be. There’s a reason why the
narrative form fits politics so well, because it’s the little
ins-and-outs and backs and forth and nuances
and all that detail that’s so important. I decided to
write the novel and put it online, but few
people found it. Finally, I just put it in hardback.
I think part of
the thing about writing something, if you can take a complex idea, a
philosophic idea,
and make it transparent then you’ve done a
very good job of writing. That’s one of the criteria I use.
Could I take
this idea that may be confusing and very complicated and make it clear? You
know, that’s
the challenge in trying to do it. If you do
that, you’ll probably end up with something worthy.
Mel: In that quote I mentioned earlier
there was a little earlier part to it and it was that ‘‘The
role of
the theorist in the world of
evaluation practice is ambivalent in some ways. The theorists are high
profile,
lauded and sought after to deliver speeches and lend legitimacy to projects,
proposals, and
meetings.
At the same time practitioners lament that the ideas are far too theoretical,
too impractical.
Practitioners have to do the project work
tomorrow, not jawbone fruitlessly forever.’’ Again, a
little
autobiographical?
Oh, yeah, sure,
definitely. Been to many places, you know, and given a lot of advice that
probably
wasn’t very useful to people, particularly in
other countries. No, no, of course. I gave a talk in New
Zealand
a couple of years ago in Maori country. You know the Maori have a huge influence there.
They were
looking at culturally sensitive issues. And so I told the story of the Denver
evaluation that
I did. I told
the story rather than talk about democratic evaluation as a concept. I think it
went over
much better—they could take the story and
adjust it for whatever their own circumstances were.
Because I
couldn’t conceivably know what their circumstances are. They could make
whatever
adjustments they wanted to, the idea is there. That’s
one way of dealing with the relevance issue,
but it’s always an issue. You get paid, and
you become well known for writing books, and articles,
and more articles. You’ve got to put the ideas
in a certain form to go into certain journals. But as
every practitioner will tell you, that doesn’t
help the practitioners very much. I’ve always lamented
that. I mean, I always have. I always see that
gap there. It’s also true—because I started as a teacher
for a few years, and I saw the big difference
between what people are teaching teachers and what
the teachers actually have to deal with.
There’s a huge difference. Actually, that problem exists
in all the professions to some degree.
Mel: Any ambivalence about sort of reaping
the benefits of this highly lauded career—you know, the
flights
to give talks at places, the requests, ‘‘Please be on our advisory board,’’ the
honors—and the
focus on social justice? Does that
ever twinge a little bit?
Yeah, yeah,
yeah, sure, I notice the difference. You’re trying to do what you can. You go
someplace
and people spend a lot of money to support
these boards and all that, and sometimes you wonder,
‘‘Does this
really make any difference for the people out there who are doing or suffering
that
the programs are supposed to be for?’’ I mean,
‘‘Does that really help? Does this help them that
we’re doing all this?’’ I think anybody with
some conscience would think about that. If you have
a sense of it, there are people out there
suffering who need these programs. You have some sense
of that in the end. So yeah, there’s a twinge
I can’t do more. Now I don’t know what else I would
do. I mean that’s the other thing. What would
I do that . . . what would I do to help? So there’s still a
question of could I have done something else? I
haven’t thought much about that. I think with evaluation
I’ve done about
as well as I could for social betterment, given my position where it was and
where my
thinking was at that time, certainly. But I would
have killed a lot of people if I’d gone into medicine,
been a doctor, I’m sure. So, you know, at least
I don’t think I’ve damaged anybody in that way.
Mel: Does Alton come back to you at times?
Is that part of your concern for social justice?
Yeah,
sure. I grew up in
difficult circumstances. A lot of people are still back there. Those people
didn’t have a very good life and somehow I was
lucky, fortunate, to get out of the circumstances we
were in. Some people did, but a lot of people
didn’t. Yeah, that’s part of it, sure. I never forgot those
people, the kind of circumstances they were in. I
know what it’s like to go to bed hungry. I’ve been
there a few times in my life. And, you know, my
relatives didn’t get out of there since they stayed in
the same social class, same kind of
circumstances. There’s first-hand experience that fuels the social
justice. It’s not simply an abstract concern. You
know people who went through all this and who could
have used help from somebody at some time. And
so evaluation, for me, is a way of trying to keep the
programs honest, but then you’ve got to keep the
evaluations honest, too, to try to help these people.
Robin: A lot of these ideas were
germinating during critical shifts in our country socially, politically,
culturally;
how did that fit into the mix of the evolution of your ideas, reading about
theories of justice,
thinking
about where you came from? Can you situate some of your thinking in the context
of those
times politically and culturally?
Well, context
definitely has a strong influence on the work that I did and why I wanted to do
that
work at that time. The social justice ideas I
did during the time of the great expansion of social
programs in the ’70s, following the Great Society
programs, which really set evaluation on its feet,
got it going. This was the expansion of
programs and program evaluation. The idea Campbell developed
was this social experimentation idea
(Campbell, 1998). But I was trying to develop the idea
that this should include social justice too as
we looked at these programs. When you get to the
1980s, Reagan
comes into power, and the whole tenor of the country changes. It has taken a
different
direction since that time. So what are we involved
with now? Conflict of interest is where we’ve
ended up. Serious, big,
damaging conflict of interest from the top of the government to the bottom.
I worry about
that now looking at it from the role of the evaluators, whether it’s evaluators
doing
drug studies or doing education evaluations.
The society has somehow become infected with conflict
of interest and other forms of corruption.
Now, I can’t quite blame Reagan for all that, although
the trend started with Reagan. I don’t think
anybody’s got a very good story on why the social backdrop
has changed. I find myself dealing with
program issues that the context delivers. And social
justice is always there. I’ve discovered that from
my own background and from the theory of John
Rawls. As of
now, 30, 40 years later, you end up with conflict of interest and this systemic
self-interest
eating up the society. The whole society has
changed since that time.
Miles: Can evaluation play a role in saving
us from ourselves?
I would say
‘‘yes,’’ but first you have to save evaluation. You have to save evaluation
from conflict
of interest. That is, you can’t have
evaluation for hire for results. You can’t have drug companies saying,
‘‘I’m giving
your company the funds to do this study for me, but you know damn good and well
if
you don’t get the results to come out the way
we want, you’re not going to get another study from us.’’
We’ve got to
save evaluation from that. We’ve got to save the field before we can do
anything beyond
the field. That’ll be tough, tough.
Acknowledgments
We gratefully
acknowledge the donation of transcription services by Gargani
and Company and the editorial
assistance of Katherine Cloutier.
Authors’ Note
The Oral History
Project Team consists of Robin Lin Miller (Michigan State University), Jean King
(University
of Minnesota), Melvin Mark (The Pennsylvania
State University), and Valerie Caracelli (U.S.
Government
Accountability
Office).
Declaration of Conflicting Interests
The author(s)
declared no potential conflicts of interest with respect to the research,
authorship, and/or publication
of this article.
Funding
The author(s)
received no financial support for the research, authorship, and/or publication
of this article.
References
Alkin, M. C. (Ed.). (2004). Evaluation roots: Tracing
theorists’ views and influences. Thousand Oaks, CA: Sage.
Alkin, M. C. (Ed.). (2013). Evaluation roots: A wider
perspective on theorists’ views and influences. Thousand
Oaks, CA: Sage.
Bhaskar,
R. (1978). A realist theory of science. London: Leeds.
Campbell,
D. T. (1998). The experimenting society. In W. N. Dunn (Ed.), The
experimenting society: Essays in
honor of Donald T. Campbell. Piscataway, NJ:
Transaction Publisher.
Christie,
C. A., & Alkin, M. C. (2008). Evaluation theory tree re-examined.
Studies in Educational Evaluation,
34,
131–135.
House, E. R.
(1976). Justice in evaluation. In G. V Glass (Ed.),
Evaluation studies review annual (Vol. 1),
75–100.
Beverly Hills, CA: Sage.
House, E. R.
(1977). The logic of the evaluative argument. Los
Angeles: University of California Center for the
Study of
Evaluation
House, E. R.
(1980a). Evaluating with validity. Beverly Hills, CA: Sage.
House, E. R.
(1980b). Evaluation as persuasion: A reply to Kelly’s critique. Educational
Evaluation and Policy
Analysis, 2(5),
39–40.
House, E. R.
(1991). Realism in research. Educational Researcher,
20, 2–9.
House, E. R.
(2007). Regression to the mean: A novel of evaluation politics. Charlotte, NC:
Information Age
Publishing.
Scriven,
M. (1980). The logic of evaluation. Inverness, CA: Edge
Press.
Shadish,
W. R., Cook, T. D., & Leviton, L. C. (1991). Foundations of program
evaluation: Theories of practice.
Newbury Park,
CA: Sage.
Stake,
R. E. (1967). The countenance of educational evaluation. Teachers College
Record, 68, 523–540
No comments:
Post a Comment