Recently there has been a lot of discussion of the value of the Implicit Association Test (IAT) as a measure of implicit bias — discussion generated largely by a new paper by Calvin Lai, Patrick Forscher and their colleagues that presents the results of a meta-analysis of studies conducted using the IAT, plus a provocative article in New York magazine by Jesse Singal that discusses that paper and the methodological controversy it’s a part of. The title of Singal’s article? “Psychology’s Favorite Tool for Measuring Racism Isn’t Up to the Job: Almost two decades after its introduction, the implicit association test has failed to deliver on its lofty promises”. (Please bear in mind that headlines are usually written by someone other than the author.)
In light of this I invited several philosophers to share their views in a roundtable discussion of the value of the IAT and the general question of how to understand, and properly measure, implicit bias. (For other coverage, see this post at Daily Nous, as well as our series of posts last year from the authors of chapters in Michael Brownstein and Jennifer Saul’s Implicit Bias and Philosophy.) The participants in this roundtable are Michael Brownstein (John Jay College of Criminal Justice, CUNY), Nick Byrd (Florida State University), Keith Frankish (The Open University), Jules Holroyd (Sheffield), Neil Levy (Oxford / Melbourne), Edouard Machery (University of Pittsburgh), Alex Madva (Cal Poly Pomona), Shannon Spaulding (Oklahoma State University), and Chandra Sripada (University of Michigan).
You can read contributions from each author’s below. Many thanks to all those involved!
***
Michael Brownstein
 I won’t say much about the recent coverage of implicit attitude research in, for example, the Chronicle of Higher Education and New York Magazine. Both articles have their virtues and vices. The New York Magazine piece, while unusually detailed and careful compared with science reporting elsewhere, is one-sided. Several of the researches quoted in it have complained on social media that the author included comments in which they described problems with implicit attitude research but excluded the rest of what they said (which was more positive).
I won’t say much about the recent coverage of implicit attitude research in, for example, the Chronicle of Higher Education and New York Magazine. Both articles have their virtues and vices. The New York Magazine piece, while unusually detailed and careful compared with science reporting elsewhere, is one-sided. Several of the researches quoted in it have complained on social media that the author included comments in which they described problems with implicit attitude research but excluded the rest of what they said (which was more positive).
I’ll focus instead on some take-aways from recent analyses of the implicit attitude literature (in particular, from Patrick Forscher, Calvin Lai, and colleagues’ meta-analysis of change in implicit attitudes (here) and a bit from Fredrick Oswald and colleagues’ meta-analysis of the predictive validity of the race-IAT (here)).
I think it is clear that general measures of implicit attitudes (e.g., as represented on a race-evaluation IAT) don’t predict specific individual behavior (e.g., biased grading) very well. However, this should be unsurprising, for several reasons.
First, predicting behavior is hard. Research on implicit social cognition arose out of the recognition that self-reported measures of attitudes don’t predict behavior very well. In some contexts, self-report measures outperform indirect measures; in other contexts, it is the opposite. Likewise, changing attitudes in a way that changes behavior is hard. It is not a special problem facing implicit attitude research to identify manipulations that do a good job of changing behavior by changing attitudes.
Second, general measures of preferences shouldn’t be expected to predict specific behaviors in specific contexts very well. The attitude-behavior link is highly context- and person-specific. Some models of implicit attitudes identify key moderators (e.g., Olson and Fazio’s MODE model), but more work in this vein is needed. Relatedly, Alex Madva and I have recently argued (here) that indirect measures may be improved by targeting the activation of specific associations in specific contexts with specific behavioral outcomes. Broadly speaking, I think this should be the take-away from the recent meta-analyses. Not that implicit attitude research is invalid, but that there is much room for improvement for measures of implicit attitudes (again, just as there is for measures of self-reported attitudes). Note also that a related point can be made in response to other psychometric critiques of implicit attitude research (e.g., low test-retest validity; low correlation between measures): what we need are theories that make principled predictions about the specific conditions under which test-retest correlations will be high or low and conditions under which various measures will or won’t correlate. For examples of what I have in mind, see here and here.
Third, while some researchers have been guilty of overselling the power of indirect measures, others have been recommending caution for years. For example, in 2012, Brian Nosek cautioned against using the IAT as a diagnostic tool for predicting individual behavior (here). The fact that the IAT is not suitable as a diagnostic tool, or, worse, as a tool for classifying kinds of people (e.g., “implicit racists”), does not mean that it cannot make predictions about human behavior that are both theoretically interesting and socially important. Greenwald, Banaji, and Nosek’s reply to the Oswald et al. meta-analysis discusses this issue (here).
The Forscher, Lai, et al. meta-analysis focuses specifically on change in implicit attitudes. I think there are important findings in this paper. (A note of cautious caution: the “multivariate network meta-analysis” that they use is new and, so far as I understand, relatively untested. So caution should held in drawing from their analysis. I say this cautiously, however, and hope those with more statistical savvy than I have will weigh in.) Arguably the most striking finding is that changes in implicit attitudes don’t appear to cause changes in behavior. While related, this is not a claim about the predictive validity of indirect measures. However, I think we should be extra cautious about this claim. Of the 426 studies Forscher, Lai, et al. examined (427 in the pre-print was a typo; also, the final number will be closer to 500), only 22 included a longitudinal component. And of those, only a few included a behavioral outcome measure. In fact, only 15% of all the studies included in the meta-analysis included a behavioral outcome measure (and this includes measures of intentions to behave in such-and-such a way and measures that simply ask people how they would behave in a hypothetical situation). This means that the vast majority of the studies under consideration in this paper used a one-shot manipulation of implicit attitudes, and, of those that examined changes in attitudes over time, most didn’t examine changes in behavior, and those that did examine behavior sometimes took intentions and hypothetical predictions as proxies for actual behavior. A priori, one might think that the latter would be the crucial studies. In other words, to really know whether changes in implicit attitudes cause changes in behavior, one might want to look at studies that create durable attitude change and then examine the effects of those lasting changes on behavior. It’s not that Forscher, Lai, and colleagues chose not to include such studies; rather, such studies almost entirely have not been done. (Calvin Lai’s earlier studies showing that the effects of manipulations of implicit attitudes don’t last long (here) use extremely minimal manipulations (e.g., lasting 5 minutes). Future research will hopefully look at more robust forms of implicit attitude change over time.)
In their paper, Forscher, Lai, and colleagues echo these sentiments, writing, “the present meta-analysis speaks more to the processes that produce short-term shifts in implicit bias than to the processes that produce lasting changes. Knowledge of how long-term change in implicit bias occurs is critical for developing a complete theoretical account of implicit bias. Insofar as some forms of problematic behavior are the result of automatic processes, understanding long-term change is also critical for developing interventions to resolve problems caused by this behavior. What processes determine whether a shift in implicit bias will be temporary or long-lasting? When will a shift in implicit bias translate into a permanent change in orientation? Theory and practice-oriented researchers alike would be well-served to contend with these questions.”
As Kate Ratliff put it on Facebook, it is a long jump from the very real challenges facing implicit attitude research to the “implicit bias isn’t real and the IAT is bogus” rhetoric found in recent press articles. (I would encourage readings to check out the discussion on the Psych Map Facebook group of these issues.)[/expand]
Nick Byrd – What Can We Infer From Debiasing Experiments?
 The implicit association test (IAT) indirectly measures biases in behavior. In the IAT, “participants […] are asked to rapidly categorize two [kinds of stimuli] (black vs. white [faces]) [into one of] two attributes (‘good’ vs. ‘bad’). Differences in response latency (and sometimes differences in error-rates)” serve as an indirect measure of bias—a.k.a., implicit bias (Huebner, 2016). So, experimentally manipulating these indirect measures of bias is supposed to tell us something about implicit bias.
The implicit association test (IAT) indirectly measures biases in behavior. In the IAT, “participants […] are asked to rapidly categorize two [kinds of stimuli] (black vs. white [faces]) [into one of] two attributes (‘good’ vs. ‘bad’). Differences in response latency (and sometimes differences in error-rates)” serve as an indirect measure of bias—a.k.a., implicit bias (Huebner, 2016). So, experimentally manipulating these indirect measures of bias is supposed to tell us something about implicit bias.
As philosophers, we are in the business of arguments and evidence. So we might wonder whether arguments and evidence can manipulate implicit biases in behavior (as measured by the IAT). There is some evidence that they can. Some think that this effect of arguments and evidence on IAT performance falsifies the idea that implicitly biased behavior is realized by associations (Mandelbaum, 2016). The idea is that propositions are fundamentally different than associations and if implicit bias is associative, then arguments and evidence could not change participants’ implicitly biased behavior. Since there is some evidence that arguments and evidence change implicit biases, we are supposed to conclude that “implicit biases are not predicated on any associative structures or associative processes but instead arise because of unconscious propositionally structured beliefs” (ibid.). However, there are two concerns about this falsification: it might rely on oversimplification about behavior and on overestimation of the evidence. After all, there are many processes involved in our behavior (De Houwer, 2006). So there are many processes that need to be accounted for when trying to measure the effect of a manipulation on our implicitly biased behavior — e.g., concern about discrimination, motivation to respond without prejudice (Plant & Devine, 1998), the likelihood that an association has been activated (Conrey, Sherman, Gawronski, Hugenberg, & Groom, 2005), and one’s awareness of one’s own implicit biases (Hahn, Judd, Hirsh, & Blair, 2014) . Further, some of the evidence that arguments and evidence change implicit bias is underpowered or else underdescribed. When we overcome these two concerns, we find evidence that manipulating implicitly biased behavior can involve mere associative processes like conditioning and countercondition. Moreover, when counterconditioning is taught in person during extended training sessions, reductions in implicit bias can last weeks (Devine, Forscher, Austin, and Cox 2012). What Lai and colleagues add to this story is that short, online counterconditioning and other manipulations result in less durable reductions in implicit bias.
For more, see “What We Can (And Can’t) Infer About Implicit Bias From Debiasing Experiments” in Synthese.[/expand]
Keith Frankish – Implicit Bias and the IAT
To be implicitly biased is to display discriminatory behaviour that one does not consciously intend or endorse. One sincerely affirms (say) that black people are no less smart than white people, yet behaves as if they are. The IAT is widely thought to provide strong evidence for the existence of implicit bias, but I am sceptical. There are many methodological concerns about the test (summarized by Jesse Singal in his recent New York Magazine article), but the core problem is simpler and deeper. The IAT aims to measure associations between stimuli (typically words and images), and we cannot extrapolate from such associations to behaviour. I assume that intelligent behaviour is the product of practical reasoning, and if it is systematically biased, then this is because the agent holds biased beliefs (nonconscious ones, if the bias is implicit). Yet we cannot infer a person’s beliefs from their associations between stimuli. A given word-image association might be accompanied by a wide range of different beliefs about the relation between the represented objects, or by no specific belief at all. We shouldn’t expect an association test to tell us much about behaviour.
This doesn’t mean that I doubt the existence of implicit bias itself. In fact, I suspect that it is widespread. Much of our behaviour is under the control of nonconscious mental processes, and it wouldn’t be surprising if these do not always mirror our avowed attitudes. We know from everyday observation that people often fail to live up to their ideals and lack insight into their motivations, and the human propensity for self-deception has been a common theme in literature since ancient times. Indeed, it may be that belief in implicit bias has fostered interest in the IAT, rather than the other way round; the test seems to offer a scientific basis for something we find intuitively plausible.
So I believe in implicit bias while doubting that the IAT does much to confirm its existence. In fact, I’d go further. The IAT may offer too comforting a picture of implicit bias. It encourages us to think of our biases as peripheral factors — culturally acquired associations that interfere with our explicit egalitarian attitudes. But they may in fact be much more central to who we are. It may be that our behaviour is wholly the product of implicit, and often biased, mental states, and that our avowed views are merely window dressing. Perhaps we assert egalitarian views, not because we really believe them, but because we have a strong implicit desire to conceal our biases. Perhaps we are unwitting hypocrites, mistaking pragmatic self-presentation for sincere belief. This isn’t a view I endorse (for my considered view, see my 2016), but in treating implicit cognition as the default, I think it puts the emphasis in the right place.
Reference
Frankish, K (2016) Playing double: Implicit bias, dual levels, and self-control. In M. Brownstein and J. Saul (eds.), Implicit Bias and Philosophy Volume I: Metaphysics and Epistemology (pp.23-46). Oxford University Press.[/expand]
Jules Holroyd
We have plenty of evidence that people discriminate unintentionally. Critical race scholars have long discussed the testimony of victims and witnesses of such discrimination (see Baldwin, Lourdes, hooks, Yamato, and more recently Rankine, Coates inter alia). Research programs into implicit bias are valuable in bringing a richer understanding of the cognitive mechanisms underpinning such unintentional discrimination. But what of the recent challenges to these research programs (e.g. here and here)?
Does the IAT tell us that we are implicitly biased? No: it tells us that individuals – pervasively – show certain patterns of response, from which it is (often) inferred that they harbour certain kinds of associations. Whether these associations are implicit biases will depend on what implicit biases are, and this is a hotly contested philosophical issue.
Does the IAT tell us that we will discriminate? No. It tells us about the presence of a risk factor for discrimination – our very own cognitions – and one we are often ill-placed to mitigate or even detect.
If the IAT turns out not to be a reliable measure or valid predictor of behaviour, does that debunk the whole research program on implicit bias? No: many other indirect measures have tracked unreported cognitive associations, or unintended behavioural responses (shooter bias tasks, studies on microbehaviours, CV & hiring decision studies, monitoring of doctor’s patterns of prescription, and so on).
Does the fact that interventions to change implicit bias have been found largely ineffective in changing behaviour mean that “we cannot claim that implicit bias is a useful target of intervention” (Forscher, quoted in this article)? My view is that, whilst biases are malleable, changing individual cognition is not obviously the right starting place. It is no surprise that manipulating one aspect of our vastly complex and socially embedded cognitions does not result directly in a change in behaviour (note: changing our explicit beliefs is also often ineffective in bringing about changes in behaviour, but that can still have value and important downstream effects). Discussions of implicit bias have motivated many people to consider and enact changes to institutional structures and procedures that may make them more robust against the possibility of implicit bias. In our own discipline, this has included working to challenge under-representation on reading lists, in conference programs, and amongst our academic staff; the anonymous marking of undergraduate essays, evaluation of applications; more rigorous uniformity in interviewing processes… These sorts of interventions seem multiply justified: they may insulate procedures from bias; they may even, downstream, change biases; but more importantly they directly target the goals of tackling marginalisation and under-representation. To my mind, the research programs on implicit bias help us to motivate and think about how best to formulate these kinds of interventions, and it is here that our efforts are best placed. [/expand]
Neil Levy
The reception of an exciting idea often passes through three stages. First, it is embraced as the key to explaining something we care about. Then there is a backlash, and the idea is rejected as explaining nothing. Of course, sometimes the backlash is fully justified, but when the idea actually has something going for it, we often move to a third stage, in which it is accepted as a useful tool, explaining, perhaps, some of the variance in the phenomenon we’re interested in. After all the hype surrounding implicit bias, we were due for a correction, and we’re certainly getting one. While implicit bias has been overhyped, though, we seem in danger of rejecting it entirely, on grounds that are spurious.
Jesse Singal is certainly right in suggesting that there is ongoing controversy over what the IAT measures. Is the Black/Bad association driven by an implicit belief that blacks are bad, or that black people are treated badly (or by something else again)? But we need to distinguish questions about the content and the structure of implicit attitudes (are they mere associations, unconscious beliefs, patchy endorsements, or something else altogether?) from questions about their effects on behaviour. Quite different states can have similar effects in some conditions. Implicit processes play an essential role in all deliberation – winnowing options and automatically assigning weights to them – without which we would face an intractable problem of combinatorial explosion. My implicit belief that blacks are bad (potentially a condemnable state of mine) might cause me to prefer a white job applicant over a black one. But so might my (potentially praiseworthy) implicit belief that blacks are treated badly: having this laudable content may be consistent with the state leading me to prefer the white candidate. It may lead me to take the same range of options seriously.
It is true that implicit bias apparently explains only a small percentage of variance in behaviour. That fact does not make it unimportant. As Greenwald et al. note, the effect size estimated by Oswald et al. is higher than the effect size of a daily aspirin. But taking a daily aspirin would prevent more than 400,000 heart attacks in the United States annually. Conscious deliberation is extremely powerful: more powerful than many writers on implicit bias appreciate. Most of the time, our behaviour is very much better explained by our conscious attitudes than are nonconscious. But in certain circumstances – when we respond under time pressure, stress or cognitive load, and when matters are very evenly balanced, we can expect implicit bias to make a difference. When things are evenly poised (as they routinely are in the context of jobs in philosophy, for example, a very small effect size can make a decisive difference).[/expand]
Edouard Machery – Should We Throw the IAT on the Scrap Heap of Indirect Measures?
 Social psychology emerged as distinct field of psychology in part to measure people’s preferences or, as they are known in psychology, people’s attitudes. Being able to measure people’s attitudes accurately would of course be of great interest to many potential consumers, and, even more interesting, funders, of psychology, from politicians to advertisers to corporations. Social psychologists have long tried to develop measures (“indirect measures”) that go around the limitations that affect self reports of attitudes, such as presentation concerns and limited awareness of one’s own attitudes. Sadly, it is fair to say that these efforts have been for naught: The history of attitude measurement in psychology is one of exuberant, irrational enthusiasm followed by disappointment when the shortcomings of the new indirect measures come to light.
Social psychology emerged as distinct field of psychology in part to measure people’s preferences or, as they are known in psychology, people’s attitudes. Being able to measure people’s attitudes accurately would of course be of great interest to many potential consumers, and, even more interesting, funders, of psychology, from politicians to advertisers to corporations. Social psychologists have long tried to develop measures (“indirect measures”) that go around the limitations that affect self reports of attitudes, such as presentation concerns and limited awareness of one’s own attitudes. Sadly, it is fair to say that these efforts have been for naught: The history of attitude measurement in psychology is one of exuberant, irrational enthusiasm followed by disappointment when the shortcomings of the new indirect measures come to light.
The recent history of the implicit association test is just the most recent episode in this sad history of irrational exuberance followed by disappointment. We were told that the IAT measures a novel type of attitude—mental states that are both unconscious and beyond intentional control, which we’ve come to know as “implicit attitudes”—and that people’s explicit and implicit attitudes can diverge dramatically: As we’ve been told dozens of times, the racial egalitarian can be implicitly racist, and the sexist egalitarian can implicitly be a sexist pig! And law enforcement agencies, deans and provosts at universities, pundits, and philosophers concerned with the sad gender and racial distribution of philosophy have swallowed this story.
But then we’ve learned that people aren’t really unaware of whatever it is that the IAT measures. So, whatever it is that the IAT measures isn’t really unconscious. And we’ve learned that the IAT predicts very little proportion of variance. In particular, only a tiny proportion of biased behavior correlates with IAT scores. We have also learned that your IAT score today will be quite different from your IAT score tomorrow. And it is now clear that there is precious little, perhaps no, evidence that whatever it is that the IAT measures causes biased behavior. So, we have a measure of attitude that is not reliable, does not predict behavior well, may not measure anything causally relevant, and does not give us access to the unconscious causes of human behavior. It would be irresponsible to put much stock in it and to build theoretical castles on such quicksand.
Lesson: Those who ignore the history of psychology are bound to repeat its mistakes.[/expand]
Alex Madva
 I love cheesesteaks. If I took an IAT comparing cheesesteaks to just about any other food (except maybe cheeseburgers, or my grandmother’s kibbeh), I bet that I’d more strongly associate images of cheesesteaks with words like “good,” “pleasant,” or “tasty.” However, I don’t eat cheesesteaks, for ethical reasons. So the range of behavior predicted by my abiding love of cheesesteaks is relatively small. It still predicts some behavior: for example, when you ask me if I like them, I’ll tell you. But suppose I inhabit a social world that says it’s immoral to eat or even to like cheesesteaks. Then maybe I wouldn’t openly admit my love of cheesesteaks (perhaps I’m embarrassed, or I have a bunch of conflicting feelings and I only report the ones that sound ethical… or maybe living in a world where it’s taboo to like or even talk about cheesesteaks makes self-knowledge about this topic unusually difficult). Then maybe indirect measures like the IAT would be the best way to acquire (partial, non-decisive) evidence about whether I like them. Of course, if my love of cheesesteaks is correlated with so little of my behavior, you might wonder why anybody would be interested in uncovering my cheesesteak preferences in the first place.
I love cheesesteaks. If I took an IAT comparing cheesesteaks to just about any other food (except maybe cheeseburgers, or my grandmother’s kibbeh), I bet that I’d more strongly associate images of cheesesteaks with words like “good,” “pleasant,” or “tasty.” However, I don’t eat cheesesteaks, for ethical reasons. So the range of behavior predicted by my abiding love of cheesesteaks is relatively small. It still predicts some behavior: for example, when you ask me if I like them, I’ll tell you. But suppose I inhabit a social world that says it’s immoral to eat or even to like cheesesteaks. Then maybe I wouldn’t openly admit my love of cheesesteaks (perhaps I’m embarrassed, or I have a bunch of conflicting feelings and I only report the ones that sound ethical… or maybe living in a world where it’s taboo to like or even talk about cheesesteaks makes self-knowledge about this topic unusually difficult). Then maybe indirect measures like the IAT would be the best way to acquire (partial, non-decisive) evidence about whether I like them. Of course, if my love of cheesesteaks is correlated with so little of my behavior, you might wonder why anybody would be interested in uncovering my cheesesteak preferences in the first place.
Well, let’s dwell a little longer in the nearby world where liking cheesesteaks is widely regarded as immoral. Suppose that in this world it’s nevertheless the case that a lot of cheesesteaks are getting eaten. Field studies find many people eating cheesesteaks despite insisting that they don’t like them. Lab experiments show that sometimes people nibble on cheesesteaks when they think nobody else is looking, or even gobble down whole cheesesteaks without realizing that they’ve done so! How bizarre! This requires explaining. We identify some conditions in which people can be coaxed into admitting that they like cheesesteaks (for example, when they see prominent politicians boast about loving cheesesteaks and thereby re-normalize unacceptable behavior), and we develop a bunch of powerful theories and evidence about what those conditions are. We also identify conditions where people deny liking cheesesteaks, but eat them nevertheless, and this behavior is predicted to some extent by cheesesteak IATs. We develop compelling theories and evidence about the conditions under which cheesesteak IATs are more and less likely to predict behavior. For example, if I’ve just eaten three cheesesteaks and I feel uncomfortably full, or if I’ve just watched a film about where cheesesteak ingredients come from, then I am temporarily less likely to have pro-cheesesteak IAT scores.
We even deliver principled reasons and empirical evidence to explain why cheesesteak IATs are often worse at predicting behavior than IATs about other topics are. What I see when I look at all this evidence is us gaining knowledge about ways to improve cheesesteak IATs, and us learning more about the precise conditions when cheesesteak IATs predict behavior and when they don’t. All these complexities are washed away, however, when someone comes along and does a meta-analysis.
Here’s an example (moving on from the belabored cheesesteak analogy, which sounds like a good name for a jam band). I recently came across a paper by Levinson, Smith, and Young (2014) which developed a sort of “do blacks matter?” IAT, and found that individuals tended to associate white faces with words like “merit” and “value” and black faces with words like “expendable” and “worthless.” This measure predicted several things. For example, mock jurors with stronger racial bias on this measure were comparatively more likely to sentence a black defendant to death rather than life in prison. That correlation provides some initial evidence that the race-value IAT really is tracking, at least to some extent, something like a disposition to devalue black lives. They also found that another IAT, which used words including “lazy” and “unemployed,” did not predict death sentencing. Now, it could be that the significant correlations found with the race-value IAT were flukes. We’d need more studies to know. But from the perspective of a meta-analysis, what we have is one IAT that predicted behavior and one that didn’t: more grist for the miller who says the IAT is an inconsistent predictor of behavior. But in hindsight it actually makes sense that one of these predicted this particular judgment and one didn’t. Also, both measures are exploratory. We are figuring out as we go which forms of the IAT better predict which behaviors in which contexts. Using either of these findings in a meta-analysis is, to my mind, inappropriate and misleading. Given that they are two distinct IATs, it’s not even clear to me why we are lumping them together in one meta-analysis.
Inferring from the fact that some IATs are bad predictors in some contexts to the general conclusion that IATs are bad predictors strikes me as pretty flawed reasoning. Suppose we want to know whether voting patterns are predicted by beliefs about cutting taxes for the rich. Suppose we find in several studies that there’s little to no correlation (e.g., because the Republican base does not support cutting taxes for the rich). Should we conclude that BELIEFS IN GENERAL don’t predict voting patterns? Or that MEASURES OF BELIEF IN GENERAL don’t predict voting patterns? Or even MEASURES OF BELIEFS ABOUT TAXES? Such inferences would be absurd. What about other beliefs that might predict voting patterns, other measures of belief with slightly different wording, etc.? Suppose we’ve been trying to figure out which beliefs predict voting behavior (and how) and we’ve been trying out a whole bunch of different measures of belief in an exploratory way, including bizarre measures and beliefs that a priori we wouldn’t expect to correlate with voting behavior. A meta-analysis on the relationship between “measures of belief” and voting behavior would then surely reveal low correlations.
There is clearly room for improving the IAT and other indirect measures. There’s already tons of theory and evidence about the limitations and possibilities for improvement, and there’s more coming out all the time. One recent example is Cooley and Payne’s finding that using images of groups of people, rather than isolated individuals, improved the AMP in various ways, including test-retest reliability. Maybe this change would also improve the IAT. Even if it does, we likely won’t see Project Implicit “switch over” to a new and improved version, for a variety of reasons, but mostly, it seems to me, because of institutional inertia. I think it’s understandable but unfortunate that the field basically settled on what they took to be a “good enough” measure.
All the improvement in the world will only take associative and attitudinal measures so far. Any given attitude could lead to radically different behavior depending on what else is going on in a person’s mind or context. For example, a working paper by Meier, Schmid, and Stutzer found that people were less likely to vote against the status quo if it was raining on election day. This tendency has evidently swayed several elections in Switzerland. So even if we develop a bunch of good evidence and theories to explain how attitudes predict voting behavior, there will always be further contextual monkey wrenches like this getting thrown into the outcome. (However, I also can’t help but wonder whether we could try to debias this very disposition—which could itself be construed as an attitude—perhaps by encouraging people to think “when it rains, I’ll vote for change!” or to form rain-change automatic associations.)
There’s also plenty of room for exploring alternative ways to get at people’s attitudes, beyond both associative measures and explicit measures like feeling thermometers. In a forthcoming paper, Guillermo del Pinal, Kevin Reuter, and I present some initial evidence that one of the notorious gender stereotypes plaguing fields like philosophy—that women have less innate brilliance or raw talent—has a different conceptual structure from a brute association. We offer principled reasons for thinking this stereotype won’t show up on associative measures like the IAT but will affect various judgments and behaviors. We also think this bias will be less susceptible to contextual variation. Now, we do not frame these findings as a criticism of the IAT or of research on associative biases in general. There is a whole lot of discrimination to explain, and the mind is populated with an abundance of biases to help explain it (and of course there will also be lots of structural factors as well!). We simply suggest it’s a mistake to think that all biases should be understood in associative terms.
On the specific meta-analysis by Forscher, Lai, et al., I would just point out that the possibility of interventions that change test scores without equally strong changes on the actual construct of interest is a ubiquitous problem, which affects everything from medicine (e.g., changing cholesterol without improving heart conditions) to education (e.g., teaching to the test rather than teaching real skills). I don’t personally care much about changing IAT scores. I care about changing the affective-cognitive-motivational-behavioral dispositions that the IAT scores are intended to track. If after going through some prejudice-reducing intervention, people’s IAT scores become less reliable but their behavior becomes less biased, so be it. Suitably improved indirect measures might still be useful for helping us identify our biases, even if they become less useful after debiasing interventions.[/expand]
Shannon Spaulding
 The IAT is meant to measure implicit bias. The recent critiques of the IAT tend to focus on the robustness of IAT results, the relation between IAT scores and other measures of bias, and whether IAT scores predict discriminatory behavior. Although these critiques are legitimate in a way, the problem is not so much with the IAT but the fact that we often reduce implicit bias to an IAT score. IAT is just one way of measuring implicit bias. (Other measures include lexical decision tasks, sequential priming, word completion tasks, go/no-go association tasks, false memory tasks, etc.) In my view, IAT tracks salient associations between categories (e.g., race or gender) and features (e.g., dangerous or family-oriented). These salient associations are real – i.e., IAT really can detect biases in one’s associations – but they are highly unstable and can vary even with small changes in context. For example, a typical White American’s association between a visual representation of a Black face and a negative feature can (and does) break down when that Black face is presented in the background context of a church interior as opposed to an urban street corner (Wittenbrink, Judd, & Park, 2001). The negative association salient in the latter context is not salient in the former context. A similar thing happens when the faces presented are not just generic faces of White or Black men but are faces of well-known liked/disliked people, e.g., Adolf Hitler vs. Michael Jordan (Govan & Williams, 2004). The kind of instability in IAT results is similar to the instability seen in the results of priming studies of implicit bias. The fact that the salient associations are unstable does not mean they aren’t tapping into a real bias.They are. But it’s just not the kind of bias that really concerns us when we are talking about implicit bias. We are deeply concerned about the more stable, deep-seated biases, the kind of biases that predict discriminatory and prejudicial behavior. One way of understanding the more deep-seated biases is in terms of how central a feature is to our concept, e.g., how central being family-oriented is to our concept of women or how central being dangerous is to our concept of Black men. The centrality of a feature for a concept determines its cross-contextual stability, i.e., whether the association will survive background changes. It is the kind of measure that predicts our inferences and behavior. The IAT is a limited tool, and it simply is not designed to detect and measure such biases. So, if we are interested in implicit biases that reliably predict how we think and behave, it would be useful to focus more on something like conceptual centrality. There are measures of conceptual centrality that could be deployed to study implicit biases, e.g., Carey, 2009; Johnson and Keil, 2000; Sloman, et al., 1998. This hasn’t been the focus of empirical or philosophical discussions of implicit bias, but going forward it should be.
The IAT is meant to measure implicit bias. The recent critiques of the IAT tend to focus on the robustness of IAT results, the relation between IAT scores and other measures of bias, and whether IAT scores predict discriminatory behavior. Although these critiques are legitimate in a way, the problem is not so much with the IAT but the fact that we often reduce implicit bias to an IAT score. IAT is just one way of measuring implicit bias. (Other measures include lexical decision tasks, sequential priming, word completion tasks, go/no-go association tasks, false memory tasks, etc.) In my view, IAT tracks salient associations between categories (e.g., race or gender) and features (e.g., dangerous or family-oriented). These salient associations are real – i.e., IAT really can detect biases in one’s associations – but they are highly unstable and can vary even with small changes in context. For example, a typical White American’s association between a visual representation of a Black face and a negative feature can (and does) break down when that Black face is presented in the background context of a church interior as opposed to an urban street corner (Wittenbrink, Judd, & Park, 2001). The negative association salient in the latter context is not salient in the former context. A similar thing happens when the faces presented are not just generic faces of White or Black men but are faces of well-known liked/disliked people, e.g., Adolf Hitler vs. Michael Jordan (Govan & Williams, 2004). The kind of instability in IAT results is similar to the instability seen in the results of priming studies of implicit bias. The fact that the salient associations are unstable does not mean they aren’t tapping into a real bias.They are. But it’s just not the kind of bias that really concerns us when we are talking about implicit bias. We are deeply concerned about the more stable, deep-seated biases, the kind of biases that predict discriminatory and prejudicial behavior. One way of understanding the more deep-seated biases is in terms of how central a feature is to our concept, e.g., how central being family-oriented is to our concept of women or how central being dangerous is to our concept of Black men. The centrality of a feature for a concept determines its cross-contextual stability, i.e., whether the association will survive background changes. It is the kind of measure that predicts our inferences and behavior. The IAT is a limited tool, and it simply is not designed to detect and measure such biases. So, if we are interested in implicit biases that reliably predict how we think and behave, it would be useful to focus more on something like conceptual centrality. There are measures of conceptual centrality that could be deployed to study implicit biases, e.g., Carey, 2009; Johnson and Keil, 2000; Sloman, et al., 1998. This hasn’t been the focus of empirical or philosophical discussions of implicit bias, but going forward it should be.
References:
Carey, S. (2009). The Origin of Concepts: Oxford University Press.
Govan, C. L., & Williams, K. D. (2004). Changing the affective valence of the stimulus items influences the IAT by re-defining the category labels. Journal of Experimental Social Psychology, 40(3), 357-365.
Johnson, C., & Keil, F. C. (2000). Explanatory understanding and conceptual combination. In F. Keil & C. Johnson (Eds.), Explanation and Cognition (pp. 328-359). Cambridge, MA: MIT Press.
Sloman, S. A., Love, B. C., & Ahn, W. K. (1998). Feature centrality and conceptual coherence. Cognitive Science, 22(2), 189-228.
Wittenbrink, B., Judd, C. M., & Park, B. (2001). Spontaneous prejudice in context: variability in automatically activated attitudes. Journal of Personality and Social Psychology, 81(5), 815.[/expand]
Chandra Sripada – Putting “tiny” correlations between implicit attitudes and behavior in perspective
 Race IAT scores account for just 1% or 2% of the variance in laboratory measures of discriminatory behavior. Many critics seize on this observation to dismiss the race IAT—the test has inadequate predictive validity it is said. I believe this criticism is misguided, and an example from baseball (inspired by Abelson 1985) helps to show why.There are one hundred batters who have differing levels of skill at hitting the ball, captured by their batting averages. The batting averages (and thus the players’ skill levels) are distributed normally with a mean of 0.2 and standard deviation of 0.05 (not too dissimilar from the distribution in Major League Baseball). Skill level here is clearly a powerful predictor of ball hitting behavior: a player in the 95th percentile of skill level hits hundreds more balls over the course of a 1000 at bat season than a player in the 5th percentile. But what is the correlation between skill level and getting a hit at a single at bat? It is 0.12 and so skill level accounts for 1.4% of the variance in ball hitting. Clearly anyone citing this meager correlation to dismiss the predictive validity of batting skill is making a serious mistake.
Race IAT scores account for just 1% or 2% of the variance in laboratory measures of discriminatory behavior. Many critics seize on this observation to dismiss the race IAT—the test has inadequate predictive validity it is said. I believe this criticism is misguided, and an example from baseball (inspired by Abelson 1985) helps to show why.There are one hundred batters who have differing levels of skill at hitting the ball, captured by their batting averages. The batting averages (and thus the players’ skill levels) are distributed normally with a mean of 0.2 and standard deviation of 0.05 (not too dissimilar from the distribution in Major League Baseball). Skill level here is clearly a powerful predictor of ball hitting behavior: a player in the 95th percentile of skill level hits hundreds more balls over the course of a 1000 at bat season than a player in the 5th percentile. But what is the correlation between skill level and getting a hit at a single at bat? It is 0.12 and so skill level accounts for 1.4% of the variance in ball hitting. Clearly anyone citing this meager correlation to dismiss the predictive validity of batting skill is making a serious mistake.
The lesson here is that when the construct of interest is expected to influence behavior repeatedly across a multitude of occasions, we need a more nuanced understanding of predictive validity. One’s racial attitudes obviously fit this expectation. The tiny correlations referenced above for the race IAT are mostly based on observations of a single occasion of behavior. It is possible, then, that, like in the baseball example, “cumulative” prediction of behavior across multiple occasions remains strong.
That such a possibility exists says nothing about whether it is actual. Studies in the race IAT literature simply have not looked much beyond single occasion observations (though this is perhaps starting to change). So let me be clear: I am not here affirming the predictive credentials of the race IAT. My point, rather, is a narrow one that the charge of “tiny correlations with behavior” we hear repeated so often is itself based on a simplistic and blinkered understanding of predictive validity.[/expand]
***
 
					
				
									
				
			 
			
Chanda Sripada’s analysis of the “tiny correlations” and their detractors seems right.
Michael Brownstein succinctly summarizes objections to implicit attitude research when he states: “In other words, to really know whether changes in implicit attitudes cause changes in behavior, one might want to look at studies that create durable attitude change and then examine the effects of those lasting changes on behavior. It’s not that Forscher, Lai, and colleagues chose not to include such studies; rather, such studies almost entirely have not been done.”
Yet, curiously, Forscher et alia’s well-researched article omits studies that directly address this question. I refer to studies that employ a “shoot/don’t shoot” paradigm, a task that has real world implications, and is identical to the type of training police officers receive. Participants watch as images of armed and unarmed individuals suddenly appear, and their task is to decide whether or not to shoot.
Results show repeatedly that civilians of all races are faster to decide and more likely to shoot black men, regardless of whether or not they are armed. Conversely, they are also faster to decide and less likely to shoot unarmed white man. https://www.ncbi.nlm.nih.gov/pubmed/17547485
The picture that has emerged from these studies is that there seems to be an alarming baseline of implicit bias among most of us that registers black faces as threatening. But, surprisingly,trained police officers don’t always show this kind of bias. The crucial factor appears to be the type of training they receive: When the training undermines negative racial stereotypes, split-second shooting decisions are less likely to be implicit racial biased. This is evidence of durable change. https://www.ncbi.nlm.nih.gov/pubmed/23401478
Subsequent neuroimiaging research indicates that these implicit biases are not eliminated, they are overridden by executive function. The interesting thing about this research is how fast that correction takes place. These shoot/don’t shoot decisions take place in a fraction of a second. https://www.ncbi.nlm.nih.gov/pubmed/17133388 https://www.ncbi.nlm.nih.gov/pubmed/17133388
I discuss more of the implications of this research for real world policy in the following PBS Newshour article: https://www.pbs.org/newshour/updates/police-shootings-racially-biased/
Denise D. Cummins, PhD
Hi Denise,
I agree that shooter task studies are a crucial part of implicit attitude research. Just a couple quick thoughts:
–Forscher, Lai, et al.’s paper does include some shooter task studies, e.g., from Josh Correll. To my knowledge, there aren’t many studies examining whether changes in implicit attitudes cause changes in shooting decisions. But there are some, and some are included in the meta-analysis (e.g., Mendoza, Gollwitzer, and Amodio 2010 used implementation intentions to shift scores on the shooter task).
–I think the research you cite showing that changes in shooting decisions might be due to greater control over biased associations (rather than changes in associations themselves) might actually support Forscher, Lai, et al.’s findings. What would conflict with their findings, I think, would be evidence of changing associations themselves leading to changes in behavior.
–You probably know about it, but Mekawi and Bresin 2015 is a meta-analysis of shooter studies. Overall they found no difference between police officers, community members, and undergraduates in shooter bias. But I don’t believe they were able to consider the effects of training on police officers’ decisions. (Interestingly, what they did find was that anti-black shooter bias was worse in states with permissive (rather than restrictive) gun laws.) I’m hopeful that shooter bias trainings are effective, though!
Best,
Michael.
They don’t cite the papers I cite in my comment or my PBS article, and I believe those papers are crucial to this discussion.
As I mentioned, the consensus interpretation seems to be that implicit biases are overridden by executive processes, so it isn’t clear how to interpret your objection regarding “changing associations themselves”.
With respect to Mekawi and Bresin’s 2015 meta-analysis, I think you might have overlooked the conclusion of the paper regarding training/context, namely, “In addition, we found that in states with permissive (vs. restrictive) gun laws, the false alarm rate for shooting Black targets was higher and the shooting threshold for shooting Black targets was lower than for White targets. These results help provide critical insight into the psychology of race-based shooter decisions, which may have practical implications for intervention (e.g., training police officers) and prevention of the loss of life of racial and ethnic minorities.”
Hi Denise, I can’t speak to whether there were relevant studies left out of the meta-analysis (that’s beyond my paygrade), but re: what’s meant by “changing associations themselves”, I take it the distinction here is between, well, changing or reducing implicit associations on the one hand and giving people the means to control or override them, on the other. If the reduction in shooter bias was due to the latter — as the evidence of increased executive function would seem to show — then it doesn’t bear against the point Michael was making in his commentary, as it would suggest that the biases remained in place but subjects learned to behave appropriately despite them.
It sounds like the concern here is whether intervention changes attitudes or “just teaches people to hide them.” I think the thing to keep in mind is that we are talking about extraordinarily fast categorization and cognitive override–all of which takes place on the order of milliseconds. It isn’t simply a matter of holding onto explicit racist beliefs but behaving otherwise in order to “fake good” in a consciously duplicitous way.
Right, I don’t mean to say that there’s anything conscious going on there, and certainly not anything duplicitous. Just that changing, weakening, or eliminating the associations themselves would amount to something different. (A difference we might try to tease out in conditions of cognitive overload, say.) I can’t speak to how important this difference is to researchers in the field.
Hi Denise,
RE: the inclusion of studies in the Forscher, Lai et al. paper: all I meant was that they included some studies on shooter bias. You said that they omitted studies of this type. I don’t know whether inclusion of the specific studies you mention would have changed their findings.
RE: control over associations vs. changing associations themselves: I was referring to process dissociation models, such as the Quad Model, which disentangle the contributions of, among other things, the activation of associations from control over the influence of activated associations on behavior. Here is one paper which discusses the use of the Quad model to disentangle responses on the shooter task: https://escholarship.org/uc/item/4x59z6nr
As John said, there is no presumption of consciousness or intent here. The point is just about what would and wouldn’t count as changing implicit associations.
RE: Mekawi and Bresin, what you quoted from the abstract of their paper suggests that their review has implications for the training of police officers. That’s true. But they did not analyze the specific effects of training (e.g., by comparing police officers who have received IB training with those who haven’t received training). It may be that training is crucial, as you originally suggested, but I think it is premature to conclude this. (This isn’t to say that I don’t support them. I do!)