Do Women Have Different Philosophical Intuitions than Men? Responding to Buckwalter and Stich*

Recently there has been a lot of discussion in our profession (e.g., on philosophy blogs) about the underrepresentation of women in philosophy.  Most of the proposed solutions to this problem have focused on problems about, and solutions for, the underrepresentation of female graduate students and professors. Hiring practices are being revised, conferences with more female speakersare being advocated, climate surveys are being given to faculty and graduate students, and sexism and sexual harrassment are being called out. (Needless to say, we think these are important developments.)  However, less attention has been paid to the underrepresentation of women at the undergraduate level—especially before students choose a major.  This lack of attention is problematic, since a recent study by Paxton, Figdor, and Tiberius (2012) shows that the most significant drop-off for women in philosophy is between Intro courses and majoring.  Given that less than one third of philosophy majors are women, we must address this under-representation in order to increase the proportion of female grad students and professors in philosophy.

In “Gender and Philosophical Intuition,” Wesley Buckwalter and Stephen Stich (2010) offered a partial explanation of the underrepresentation of women in philosophy that focused on the undergraduate level drop-off. They presented evidence that women have different intuitions than men about thought experiments typically used in Intro courses.  And they proposed that instructors may treat female students’ intuitions as “incorrect” because they differ from mainstream accepted philosophical views. Buckwalter and Stich explain: “[t]he more courses a woman takes, the more likely it is that she will be exposed to thought experiments on which her intuitions and those of her instructor diverge – and the more likely it is that she will decide not to take another course” (2010: 29).  So, on Buckwalter and Stich’s hypothesis, the underrepresentation of women in philosophy is partially explained by the weeding out of those students with “incorrect” intuitions. Their view has received a lot of attention (and some responses). We had our doubts about their hypothesis, decided to test it more fully, and found that our doubts had merit.

In their paper, Buckwalter and Stich provide numerous examples of gender differences on thought experiments, though they do not propose that these gender differences are due to biological differences. Their evidence for gender differences in philosophical intuitions was garnered from philosophers and psychologists contacted by Buckwalter and Stich as well as from thought experiments they tested themselves.

Although Buckwalter and Stich’s hypothesis seems well supported by the evidence they provide, we question their methodology. First, for the data solicited from other philosophers, Buckwalter and Stich do not report the total number of measures checked (including those without gender differences), or whether they obtained information about studies that showed no gender differences. So, they have a high risk for Type-I errors. Suppose there were 100 measures total in the sets of surveys among the experimenters solicited, and these experimenters reported to Buckwalter and Stich that 5 of those measures showed gender differences. With a standard Type-I error rate of 0.05, we would expect 5 responses to indicate significant gender differences by chance alone. Buckwalter and Stich did not use any statistical technique for adjusting the significance level in order to account for multiple comparisons. It is hard to know whether the gender differences on the solicited data are due to actual gender differences or due to chance. In addition to their presentation of the solicited data, Buckwalter and Stich present four gender differences from their own studies. However, even on these four thought experiments there appears to be no statistical correction for multiple comparisons. Further, Buckwalter and Stich seem to have tested more than four thought experiments while looking for gender differences. Their own series of thought experiments on Amazon’s M-Turk had a total of 1,836 subjects, but they only report four cases with gender differences, which account for only 384 of the subjects. If each case used around 95 participants, it is likely Buckwalter and Stich ran 15-20 studies, of which only four were reported as showing evidence of gender differences. It is unclear how many of the putative gender differences would be statistically significant had they accounted for multiple comparisons, assuming they checked for gender differences on all 15-20 measures.

Meanwhile, even if there are some gender differences on thought experiments, there are at least three reasons to doubt that women drop out of philosophy because of these differences. First, even in cases where Buckwalter and Stich report a small but significant gender difference in responses, it is unclear that the difference implies that men and women actually make different judgments. For example, they report a gender difference on the Plank of Carneades thought experiment, in which one shipwrecked sailor, Ricki, pushes another sailor, Jamie, off a plank that could not support both sailors. On a seven-item scale, women attributed a greater degree of blameworthiness to Ricki than men did. Yet, both women and men agreed that Ricki is morally blameworthy. Second, in many cases, there is no accepted philosophical intuition. Buckwalter and Stich present a gender difference in intuitions about Compatibilism regarding free will and determinism. However, there is heated debate between Compatibilists and Incompatibilists and it is unlikely that most instructors present one side of the debate as correct. Third and finally, when there is an accepted intuition by the philosophy profession, the gender difference reported sometimes suggests that women reported the accepted intuition. Take the thought experiment on Putnam’s Twin Earth; women are reported to be less likely than men to agree that Oscar and Twin-Oscar mean the same thing when they say ‘water’.

But, in any case, we wanted to test whether any of the just-described gender differences were genuine.  In an attempt to replicate Buckwalter and Stich’s findings, we gave a survey to over 300 critical thinking students at Georgia State University (see here for summary of findings). Note that our sample is more representative of the population of undergraduates taking philosophy for the first time than Buckwalter and Stich’s M-Turk sample. We re-ran nearly all of the thought experiments discussed in their paper, using the same wording, but we found a gender difference only in one case (women were more likely than men to agree that George knew he was not a virtual reality brain, which is consistent with Buckwalter and Stich’s report). Yet, when we performed a Sidak correction for multiple comparisons, the gender difference is not significant. Our colleague and statistics aid, Sam Sims, has conducted a power analysis for our replication. Our study has an 80% chance of detecting the effect of gender on responses to any given thought experiment provided that gender explains at least 9% of the variance in responses. Given the concerns expressed above, we doubt that any smaller gender effects, even if they exist, would be enough to contribute to women leaving philosophy.

Because we greatly appreciate Buckwalter and Stich’s attempt to find explanations for the early drop-off of women in philosophy, we decided to look for others.  We developed a climate survey for undergraduates (over 700 in Intro to Philosophy at Georgia State) to look for other explanations for why women say goodbye to philosophy so early in the game.  We found many differences between genders—and also between white and black students, another issue that needs to be addressed—in their perceptions of their Intro class, some of which provide clues for where to look further (e.g., the number of women on the syllabi, as discussed here and here).  One interesting finding from our results is that the students’ perceptions of themselves as having different opinions from their classmates is a “partial mediator” between gender and the intent to persist in philosophy. That is, students’ intentions to persist in philosophy are partly driven by whether they perceive themselves as having different opinions from other students, and whether they perceive themselves as having different opinions from other students varies by gender. Buckwalter and Stich’s hypothesis seems to suggest that women are less likely to persist in philosophy in part because they are less likely to perceive themselves as having similar opinions to their classmates. However, our data suggests that women are less likely to persist in philosophy in part because they are actually more likely to perceive themselves as having similar opinions to their classmates. This finding seems to provide evidence against Buckwalter and Stich’s (2010, 34) claim that “differences in intuition tout court” makes one less likely to continue in philosophy.

We have some emerging hypotheses for why women, and black students, don’t go into philosophy right from the start.  But it’s likely going to be a complicated story with a lot of contributing causes. We hope that more effort will be made to understand these causes and, where appropriate, to counteract them. We suspect that most of these efforts will make undergraduate philosophy courses more relevant, useful, and enjoyable for all students.

-Toni Adleberg,* Morgan Thompson,* and Eddy Nahmias, Georgia State University

* Primary (and equal) authors of this work (including this blog post), both GSU MA 2013



  1. Eddy, this is a really interesting response to the analysis offered by Buckwalter and Stitch.

    The original Buckwalter and Stitch paper (as I recall it) argues roughly that there are some statistically significant gender differences–and whatever explains this might partially explain the under-representation of women.

    I think it is worth keeping some issues separate that tend to be run together. The first issue is the matter of why we have widespread under-representation of women in academic philosophy. The second issue is whether it is descriptively accurate to say that there ARE statistically meaningful gender differences in how men and women respond to various philosophical questions. Part of this second issue is a further distinction: we need to describe any differences that may exist and as a distinct project we need to offer possible explanations.

    Is the point of the critique your group makes a) that there are no actual gender differences, b) that there are observed differences, but these are explained by some other factor–they aren’t really gender differences (or may only be statistical noise), c) that there are differences, but that they do not explain under-representation in the field, or d) some other specific claim?

  2. Morgan Thompson

    Hi Robert (if I may),

    We take Buckwalter and Stich’s argument to go something like the following:

    1. If women have different intuitions about philosophical thought experiments than men, then this would likely lead more women than men to stop taking more philosophy classes.
    2. Women do have different intuitions about philosophical thought experiments than men.
    3. So, this likely leads more women than men to stop taking more philosophy classes, hence accounting for some of the underrepresentation of women in philosophy.

    We question both their premises and the warrant of their conclusion. In the post, we describe some of the reasons we disagree with (1). We also question their methodology used to provide the evidence for (2). Because Buckwalter and Stich did not statistically correct for multiple comparisons (e.g., using Sidak or Bonferroni corrections), there is no way to know whether the differences they found reflect actual gender differences or mere chance (“statistical noise”, as you suggest). That is, we think Buckwalter and Stich cannot reject their null hypothesis, (a). We then tried to replicate their results in order to weigh in on (b). Our replication failed to produce significant differences. The one difference we found (on the brain-in-a-vat scenario) is not significant when we adjust our significance level for multiple comparisons.

    Now there is a concern that there may be gender differences we did not detect because our replication may not have enough statistical power. When designing their studies, researchers typically aim to have at least 80% power to detect an effect of a given size. Effect sizes can be small (|r| = 0.1), medium (|r| = 0.3), or large (|r| = 0.5). In order to conclude (a), we would need over 80% power for small effect sizes. As it turns out, our replication has 80+% power for effect sizes slightly larger than medium. We cannot confidently claim (a) because we did not have enough power to detect small differences. Instead, we can claim (a’): there are no gender effects (of at least slightly larger than medium size) in philosophical intuitions. For any gender effect that is medium or smaller, we suggest (c’): if there are any gender effects that are medium size or smaller, then these would be inadequate to explain the underrepresentation of women in philosophy.

    Given our failed replication, we began searching for other factors in the early drop-off of women in philosophy, some of which we are currently exploring. For example, we’re interested in seeing whether philosophy instructors have “fixed” mindsets about philosophical skill, and so are more likely to perceive their students as either being good or bad a philosophy with little the student can do to alter that fact. If they do have “fixed” mindsets, then philosophy instructors may encourage only those students perceived as immediately being good at philosophy to take more philosophy courses. Another potential factor comes from the results of the climate survey we ran last fall, which showed that female students are less likely than male students to find philosophy useful for getting a job and to think philosophy is relevant to their lives. We are interested in finding out if women are more concerned with majoring in a field that they find useful for getting a job and relevant to their lives. If they are, then perhaps philosophy instructors can emphasize the relevance of philosophy to the students’ everyday lives, for example, by showing movie clips with philosophical elements in their courses. Instructors might also provide students with a sense of how philosophical skills can help them get a job.

  3. Howard

    I imagine similar claims are made with regard to women taking up subjects other than philosophy eg. math and engineering.
    Might such comparisons affect your argument or otherwise prove instructive?

  4. Toni Adleberg

    Hi Howard,

    Good point; women are also quite underrepresented in some STEM fields, especially engineering, computer science, and physics. (See here: Insight into the causes of women’s underrepresentation in those fields certainly might be instructive in philosophy.

    A helpful research report from 2010 about women in STEM fields is available here: One thing it mentions is Carol Dweck’s research on “fixed” and “growth” mindsets. Female students are evidently more likely to persist in math if they have been taught that intelligence can be developed, rather than that it is fixed. That is why, as Morgan mentions in her comment, we are conducting an instructor survey in the fall to investigate our instructors’ attitudes towards intelligence.

    Thanks for the comment, and let us know if you had in mind any other ways in which our project might benefit from research in other fields!

    • howard

      For what it’s worth, here’s my proposal:
      1) following Badcock people can think mechanistically or mentalistically
      2) women think more mentalistically
      3) philosophy involves more mechanistic thinking
      4) logic is the most explicitly mechanistic aspect of philosophy.
      5) (I’m leaving out a few steps) in introductory courses for women along with making philosophy more palatable, teach them logic while showing how logic applies to philosophy in general
      6) pilot this approach at women’s colleges where you don’t have to worry about alienating the men

  7. James

    Great stuff! One thing : I take it that an intuition, if not presented as being correct, can be presented as being standard. And I can imagine that being pretty discouraging to someone who fails to share the standard intuition. It seems that there are many debates in which instructors would be unlikely to present one side of the debate as being correct, due to reasons you cite, but that certain intuitions in those debates would nonetheless be commonly presented as being the standard intuitions to have. (more of a comment for B&S than you, I suppose)

  8. Wesley Buckwalter

    I wanted to write in to thank Toni, Morgan and Eddy for all of their careful and fantastic work. Just speaking for myself, this is exactly the sort of productive engagement and further empirical study I was hoping our paper would help inspire. The fact that some of our results back in 2009 were not found among Georgia State students might begin to question the degree to which differences in these thought experiments are so widespread. But as the research on diversity in philosophical intuitions continues, I think Toni, Morgan and Eddy have also provided a very valuable new set of results at George State pointing to other promising hypotheses philosophers should also begin to take seriously. Further testing will be essential here too, and I look forward to learning even more in the future about how experimental philosophy can contribute to our understanding of this important problem.

