Welcome to the Brains Blog’s Symposium series on the Cognitive Science of Philosophy! The aim of the series is to examine the use of methods from the cognitive sciences to generate philosophical insight. Each symposium is comprised of two parts. In the target post, a practitioner describes their use of the method under discussion and explains why they find it philosophically fruitful. A commentator then responds to the target post and discusses the strengths and limitations of the method.
* * * * * * * * * * * *
Metaethics and Experimental Philosophy:
A Journey Through Thick and Thin
Experimental philosophy is an interdisciplinary approach to philosophical questions and problems that uses empirical methods from various cognitive-scientific disciplines such as psychology, experimental linguistics, and neurosciences. Even though experimental philosophy is a relatively recent movement and has only been around for 25 years, its practitioners have shown remarkable productivity. In January 2021, roughly 2000 papers were listed as ‘Experimental Philosophy’ on PhilPapers.org. Roughly one-fourth of these papers were categorised as ‘Ethics’. In this article, I would like to focus on one specific sub-discipline of moral philosophy that, I argue, can benefit greatly from engaging with experimental philosophy, namely metaethics.
As Sayre-McCord (2014) said, ‘Metaethics is the attempt to understand the metaphysical, epistemological, semantic, and psychological presuppositions and commitments of moral thought, talk, and practice’. Central to metaethics are, among others, questions about the meaning of ethical terms such as ‘good’ and ‘bad’, whether moral statements containing these terms can be true or false, what it is that people express or do by using moral language, and how moral language relates to motivation and behaviour. It is clear that these questions demand at least partly empirical answers. It seems absurd to claim that we can properly understand the meaning of ethical terms and what people express and do with them without looking at the way people actually talk. Additionally, it would be highly questionable to make any claims about moral language and its relationship to motivation and subsequent behaviour without consulting or conducting empirical studies.
For the sake of this article, it is impossible to cover all there is to say about the relevance of experimental methods for metaethics. What I wish to do instead is draw attention to a recent project in experimental metaethics that aims to understand thick ethical terms.
Concepts such as ‘rude’, ‘cruel’, ‘friendly’, and ‘compassionate’ are what philosophers call thick ethical concepts. They are characterised by their provision of both evaluative and descriptive content. They communicate that an action, behaviour, custom, person, or character trait is viewed with approval or disapproval, and they further communicate in virtue of what descriptive features they are evaluated in this way. In contrast, thin ethical concepts, such as ‘good’ and ‘bad’, are said to be merely evaluative. With such a rough-and-ready notion of thick concepts in mind, philosophers have sought to provide a proper definition for these concepts and spell out more clearly how thick concepts differ from thin concepts, as well as from descriptive concepts, such as ‘green’ or ‘round’.
As many philosophers in the debate readily admit, the attempt to define thick concepts is a long way from being accomplished. Discussions and significant disagreement have revolved around five central questions:
- Separability Question
- Location Question
- Centrality Question
- Variability Question
- Action-Guidingness Question
It is beyond the scope of this article to do justice to the complexity of the entire debate. For now, let us focus on two questions in more detail, determine what empirically testable predictions they make, and analyse how experimental philosophers have started examining them.
The Location Question asks where exactly we can find the evaluative dimension of a thick term or concept. The two options discussed are (a) that the evaluation is part of the semantic content of a thick term or (b) that it belongs to what is pragmatically conveyed beyond what is literally said. The philosophical literature often relies on intuitions about whether statements such as ‘Tom is cruel, but he is not bad’ sound contradictory. If such a statement does sound non-contradictory, we can conclude that a negative evaluation of Tom is not intrinsic to saying that he is cruel and, thus, not semantically conveyed. Despite the widely accepted relevance of such ‘linguistic data’ (that is the author’s own intuitions about what linguistic intuitions most people have), no systematic empirical studies have been conducted on thick concepts and their evaluative dimension. This is even more surprising, given that in experimental linguistics, empirical means to test whether a statement sounds contradictory are readily available. Perhaps the most widely used test is the Cancellability Test: take a bit of information for which it is unclear whether it is merely conversationally implicated or semantically connected to a concept or sentence, then explicitly cancel this bit of information and see if the resulting phrase sounds contradictory.
In two recent papers (2020, 2021), Kevin Reuter and I aimed to determine whether the evaluation of thick concepts is communicated by semantic or pragmatic means by using the cancellability paradigm. We reasoned that if pragmatists are correct and the evaluative aspect is only conversationally implicated, cancelling the evaluation should not lead to a contradiction. Take, for instance, the sentence: ‘There is the door’. This statement not only communicates the location of a door but in some contexts carries the particularised conversational implicature that the addressee is asked to leave the room. Still, saying ‘There is the door, but by that, I am not saying you should leave’ does not yield a contradiction. Generalised conversational implicatures work similarly but depend less on the speciﬁc context. If the evaluation of a thick term is conversationally implicated, cancelling the evaluation should be equally non-contradictory. If such empirical evidence were to be found, this would count as direct support for the pragmatist position, which treats the evaluation as a conversational implicature. In contrast, semantic separabilists claim that the evaluative component cannot be cancelled. For example, a person who says, ‘What Tom did was rude, but by that, I am not saying something negative about Tom’ makes an infelicitous statement.
Our study yielded surprising results. First, neither the prediction of the pragmatist view nor the semanticist view were met. Against the pragmatists’ prediction, the evaluation of a thick concept was signiﬁcantly harder to cancel than the conversationally implicated content and resulted in higher contradiction ratings. This effect persisted for two different embeddings of thick terms (Behaviour and Character). Challenging the semanticist, the evaluation of thick concepts was signiﬁcantly easier to cancel than the semantically entailed content.
Second, going beyond the philosophical dispute and each side’s respective predictions, we assumed that polarity (positive vs. negative) might play a role in how simple it would be to cancel the evaluation of a thick concept. We have not seen any suspicion along these lines in the metaethical literature, but given what we know from the experimental philosophy of morality, this was a possibility worth exploring. Our study revealed a strong polarity effect on contradiction ratings. For positive thick terms, contradiction ratings were signiﬁcantly lower than those of negative thick terms as well as semantic entailments. This polarity effect is hitherto unknown and has not been predicted by any of the various accounts of thick concepts. In fact, the effect challenges the tacit assumption that thick terms and concepts form a homogenous group of which we can ask broad questions about separability and how evaluation and description are connected.
Kevin Reuter and I happily admit that for the time being, we can only speculate as to why the polarity effect occurs. We outline several possibilities in our paper. Together with Lucien Baumgartner, we are in the process of exploring these explanations more systematically and we also developing new experimental approaches to evaluative language, including other experimental paradigms and corpus linguistic approaches (see Reuter, Baumgartner & Willemsen, ms). The investigation of thick concepts is very much in its infancy. However, by applying a very simple experimental paradigm to sentences that so far have only been used as ‘thought experiments’ and philosophical ‘intuition pumps’, we have already empirically challenged two of the most prominent views on thick concepts, as well as the shared assumption that positive and negative thick concepts communicate evaluation in the same way. This seems to be enough to motivate a much larger empirical investigation of thick concepts and normative language more generally.
Independent of the linguistically-driven debate in metaethics, it has been argued that thick concepts possess an important connection to actions. This is what I have called the Action-Guidingness Question. Take as an example a friend telling you that your behaviour at the party last night was rude. In addition to simply communicating her disapproval, you might infer an even more far-reaching communicative goal: your friend does not want you to behave in the same way at the next party. What she tries to do is make you change your behaviour. Bernard Williams (1985) offered one of the earliest and most influential attempts to define thick concepts in terms of their potential to guide actions:
The way these notions are applied is determined by what the world is like (for instance, by how someone has behaved), and yet, at the same time, their application usually involves a certain valuation of the situation, of a person or actions. Moreover, they usually (though not necessarily directly) provide reasons for actions.(Williams, 1985, p. 143 f.; own emphasis)
Many have adopted this suggestion. In addition to being endorsed by many scholars, it seems that the idea of thick concepts being connected to actions adequately captures what people mean to communicate by calling someone else ‘rude’ or ‘cruel’.
After speaking at length about how metaethics could profit from experimental results, it might not come as a surprise that I believe that even an idea this plausible requires empirical support. Ultimately, whether thick concepts have the disposition to guide actions is a matter of their psychological effects on people. Judith Martens and I have started to investigate the action-guidingness of thick concepts. We believe that to develop a proper metaethical theory of thick concepts and their relationship to actions, we need to understand a) whether there are circumstances in which thick concepts provide reasons for action, b) whether there are circumstances in which thick concepts do not provide reasons for action, and c) how these two classes of circumstances differ from one another.
As a first study (Willemsen & Martens, ms), we tested the idea that thick concepts have the disposition to provide reasons for action, especially when a person is reasoning about what would be best to do. Participants were presented with the following prompt:
“Sally is struggling with a decision on how she should act. The situation is tricky, and Sally has several alternative options. To decide which option she should choose, Sally makes a list of things that count against and in favour of each of these options, and also of things that speak neither against nor in favour of these options. Sally thinks about Option A and writes down ‘Doing this would be [term]’.”
The results suggest that in contexts of self-reflection when an individual is attempting to determine the best course of action, thick terms strongly count in favour of or against an option. Descriptive terms do not share this disposition in this context. It seems that philosophers have been right all along in their assmption that thick concepts have the potential to guide actions.
Building on Willemsen and Reuter (2000, 2021), Judith Martens and I also wondered how exactly reasons for actions are communicated. Again, there are at least two obvious options on the table. First, reasons for actions are communicated as a matter of the semantic meaning of thick concepts. Alternatively, uttering a thick term might simply conversationally implicate a reason for action. Judith Martens and I used a variation of the cancellability test to determine how reasons for actions are communicated. We asked participants to ‘Please imagine that Sally said the following sentence: “What Jim did last week was cruel, but by that, I am not saying that Jim should have acted in a different way.”’
To our and potentially many philosophers’ surprise, we made two findings. First, the contradiction ratings for every single thick concept we tested were significantly below the neutral midpoint and even lower than the cancellability ratings of conversational implicatures. The action-related, reason-giving component of a thick concept seems to be only very loosely connected to a thick concept. Second, mirroring the findings of Kevin Reuter, Judith Martens and I also found a significant polarity effect, such that negative terms received higher contradiction ratings.
Again, I do not pretend to have a sufficient grasp on what is going on here, and I believe that the journey into the exciting world of evaluative concepts has just begun. The reason I believe that this journey is worth travelling is that even in the first few steps, we have made exciting, surprising discoveries that suggest more questions and more answers.
* * * * * * * * * * * *
Commentary: Testing the Loaded Side of Language
I share Pascale’s enthusiasm for the new insights that an empirical approach to the philosophical study of language can offer, especially in the domain of what we may call loaded language, i.e. speech that does not only describe the world, but evaluates it. Within this broad field, there are entire areas of investigation that have been explored only relatively recently on theoretical grounds, let alone on experimental ones. As a matter of fact, the domain of loaded language encompasses not only moral discourse – on which Pascale’s post focuses – but also expressive speech, ranging from insults (jerk, bastard), interjections (shit!, fuck!), intensifiers (damn, fucking), to slurring terms, that is, derogatory words that target groups and individuals on the basis of their belonging to a certain category (think of racial and homophobic epithets, for instance). Many crucial questions arise around expressive discourse: what expressive speech is, how it works, how expressive content is encoded in language, what functions it fulfils with respect to the speaker and to their audience, in what relation it stands to morality (if any), when it should be censored (if ever).
In the very last few years, philosophers and linguists have started experimenting on expressive language, while until recently only psychologists and cognitive scientists have done so (see among others Bowers and Pleydell-Pearce 2011, Fasoli et al. 2015). This has been a pivotal turn in the study of expressives, just as much as for the investigation of moral terms discussed by Pascale: so far, scholars have solely relied on their own intuitions, assuming they are a reliable source of information. But of course philosophers’ linguistic intuitions can diverge very much, they are imbued with theoretical assumptions, and are ultimately a very biased source of information. In contrast, testing hundreds of untrained participants seems to be a more promising strategy to get a better picture of loaded language. Resorting to these empirical approaches has proved very interesting and fruitful for my own research on expressives, leading to surprising findings as to how the context affects the way in which slurs and insults are perceived (Cepollaro et al. 2019, Cepollaro et al. 2020).
However, having voiced enthusiasm for testing the loaded side of language, I feel like sounding a note of caution. When we experimentally investigate loaded language, we typically try to squeeze precise empirical predictions out of theoretical accounts. What we then find doesn’t often match any of those simplified predictions. Pascale mentioned how in two recent papers with Kevin Reuter (2020, 2021), they employed the cancellability paradigm in order to test semantic and pragmatic theories of thick terms. They found that the evaluative content of these expressions is – roughly speaking – harder to cancel than conversational implicatures (which goes against pragmatic views) and easier to cancel than semantically entailed contents (which speaks against semantic accounts). These findings are of great interest, but they leave room for some hermeneutical frustration nevertheless. In fact, an advocate of the semantic view can take these results as showing that the evaluative content of thick terms can’t be suspended like pragmatically implicated contents; they will then add that the reason why these evaluations are nevertheless easier to cancel than semantic entailments is that, in the absence of suitable non-evaluative counterparts, we are willing to force a non-literal reading of thick terms in utterances which would otherwise sound totally contradictory. A supporter of the pragmatic approach, on the other hand, can interpret these data as suggesting that the evaluative content of thick terms can be cancelled (they are much easier to suspend than semantic entailments), but since evaluations are so routinely associated with these expressions, it can be hard to get rid of them altogether. In other words, each theorist can stress the importance of a certain finding, by taking it as the primary and reliable result and then come up with a secondary mechanism that explains those results that fail to align with one’s favourite theoretical approach.
This is to say that these (and similar) studies should be seen as a preliminary exploration into a relatively unknown domain and – because of the subtleties of the matter – we should expect to often run into similar hermeneutical aporias. I’ve found myself in a similar situation too. In How Bad Is It to Report a Slur? (2019), together with psychologist Simone Sulpizio and philosopher Claudia Bianchi, we looked at how slurs are perceived in reported speech. Scholars disagree on whether a speaker who reports a slurring utterance is herself engaging in slurring (take an utterance like “My boss said that they aren’t going to hire a S” – where S is a slur, for instance a racial or homophobic epithet). This question is interesting for a bunch of reasons. First, it is a clue for understanding how slurs encode their pejorative content: is it semantically encoded or rather pragmatically conveyed? Second, whether slurs can be reported without being derogatory affects our online and offline language policies. Now, when armchair philosophers and linguists have examined their own intuitions, they came to diverging conclusions. According to some, when a speaker reports a slurring utterance, they – and not necessarily the reported speaker – are perceived as slurring. For these scholars, slurs should be banned not only from direct but also from reported speech (let’s call them prohibitionists; see Anderson and Lepore, 2013; Anderson, 2016). According to others, only the reported and not the reporting speaker is taken to be slurring; slurs don’t need to be banned from reported, but only from direct speech (let’s call them non-prohibitionists; see Schlenker 2007).
In order to shed some light into this debate between prohibitionists and non-prohibitionists, we asked participants to rate the offensiveness of utterances featuring slurs in two conditions: (i) direct speech of the form Y: ‘X is a S’, and (ii) reported speech of the form Z: ‘Y said that X is a S’, where X, Y, Z are proper names and S is a slur. We found that reported speech decreases the offensiveness of utterances featuring slurs, without entirely deleting it. Once again, these results do not exactly fit within any of the theories on the market. The prohibitionist could read these findings as showing that the pejorative content of slurs is ascribed to the reporting speaker and then appeal to a supplementary mechanism to explain why the derogatory power is nevertheless diminished: participants grant the possibility that the reporting speaker didn’t fully mean to slur and thus perceive slurs in reported speech as less derogatory than in direct speech. The non-prohibitionist, on the other hand, can take these very results as showing that the derogatory content of slurs is significantly diminished by report, and then appeal to a secondary mechanism to explain why it is not entirely cancelled: since competent speakers are supposed to know that slurs are tabooed to a certain extent, they are expected to avoid them in reported speech, or they will run the risk of sounding derogatory. Here comes the same pattern again: each theorist could in principle take one result as the primary and reliable finding and then come up with a secondary mechanism that explains those results that fail to align with one’s favourite approach.
Of course conflicting interpretations are not so unconstrained, diverging readings of empirical results are typically not on a par, and any supplementary hypothesis concerning secondary mechanisms needs to be tested further and experimentally supported. However, this couple of cases of open-ended interpretation show how we should always keep in mind that when we look into loaded language on empirical grounds, the matter at stake is so complex and full of subtleties, that one study will neither prove nor disprove a theory by itself, but can at best suggest promising lines of investigation and further thought.
With this – and many other – caveat in mind, long life to experimental philosophy of (loaded) language!
* * * * * * * * * * * *
Anderson, L. 2016. “When Reporting Others Backfires.” In Indirect Reports and Pragmatics, edited by Capone, Alessandro, Kiefer Ferenc, and Franco Lo Piparo, 253–64. Cham: Springer International Publishing.
Anderson, L., and E. Lepore. 2013. “Slurring words.” Nous 47, no. 1: 25–48.
Bowers, J.S., and C.W. Pleydell-Pearce. 2011. “Swearing, euphemisms, and linguistic relativity.” PloS one 6, no. 7: e22341.
Cepollaro, B., S. Sulpizio, and C. Bianchi. 2019. “How bad is it to report a slur? An empirical investigation.” Journal of Pragmatics 146: 32-42.
Cepollaro, B., F. Domaneschi, and I. Stojanovic. 2020. “When is it ok to call someone a jerk? An experimental investigation of expressives.” Synthese.
Fasoli, F., A. Maass, and A. Carnaghi. 2015. “Labelling and discrimination: Do homophobic epithets undermine fair distribution of resources?” British Journal of Social Psychology 54, no. 2: 383–93.
Reuter, K., Baumgartner, L. & Willemsen, P. (ms). Tracing Thick Concepts Through Corpora.
Schlenker, P. 2007. “Expressive presuppositions.” Theoretical Linguistics 33, no. 2: 237–45.
Willemsen, P., Martens, J. (ms). Do Thick Concepts Provide Reasons for Action?
Willemsen, P., Reuter, K. (2020). Separability and the Eﬀect of Valence. In Denison, Mack, Xu, Armstrong (Eds.), Proceedings of the 42th Annual Conference of the Cognitive Science Society 2020, pp. 794-800.
Willemsen, P., Reuter, K. (2021). Separating the Evaluative from the Descriptive: An Empirical Study of Thick Concepts. Thought: A Journal of Philosophy.
Williams, B. (1985). Ethics and the Limits of Philosophy, Cambridge, MA: Harvard University Press.