Cognitive Science of Philosophy Symposium: Idealized Models

Welcome to the Brains Blog’s Symposium series on the Cognitive Science of Philosophy. The aim of the series is to examine the use of diverse methods to generate philosophical insight. Each symposium is comprised of two parts. In the target post, a practitioner describes the method under discussion and explains why they find it fruitful. A commentator then responds to the target post and discusses the strengths and limitations of the method.

In this symposium, Hannah Rubin (University of Missouri) argues that highly idealized models have at least two important uses in policymaking. In her commentary, Dunja Šešelja (Ruhr University Bochum/Eindhoven University of Technology) discusses how we can establish a link between an idealized model and its intended target.

* * * * * * * * * * * *

Should we use idealized models in policymaking?

Hannah Rubin

* * * * * * * * * *

I know the rule is that the answer to questions in a title is supposed to be ‘no’, but I’m going to argue for ‘yes’. Idealized models can be very useful in our thinking about the likely efficacy and effects of enacting certain policies.

I mean ‘policy’ in a very broad sense: any sort of formalized action taken to achieve some goal(s). However, I will not attempt to talk broadly about the role of models in general, or even the general role of models in policymaking. What I will do is highlight two particularly useful things that highly simplified models can do: (1) provide counterexamples to empirical statements we might otherwise base policy decisions on and (2) point out potential pitfalls of policy proposals. (To be clear, these are not the only things accomplished by the models I’ll describe, and I don’t claim that the authors would necessarily agree their models play the role I outline.)

Counterexamples

Many policy-relevant claims are intuitively convincing. This doesn’t always mean that they are actually true. Idealized models are great at providing counterexamples to intuitively convincing policy reasoning. To see how, let’s take three policy-relevant claims we might be tempted to think are true, along with three models showing them to be false.

The way industry funding corrupts science is by corrupting researchers. This statement seems empirically plausible, and there are policy decisions based on it – requiring conflict of interest statements, ethics training, and so on. However, Holman and Bruner (2017)’s model shows that it is false. If researchers whose (legitimate, but industry-favorable) methods are funded by industry, they will be more productive and train future researchers in their methods. Those methods will become overrepresented as time goes on, skewing research in the favor of industry interests. However, no individual researcher ever changes their methods based on industry funding; none of the researchers have been individually corrupted. So, to prevent industry influence, we need additional kinds of policies to counteract this biasing-at-the-community-level.

More communication leads to better beliefs. Again, this seems totally plausible – more information and better communication should promote better inquiry. This might lead to policies to promote sharing of results or increase communication within large-scale collaborations. However, as Zollman (2007) shows, reducing communication can lead to more accurate beliefs. Limiting information flow can prevent a group from erroneously converging on a false belief. If there happens to be misleading evidence, which is always a possibility when agents get noisy data about the world, then having mechanisms to contain that misleading evidence and not share it widely will benefit the group in the long term. This doesn’t tell us exactly how to set up optimal communication structures, but it does caution against indiscriminately implementing policies aimed at increasing information flow.   

Demographic diversity only benefits inquiry if it increases cognitive diversity. Research shows that cognitive diversity – e.g., different research methods, questions, background assumptions – is beneficial to inquiry. It is often assumed that demographic diversity, and the conflicts associated with it, is beneficial just in case it correlates with or causes this beneficial cognitive diversity. This assumption might limit diversity efforts to cases where it will have the most ‘impact’, e.g., enacting diversity-promoting policies only when target demographic groups are associated with different cognitive styles. However, as Fazelpour and Steel (2022) show, intergroup frictions in demographically diverse groups can improve accuracy of inquiry, even in the absence of cognitive diversity. Lack of trust can slow the spread of information, preventing erroneous convergence on false beliefs. Additionally, people tend to conform to others in their demographic group, and the mere presence of people from other demographic groups reduces conformity pressure. Since conformity is generally bad for successful inquiry, demographic diversity improves inquiry by reducing conformity. 

In all these cases, I’ve given an intuitive explanation for why the model’s results hold. Assuming I’ve done a decent job, this might lead one to object: “Yes, that counterexample makes sense, but do why do we need the model?” Here’s how I think about it. When engaging with a counterexample, it’s reasonable for someone to ask how exactly that counterexample is supposed to work. Someone committed to the view that industry funding corrupts science exclusively via corrupting researchers won’t necessarily be convinced if you just tell them “No, actually, it can corrupt science by changing what research methods are dominant in the field.” They’ll want an explanation of how exactly that’s supposed to happen without corrupting individual researchers, which is just what the model provides. You might be able to demonstrate how it works without a model, but at a certain point in clearly spelling out the details of the mechanism you’ll have basically made a model anyway. So, you might as well analyze it.

Now, how seriously we take the counterexamples depends in large part on how plausible the assumptions of the model are and the robustness of the results. If demographic diversity (without cognitive diversity) only matters in very specific cases or in cases where we assume people act completely unlike actual real-world people, we might think that our counterexample is of little importance. It shows a universal statement is technically false, but we might still be justified in acting as if the universal were true for any policy we’re considering. So, we want some connection to the world.

Both Zollman (2007) and Fazelpour and Steel (2022)’s results match empirical results. But this raises another question: If we have this empirical data, why then do I say the model provides the counterexample? In Zollman (2007)’s case, relevant empirical data came later (e.g., Derex and Boyd, 2016). Fazelpour and Steel (2022) are explicitly motivated by existing empirical results, but their model is a clearer counterexample – in their model, we can be sure that there is no cognitive diversity to explain the beneficial effects of demographic diversity, while in any experiment on real people we can’t be sure. Further, while a match of model results with empirical results gives us a certain kind of confidence, it’s not necessary. If the model achieves what’s called face validation, where the agents ‘on face’ act like we think real people act (Wilensky and Rand 2007), we have reason to take the counterexample seriously and therefore to question the potential efficacy of a policy based on a shown-to-be-false claim.

Potential policy pitfalls

When enacting policy, especially policy relevant to regulating some kind of human behavior, there will always be uncertainty. Particularly pernicious are ‘unknown unknowns,’ things you don’t even know to look out for. Idealized models provide a low-cost way to know some of these unknowns and to point out ways in which policies might fail or backfire. This gives us an idea of what potential negative side effects to preemptively counteract, or at least what data is important to collect to check for new negative effects.

For example, some propose policies to eliminate the current journal peer review process and switch to a system of posting papers online to an archive, in effect ‘publishing it’, while letting peer review happen afterwards. I’ve used an idealized model to argue that this policy increases the importance of social positioning relative to our current system and, as a result, will likely increase disparities in impact and visibility, e.g., along gender and race lines (Rubin 2022). This possibility was discussed and concluded to be highly speculative by Heesen and Bright (2021); the model makes it less speculative by demonstrating the existence of the following feedback loop. Social positioning depends, in part, on previous impact, so researchers initially on the peripheries are less likely to have their work seen and engaged with. They’ll then accumulate less prestige or impact, which affects their social positioning, creating a runaway effect where they are pushed more and more to the peripheries over time. Further, this effect would be hard to detect before fully implementing the policy. Beta testing limited to certain journals wouldn’t create the feedback loop, as there would be other quality journals researchers on the peripheries could apply to. In this case, at least, I think the model gives us what empirical testing can’t.

Additionally, policies aimed at addressing inequities might fail to properly address underlying social dynamics. Let’s say we want to increase the diversity of research teams. So, we might, quite intuitively, enact a policy to give grants to people who put together a diverse research team. However, if the lack of diversity is, in part, due to people leaving inequitable collaborations (as seems likely, since there are persistent inequities across demographic lines), this policy encourages people into those inequitable collaborations. The well-intentioned diversity policy then actually increases inequity. And since inequity is a cause of the lack of diversity, it can also backfire and decrease diversity in the long term. Knowing this, we might reconsider the grant-giving policy or take steps to counteract negative consequences (Schneider, Rubin, and O’Connor 2022).

Policy makers may also fail to account for consequences of partial compliance. For instance, if half of businesses adopt certain fair-hiring practices, will we see half of the expected benefits of those practices? As Dai et al. (2021) show, it depends on the practice adopted and how applicants react. For instance, a policy might require that businesses’ hires reflect the demographics of the overall applicant pool across all businesses. This works fine if everyone follows the rules, but, under partial compliance, people from discriminated-against demographic groups may strategically apply to compliant companies. The compliant companies hire members of that demographic group with respect to representation in the overall applicant pool, which is less than their outsized representation in compliant companies’ applicant pools, meaning they are hired at a lower rate than their overall representation. Meanwhile, non-compliant companies can hire them at an even lower rate. So, overall, the benefit (gain in fairness) is less than the percent compliance. Other policies may fare better.

Again, assuming I’ve done an adequate job of explaining the results of these models, you might object: why do we need the model? Why can’t you just point out the potential pitfalls? It is impossible to consider all the ways a policy might go wrong. We need some reason to take these possibilities seriously before we can say we ought to account for them in policy decisions. As with the counterexamples, providing a model with plausible assumptions regarding how people act (at least achieving face validation) gives us reason to take these potential pitfalls seriously. And again, we might not need the models to see the possibilities, but the models add something extra in terms of clearly demonstrating how and when we might fall into the pit.

We shouldn’t base policy decisions on highly simplified models alone. However, simple models are used to advocate for policies, especially in cases where it’s hard to acquire empirical data. For example, Hong and Page (2004)’s model has been used as evidence for diversity policies in a variety of contexts, and Strevens (2003) uses a simple model to argue for maintaining the priority rule in science. This makes it important to spot pitfalls or a lack of robustness within the model before large-scale change happens. Simple models can also be used to argue that other simple models are not good evidence for the effectiveness of these policies, e.g., Grim et al. (2019) in response to the first example and Rubin and Schneider (2021) in response to the second.

I don’t think that simple models are unique in either of these roles. Basing policy on empirical studies can be problematic if those studies fail to control for important factors or if they aren’t transportable across contexts. We can use empirical studies to critique other empirical studies just as we can use simple models to critique other simple models, or simple models to critique empirical studies, or whatever. Pointing out issues with past efforts is just part of how science works! We should use all the tools available to us whenever we can, and when it comes to policymaking, idealized models can be helpful tools.

* * * * * * * * * * * *

Commentary

Dunja Šešelja

* * * * * * * * * *

In her thought-provoking blog post, Hannah gives an insightful and comprehensive overview of how models can provide counterexamples to policy proposals and indicate their potential pitfalls. Against the backdrop of models being utilized also as evidence for policy proposals, Hannah highlights several cases in which subsequent studies cast doubt on their usefulness. In light of these examples one may wonder: how can we evaluate whether a model is useful for policy making? How can we decide if we should take its result seriously, for purposes of supporting or criticizing a given policy measure? While calling upon the robustness of results or their coherence with empirical findings is often done to this end, one may wonder whether either of these procedures suffices to take the result seriously, and if so why? I will use this commentary to address these issues. While Hannah highlighted specific ways in which highly idealized models can be useful for discussions on policy, I will elaborate on the criteria we can use to assess whether to take them seriously.

Let’s start with Hong and Page’s (2004) model, mentioned in Hannah’s post. The result of this model, often summarized as the slogan ‘diversity trumps ability’, has been used to support policies aimed at prioritizing cognitive diversity over expertise (for an overview of such applications see Grim et al., 2019). However, Hong and Page’s result doesn’t necessarily hold once more realistic assumptions about expertise are added to the model (e.g. Grim et al., 2019, Reijula and Kuorikoski, 2021). In other words, the result failed to be robust under certain changes in structural assumptions about the target phenomenon – in this case, the notion of expertise and how experts are represented in the model. This case nicely illustrates why robustness analysis is important before we take the modeling results seriously as applicable across various contexts.

But if a modeling result does turn out to be robust, should we then take it seriously for the purposes of policy discussions? Not necessarily. Take, for example, Schelling’s checkerboard model of social segregation, which shows how individual nearest neighbor preferences can lead to segregated communities (Schelling, 1971, Schelling, 1978). The model was shown to be highly robust, under changes in both parameter values and structural properties of individual preferences (Muldoon et al., 2012). At the same time, Schelling’s model is a ‘minimal model’ representing a specific causal mechanism in isolation from other factors: it examines the process of segregation emerging from individual preferences, in the absence of other potential causes of segregation. As commonly argued, it demonstrates a possible causal mechanism of segregation and it provides a how-possibly explanation of this phenomenon (Aydinonat, 2007, Verreault-Julien, 2019). In contrast, the real-world segregation may well be caused and better explained by, for instance, institutional racism. Indeed, it is precisely the absence of well-understood causes of segregation (such as institutional racism) from Schelling’s model that makes its result surprising and thus interesting. But for the very same reason, the model is not fit to provide policy suggestions in contexts including various other causes of segregation, nor did Schelling develop it with this purpose in mind.

So while Hong and Page’s model fails to adequately represent the intended target scenario, Schelling’s model represents a target that may not be relevant for policy making. In both cases, the link between the model and the target relevant to policy issues has not been successfully established. These cases illustrate a point raised by Sullivan, 2022: that the link between a model and the intended target is uncertain; to reduce the link uncertainty, we have to empirically validate the model. So what does empirical validation of highly idealized models entail?

Let’s start by clarifying the role of robustness analysis (RA) in this process. The primary purpose of RA is to help us understand the conditions under which a modeling result holds in the model-world. For instance, in case of computer simulations in the form of agent-based models, we can examine whether the result remains stable when we vary the number of agents within the modeled community. Similar checks can be performed by altering the values of other parameters in the model. As we saw above in the case of Hong and Page’s model, RA can also be used to assess whether the result holds when specific structural assumptions governing agent behavior are modified. Moreover, RA can involve developing models based on different representational frameworks, allowing for an investigation into whether the obtained result is contingent on specific idealizations inherent to each framework.

While RA can thus aid in understanding a range of modeling assumptions under which the result holds, it alone is insufficient to establish the link between the model and a specific real-world target. To see why, consider a modeling result obtained through multiple models that employ different assumptions and representational frameworks. It is still possible that all of these models incorporate one or more unrealistic assumptions, which are necessary for obtaining the given result but have not been scrutinized through RA. These assumptions may not align with the intended empirical target, yet their influence on the model’s outcome remains unexplored (for an overview of the discussion on the evidential value of RA see Houkes et al., 2023).

To address this issue, RA needs to be complemented with an empirical embedding of the model. This entails conducting RA by focusing on the stability of the results under assumptions that are informed by empirical knowledge of the target. In this way we can adjust the model to adequately capture relevant aspects of the real-world phenomenon. One way to do this is by adjusting parameter values of the model based on numerical information obtained empirically about the target (Boero and Squazzoni, 2005). For instance, some models of scientific communities can be calibrated using bibliometric data (Martini and Pinto, 2016, Harnagel, 2018). Another way is to adjust specific mechanisms in the model so that they better reflect our empirical knowledge about the target. For example, in simulations of traffic flow in a city, mechanisms representing driver behavior can be adjusted based on empirical knowledge of how drivers respond to congestion.

Beside using empirically-guided RA to gather evidence linking the model to an empirical target, we can also conduct the so-called empirical output validation (Gräbner, 2018). The idea of this procedure is to check whether the result of the model is supported by empirical studies, that is, whether it predicts their findings. For instance, Mohseni et al., 2021 conducted an experimental study that provided empirical support for the result of Bruner’s (2019) model. The study found evidence that minority members can become disadvantaged merely due to the smaller size of their group, as originally proposed in Bruner’s model. The suite of such methods – from empirically guided RA to the replication of results by means of empirical studies – is known in the literature as empirical validation of models (I discuss this in more detail in Šešelja, 2022).

But if highly idealized models need to be empirically validated before we can take them seriously, doesn’t that sound hopelessly demanding? And what shall we make of numerous models that have not passed any empirical checks? My take is that models which have only passed a basic plausibility check (referred to by Hannah as ‘face validation’) can still provide hypotheses that initiate discussions on policy measures. While we may not have sufficient grounds to fully endorse such results, they can serve as motivation for further investigation, including the use of other models and empirical studies. In this way, models that haven’t undergone empirical validation can still offer hypotheses worthy of further pursuit.

This also means that to increase the relevance of highly idealized models for policy debates, it is important to encourage their further investigation. This includes conducting RA by means of additional iterations of the initial model, RA through models employing different representational frameworks, incorporating empirical guidance into the RA, and conducting empirical studies to examine the presence of similar findings in real-world contexts.

To conclude, highly idealized models can indeed stimulate discussions on policy measures, but it is important to exercise caution when interpreting their results. It is also worth noting that providing suggestions for policy guidance is only one function of models. As Schelling’s model illustrates, highly idealized models can provide fascinating insights even when the direct connection to policy discussions may not be immediately apparent.

(Many thanks to Christian Straßer for valuable discussions that informed this commentary.)

* * * * * * * * * * *

* * * * * * * * * * * *

References

* * * * * * * * * *

Aydinonat, N Emrah (2007). “Models, conjectures and exploration: An analysis of Schelling’s checkerboard model of residential segregation”. In: Journal of Economic Methodology 14.4, pp. 429–454.

Boero, Riccardo and Flaminio Squazzoni (2005). “Does empirical embeddedness matter? Methodological issues on agent-based models for analytical social science”. In: Journal of Artificial Societies and Social Simulation 8.4.

Bruner, Justin P (2019). “Minority (dis) advantage in population games”. In: Synthese 196.1, pp. 413–427.

Dai, J., Fazelpour, S., & Lipton, Z. (2021). “Fair machine learning under partial compliance”. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 55-65.

Derex, M., & Boyd, R. (2016). “Partial connectivity increases cultural accumulation within groups”. In: Proceedings of the National Academy of Sciences, 113(11), pp. 2982-2987.

Fazelpour, S., & Steel, D. (2022). “Diversity, trust, and conformity: A simulation study”. In: Philosophy of Science, 89(2), pp. 209-231.

Gräbner, Claudius (2018). “How to relate models to reality? An epistemological framework for the validation and verification of computational models”. In: Journal of Artificial Societies and Social Simulation 21.3, p. 8. issn: 1460-7425. doi: 10.18564/jasss.3772. url: http://jasss. soc.surrey.ac.uk/21/3/8.html.

Grim, Patrick, Daniel J Singer, Aaron Bramson, Bennett Holman, Sean McGeehan, and William J Berger (2019). “Diversity, ability, and expertise in epistemic communities”. In: Philosophy of Science 86.1, pp. 98–123.

Harnagel, Audrey (2018). “A Mid-Level Approach to Modeling Scientific Communities”. In: Studies in History and Philosophy of Science.

Heesen, R., & Bright, L. K. (2021). “Is peer review a good idea?” In: The British Journal for the Philosophy of Science.

Holman, B., & Bruner, J. (2017). “Experimentation by industrial selection”. In: Philosophy of Science, 84(5), pp. 1008-1019.

Hong, Lu and Scott E Page (2004). “Groups of diverse problem solvers can outperform groups of high-ability problem solvers”. In: Proceedings of the National Academy of Sciences 101.46, pp. 16385–16389.

Houkes, Wybo, Dunja Šešelja, and Krist Vaesen (2023). “Robustness analysis”. In: The Routledge Handbook of Philosophy of Scientific Modeling. Ed. by Tarja Knuuttila Natalia Carrillo and Rami Koskinen. Accepted for publication, Preprint available online. url: http://philsciarchive.pitt.edu/22010/.

Martini, Carlo and Manuela Fernández Pinto (2016). “Modeling the socialorganization of science”. In: European Journal for Philosophy of Science, pp. 1–18.

Mohseni, Aydin, Cailin O’Connor, and Hannah Rubin (2021). “On the emergence of minority disadvantage: testing the cultural Red King hypothesis”. In: Synthese 198.6, pp. 5599–5621.

Muldoon, Ryan, Tony Smith, and Michael Weisberg (2012). “Segregation that no one seeks”. In: Philosophy of Science 79.1, pp. 38–62.

Reijula, Samuli and Jaakko Kuorikoski (2021). “The diversity-ability tradeoff in scientific problem solving”. In: Philosophy of Science 88.5, pp. 894–905.

Rubin, H. (2022). “Structural causes of citation gaps”. In: Philosophical Studies, 179(7), pp. 2323-2345.

Rubin, H., & Schneider, M. D. (2021). “Priority and privilege in scientific discovery”. In: Studies in History and Philosophy of Science Part A, 89, pp. 202-211.

Schelling, Thomas C (1971). “Dynamic models of segregation”. In: Journal of mathematical sociology 1.2, pp. 143–186.

——— (1978). Micromotives and macrobehavior. W. W. Norton & Company.

Schneider, M. D., Rubin, H., & O’Connor, C. (2022). “Promoting diverse collaborations”. In: The Dynamics of Science: Computational Frontiers in History and Philosophy of Science, edited by Grant Ramsey and Andreas De Block, pp. 54-72.

Šešelja, Dunja (2022). “What Kind of Explanations Do We Get from Agent- Based Models of Scientific Inquiry?” In: Logic, Methodology and Philosophy of Science and Technology. Proceedings of the Sixteenth International Congress. Ed. by Hasok Chang Benedikt Löwe Hanne Andersen Tomááß Marvan and Ivo Pezlar. Bridging Across Academic Cultures. Rickmansworthe: College Publications.

Sullivan, Emily (2022). “Understanding from machine learning models”. In: The British Journal for the Philosophy of Science, 73(1), pp. 109-33.

Strevens, M. (2003). “The role of the priority rule in science”. In: The Journal of Philosophy, 100(2), pp. 55-79.

Verreault-Julien, Philippe (2019). “How could models possibly provide how-possibly explanations?” In: Studies in History and Philosophy of Science Part A 73, pp. 22–33.

Wilensky, U., & Rand, W. (2015). An introduction to agent-based modeling: modeling natural, social, and engineered complex systems with NetLogo. MIT Press.

Zollman, K. J. (2007). “The communication structure of epistemic communities”. In: Philosophy of science, 74(5), pp. 574-587.

Back to Top