Going Against the Grain of Proportionality

David Kinney and Tania Lombrozo October 29, 2021 October 29, 2021 Causation / James Woodward: Causation With A Human Face

In Chapter 1 of Causation with a Human Face, James Woodward articulates the metaphilosophical outlook on causation that gives his book its title. He tells us that his aim is to articulate a normative theory about how human beings ought to engage in causal reasoning. However, he believes that when building such a theory, one must also account for empirical data describing how human beings actually reason about causation. The human mind, he argues, is finely-tuned by millennia of evolution and years of learning, such that data showing how we actually reason about causation should serve as a good guide to how to be an effective causal reasoner, and hence to how we ought to reason about causation. In the background is a metaphilosophy according to which the sole good-making feature of an account of causal reasoning is its usefulness in helping human beings achieve their goals as agents. That is, our actual goals as agents define a functional standard according to which we assess normative accounts of causal reasoning.

We believe that Woodward is correct with respect to what a theory of causation and causal reasoning ought to do. What we want to explore in this post is the normative issue of how we ought to choose causal variables for use in descriptive and explanatory reasoning, with a particular focus on the “grain problem” in causal reasoning.

The grain problem concerns how we choose an appropriate level of granularity for the variables that we use in our causal models of the world. We posit that human beings often choose to represent the causal structure of the world using coarse-grained generalizations such as “smoking causes lung cancer,” rather than more fine-grained generalizations such as “smoking at least 100,000 high-tar cigarettes over the course of twenty years causes lung cancer.” Choosing the former generalization over the latter often involves a process of lossy compression. Even though the latter generalization allows for the representation of more dependencies between cause and effect than the former, we choose the former, more compressed generalization, plausibly due to its simplicity, tractability, or generalizability.

As Woodward himself puts it, “it is a striking empirical fact that the difference-making features cited in many lower-level theories sometimes can […] be absorbed into variables that figure in upper-level theories without a significant loss of difference-making information with respect to many of the effects explained by those upper-level theories” (emphasis ours) (p. 383). This suggests that insignificant difference-making information can be elided from our causal model of some system of interest. Indeed, this is exactly what happens in cases of lossy compression: a certain amount of information about difference-making relationships is judged to be insignificant when we represent the causal dynamics of our environment.

The question we set out to answer here is the following: are there cases in which the normative theory of causal reasoning laid out in Woodward’s book recommends lossy compression when choosing causal variables?

The place to look for an answer to this question is in Chapter 8, where Woodward discusses his principle of proportionality. For Woodward, proportionality is the criterion that humans should use when they choose the optimal level of granularity for a causal variable, given some effect. To briefly summarize what is presented with a high degree of technicality in the chapter, for a given effect variable E, the choice of causal variable V from a set of alternatives V is proportional to the extent that, ceteris paribus, the variable V “correctly represents more rather than fewer of the dependency relations concerning the effect or explanandum that are present, up to the point at which we have specified dependency relations governing all possible values of E” (p. 372). Importantly, the alternatives V should be understood as all representing the same type of phenomenon at different levels of granularity (e.g., smoking cigarettes, smoking high-tar cigarettes, smoking high-tar Marlboro cigarettes, etc…).

When the dynamics of the system under study are such that each variable in the set V is deterministically related to E, applying the proportionality criterion is straightforward. We select the variable V which, to the greatest extent possible, allows us to intervene on V so as to induce E to take any of its values. To illustrate, if E is a variable denoting whether a lightbulb is on or off, and there is a light switch that perfectly determines whether the lightbulb is on or off, then a maximally proportionate choice of causal variable V would be a binary variable denoting whether the switch is in the on or off position. Importantly, if there were also a dial with ten positions, five of which ensured that the bulb was on and five of which ensured that the bulb was off, then a choice of causal variable V denoting the position of this dial would also be a maximally proportional choice with respect to E. Even though there is some redundancy in the system, both variables allow for maximal control over whether the lightbulb is on or off.[1]

Note that this application of the proportionality criterion vitiates against lossy compression. When we engage in lossy compression, we arrive at a coarse-grained causal model of some system by tolerating some reduction in the amount of information that can be passed between cause and effect, compared to a more fine-grained representation of the same system.

What about when the dynamics governing the system of interest are not deterministic? Woodward has far less to say about this case than the deterministic one, even though his theory of causation is intended primarily as an analysis of statistical causal claims such as “smoking causes lung cancer.” When the causal variable V and effect variable E are both binary, we can make a proportionate choice of causal variable by choosing the variable that maximizes the following quantity:

where V has the possible values {v, ~v}, and where e is either of the two values of E (p. 377). Here too, Woodward is effectively recommending against lossy compression, as this quantity measures how informative interventions on V are with respect to the value of E.

Woodward says nothing about how to measure proportionality in probabilistic systems with non-binary variables. However, Pocheville et al. (2017) propose the following measure, based on the information-theoretic measure of mutual information:

where E has n total values and V has m total values.[2] If we take this measure on board, then once again it seems that the proportionality criterion recommends against lossy compression, since proportionality is formalized as the average amount of information that interventions on the causal variable transmit about the value of the effect variable.

We find ourselves at a juncture. On the one hand, we posit that humans do engage in lossy compression when representing the causal structure of their environment, and some of Woodward’s comments in this book seem to agree with this posit (e.g., his comment, quoted above, on the ability of higher-level theories to retain significant information from lower-level theories). Moreover, Woodward is clear that he takes the actual cognitive practices of human beings to provide relevant data for constructing a normative theory of causation. Taken together, these two claims appear to be in tension with the recommendation against lossy compression that is implicit in Woodward’s account of proportionality. Thus, it seems that we will have to either: i) say something about why, in this instance, the practice of lossy compression by actual agents is not a good guide to a normative theory of causal proportionality, ii) identify a different good-making feature of a causal explanation that is traded off with proportionality in cases of lossy compression, iii) argue that in fact, humans do not engage in lossy compression as part of causal reasoning, or iv) amend the account of proportionality to allow for lossy compression.

Let us begin by pursuing the first route. For Woodward, the reason why we ought to prefer proportional causal models is straightforward: proportional models identify “the level [of causal description] that is most informative about the conditions under which the effect will and will not occur” (p. 389). But this just invites the further question of what it is about the informativeness of a cause with respect to its effect that is valuable to human agents. Woodward’s answer here is that an informative causal relationship is valuable because it does “a better job at providing information that is associated with such distinctive aims of causal thinking as manipulation and control” (p. 373). While enhanced manipulation and control of the value of some variables is certainly desirable for human agents in many contexts, these desiderata can be traded off against other agential interests. Moreover, given agential interests, there may be increases in the informativeness of a cause with respect to its effect that have no value to an agent.

To illustrate, suppose that we knew that if a patient smoked 20 Marlboro cigarettes a day for ten years, then their probability of developing lung cancer was slightly higher than if they smoked 20 cigarettes of some other brand every day for the same time period. Under these conditions, the more proportional causal variable for an agent who cares about preventing lung cancer would be one that distinguishes between these two types of cigarette brands. But one can easily imagine an agent who says, despite the difference that brand makes to one’s probability of lung cancer, that no matter what, if one wants to avoid lung cancer, then one simply should not smoke. That is, for the sake of manipulation and control, they don’t care enough about the differences between cigarette brands to account for this difference when building a causal model of their environment. By Woodward’s own standards, such an agent would favor a binary causal variable with the values {smoker, non-smoker}, even though this is not consistent with Woodward’s proportionality criterion for variable choice, since it amounts to a case of lossy compression.

If we pursue the second route, and aim to find an alternative desideratum for a causal model that is satisfied at the expense of proportionality in cases of lossy compression, a natural candidate is simplicity. Compressed models are simpler models, in the sense that they use variables with fewer values. In cases of lossy compression, we may want to say that the loss in difference-making information between cause and effect diminishes the proportionality of the causal representation, but increases its simplicity, achieving an optimal balance. This is plausible, but there are indications that it is not the move Woodward would make. This kind of simplicity, he writes, is best understood as “descriptive simplicity,” which he contrasts with “the kind of simplicity that is supposedly relevant to choosing among different empirical alternatives” (p. 382, footnote 37). This suggests that Woodward does not take merely descriptive simplicity to be a genuine desideratum for causal representations, such that this option is not one he wishes to take.

Third, one could argue that humans do not, in fact, engage in lossy compression as part of causal reasoning. Given Woodward’s comments in this book, there is no indication that he would take this route, and, for our part, it does not seem especially plausible. But we also note that there are, to our knowledge, no empirical studies directly testing whether people engage in lossy compression when formulating type-level causal hypotheses or explanations.[3] This suggests an intriguing line of future research for cognitive science and experimental philosophy.

This leaves our final option, which is to hold that a normatively-grounded approach to the granularity problem in causal modelling should allow for lossy compression, and that Woodward’s proportionality criterion should be modified to allow for this. As a guide to how this could go, there is some work in the philosophy of science and machine learning literature on coarse-graining and variable choice in causal models that explicitly allows for lossy compression in accordance with agential interests; see for instance Brodu (2011), Icard and Goodman (2015), Kinney (2019), Beckers, Eberhardt and Halpern (2019), and Kinney and Watson (2020). Each of these approaches use utility functions or stipulate an acceptable level of information loss when coarse-graining causal variables in order to arrive at a tractable causal model of some system of interest.

There is some evidence that Woodward wants to resist approaches like these. He writes that proportionality is a “pragmatic virtue in the sense that it has a means/ends rationale or justification, [but] it is not ‘pragmatic’ in the sense that it depends on the idiosyncrasies of particular people’s interests or other similarly ‘subjective’ factors” (p. 373). This suggests that if Woodward’s proportionality criterion is to be amended to allow for lossy compression, then the amount of loss that is to be tolerated will have to be fixed not by the contextual conditions of a given agent’s motivations for engaging in causal reasoning, but rather by some more universal or objective determinant of what counts as acceptable information loss. There are reasons for skepticism that such a sharp contrast between subjective and objective determinants of tolerable information loss can be drawn. Even in stock examples like the causal generalization “smoking causes lung cancer,” the value of the generalization derives from a subjective concern with preventing cancer that may not be universally shared.

To make this more concrete, consider the analysis offered in Kinney (2019). Imagine an expected-utility maximizing agent with a utility function defined over all pairs consisting of an action from a set A and a value of an effect variable E. Given the relationship between a causal variable C and its effect E, we can ask such an agent how much they would be willing to pay to learn which intervention had been performed on C, before they choose an action from A. Suppose that there is some coarsening C¢ of C such that there isgreater proportionality between C and E (by the lights of Pocheville et al.’s measure) than there is between C¢ and C. That is, the choice of C¢ over C amounts to an instance of lossy compression. Nevertheless, there are scenarios in which such an agent would pay no more to learn which interventions had been performed on C than they would pay to learn which interventions has been performed on C¢. For instance, an agent concerned with preventing lung cancer might pay just as much to learn about interventions such that a patient either smokes or does not as she would to learn about interventions in which the brand of cigarettes smoked is specified, even when some cigarettes are more carcinogenic than others. In these cases, it seems that this agent has a clear pragmatic justification for engaging in lossy compression, even though this goes against the recommendation of proportionality, and even though it seems to depend on the kinds of personal idiosyncrasies that Woodward wants to avoid when formulating his own solution to the grain problem.

We hope that this piece will be read mostly as an invitation (to both Woodward and other philosophers and psychologists) to say more about how humans coarse-grain their implicit causal models of their environment, and how this coarse-graining reflects agential goals. This process of coarse-graining is central to how we understand our environment (i.e., how we form what Sellars (1963) called our “manifest image” of the world) in a way that allows for systematic and robust causal inference. Thus, we take coarse-graining to be a crucial part of the conceptual link between human psychology and the theory and practice of science. And it is the investigation of just this conceptual link that forms the core of Woodward’s project, both in this book and throughout his rich catalogue.

References

Beckers, S., Eberhardt, F., & Halpern, J. Y. (2020, August). Approximate Causal Abstractions. In Uncertainty in Artificial Intelligence (pp. 606-615). PMLR.

Blanchard, T., Murray, D., & Lombrozo, T. (2021). Experiments on causal exclusion. Mind & Language.

Brodu, N. (2011). Reconstruction of epsilon-machines in predictive frameworks and decisional states. Advances in Complex Systems, 14(05), 761-794.

Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). Asymmetries in predictive and diagnostic reasoning. Journal of Experimental Psychology: General, 140(2), 168.

Fernbach, P. M., Darlow, A., & Sloman, S. A. (2010). Neglect of alternative causes in predictive but not diagnostic reasoning. Psychological Science, 21(3), 329-336.

Ho, M. K., Abel, D., Correa, C. G., Littman, M. L., Cohen, J. D., & Griffiths, T. L. (2021). Control of mental representations in human planning. arXiv preprint arXiv:2105.06948

Icard, T., & Goodman, N. D. (2015, July). A Resource-Rational Approach to the Causal Frame Problem. In CogSci.

Johnson, S. G., & Keil, F. C. (2014). Causal inference and the hierarchical structure of experience. Journal of Experimental Psychology: General, 143(6), 2223.

Johnston, A. M., Sheskin, M., Johnson, S. G., & Keil, F. C. (2018). Preferences for explanation generality develop early in biology but not physics. Child Development, 89(4), 1110-1119.

Kinney, D. (2019). On the explanatory depth and pragmatic value of coarse-grained, probabilistic, causal explanations. Philosophy of Science, 86(1), 145-167.

Kinney, D., & Watson, D. (2020, February). Causal feature learning for utility-maximizing agents. In International Conference on Probabilistic Graphical Models (pp. 257-268). PMLR.

Pearl, J. (1994, July). A probabilistic calculus of actions. In Proceedings of the Tenth international conference on Uncertainty in artificial intelligence (pp. 454-462).

Pocheville, A., Griffiths, P. E., & Stotz, K. (2017, August). Comparing causes—an information-theoretic approach to specificity, proportionality and stability. In Proceedings of the 15th congress of logic, methodology and philosophy of science (pp. 93-102). London: College Publications.

Sellars, Wilfrid (ed.) (1963). Science, Perception and Reality. New York: Humanities Press.

Walker, C. M., Lombrozo, T., Williams, J. J., Rafferty, A. N., & Gopnik, A. (2017). Explaining constrains causal learning in childhood. Child development, 88(1), 229-246.

Williams, J. J., Lombrozo, T., & Rehder, B. (2013). The hazards of explanation: Overgeneralization in the face of exceptions. Journal of Experimental Psychology: General, 142(4), 1006.

Woodward, J. (2010). Causation in biology: stability, specificity, and the choice of levels of explanation. Biology & Philosophy, 25(3), 287-318.

[1] In this respect, Woodward’s definition of proportionality revises (correctly, in our view) the definition of proportionality given in Woodward (2010), which held that maximally proportional causal relationships were characterized by a one-to-one function between the causal variable and the effect variable.

[2] Note here that while Pocheville et al.’s proposed measure of proportionality contains the term p(do(v_j)), strictly speaking this term is not well-defined according to the standard theory of causal Bayesian networks. However, using “intervention variables” (see Pearl 1994), one can augment the causal Bayes nets framework so that this term is well-defined.

[3] It is worth noting that there is some indirect evidence that people might engage in lossy compression when pursuing causal reasoning. For instance, there is evidence that in estimating the probability of an effect given a cause, people operate with a causal model that is lossy in the sense that it omits alternative causes (Fernbach, Darlow, & Sloman, 2010, 2011). In choosing between two causal hypotheses, children sometimes favor the cause consistent with prior beliefs over the cause that better fits their observations, thus tolerating a loss in fidelity (Walker et al., 2017). For token-level causal claims, people sometimes favor claims at higher-levels over those that are maximally specific (Blanchard, Murray, & Lombrozo, 2021), and favor explanations that cite higher-level generalizations (Johnston et al., 2018) or match the level of the explanandum (Johnson & Keil, 2014). In Williams et al. (2013), participants who are prompted to explain why category members belong to their categories, more so than those in control conditions, rely on something like a single causal variable that explains most (but not all) cases vs. on an ugly disjunction that explains all cases. Finally, in Ho et al. (2021), participants who attempted to navigate an environment in order to achieve the goal of reaching an endpoint while avoiding obstacles were found to elide goal-irrelevant details in their representation of that environment. The authors of this study adopt an explicitly causal understanding of what counts as both a goal and an obstacle. This is not an exhaustive list of the evidence, but to our knowledge no prior work has directly investigated whether people exhibit lossy compression in solving the grain problem for type-level causal hypotheses or explanations.

4 Comments

Steve Esser

October 30, 2021 at 10:36 am 4 years ago

Hello. Unless it is my browser that is at fault, the formulas referenced that should appear after the 9th and 11th paragraphs are missing. Thanks!
- Nick Byrd
  
  October 30, 2021 at 3:25 pm 4 years ago
  
  Thanks Steve! We’ve been troubleshooting this with WordPress. Today’s change to site settings fixed it for us on multiple browsers on both mobile and desktop. If the problem persists for you, can you clear your browser cache for this site, reload the page, and then let us know if the problem remains?
Steve Esser

October 30, 2021 at 4:43 pm 4 years ago

Looking good now!
James Woodward

November 1, 2021 at 3:58 pm 4 years ago

I thank David Kinney and Tania Lombrozo (here after DT) for their very rich and stimulating commentary on my chapter on proportionality. In that chapter I had several goals. First I wanted to formulate a notion of proportionality as a sort of ceteris paribus ideal, motivated by the idea that we (at least often ) prefer more information about dependency relations to less. In this way, I hoped to account for our preferences in cases like Sophie the pigeon. Second, I wanted to develop the idea that, as explained in my response to Chris Hitchcock’s commentary, we often think we don’t have to go beyond what proportionality requires, so that leaving out certain kinds of information that does not add to our understanding of what an effect depends on is permissible. When I wrote, in a passage that D T quote, that “it is a striking empirical fact that the difference-making features cited in many lower-level theories sometimes can […] be absorbed into variables that figure in upper-level theories without a significant loss of difference-making information with re¬spect to many of the effects explained by those upper-level theories” , it was this second possibility that I was concerned with. I was trying to make the point that it is sometimes possible to pretty much fully preserve lower-level difference-making relations in more coarse -grained upper- level theories. For example, as discussed previously, virtually all of the difference-making information associated with a full specification of the positions and momenta of the molecules making up a dilute gas can be absorbed into a specification of the values of a very small number of thermodynamic variable such as temperature and pressure insofar as the effect in which we interested is the value of another thermodynamic variable. When I wrote that this is can be done without a significant loss of difference-making information what I had in mind that thermodynamics is what physicists call an “effective” theory, with large deviations from its predictions occurring only in “measure zero” cases. Here and arguably in some other cases, “without significant loss” has a straightforward meaning grounded in the physics. It does not seem to me to require reference to agent’s cost functions or how they trade off different desiderata. I thus think of such cases as not (in any interesting sense) case of “lossy compression” but rather cases of compression pretty much without loss.

Similarly in the pigeon pecking and similar examples, I took it that a background assumption is that the causal reasoner is not constrained by a need for trade offs with other desiderata– citing the redness of the target is not more or less costly (in terms of some other goal) than citing its scarlet color. I thus wouldn’t see this as a case involving lossy compression either, although perhaps I am wrong about that.

In their commentary DT purse what I am inclined to think of as a somewhat different (although no less interesting) set of issues, where compression of difference-making information does involve non-trivial “loss”, so that we are faced with a trade off between proportionality based considerations and others. I did not try to explore such issues in CHF , partly because the proportionality chapter was already complicated and partly because I was not sure what to say about them. (As I suggest below, introducing trade-offs, although this captures the more general case, makes it less obvious how to proceed.)
In any event, I welcome their discussion of such trade-off cases. To use one of their examples, consider a generalization G1 that specifies the exact probability p1 of someone like Jones developing lung cancer if he smokes 20 Marlboro cigarettes a day for 10 years (assume this includes a specification of the other factors on which this probability depends) and compare this with two other generalizations: G2 just specifies the probability p2 of lung cancer for someone like Jones if he smokes 20 cigarettes of any brand for ten years, where we suppose that p2 is slightly less than p1. G3 is even more generic: it just says that smoking causes lung cancer. G1 thus provides more detailed information about the dependency relations governing Jones’ risk of lung cancer than G2 or G3. Nonetheless, DT argue, in building a causal model relevant to his smoking decision Jones may not “care enough” about the difference between p2 and p1, to use G1 rather than G2 even though proportionality (at least as they see it) favors G1. For similar reasons Jones might prefer to be guided by G3. More generally, DT suggest that agents will often or usually trade off proportionality-based considerations against others such as “simplicity”, with less proportional but simpler causal claims being preferred to a more complex ones that better satisfy proportionality. Modeling this will require a cost or utility function specifying the terms of such a trade-off for the agent.

I find this proposal mostly plausible but wonder how exactly it might be developed in a way that is illuminating about either normative or descriptive issues. First, though, let me distinguish two sorts of cases. In the first kind of case the subject trades off proportionality- like considerations against other widely accepted (and non agent-specific) goals in causal reasoning. For example, generalization G3 above does less well with respect to proportionality than G1 but G3 is more portable and applicable to a larger range of other situations than G1. For this reason (among others), an agent might prefer to operate with G1. Somewhat question-beggingly, I’m going to describe this as a case involving trade-offs among epistemic goals. (Question-begging because I haven’t specified what “epistemic” means.)

In what seems to me to be a somewhat different kind of case, the agent also trades off proportionality against other considerations but these “other considerations” have a somewhat different status. Here I have in mind an agent who takes into account considerations like the cost of acquiring additional information or of storing it in memory or using it to guide action or efficiency of communication with others. I will call these “resource constraints”.

It seems very plausible, as DT suggest, that both kinds of trade-off cases are common. The question, as I see it, is what this suggests about how to proceed, either for the purposes of normative theorizing or from the point of empirical modeling. Looking first at descriptive issues, it seems possible in principle that we might discover that there there are relatively uniform trade-offs among different epistemic values that are widely shared across different agents– most people trade off proportionality against invariance and so on in pretty much the same way. Perhaps people usually have similar resource constraints and also trade these off against proportionality and other epistemic considerations in similar ways. In other words, people have very similar utility functions regarding such decisions. If this turns out to be the case it would be extremely interesting. It would then be important to try to understand why people have such uniform trade-off schedules and whether there is some rational basis for them.

On the hand, it might turn out that there is no such uniformity– people have very different utility functions or trade-off schedules or at least these are highly context specific, depending on the details of the goals and situations of individual agents. In this second kind of case, I wonder how model construction and testing should proceed. Do we try to model these individual utility functions in all of their variety? Do we just say that people trade-off proportionality considerations against others in various idiosyncratic ways and leave it at that? ( I realize that I am raising difficult questions about the treatment of individual differences, which is a vexed issue in psychology generally. There is of course a third, intermediate possibility which is that people can be grouped into a small number of types.)

Turning to the normative side of things, my concerns are similar. Is it plausible that there is a single objectively normatively correct standard that specifies how we ought to make trade-offs between proportionality other considerations? It would be wonderful if there were such a thing but I’m skeptical for reasons of the sort alluded to above. It sounds as though DT are skeptical as well.

So my question to DT is this: Once we acknowledge that people trade off proportionality considerations against others or they engage in lossy compression, how exactly should descriptive and normative theorizing proceed? Just saying that agents have utility or loss functions for trade-offs and that they maximize or minimize doesn’t seem to me to take us very far. I don’t say this to discourage the sort of approach that they outline– it would be fabulous if it could be worked out in detail. And I completely concur with their general point that researchers need to better understand how and when people engage in abstraction and coarse-graining in causal representation. Proportionality is at best only part of this story.

Finally, let me add that because I am pressed for time in composing this response, I have not been able to read the various papers in DT’s bibliography that contain technical results about incorporating utility-based considerations into various causal coarsening algorithms. (A number of these are unfamiliar to me.) It may be that some of my concerns are addressed in these papers. Perhaps it would be worthwhile to distinguish problems that arise for an individual decision-maker with a known well-defined loss function, where integration of that function with other modeling considerations is obviously important from cases in which one tries to model, e.g., groups of experimental subjects whose loss functions have to be inferred.

Comments are closed.