{"id":17,"date":"2012-01-23T21:52:55","date_gmt":"2012-01-24T02:52:55","guid":{"rendered":"https:\/\/michaelnielsen.org\/ddi\/?p=17"},"modified":"2015-11-25T13:50:37","modified_gmt":"2015-11-25T18:50:37","slug":"if-correlation-doesnt-imply-causation-then-what-does","status":"publish","type":"post","link":"https:\/\/michaelnielsen.org\/ddi\/if-correlation-doesnt-imply-causation-then-what-does\/","title":{"rendered":"If correlation doesn&#8217;t imply causation, then what does?"},"content":{"rendered":"<p>It is a commonplace of scientific discussion that correlation does not imply causation.  Business Week recently ran an <a href=\"http:\/\/www.businessweek.com\/magazine\/correlation-or-causation-12012011-gfx.html\">spoof   article<\/a> pointing out some amusing examples of the dangers of inferring causation from correlation.  For example, the article points out that Facebook&#8217;s growth has been strongly correlated with the yield on Greek government bonds: (<a href=\"http:\/\/www.businessweek.com\/magazine\/correlation-or-causation-12012011-gfx.html\">credit<\/a>)<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/correlation_greece_facebook.png\" width=\"360px\"><\/p>\n<p>Despite this strong correlation, it would not be wise to conclude that the success of Facebook has somehow <em>caused<\/em> the current (2009-2012) Greek debt crisis, nor that the Greek debt crisis has caused the adoption of Facebook!<\/p>\n<p>Of course, while it&#8217;s all very well to piously state that correlation doesn&#8217;t imply causation, it does leave us with a conundrum: under what conditions, exactly, can we use experimental data to deduce a causal relationship between two or more variables?  <\/p>\n<p>The standard scientific answer to this question is that (with some caveats) we can infer causality from a well designed <a href=\"http:\/\/en.wikipedia.org\/wiki\/Randomized_controlled_trial\">randomized   controlled experiment<\/a>.  Unfortunately, while this answer is satisfying in principle and sometimes useful in practice, it&#8217;s often impractical or impossible to do a randomized controlled experiment. And so we&#8217;re left with the question of whether there are other procedures we can use to infer causality from experimental data.  And, given that we can find more general procedures for inferring causal relationships, what does causality mean, anyway, for how we reason about a system?<\/p>\n<p>It might seem that the answers to such fundamental questions would have been settled long ago.  In fact, they turn out to be surprisingly subtle questions.  Over the past few decades, a group of scientists have developed a theory of <em>causal inference<\/em> intended to address these and other related questions.  This theory can be thought of as an algebra or language for reasoning about cause and effect.  Many elements of the theory have been laid out in a <a href=\"http:\/\/www.amazon.com\/Causality-Reasoning-Inference-Judea-Pearl\/dp\/0521773628\">famous   book<\/a> by one of the main contributors to the theory, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Judea_Pearl\">Judea Pearl<\/a>. Although the theory of causal inference is not yet fully formed, and is still undergoing development, what has already been accomplished is interesting and worth understanding.<\/p>\n<p>In this post I will describe one small but important part of the theory of causal inference, a <em>causal calculus<\/em> developed by Pearl.  This causal calculus is a set of three simple but powerful algebraic rules which can be used to make inferences about causal relationships.  In particular, I&#8217;ll explain how the causal calculus can sometimes (but not always!) be used to infer causation from a set of data, even when a randomized controlled experiment is not possible. Also in the post, I&#8217;ll describe some of the limits of the causal calculus, and some of my own speculations and questions.<\/p>\n<p>The post is a little technically detailed at points.  However, the first three sections of the post are non-technical, and I hope will be of broad interest.  Throughout the post I&#8217;ve included occasional &#8220;Problems for the author&#8221;, where I describe problems I&#8217;d like to solve, or things I&#8217;d like to understand better.  Feel free to ignore these if you find them distracting, but I hope they&#8217;ll give you some sense of what I find interesting about the subject.  Incidentally, I&#8217;m sure many of these problems have already been solved by others; I&#8217;m not claiming that these are all open research problems, although perhaps some are.  They&#8217;re simply things I&#8217;d like to understand better.  Also in the post I&#8217;ve included some exercises for the reader, and some slightly harder problems for the reader.  You may find it informative to work through these exercises and problems.<\/p>\n<p>Before diving in, one final caveat: I am not an expert on causal inference, nor on statistics.  The reason I wrote this post was to help me internalize the ideas of the causal calculus.  Occasionally, one finds a presentation of a technical subject which is beautifully clear and illuminating, a presentation where the author has seen right through the subject, and is able to convey that crystalized understanding to others.  That&#8217;s a great aspirational goal, but I don&#8217;t yet have that understanding of causal inference, and these notes don&#8217;t meet that standard.  Nonetheless, I hope others will find my notes useful, and that experts will speak up to correct any errors or misapprehensions on my part.<\/p>\n<h3>Simpson&#8217;s paradox<\/h3>\n<p>Let me start by explaining two example problems to illustrate some of the difficulties we run into when making inferences about causality. The first is known as <a href=\"http:\/\/en.wikipedia.org\/wiki\/Simpson's_paradox\">Simpson&#8217;s   paradox<\/a>.  To explain Simpson&#8217;s paradox I&#8217;ll use a concrete example based on the passage of the Civil Rights Act in the United States in 1964.<\/p>\n<p>In the US House of Representatives, 61 percent of Democrats voted for the Civil Rights Act, while a much higher percentage, 80 percent, of Republicans voted for the Act.  You might think that we could conclude from this that being Republican, rather than Democrat, was an important factor in causing someone to vote for the Civil Rights Act. However, the picture changes if we include an additional factor in the analysis, namely, whether a legislator came from a Northern or Southern state.  If we include that extra factor, the situation <em>completely<\/em> reverses, in both the North <em>and<\/em> the South. Here&#8217;s how it breaks down:<\/p>\n<p><strong>North:<\/strong> Democrat (94 percent), Republican (85 percent)<\/p>\n<p><strong>South:<\/strong> Democrat (7 percent), Republican (0 percent)<\/p>\n<p>Yes, you read that right: in <em>both<\/em> the North and the South, a larger fraction of Democrats than Republicans voted for the Act, despite the fact that <em>overall<\/em> a larger fraction of Republicans than Democrats voted for the Act.<\/p>\n<p>You might wonder how this can possibly be true.  I&#8217;ll quickly state the raw voting numbers, so you can check that the arithmetic works out, and then I&#8217;ll explain why it&#8217;s true.  You can skip the numbers if you trust my arithmetic.<\/p>\n<p><strong>North:<\/strong> Democrat (145\/154, 94 percent), Republican (138\/162, 85 percent)<\/p>\n<p><strong>South:<\/strong> Democrat (7\/94, 7 percent), Republican (0\/10, 0 percent)<\/p>\n<p><strong>Overall:<\/strong> Democrat (152\/248, 61 percent), Republican (138\/172, 80 percent)<\/p>\n<p>One way of understanding what&#8217;s going on is to note that a far greater proportion of Democrat (as opposed to Republican) legislators were from the South.  In fact, at the time the House had 94 Democrats, and only 10 Republicans.  Because of this enormous difference, the very low fraction (7 percent) of southern Democrats voting for the Act dragged down the Democrats&#8217; overall percentage much more than did the even lower fraction (0 percent) of southern Republicans who voted for the Act.<\/p>\n<p>(The numbers above are for the House of Congress.  The numbers were different in the Senate, but the same overall phenomenon occurred. I&#8217;ve taken the numbers from <a href=\"http:\/\/en.wikipedia.org\/wiki\/Simpson's_paradox#Civil_Rights_Act_of_1964\">Wikipedia&#8217;s   article about Simpson&#8217;s paradox<\/a>, and there are more details there.)<\/p>\n<p>If we take a naive causal point of view, this result looks like a paradox.  As I said above, the overall voting pattern seems to suggest that being Republican, rather than Democrat, was an important causal factor in voting for the Civil Rights Act.  Yet if we look at the individual statistics in <em>both<\/em> the North and the South, then we&#8217;d come to the exact <em>opposite<\/em> conclusion.  To state the same result more abstractly, Simpson&#8217;s paradox is the fact that the correlation between two variables can actually be <em>reversed<\/em> when additional factors are considered.  So two variables which appear correlated can become anticorrelated when another factor is taken into account.<\/p>\n<p>You might wonder if results like those we saw in voting on the Civil Rights Act are simply an unusual fluke.  But, in fact, this is not that uncommon. <a href=\"http:\/\/en.wikipedia.org\/wiki\/Simpson's_paradox\">Wikipedia&#8217;s   page on Simpson&#8217;s paradox<\/a> lists many important and similar real-world examples ranging from understanding whether there is gender-bias in university admissions to which treatment works best for kidney stones.  In each case, understanding the causal relationships turns out to be much more complex than one might at first think.<\/p>\n<p>I&#8217;ll now go through a second example of Simpson&#8217;s paradox, the kidney stone treatment example just mentioned, because it helps drive home just how bad our intuitions about statistics and causality are.<\/p>\n<p>Imagine you suffer from kidney stones, and your Doctor offers you two choices: treatment A or treatment B.  Your Doctor tells you that the two treatments have been tested in a trial, and treatment A was effective for a higher percentage of patients than treatment B.  If you&#8217;re like most people, at this point you&#8217;d say &#8220;Well, okay, I&#8217;ll go with treatment A&#8221;.<\/p>\n<p>Here&#8217;s the gotcha.  Keep in mind that this <em>really happened<\/em>. Suppose you divide patients in the trial up into those with large kidney stones, and those with small kidney stones.  Then even though treatment A was effective for a higher overall percentage of patients than treatment B, treatment B was effective for a higher percentage of patients in <em>both groups<\/em>, i.e., for both large and small kidney stones.  So your Doctor could just as honestly have said &#8220;Well, you have large [or small] kidney stones, and treatment B worked for a higher percentage of patients with large [or small] kidney stones than treatment A&#8221;.  If your Doctor had made either one of these statements, then if you&#8217;re like most people you&#8217;d have decided to go with treatment B, i.e., the exact opposite treatment.<\/p>\n<p>The kidney stone example relies, of course, on the same kind of arithmetic as in the Civil Rights Act voting, and it&#8217;s worth stopping to figure out for yourself how the claims I made above could possibly be true.  If you&#8217;re having trouble, you can click through to the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Simpson's_paradox#Kidney_stone_treatment\">Wikipedia   page<\/a>, which has all the details of the numbers.<\/p>\n<p>Now, I&#8217;ll confess that before learning about Simpson&#8217;s paradox, I would have unhesitatingly done just as I suggested a naive person would.  Indeed, even though I&#8217;ve now spent quite a bit of time pondering Simpson&#8217;s paradox, I&#8217;m not entirely sure I wouldn&#8217;t still sometimes make the same kind of mistake.  I find it more than a little mind-bending that my heuristics about how to behave on the basis of statistical evidence are obviously not just a little wrong, but utterly, horribly wrong.<\/p>\n<p>Perhaps I&#8217;m alone in having terrible intuition about how to interpret statistics.  But frankly I wouldn&#8217;t be surprised if most people share my confusion.  I often wonder how many people with real decision-making power &#8211; politicians, judges, and so on &#8211; are making decisions based on statistical studies, and yet they don&#8217;t understand even basic things like Simpson&#8217;s paradox.  Or, to put it another way, they have not the first clue about statistics.  Partial evidence may be worse than no evidence if it leads to an illusion of knowledge, and so to overconfidence and certainty where none is justified.  It&#8217;s better to know that you don&#8217;t know.<\/p>\n<h3>Correlation, causation, smoking, and lung cancer<\/h3>\n<p>As a second example of the difficulties in establishing causality, consider the relationship between cigarette smoking and lung cancer. In 1964 the United States&#8217; Surgeon General issued a <a href=\"http:\/\/profiles.nlm.nih.gov\/ps\/access\/NNBBMQ.pdf\">report<\/a> claiming that cigarette smoking causes lung cancer.  Unfortunately, according to Pearl the evidence in the report was based primarily on correlations between cigarette smoking and lung cancer.  As a result the report came under attack not just by tobacco companies, but also by some of the world&#8217;s most prominent statisticians, including the great <a href=\"http:\/\/en.wikipedia.org\/wiki\/Ronald_Fisher\">Ronald   Fisher<\/a>.  They claimed that there could be a hidden factor &#8211; maybe some kind of genetic factor &#8211; which caused both lung cancer <em>and<\/em> people to want to smoke (i.e., nicotine craving).  If that was true, then while smoking and lung cancer would be correlated, the decision to smoke or not smoke would have no impact on whether you got lung cancer.<\/p>\n<p>Now, you might scoff at this notion.  But derision isn&#8217;t a principled argument.  And, as the example of Simpson&#8217;s paradox showed, determining causality on the basis of correlations is tricky, at best, and can potentially lead to contradictory conclusions.  It&#8217;d be much better to have a principled way of using data to conclude that the relationship between smoking and lung cancer is not just a correlation, but rather that there truly is a causal relationship.<\/p>\n<p>One way of demonstrating this kind of causal connection is to do a randomized, controlled experiment.  We suppose there is some experimenter who has the power to <em>intervene<\/em> with a person, literally forcing them to either smoke (or not) according to the whim of the experimenter.  The experimenter takes a large group of people, and randomly divides them into two halves.  One half are forced to smoke, while the other half are forced not to smoke.  By doing this the experimenter can <em>break<\/em> the relationship between smoking and any hidden factor causing both smoking and lung cancer.  By comparing the cancer rates in the group who were forced to smoke to those who were forced not to smoke, it would then be possible determine whether or not there is truly a causal connection between smoking and lung cancer.<\/p>\n<p>This kind of randomized, controlled experiment is highly desirable when it can be done, but experimenters often don&#8217;t have this power. In the case of smoking, this kind of experiment would probably be illegal today, and, I suspect, even decades into the past.  And even when it&#8217;s legal, in many cases it would be impractical, as in the case of the Civil Rights Act, and for many other important political, legal, medical, and econonomic questions.<\/p>\n<h3>Causal models<\/h3>\n<p>To help address problems like the two example problems just discussed, Pearl introduced a <a href=\"http:\/\/ftp.cs.ucla.edu\/pub\/stat_ser\/R212.pdf\">causal calculus<\/a>. In the remainder of this post, I will explain the rules of the causal calculus, and use them to analyse the smoking-cancer connection. We&#8217;ll see that even without doing a randomized controlled experiment it&#8217;s possible (with the aid of some reasonable assumptions) to <em>infer<\/em> what the outcome of a randomized controlled experiment would have been, using only relatively easily accessible experimental data, data that doesn&#8217;t require experimental intervention to force people to smoke or not, but which can be obtained from purely observational studies.<\/p>\n<p>To state the rules of the causal calculus, we&#8217;ll need several background ideas.  I&#8217;ll explain those ideas over the next three sections of this post.  The ideas are <em>causal models<\/em> (covered in this section), <em>causal conditional probabilities<\/em>, and <em>d-separation<\/em>, respectively.  It&#8217;s a lot to swallow, but the ideas are powerful, and worth taking the time to understand.  With these notions under our belts, we&#8217;ll able to understand the rules of the causal calculus<\/p>\n<p>To understand causal models, consider the following graph of possible causal relationships between smoking, lung cancer, and some unknown hidden factor (say, a hidden genetic factor):<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/smoking_basic_causal_model.png\" width=\"260px\"><\/p>\n<p>This is a quite general model of causal relationships, in the sense that it includes both the suggestion of the US Surgeon General (smoking causes cancer) and also the suggestion of the tobacco companies (a hidden factor causes both smoking and cancer).  Indeed, it also allows a third possibility: that perhaps both smoking and some hidden factor contribute to lung cancer.  This combined relationship could potentially be quite complex: it could be, for example, that smoking alone actually reduces the chance of lung cancer, but the hidden factor increases the chance of lung cancer so much that someone who smokes would, on average, see an increased probability of lung cancer.  This sounds unlikely, but later we&#8217;ll see some toy model data which has exactly this property.<\/p>\n<p>Of course, the model depicted in the graph above is not the most general possible model of causal relationships in this system; it&#8217;s easy to imagine much more complex causal models.  But at the very least this is an <em>interesting<\/em> causal model, since it encompasses both the US Surgeon General and the tobacco company suggestions.  I&#8217;ll return later to the possibility of more general causal models, but for now we&#8217;ll simply keep this model in mind as a concrete example of a causal model.<\/p>\n<p>Mathematically speaking, what do the arrows of causality in the diagram above mean?  We&#8217;ll develop an answer to that question over the next few paragraphs.  It helps to start by moving away from the specific smoking-cancer model to allow a causal model to be based on a more general graph indicating possible causal relationships between a number of variables:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/general_causal_model.png\" width=\"220px\"><\/p>\n<p>Each vertex in this causal model has an associated random variable, <img src='https:\/\/s0.wp.com\/latex.php?latex=X_1%2CX_2%2C%5Cldots&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_1,X_2,\\ldots' title='X_1,X_2,\\ldots' class='latex' \/>.  For example, in the causal model above <img src='https:\/\/s0.wp.com\/latex.php?latex=X_2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_2' title='X_2' class='latex' \/> could be a two-outcome random variable indicating the presence or absence of some gene that exerts an influence on whether someone smokes or gets lung cancer, <img src='https:\/\/s0.wp.com\/latex.php?latex=X_3&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_3' title='X_3' class='latex' \/> indicates &#8220;smokes&#8221; or &#8220;does not smoke&#8221;, and <img src='https:\/\/s0.wp.com\/latex.php?latex=X_4&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_4' title='X_4' class='latex' \/> indicates &#8220;gets lung cancer&#8221; or &#8220;doesn&#8217;t get lung cancer&#8221;. The other variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X_1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_1' title='X_1' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=X_5&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_5' title='X_5' class='latex' \/> would refer to other potential dependencies in this (somewhat more complex) model of the smoking-cancer connection.<\/p>\n<p>A notational convention that we&#8217;ll use often is to interchangeably use <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> to refer to a random variable in the causal model, and also as a way of labelling the corresponding vertex in the graph for the causal model.  It should be clear from context which is meant.  We&#8217;ll also sometimes refer interchangeably to the causal model or to the associated graph.<\/p>\n<p>For the notion of causality to make sense we need to constrain the class of graphs that can be used in a causal model.  Obviously, it&#8217;d make no sense to have loops in the graph:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/graph_with_loops.png\" width=\"173px\"><\/p>\n<p>We can&#8217;t have <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> causing <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> causing <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> causing <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>!  At least, not without a time machine.  Because of this we constrain the graph to be a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Directed_acyclic_graph\">directed   acyclic graph<\/a>, meaning a (directed) graph which has no loops in it.<\/p>\n<p>By the way, I must admit that I&#8217;m not a fan of the term directed acyclic graph. It sounds like a very complicated notion, at least to my ear, when what it means is very simple: a graph with no loops.  I&#8217;d really prefer to call it a &#8220;loop-free graph&#8221;, or something like that.  Unfortunately, the &#8220;directed acyclic graph&#8221; nomenclature is pretty standard, so we&#8217;ll go with it.<\/p>\n<p>Our picture so far is that a causal model consists of a directed acyclic graph, whose vertices are labelled by random variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X_1%2CX_2%2C%5Cldots&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_1,X_2,\\ldots' title='X_1,X_2,\\ldots' class='latex' \/>.  To complete our definition of causal models we need to capture the allowed relationships between those random variables.<\/p>\n<p>Intuitively, what causality means is that for any particular <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> the only random variables which directly influence the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> are the <em>parents<\/em> of <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>, i.e., the collection <img src='https:\/\/s0.wp.com\/latex.php?latex=X_%7B%5Cmbox%7Bpa%7D%28j%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_{\\mbox{pa}(j)}' title='X_{\\mbox{pa}(j)}' class='latex' \/> of random variables which are connected directly to <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>.  For instance, in the graph shown below (which is the same as the complex graph we saw a little earlier), we have <img src='https:\/\/s0.wp.com\/latex.php?latex=X_%7B%5Cmbox%7Bpa%7D%284%29%7D+%3D+%28X_2%2CX_3%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_{\\mbox{pa}(4)} = (X_2,X_3)' title='X_{\\mbox{pa}(4)} = (X_2,X_3)' class='latex' \/>:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/general_causal_model.png\" width=\"220px\"><\/p>\n<p>Now, of course, vertices further back in the graph &#8211; say, the parents of the parents &#8211; could, of course, influence the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X_4&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_4' title='X_4' class='latex' \/>.  But it would be indirect, an influence mediated through the parent vertices.<\/p>\n<p>Note, by the way, that I&#8217;ve overloaded the <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> notation, using <img src='https:\/\/s0.wp.com\/latex.php?latex=X_%7B%5Cmbox%7Bpa%7D%284%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_{\\mbox{pa}(4)}' title='X_{\\mbox{pa}(4)}' class='latex' \/> to denote a collection of random variables.  I&#8217;ll use this kind of overloading quite a bit in the rest of this post.  In particular, I&#8217;ll often use the notation <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> (or <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/>, <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> or <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>) to denote a subset of random variables from the graph.<\/p>\n<p>Motivated by the above discussion, one way we could define causal influence would be to require that <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> be a function of its parents:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+X_j+%3D+f_j%28X_%7B%5Cmbox%7Bpa%7D%28j%29%7D%29%2C+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' X_j = f_j(X_{\\mbox{pa}(j)}), ' title=' X_j = f_j(X_{\\mbox{pa}(j)}), ' class='latex' \/>\n<p>where <img src='https:\/\/s0.wp.com\/latex.php?latex=f_j%28%5Ccdot%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_j(\\cdot)' title='f_j(\\cdot)' class='latex' \/> is some function.  In fact, we&#8217;ll allow a slightly more general notion of causal influence, allowing <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> to not just be a deterministic function of the parents, but a random function.  We do this by requiring that <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> be expressible in the form:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+X_j+%3D+f_j%28X_%7B%5Cmbox%7Bpa%7D%28j%29%7D%2CY_%7Bj%2C1%7D%2CY_%7Bj%2C2%7D%2C%5Cldots%29%2C+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' X_j = f_j(X_{\\mbox{pa}(j)},Y_{j,1},Y_{j,2},\\ldots), ' title=' X_j = f_j(X_{\\mbox{pa}(j)},Y_{j,1},Y_{j,2},\\ldots), ' class='latex' \/>\n<p>where <img src='https:\/\/s0.wp.com\/latex.php?latex=f_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_j' title='f_j' class='latex' \/> is a function, and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2C%5Ccdot%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,\\cdot}' title='Y_{j,\\cdot}' class='latex' \/> is a collection of random variables such that: (a) the <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2C%5Ccdot%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,\\cdot}' title='Y_{j,\\cdot}' class='latex' \/> are independent of one another for different values of <img src='https:\/\/s0.wp.com\/latex.php?latex=j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='j' title='j' class='latex' \/>; and (b) for each <img src='https:\/\/s0.wp.com\/latex.php?latex=j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='j' title='j' class='latex' \/>, <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2C%5Ccdot%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,\\cdot}' title='Y_{j,\\cdot}' class='latex' \/> is independent of all variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X_k&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_k' title='X_k' class='latex' \/>, except when <img src='https:\/\/s0.wp.com\/latex.php?latex=X_k&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_k' title='X_k' class='latex' \/> is <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> itself, or a descendant of <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>.  The intuition is that the <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2C%5Ccdot%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,\\cdot}' title='Y_{j,\\cdot}' class='latex' \/> are a collection of auxiliary random variables which inject some extra randomness into <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> (and, through <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>, its descendants), but which are otherwise independent of the variables in the causal model.<\/p>\n<p>Summing up, a causal model consists of a directed acyclic graph, <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/>, whose vertices are labelled by random variables, <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>, and each <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> is expressible in the form <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j+%3D+f_j%28X_%7B%5Cmbox%7Bpa%7D%28j%29%7D%2CY_%7Bj%2C%5Ccdot%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j = f_j(X_{\\mbox{pa}(j)},Y_{j,\\cdot})' title='X_j = f_j(X_{\\mbox{pa}(j)},Y_{j,\\cdot})' class='latex' \/> for some function <img src='https:\/\/s0.wp.com\/latex.php?latex=f_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_j' title='f_j' class='latex' \/>.  The <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2C%5Ccdot%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,\\cdot}' title='Y_{j,\\cdot}' class='latex' \/> are independent of one another, and each <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2C%5Ccdot%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,\\cdot}' title='Y_{j,\\cdot}' class='latex' \/> is independent of all variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X_k&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_k' title='X_k' class='latex' \/>, except when <img src='https:\/\/s0.wp.com\/latex.php?latex=X_k&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_k' title='X_k' class='latex' \/> is <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> or a descendant of <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>.<\/p>\n<p>In practice, we will not work directly with the functions <img src='https:\/\/s0.wp.com\/latex.php?latex=f_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_j' title='f_j' class='latex' \/> or the auxiliary random variables <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2C%5Ccdot%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,\\cdot}' title='Y_{j,\\cdot}' class='latex' \/>.  Instead, we&#8217;ll work with the following equation, which specifies the causal model&#8217;s joint probability distribution as a product of conditional probabilities:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+++p%28x_1%2Cx_2%2C%5Cldots%29+%3D+%5Cprod_j+p%28x_j+%7C+%5Cmbox%7Bpa%7D%28x_j%29%29.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='   p(x_1,x_2,\\ldots) = \\prod_j p(x_j | \\mbox{pa}(x_j)). ' title='   p(x_1,x_2,\\ldots) = \\prod_j p(x_j | \\mbox{pa}(x_j)). ' class='latex' \/>\n<p>I won&#8217;t prove this equation, but the expression should be plausible, and is pretty easy to prove; I&#8217;ve asked you to prove it as an optional exercise below. <\/p>\n<h3>Exercises<\/h3>\n<ul>\n<li> Prove the above equation for the joint probability distribution. <\/ul>\n<h3>Problems<\/h3>\n<ul>\n<li><strong>(Simpson&#8217;s paradox in causal models)<\/strong> Consider the causal model of   smoking introduced above. Suppose that the hidden factor is a gene   which is either switched on or off.  If on, it tends to make people   both smoke and get lung cancer.  Find explicit values for   conditional probabilities in the causal model such that   <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bsmokes%7D%29+%3E+p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bdoesn%27t+++++smoke%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer} | \\mbox{smokes}) &gt; p(\\mbox{cancer} | \\mbox{doesn&#039;t     smoke})' title='p(\\mbox{cancer} | \\mbox{smokes}) &gt; p(\\mbox{cancer} | \\mbox{doesn&#039;t     smoke})' class='latex' \/>, and yet if the additional genetic factor is taken into   account this relationship is reversed.  That is, we have both <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bsmokes%2C+gene+on%7D%29+%5C%2C%5C%2C+%5Ctextless+%5C%2C%5C%2C+p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bdoes+not+smoke%2C+gene+on%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer} | \\mbox{smokes, gene on}) \\,\\, \\textless \\,\\, p(\\mbox{cancer} | \\mbox{does not smoke, gene on})' title='p(\\mbox{cancer} | \\mbox{smokes, gene on}) \\,\\, \\textless \\,\\, p(\\mbox{cancer} | \\mbox{does not smoke, gene on})' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bsmokes%2C+gene+off%7D%29+%5C%2C%5C%2C+%5Ctextless+%5C%2C%5C%2C+p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bdoesn%27t+smoke%2C+gene+off%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer} | \\mbox{smokes, gene off}) \\,\\, \\textless \\,\\, p(\\mbox{cancer} | \\mbox{doesn&#039;t smoke, gene off})' title='p(\\mbox{cancer} | \\mbox{smokes, gene off}) \\,\\, \\textless \\,\\, p(\\mbox{cancer} | \\mbox{doesn&#039;t smoke, gene off})' class='latex' \/>.\n<\/li>\n<\/ul>\n<h3>Problems for the author<\/h3>\n<ul>\n<li> An alternate, equivalent approach to defining causal models is   as follows: (1) all root vertices (i.e., vertices with no parents)   in the graph are labelled by independent random variables. (2)   augment the graph by introducing new vertices corresponding to the   <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2Ck%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,k}' title='Y_{j,k}' class='latex' \/>.  These new vertices have single outgoing edges, pointing   to <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>.  (3) Require that non-root vertices in the augmented graph   be deterministic functions of their parents.  The disadvantage of   this definition is that it introduces the overhead of dealing with   the augmented graph.  But the definition also has the advantage of   cleanly separating the stochastic and deterministic components, and   I wouldn&#8217;t be surprised if developing the theory of causal inference   from this point of view was stimulating, at the very least, and may   possibly have some advantages compared to the standard approach.  So   the problem I set myself (and anyone else who is interested!) is to   carry the consequences of this change through the rest of the theory   of causal inference, looking for advantages and disadvantages. <\/ul>\n<p>I&#8217;ve been using terms like &#8220;causal influence&#8221; somewhat indiscriminately in the discussion above, and so I&#8217;d like to pause to discuss a bit more carefully about what is meant here, and what nomenclature we should use going forward.  All the arrows in a causal model indicate are the <em>possibility<\/em> of a <em>direct<\/em> causal influence.  This results in two caveats on how we think about causality in these models.  First, it may be that a child random variable is actually completely independent of the value of one (or more) of its parent random variables.  This is, admittedly, a rather special case, but is perfectly consistent with the definition.  For example, in a causal model like<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/smoking_basic_causal_model.png\" width=\"260px\"><\/p>\n<p>it is possible that the outcome of cancer might be independent of the hidden causal factor or, for that matter, that it might be independent of whether someone smokes or not.  (Indeed, logically, at least, it may be independent of both, although of course that&#8217;s not what we&#8217;ll find in the real world.)  The second caveat in how we think about the arrows and causality is that the arrows only capture the <em>direct<\/em> causal influences in the model.  It is possible that in a causal model like<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/general_causal_model.png\" width=\"220px\"><\/p>\n<p><img src='https:\/\/s0.wp.com\/latex.php?latex=X_1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_1' title='X_1' class='latex' \/> will have a causal influence on <img src='https:\/\/s0.wp.com\/latex.php?latex=X_5&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_5' title='X_5' class='latex' \/> through its influence on <img src='https:\/\/s0.wp.com\/latex.php?latex=X_2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_2' title='X_2' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=X_3&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_3' title='X_3' class='latex' \/>.  This would be an indirect causal influence, mediated by other random variables, but it would still be a causal influence. In the next section I&#8217;ll give a more formal definition of causal influence that can be used to make these ideas precise.<\/p>\n<h3>Causal conditional probabilities<\/h3>\n<p>In this section I&#8217;ll explain what I think is the most imaginative leap underlying the causal calculus.  It&#8217;s the introduction of the concept of <em>causal conditional probabilities<\/em>.<\/p>\n<p>The notion of ordinary conditional probabilities is no doubt familiar to you.  It&#8217;s pretty straightforward to do experiments to estimate conditional probabilities such as <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bsmoking%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{smoking})' title='p(\\mbox{cancer}| \\mbox{smoking})' class='latex' \/>, simply by looking at the population of people who smoke, and figuring out what fraction of those people develop cancer.  Unfortunately, for the purpose of understanding the causal relationship between smoking and cancer, <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bsmoking%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{smoking})' title='p(\\mbox{cancer}| \\mbox{smoking})' class='latex' \/> isn&#8217;t the quantity we want.  As the tobacco companies pointed out, there might well be a hidden genetic factor that makes it very likely that you&#8217;ll see cancer in anyone who smokes, but that wouldn&#8217;t therefore mean that smoking causes cancer.<\/p>\n<p>As we discussed earlier, what you&#8217;d really like to do in this circumstance is a randomized controlled experiment in which it&#8217;s possible for the experimenter to force someone to smoke (or not smoke), breaking the causal connection between the hidden factor and smoking.  In such an experiment you really could see if there was a causal influence by looking at what fraction of people who smoked got cancer.  In particular, if that fraction was higher than in the overall population then you&#8217;d be justified in concluding that smoking helped cause cancer.   In practice, it&#8217;s probably not practical to do this kind of randomized controlled experiment.  But Pearl had what turns out to be a very clever idea: to imagine a hypothetical world in which it really <em>is<\/em> possible to force someone to (for example) smoke, or not smoke.  In particular, he introduced a <em>conditional causal   probability<\/em> <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%28smoking%29%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do(smoking)})' title='p(\\mbox{cancer}| \\mbox{do(smoking)})' class='latex' \/>, which is the conditional probability of cancer in this hypothetical world.  This should be read as the (causal conditional) probability of cancer given that we &#8220;do&#8221; smoking, i.e., someone has been forced to smoke in a (hypothetical) randomized experiment.<\/p>\n<p>Now, at first sight this appears a rather useless thing to do.  But what makes it a clever imaginative leap is that although it may be impossible or impractical to do a controlled experiment to determine <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C%5Cmbox%7Bdo%7D%28%5Cmbox%7Bsmoking%7D%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}|\\mbox{do}(\\mbox{smoking}))' title='p(\\mbox{cancer}|\\mbox{do}(\\mbox{smoking}))' class='latex' \/>, Pearl was able to establish a set of rules &#8211; a causal calculus &#8211; that such causal conditional probabilities should obey.  And, by making use of this causal calculus, it turns out to sometimes be possible to <em>infer<\/em> the value of probabilities such as <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C%5Cmbox%7Bdo%7D%28%5Cmbox%7Bsmoking%7D%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}|\\mbox{do}(\\mbox{smoking}))' title='p(\\mbox{cancer}|\\mbox{do}(\\mbox{smoking}))' class='latex' \/>, even when a controlled, randomized experiment is impossible.  And that&#8217;s a very remarkable thing to be able to do, and why I say it was so clever to have introduced the notion of causal conditional probabilities.<\/p>\n<p>We&#8217;ll discuss the rules of the causal calculus later in this post. For now, though, let&#8217;s develop the notion of causal conditional probabilities.  Suppose we have a causal model of some phenomenon:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/general_causal_model.png\" width=\"220px\"><\/p>\n<p>Now suppose we introduce an external experimenter who is able to intervene to deliberately set the value of a particular variable <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=x_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_j' title='x_j' class='latex' \/>.  In other words, the experimenter can override the other causal influences on that variable.  This is equivalent to having a new causal model:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/intervention_in_causal_model.png\" width=\"280px\"><\/p>\n<p>In this new causal model, we&#8217;ve represented the experimenter by a new vertex, which has as a child the vertex <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>.  All other parents of <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> are cut off, i.e., the edges from the parents to <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> are deleted from the graph.  In this case that means the edge from <img src='https:\/\/s0.wp.com\/latex.php?latex=X_2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_2' title='X_2' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=X_3&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_3' title='X_3' class='latex' \/> has been deleted.  This represents the fact that the experimenter&#8217;s intervention overrides the other causal influences. (Note that the edges to the children of <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> are left undisturbed.) In fact, it&#8217;s even simpler (and equivalent) to consider a causal model where the parents have been cut off from <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>, and no extra vertex added:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/intervention_in_causal_model_simplified.png\" width=\"280px\"><\/p>\n<p>This model has no vertex explicitly representing the experimenter, but rather the relation <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j+%3D+f_j%28X_%7B%7B%5Crm+pa%7D%28j%7D%2CY_%7Bj%2C%5Ccdot%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j = f_j(X_{{\\rm pa}(j},Y_{j,\\cdot})' title='X_j = f_j(X_{{\\rm pa}(j},Y_{j,\\cdot})' class='latex' \/> is replaced by the relation <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j+%3D+x_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j = x_j' title='X_j = x_j' class='latex' \/>.  We will denote this graph by <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Coverline+X_j%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\overline X_j}' title='G_{\\overline X_j}' class='latex' \/>, indicating the graph in which all edges pointing to <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> have been deleted.  We will call this a <em>perturbed   graph<\/em>, and the corresponding causal model a <em>perturbed causal   model<\/em>.  In the perturbed causal model the only change is to delete the edges to <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/>, and to replace the relation <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j+%3D+f_j%28X_%7B%7B%5Crm+++++pa%7D%28j%7D%2CY_%7Bj%2C%5Ccdot%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j = f_j(X_{{\\rm     pa}(j},Y_{j,\\cdot})' title='X_j = f_j(X_{{\\rm     pa}(j},Y_{j,\\cdot})' class='latex' \/> by the relation <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j+%3D+x_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j = x_j' title='X_j = x_j' class='latex' \/>.<\/p>\n<p>Our aim is to use this perturbed causal model to compute the conditional causal probability <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28x_1%2C%5Cldots%2C%5Chat+x_j%2C+%5Cldots%2C+x_n+%7C+%5Cmbox%7Bdo%7D%28x_j%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(x_1,\\ldots,\\hat x_j, \\ldots, x_n | \\mbox{do}(x_j))' title='p(x_1,\\ldots,\\hat x_j, \\ldots, x_n | \\mbox{do}(x_j))' class='latex' \/>.  In this expression, <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Chat+x_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\hat x_j' title='\\hat x_j' class='latex' \/> indicates that the <img src='https:\/\/s0.wp.com\/latex.php?latex=x_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_j' title='x_j' class='latex' \/> term is omitted before the <img src='https:\/\/s0.wp.com\/latex.php?latex=%7C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='|' title='|' class='latex' \/>, since the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=x_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_j' title='x_j' class='latex' \/> is set on the right.  By definition, the causal conditional probability <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28x_1%2C%5Cldots%2C%5Chat+x_j%2C+%5Cldots%2C+x_n+%7C+%5Cmbox%7Bdo%7D%28x_j%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(x_1,\\ldots,\\hat x_j, \\ldots, x_n | \\mbox{do}(x_j))' title='p(x_1,\\ldots,\\hat x_j, \\ldots, x_n | \\mbox{do}(x_j))' class='latex' \/> is just the value of the probability distribution in the perturbed causal model, <img src='https:\/\/s0.wp.com\/latex.php?latex=p%27%28x_1%2C%5Cldots%2Cx_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p&#039;(x_1,\\ldots,x_n)' title='p&#039;(x_1,\\ldots,x_n)' class='latex' \/>.  To compute the value of the probability in the perturbed causal model, note that the probability distribution in the original causal model was given by<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+++p%28x_1%2C%5Cldots%2Cx_n%29+%3D+%5Cprod_k+p%28x_k%7C+%5Cmbox%7Bpa%7D%28x_k%29%29%2C+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='   p(x_1,\\ldots,x_n) = \\prod_k p(x_k| \\mbox{pa}(x_k)), ' title='   p(x_1,\\ldots,x_n) = \\prod_k p(x_k| \\mbox{pa}(x_k)), ' class='latex' \/>\n<p>where the product on the right is over all vertices in the causal model.  This expression remains true for the perturbed causal model, but a single term on the right-hand side changes: the conditional probability for the <img src='https:\/\/s0.wp.com\/latex.php?latex=x_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_j' title='x_j' class='latex' \/> term.  In particular, this term gets changed from <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28x_j%7C+%5Cmbox%7Bpa%7D%28x_j%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(x_j| \\mbox{pa}(x_j))' title='p(x_j| \\mbox{pa}(x_j))' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='1' title='1' class='latex' \/>, since we have fixed the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X_j' title='X_j' class='latex' \/> to be <img src='https:\/\/s0.wp.com\/latex.php?latex=x_j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_j' title='x_j' class='latex' \/>.  As a result we have:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+p%28x_1%2C%5Cldots%2C%5Chat+x_j%2C%5Cldots%2Cx_n+%7C+%5Cmbox%7Bdo%7D%28x_j%29%29++%3D+%5Cfrac%7Bp%28x_1%2C%5Cldots%2Cx_n%29%7D%7Bp%28x_j%7C%5Cmbox%7Bpa%7D%28x_j%29%29%7D.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' p(x_1,\\ldots,\\hat x_j,\\ldots,x_n | \\mbox{do}(x_j))  = \\frac{p(x_1,\\ldots,x_n)}{p(x_j|\\mbox{pa}(x_j))}. ' title=' p(x_1,\\ldots,\\hat x_j,\\ldots,x_n | \\mbox{do}(x_j))  = \\frac{p(x_1,\\ldots,x_n)}{p(x_j|\\mbox{pa}(x_j))}. ' class='latex' \/>\n<p>This equation is a fundamental expression, capturing what it means for an experimenter to intervene to set the value of some particular variable in a causal model.  It can easily be generalized to a situation where we partition the variables into two sets, <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, where <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> are the variables we suppose have been set by intervention in a (possibly hypothetical) randomized controlled experiment, and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are the remaining variables:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+%5B1%5D+%5C%2C%5C%2C%5C%2C%5C%2C+p%28Y%3Dy%7C+%5Cmbox%7Bdo%7D%28X%3Dx%29%29+%3D+%5Cfrac%7Bp%28X%3Dx%2CY%3Dy%29%7D%7B%5CPi_j+p%28X_j+%3D+x_j%7C+++%5Cmbox%7Bpa%7D%28X_j%29%29%7D.++&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' [1] \\,\\,\\,\\, p(Y=y| \\mbox{do}(X=x)) = \\frac{p(X=x,Y=y)}{\\Pi_j p(X_j = x_j|   \\mbox{pa}(X_j))}.  ' title=' [1] \\,\\,\\,\\, p(Y=y| \\mbox{do}(X=x)) = \\frac{p(X=x,Y=y)}{\\Pi_j p(X_j = x_j|   \\mbox{pa}(X_j))}.  ' class='latex' \/>\n<p>Note that on the right-hand side the values for <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Cmbox%7Bpa%7D%28X_j%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\mbox{pa}(X_j)' title='\\mbox{pa}(X_j)' class='latex' \/> are assumed to be given by the appropriate values from <img src='https:\/\/s0.wp.com\/latex.php?latex=x&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x' title='x' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y' title='y' class='latex' \/>.  The expression [1] can be viewed as a <em>definition<\/em> of causal conditional probabilities.  But although this expression is fundamental to understanding the causal calculus, it is not always useful in practice.  The problem is that the values of some of the variables on the right-hand side may not be known, and cannot be determined by experiment.  Consider, for example, the case of smoking and cancer.  Recall our causal model:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/smoking_basic_causal_model.png\" width=\"230px\"><\/p>\n<p>What we&#8217;d like is to compute <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%28smoking%29%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do(smoking)})' title='p(\\mbox{cancer}| \\mbox{do(smoking)})' class='latex' \/>. Unfortunately, we immediately run into a problem if we try to use the expression on the right of equation [1]: we&#8217;ve got no way of estimating the conditional probabilities for smoking given the hidden common factor.  So we can&#8217;t obviously compute <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%28smoking%29%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do(smoking)})' title='p(\\mbox{cancer}| \\mbox{do(smoking)})' class='latex' \/>.  And, as you can perhaps imagine, this is the kind of problem that will come up a lot whenever we&#8217;re worried about the possible influence of some hidden factor.<\/p>\n<p>All is not lost, however.  Just because we can&#8217;t compute the expression on the right of [1] directly doesn&#8217;t mean we can&#8217;t compute causal conditional probabilities in other ways, and we&#8217;ll see below how the causal calculus can help solve this kind of problem.  It&#8217;s not a complete solution &#8211; we shall see that it doesn&#8217;t always make it possible to compute causal conditional probabilities.  But it does help.  In particular, we&#8217;ll see that although it&#8217;s not possible to compute <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%28smoking%29%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do(smoking)})' title='p(\\mbox{cancer}| \\mbox{do(smoking)})' class='latex' \/> for this causal model, it is possible to compute <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%28smoking%29%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do(smoking)})' title='p(\\mbox{cancer}| \\mbox{do(smoking)})' class='latex' \/> in a very similar causal model, one that still has a hidden factor.<\/p>\n<p>With causal conditional probabilities defined, we&#8217;re now in position to define more precisely what we mean by causal influence.  Suppose we have a causal model, and <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are distinct random variables (or disjoint subsets of random variables).  Then we say <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> has a <em>causal influence<\/em> over <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> if there are values <img src='https:\/\/s0.wp.com\/latex.php?latex=x_1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_1' title='x_1' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=x_2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_2' title='x_2' class='latex' \/> of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y' title='y' class='latex' \/> of <img src='https:\/\/s0.wp.com\/latex.php?latex=y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y' title='y' class='latex' \/> such that <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x_1%29%29+%5Cneq+p%28y%7C%5Cmbox%7Bdo%7D%28x_2%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x_1)) \\neq p(y|\\mbox{do}(x_2))' title='p(y|\\mbox{do}(x_1)) \\neq p(y|\\mbox{do}(x_2))' class='latex' \/>.  In other words, an external experimenter who can intervene to change the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> can cause a corresponding change in the distribution of values at <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  The following exercise gives an information-theoretic justification for this definition of causal influence: it shows that an experimenter who can intervene to set <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> can transmit information to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> if and only if the above condition for causal inference is met.<\/p>\n<h3>Exercises<\/h3>\n<ul>\n<li><strong>(The causal capacity)<\/strong> This exercise is for   people with some background in information theory.  Suppose we   define the causal capacity between <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> to be <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Cmax_%7Bp%28%5Chat+++++x%29%7D+H%28%5Chat+X%3A+%5Chat+Y%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\max_{p(\\hat     x)} H(\\hat X: \\hat Y)' title='\\max_{p(\\hat     x)} H(\\hat X: \\hat Y)' class='latex' \/>, where <img src='https:\/\/s0.wp.com\/latex.php?latex=H%28%5Ccdot%3A%5Ccdot%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H(\\cdot:\\cdot)' title='H(\\cdot:\\cdot)' class='latex' \/> is the mutual   information, the maximization is over possible distributions <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Chat+++x%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\hat   x)' title='p(\\hat   x)' class='latex' \/> for <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Chat+X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\hat X' title='\\hat X' class='latex' \/> (we use the hat to indicate that the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>   is being set by intervention), and <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Chat+Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\hat Y' title='\\hat Y' class='latex' \/> is the corresponding   random variable at <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, with distribution <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Chat+y%29+%3D+%5Csum_%7B%5Chat+x%7D+++p%28%5Chat+y%7C%5Cmbox%7Bdo%7D%28%5Chat+x%29%29+p%28%5Chat+x%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\hat y) = \\sum_{\\hat x}   p(\\hat y|\\mbox{do}(\\hat x)) p(\\hat x)' title='p(\\hat y) = \\sum_{\\hat x}   p(\\hat y|\\mbox{do}(\\hat x)) p(\\hat x)' class='latex' \/>.  Shannon&#8217;s noisy channel   coding theorem tells us that an external experimenter who can   intervene to set the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> can transmit information to an   observer at <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> at a maximal rate set by the causal capacity.  Show   that the causal capacity is greater than zero if and only if <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> has   a causal influence over <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>. <\/ul>\n<p>We&#8217;ve just defined a notion of causal influence between two random variables in a causal model.  What about when we say something like &#8220;Event A&#8221; causes &#8220;Event B&#8221;?  What does this mean?  Returning to the smoking-cancer example, it seems that we would say that smoking causes cancer provided <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bdo%7D%28%5Cmbox%7Bsmoking%7D%29%29+%3E+p%28%5Cmbox%7Bcancer%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer} | \\mbox{do}(\\mbox{smoking})) &gt; p(\\mbox{cancer})' title='p(\\mbox{cancer} | \\mbox{do}(\\mbox{smoking})) &gt; p(\\mbox{cancer})' class='latex' \/>, so that if someone makes the choice to smoke, uninfluenced by other causal factors, then they would increase their chance of cancer.  Intuitively, it seems to me that this notion of events causing one another should be related to the notion of causal influence just defined above.  But I don&#8217;t yet see quite how to do that.  The first problem below suggests a conjecture in this direction:<\/p>\n<h3>Problems for the author<\/h3>\n<ul>\n<li> Suppose <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are random variables in a causal model such   that <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28Y%3Dy+%7C+%5Cmbox%7Bdo%7D%28X%3Dx%29%29+%3E+p%28Y%3Dy%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(Y=y | \\mbox{do}(X=x)) &gt; p(Y=y)' title='p(Y=y | \\mbox{do}(X=x)) &gt; p(Y=y)' class='latex' \/> for some pair of values <img src='https:\/\/s0.wp.com\/latex.php?latex=x&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x' title='x' class='latex' \/>   and <img src='https:\/\/s0.wp.com\/latex.php?latex=y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='y' title='y' class='latex' \/>.  Does this imply that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> exerts a causal influence on <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>?\n<li><strong>(Sum-over-paths for causal conditional probabilities?)<\/strong> I believe   a kind of sum-over-paths formulation of causal conditional   probabilities is possible, but haven&#8217;t worked out details.  The idea   is as follows (the details may be quite wrong, but I believe   something along these lines should work).  Supose <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are   single vertices (with corresponding random variables) in a causal   model. Then I would like to show first that if <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> is not an   ancestor of <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> then <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29+%3D+p%28y%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x)) = p(y)' title='p(y|\\mbox{do}(x)) = p(y)' class='latex' \/>, i.e., intervention   does nothing.  Second, if <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> is an ancestor of <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> then   <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x))' title='p(y|\\mbox{do}(x))' class='latex' \/> may be obtained by summing over all directed   paths from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> in <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Coverline+X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\overline X}' title='G_{\\overline X}' class='latex' \/>, and computing for each   path a contribution to the sum which is a product of conditional   probabilities along the path.  (Note that we may need to consider   the same path multiple times in the sum, since the random variables   along the path may take different values).\n<li> We used causal models in our definition of causal conditional   probabilities.  But our informal definiton &#8211; imagine a   hypothetical world in which it&#8217;s possible to force a variable to   take a particular value &#8211; didn&#8217;t obviously require the use of a   causal model.  Indeed, in a real-world randomized controlled   experiment it may be that there is no underlying causal model.  This   leads me to wonder if there is some other way of formalizing the   informal definition we&#8217;ve given?\n<li> Another way of framing the last problem is that I&#8217;m concerned   about the empirical basis for causal models.  How should we go about   constructing such models?  Are they fundamental, representing true   facts about the world, or are they modelling conveniences?  (This is   by no means a dichotomy.)  It would be useful to work through many   more examples, considering carefully the origin of the functions   <img src='https:\/\/s0.wp.com\/latex.php?latex=f_j%28%5Ccdot%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f_j(\\cdot)' title='f_j(\\cdot)' class='latex' \/> and of the auxiliary random variables <img src='https:\/\/s0.wp.com\/latex.php?latex=Y_%7Bj%2C%5Ccdot%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y_{j,\\cdot}' title='Y_{j,\\cdot}' class='latex' \/>. <\/ul>\n<h3>d-separation<\/h3>\n<p>In this section we&#8217;ll develop a criterion that Pearl calls <em>directional separation<\/em> (<em>d-separation<\/em>, for short).  What d-separation does is let us inspect the graph of a causal model and conclude that a random variable <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> in the model can&#8217;t tell us anything about the value of another random variable <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> in the model, or vice versa. <\/p>\n<p>To understand d-separation we&#8217;ll start with a simple case, and then work through increasingly complex cases, building up our intuition. I&#8217;ll conclude by giving a precise definition of d-separation, and by explaining how d-separation relates to the concept of conditional independence of random variables.<\/p>\n<p>Here&#8217;s the first simple causal model:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_traverse.png\" width=\"285px\"><\/p>\n<p>Clearly, knowing <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> can in general tell us something about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> in this kind of causal model, and so in this case <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are <em>not<\/em> d-separated.  We&#8217;ll use the term <em>d-connected<\/em> as a synonym for &#8220;not d-separated&#8221;, and so in this causal model <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are d-connected.<\/p>\n<p>By contrast, in the following causal model <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> don&#8217;t give us any information about each other, and so they are d-separated:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_collider.png\" width=\"287px\"><\/p>\n<p>A useful piece of terminology is to say that a vertex like the middle vertex in this model is a <em>collider<\/em> for the path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, meaning a vertex at which both edges along the path are incoming.<\/p>\n<p>What about the causal model:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_fork.png\" width=\"285px\"><\/p>\n<p>In this case, it is possible that knowing <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> will tell us something about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, because of their common ancestry.  It&#8217;s like the way knowing the genome for one sibling can give us information about the genome of another sibling, since similarities between the genomes can be inferred from the common ancestry. We&#8217;ll call a vertex like the middle vertex in this model a <em>fork<\/em> for the path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, meaning a vertex at which both edges are outgoing.<\/p>\n<h3>Exercises<\/h3>\n<ul>\n<li> Construct an explicit causal model demonstrating the assertion   of the last paragraph.  For example, you may construct a causal   model in which <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are joined by a fork, and where <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is   actually a function of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>.\n<li> Suppose we have a path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> in a causal model.  Let   <img src='https:\/\/s0.wp.com\/latex.php?latex=c&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='c' title='c' class='latex' \/> be the number of colliders along the path, and let <img src='https:\/\/s0.wp.com\/latex.php?latex=f&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f' title='f' class='latex' \/> be the   number of forks along the path.  Show that <img src='https:\/\/s0.wp.com\/latex.php?latex=%7Cf-c%7C&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='|f-c|' title='|f-c|' class='latex' \/> can only take the   values <img src='https:\/\/s0.wp.com\/latex.php?latex=0%2C+1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='0, 1' title='0, 1' class='latex' \/> or <img src='https:\/\/s0.wp.com\/latex.php?latex=-1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='-1' title='-1' class='latex' \/>, i.e., the number of forks and colliders is   either the same or differs by at most one.             <\/ul>\n<p>We&#8217;ll say that a path (of any length) from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> that contains a collider is a <em>blocked<\/em> path.  By contrast, a path that contains no colliders is called an <em>unblocked<\/em> path.  (Note that by the above exercise, an unblocked path must contain either one or no forks.)  In general, we define <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> to be <em>d-connected<\/em> if there is an unblocked path between them.  We define them to be <em>d-separated<\/em> if there is no such unblocked path.<\/p>\n<p>It&#8217;s worth noting that the concepts of d-separation and d-connectedness depend only on the graph topology and on which vertices <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> have been chosen.  In particular, they don&#8217;t depend on the nature of the random variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, merely on the identity of the corresponding vertices.  As a result, you can determine d-separation or d-connectdness simply by inspecting the graph.  This fact &#8211; that d-separation and d-connectdness are determined by the graph &#8211; also holds for the more sophisticated notions of d-separation and d-connectedness we develop below.<\/p>\n<p>With that said, it probably won&#8217;t surprise you to learn that the concept of d-separation is closely related to whether or not the random variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are independent of one another.  This is a connection you can (optionally) develop through the following exercises.  I&#8217;ll state a much more general connection below.<\/p>\n<h3>Exercises<\/h3>\n<ul>\n<li> Suppose that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are d-separated.  Show that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>   are independent random variables, i.e., that <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28x%2Cy%29+%3D+p%28x%29p%28y%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(x,y) = p(x)p(y)' title='p(x,y) = p(x)p(y)' class='latex' \/>.\n<li> Suppose we have two vertices which are d-connected in a graph   <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/>.  Explain how to construct a causal model on that graph such   that the random variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> corresponding to those two   vertices are <em>not<\/em> independent.\n<li> The last two exercises almost but don&#8217;t quite claim that random   variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> in a causal model are independent if and only   if they are d-separated.  Why does this statement fail to be true?   How can you modify the statement to make it true? <\/ul>\n<p>So far, this is pretty simple stuff. It gets more complicated, however, when we extend the notion of d-separation to cases where we are conditioning on already <em>knowing<\/em> the value of one or more random variables in the causal model.  Consider, for example, the graph:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_conditioned_traverse.png\" width=\"285px\"><\/p>\n<p>(Figure A.)<\/p>\n<p>Now, if we know <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, then knowing <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> doesn&#8217;t give us any additional information about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, since by our original definition of a causal model <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is already a function of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> and some auxiliary random variables which are independent of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>.  So it makes sense to say that <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> blocks this path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, even though in the unconditioned case this path would not have been considered blocked.  We&#8217;ll also say that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are d-separated, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.<\/p>\n<p>It is helpful to give a name to vertices like the middle vertex in Figure A, i.e., to vertices with one ingoing and one outgoing edge. We&#8217;ll call such vertices a <em>traverse<\/em> along the path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  Using this language, the lesson of the above discussion is that if <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> is in a traverse along a path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, then the path is blocked.<\/p>\n<p>By contrast, consider this model:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_conditioned_multiway.png\" width=\"227px\"><\/p>\n<p>In this case, knowing <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> will in general give us additional information about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, even if we know <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.  This is because while <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> blocks one path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> there is another unblocked path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  And so we say that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are d-connected, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.<\/p>\n<p>Another case similar to Figure A is the model with a fork:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_conditioned_fork.png\" width=\"288px\"><\/p>\n<p>Again, if we know <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, then knowing <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> as well doesn&#8217;t give us any extra information about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> (or vice versa).  So we&#8217;ll say that in this case <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> is blocking the path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, even though in the unconditioned case this path would not have been considered blocked. Again, in this example <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are d-separated, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.<\/p>\n<p>The lesson of this model is that if <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> is located at a fork along a path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, then the path is blocked.<\/p>\n<p>A subtlety arises when we consider a collider:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_conditioned_collider.png\" width=\"285px\"><\/p>\n<p>(Figure B.)<\/p>\n<p>In the unconditioned case this would have been considered a blocked path.  And, naively, it seems as though this should still be the case: at first sight (at least according to my intuition) it doesn&#8217;t seem very likely that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> can give us any additional information about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> (or vice versa), even given that <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> is known.  Yet we should be cautious, because the argument we made for the graph in Figure A breaks down: we can&#8217;t say, as we did for Figure A, that <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is a function of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> and some auxiliary independent random variables.<\/p>\n<p>In fact, we&#8217;re wise to be cautious because <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> really <em>can<\/em> tell us something extra about one another, given a knowledge of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.  This is a phenomenon which Pearl calls <em>Berkson&#8217;s paradox<\/em>.  He gives the example of a graduate school in music which will admit a student (a possibility encoded in the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>) if either they have high undergraduate grades (encoded in <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>) or some other evidence that they are exceptionally gifted at music (encoded in <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>).  It would not be surprising if these two attributes were anticorrelated amongst students in the program, e.g., students who were admitted on the basis of exceptional gifts would be more likely than otherwise to have low grades.  And so in this case knowledge of <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> (exceptional gifts) would give us knowledge of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> (likely to have low grades), conditioned on knowledge of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> (they were accepted into the program).<\/p>\n<p>Another way of seeing Berkson&#8217;s paradox is to construct an explicit causal model for the graph in Figure B.  Consider, for example, a causal model in which <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are independent random bits, <img src='https:\/\/s0.wp.com\/latex.php?latex=0&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='0' title='0' class='latex' \/> or <img src='https:\/\/s0.wp.com\/latex.php?latex=1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='1' title='1' class='latex' \/>, chosen with equal probabilities <img src='https:\/\/s0.wp.com\/latex.php?latex=1%2F2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='1\/2' title='1\/2' class='latex' \/>.  We suppose that <img src='https:\/\/s0.wp.com\/latex.php?latex=Z+%3D+X+%5Coplus+Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z = X \\oplus Y' title='Z = X \\oplus Y' class='latex' \/>, where <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Coplus&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\oplus' title='\\oplus' class='latex' \/> is addition modulo <img src='https:\/\/s0.wp.com\/latex.php?latex=2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='2' title='2' class='latex' \/>.  This causal model does, indeed, have the structure of Figure B.  But given that we know the value <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, knowing the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> tells us everything about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, since <img src='https:\/\/s0.wp.com\/latex.php?latex=Y+%3D+Z+%5Coplus+X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y = Z \\oplus X' title='Y = Z \\oplus X' class='latex' \/>.<\/p>\n<p>As a result of this discussion, in the causal graph of Figure B we&#8217;ll say that <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> unblocks the path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, even though in the unconditioned case the path would have been considered blocked.  And we&#8217;ll also say that in this causal graph <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are d-connected, conditional on <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.<\/p>\n<p>The immediate lesson from the graph of Figure B is that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> can tell us something about one another, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, if there is a path between <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> where the only collider is at <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.  In fact, the same phenomenon can occur even in this graph:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_conditioned_collider_ancestor.png\" width=\"285px\"><\/p>\n<p>(Figure C.)<\/p>\n<p>To see this, suppose we choose <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> as in the example just described above, i.e., independent random bits, <img src='https:\/\/s0.wp.com\/latex.php?latex=0&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='0' title='0' class='latex' \/> or <img src='https:\/\/s0.wp.com\/latex.php?latex=1&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='1' title='1' class='latex' \/>, chosen with equal probabilities <img src='https:\/\/s0.wp.com\/latex.php?latex=1%2F2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='1\/2' title='1\/2' class='latex' \/>.  We will let the unlabelled vertex be <img src='https:\/\/s0.wp.com\/latex.php?latex=W+%3D+X+%5Coplus+Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W = X \\oplus Y' title='W = X \\oplus Y' class='latex' \/>.  And, finally, we choose <img src='https:\/\/s0.wp.com\/latex.php?latex=Z+%3D+W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z = W' title='Z = W' class='latex' \/>.  Then we see as before that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> can tell us something about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, given that we know <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, because <img src='https:\/\/s0.wp.com\/latex.php?latex=X+%3D+Y+%5Coplus+Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X = Y \\oplus Z' title='X = Y \\oplus Z' class='latex' \/>.<\/p>\n<p>The general intuition about graphs like that in Figure C is that knowing <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> allows us to infer something about the ancestors of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, and so we must act as though those ancestors are known, too.  As a result, in this case we say that <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> unblocks the path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, since <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> has an ancestor which is a collider on the path from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  And so in this case <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> is d-connected to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.<\/p>\n<p>Given the discussion of Figure C that we&#8217;ve just had, you might wonder why forks or traverses which are ancestors of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> can&#8217;t block a path, for similar reasons?  For instance, why don&#8217;t we consider <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> to be d-separated, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, in the following graph:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_conditioned_traverse_ancestor.png\" width=\"287px\"><\/p>\n<p>The reason, of course, is that it&#8217;s easy to construct examples where <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> tells us something about <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> in <em>addition<\/em> to what we already know from <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.  And so we can&#8217;t consider <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> to be d-separated, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, in this example.<\/p>\n<p>These examples motivate the following definition:<\/p>\n<p><strong>Definition:<\/strong> Let <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>, <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> be disjoint subsets of vertices in a causal model.  Consider a path from a vertex in <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to a vertex in <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  We say the path is <em>blocked<\/em> by <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> if the path contains either: (a) a collider which is not an ancestor of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, or (b) a fork which is in <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, or (c) a traverse which is in <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.  We say the path is <em>unblocked<\/em> if it is not blocked.  We say that <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are <em>d-connected<\/em>, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, if there is an unblocked path between some vertex in <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and some vertex in <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are <em>d-separated<\/em>, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, if they are not d-connected.<\/p>\n<p>Saying &#8220;<img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are d-separated given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>&#8221; is a bit of a mouthful, and so it&#8217;s helpful to have an abbreviated notation.  We&#8217;ll use the abbreviation <img src='https:\/\/s0.wp.com\/latex.php?latex=%28X+%5Cperp+Y%7CZ%29_G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(X \\perp Y|Z)_G' title='(X \\perp Y|Z)_G' class='latex' \/>.  Note that this notation includes the graph <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/>; we&#8217;ll sometimes omit the graph when the context is clear.  We&#8217;ll write <img src='https:\/\/s0.wp.com\/latex.php?latex=%28X+%5Cperp+Y%29_G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(X \\perp Y)_G' title='(X \\perp Y)_G' class='latex' \/> to denote unconditional d-separation.<\/p>\n<p>As an aside, Pearl uses a similar but slightly different notation for d-separation, namely <img src='https:\/\/s0.wp.com\/latex.php?latex=%28X+%5Cperp+%5C%21+%5C%21+%5Cperp+Y%7CZ%29_G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(X \\perp \\! \\! \\perp Y|Z)_G' title='(X \\perp \\! \\! \\perp Y|Z)_G' class='latex' \/>.  Unfortunately, while the symbol <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Cperp+%5C%21+%5C%21+%5Cperp&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\perp \\! \\! \\perp' title='\\perp \\! \\! \\perp' class='latex' \/> looks like a LaTeX symbol, it&#8217;s not, but is most easily produced using a rather dodgy LaTeX hack. Instead of using that hack over and over again, I&#8217;ve adopted a more standard LaTeX notation.<\/p>\n<p>While I&#8217;m making asides, let me make a second: when I was first learning this material, I found the &#8220;d&#8221; for &#8220;directional&#8221; in d-separation and d-connected rather confusing.  It suggested to me that the key thing was having a directed path from one vertex to the other, and that the complexities of colliders, forks, and so on were a sideshow.  Of course, they&#8217;re not, they&#8217;re central to the whole discussion.  For this reason, when I was writing these notes I considered changing the terminology to i-separated and i-connected, for informationally-separated and informationally-connected. Ultimately I decided not to do this, but I thought mentioning the issue might be helpful, in part to reassure readers (like me) who thought the &#8220;d&#8221; seemed a little mysterious.<\/p>\n<p>Okay, that&#8217;s enough asides, let&#8217;s get back to the main track of discussion.<\/p>\n<p>We saw earlier that (unconditional) d-separation is closely connected to the independence of random variables.  It probably won&#8217;t surprise you to learn that conditional d-separation is closely connected to conditional independence of random variables.  Recall that two sets of random variables <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are <em>conditionally independent<\/em>, given a third set of random variables <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, if <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28x%2Cy%7Cz%29+%3D+p%28x%7Cz%29p%28y%7Cz%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(x,y|z) = p(x|z)p(y|z)' title='p(x,y|z) = p(x|z)p(y|z)' class='latex' \/>.  The following theorem shows that d-separation gives a criterion for when conditional independence occurs in a causal model:<\/p>\n<p><strong>Theorem (graphical criterion for conditional independence):<\/strong> Let <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/> be a graph, and let <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>, <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> be disjoint subsets of vertices in that graph.  Then <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are d-separated, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, if and only if for all causal models on <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/> the random variables corresponding to <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> are conditionally independent, given <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.<\/p>\n<p><em>(Update: Thanks to Rob Spekkens for pointing out an error in my original statement of this theorem.)<\/em><\/p>\n<p>I won&#8217;t prove the theorem here.  However, it&#8217;s not especially difficult if you&#8217;ve followed the discussion above, and is a good problem to work through:<\/p>\n<h3>Problems<\/h3>\n<ul>\n<li> Prove the above theorem. <\/ul>\n<h3>Problems for the author<\/h3>\n<ul>\n<li> The concept of d-separation plays a central role in the causal   calculus.  My sense is that it should be possible to find a cleaner   and more intuitive definition that substantially simplifies many   proofs.  It&#8217;d be good to spend some time trying to find such a   definition. <\/ul>\n<h3>The causal calculus<\/h3>\n<p>We&#8217;ve now got all the concepts we need to state the rules of the causal calculus.  There are three rules.  The rules look complicated at first, although they&#8217;re easy to use once you get familiar with them.  For this reason I&#8217;ll start by explaining the intuition behind the first rule, and how you should think about that rule.  Having understood how to think about the first rule it&#8217;s easy to get the hang of all three rules, and so after that I&#8217;ll just outright state all three rules.<\/p>\n<p>In what follows, we have a causal model on a graph <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/>, and <img src='https:\/\/s0.wp.com\/latex.php?latex=W%2C+X%2C+Y%2C+Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W, X, Y, Z' title='W, X, Y, Z' class='latex' \/> are disjoint subsets of the variables in the causal model.  Recall also that <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Coverline+X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\overline X}' title='G_{\\overline X}' class='latex' \/> denotes the perturbed graph in which all edges pointing to <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> from the parents of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> have been deleted.  This is the graph which results when an experimenter intervenes to set the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>, overriding other causal influences on <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>.<\/p>\n<p><strong>Rule 1: When can we ignore observations:<\/strong> I&#8217;ll begin by stating the first rule in all its glory, but don&#8217;t worry if you don&#8217;t immediately grok the whole rule.  Instead, just take a look, and try to start getting your head around it.  What we&#8217;ll do then is look at some simple special cases, which are easily understood, and gradually build up to an understanding of what the full rule is saying.<\/p>\n<p>Okay, so here&#8217;s the first rule of the causal calculus.  What it tells us is that when <img src='https:\/\/s0.wp.com\/latex.php?latex=%28Y+%5Cperp+Z%7CW%2CX%29_%7BG_%7B%5Coverline+X%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(Y \\perp Z|W,X)_{G_{\\overline X}}' title='(Y \\perp Z|W,X)_{G_{\\overline X}}' class='latex' \/>, then we can ignore the observation of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> in computing the probability of <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, conditional on both <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/> and an intervention to set <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+p%28y%7Cw%2C%5Cmbox%7Bdo%7D%28x%29%2Cz%29+%3D+p%28y%7Cw%2C%5Cmbox%7Bdo%7D%28x%29%29+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' p(y|w,\\mbox{do}(x),z) = p(y|w,\\mbox{do}(x)) ' title=' p(y|w,\\mbox{do}(x),z) = p(y|w,\\mbox{do}(x)) ' class='latex' \/>\n<p>To understand why this rule is true, and what it means, let&#8217;s start with a much simpler case.  Let&#8217;s look at what happens to the rule when there are no <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> or <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/> variables in the mix.  In this case, our starting assumption simply becomes that <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is d-separated from <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> in the original (unperturbed) graph <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/>.  There&#8217;s no need to worry about <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Coverline+X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\overline X}' title='G_{\\overline X}' class='latex' \/> because there&#8217;s no <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> variable whose value is being set by intervention.  In this circumstance we have <img src='https:\/\/s0.wp.com\/latex.php?latex=%28Y+%5Cperp+Z%29_G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(Y \\perp Z)_G' title='(Y \\perp Z)_G' class='latex' \/>, so <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is independent of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>.  But the statement of the rule in this case is merely that <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7Cz%29+%3D+p%28y%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|z) = p(y)' title='p(y|z) = p(y)' class='latex' \/>, which is, indeed, equivalent to the standard definition of <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> being independent.<\/p>\n<p>In other words, the first rule is simply a generalization of what it means for <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> to be independent.  The full rule generalizes the notion of independence in two ways: (1) by adding in an extra variable <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/> whose value has been determined by passive observation; and (2) by adding in an extra variable <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> whose value has been set by intervention.  We&#8217;ll consider these two ways of generalizing separately in the next two paragraphs.<\/p>\n<p>We begin with generalization (1), i.e., there is no <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> variable in the mix.  In this case, our starting assumption becomes that <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is d-separated from <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, given <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/>, in the graph <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/>.  By the graphical criterion for conditional independence discussed in the last section this means that <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is conditionally independent of <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, given <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/>, and so <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7Cz%2Cw%29+%3D+p%28y%7Cw%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|z,w) = p(y|w)' title='p(y|z,w) = p(y|w)' class='latex' \/>, which is exactly the statement of the rule.  And so the first rule can be viewed as a generalization of what it means for <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> to be independent, conditional on <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/>.<\/p>\n<p>Now let&#8217;s look at the other generalization, (2), in which we&#8217;ve added an extra variable <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> whose value has been set by intervention, and where there is no <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/> variable in the mix.  In this case, our starting assumption becomes that <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is d-separated from <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, given <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>, in the perturbed graph <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Coverline+X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\overline X}' title='G_{\\overline X}' class='latex' \/>.  In this case, the graphical criterion for conditional indepenence tells us that <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is independent from <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/>, conditional on the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> being set by experimental intervention, and so <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%2Cz%29+%3D+p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x),z) = p(y|\\mbox{do}(x))' title='p(y|\\mbox{do}(x),z) = p(y|\\mbox{do}(x))' class='latex' \/>. Again, this is exactly the statement of the rule.<\/p>\n<p>The full rule, of course, merely combines both these generalizations in the obvious way.  It is really just an explicit statement of the content of the graphical criterion for conditional independence, in a context where <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/> has been observed, and the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> set by experimental intervention.<\/p>\n<p><strong>The rules of the causal calculus:<\/strong> All three rules of the causal calculus follow a similar template to the first rule: they provide ways of using facts about the causal structure (notably, d-separation) to make inferences about conditional causal probabilities.  I&#8217;ll now state all three rules.  The intuition behind rules 2 and 3 won&#8217;t necessarily be entirely obvious, but after our discussion of rule 1 the remaining rules should at least appear plausible and comprehensible.  I&#8217;ll have bit more to say about intuition below.<\/p>\n<p>As above, we have a causal model on a graph <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/>, and <img src='https:\/\/s0.wp.com\/latex.php?latex=W%2C+X%2C+Y%2C+Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W, X, Y, Z' title='W, X, Y, Z' class='latex' \/> are disjoint subsets of the variables in the causal model.  <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Coverline+++X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\overline   X}' title='G_{\\overline   X}' class='latex' \/> denotes the perturbed graph in which all edges pointing to <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> from the parents of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> have been deleted.  <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Cunderline+X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\underline X}' title='G_{\\underline X}' class='latex' \/> denotes the graph in which all edges pointing out from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to the children of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> have been deleted.  We will also freely use notations like <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Coverline+W%2C+%5Coverline+X%2C+%5Cunderline+Z%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\overline W, \\overline X, \\underline Z}' title='G_{\\overline W, \\overline X, \\underline Z}' class='latex' \/> to denote combinations of these operations.<\/p>\n<p><strong>Rule 1: When can we ignore observations:<\/strong> Suppose <img src='https:\/\/s0.wp.com\/latex.php?latex=%28Y+%5Cperp+Z%7CW%2CX%29_%7BG_%7B%5Coverline+X%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(Y \\perp Z|W,X)_{G_{\\overline X}}' title='(Y \\perp Z|W,X)_{G_{\\overline X}}' class='latex' \/>.  Then:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+++p%28y%7Cw%2C%5Cmbox%7Bdo%7D%28x%29%2Cz%29+%3D+p%28y%7Cw%2C%5Cmbox%7Bdo%7D%28x%29%29.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='   p(y|w,\\mbox{do}(x),z) = p(y|w,\\mbox{do}(x)). ' title='   p(y|w,\\mbox{do}(x),z) = p(y|w,\\mbox{do}(x)). ' class='latex' \/>\n<p><strong>Rule 2: When can we ignore the act of intervention:<\/strong> Suppose <img src='https:\/\/s0.wp.com\/latex.php?latex=%28Y+%5Cperp+Z%7CW%2CX%29_%7BG_%7B%5Coverline+X%2C%5Cunderline+Z%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(Y \\perp Z|W,X)_{G_{\\overline X,\\underline Z}}' title='(Y \\perp Z|W,X)_{G_{\\overline X,\\underline Z}}' class='latex' \/>.  Then:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+p%28y%7Cw%2C%5Cmbox%7Bdo%7D%28x%29%2C%5Cmbox%7Bdo%7D%28z%29%29+%3D+p%28y%7Cw%2C%5Cmbox%7Bdo%7D%28x%29%2Cz%29.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' p(y|w,\\mbox{do}(x),\\mbox{do}(z)) = p(y|w,\\mbox{do}(x),z). ' title=' p(y|w,\\mbox{do}(x),\\mbox{do}(z)) = p(y|w,\\mbox{do}(x),z). ' class='latex' \/>\n<p><strong>Rule 3: When can we ignore an intervention variable entirely:<\/strong> Let <img src='https:\/\/s0.wp.com\/latex.php?latex=Z%28W%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z(W)' title='Z(W)' class='latex' \/> denote the set of nodes in <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> which are not ancestors of <img src='https:\/\/s0.wp.com\/latex.php?latex=W&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='W' title='W' class='latex' \/>.  Suppose <img src='https:\/\/s0.wp.com\/latex.php?latex=%28Y+%5Cperp+Z%7CW%2CX%29_%7BG_%7B%5Coverline+X%2C+%5Coverline%7BZ%28W%29%7D%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(Y \\perp Z|W,X)_{G_{\\overline X, \\overline{Z(W)}}}' title='(Y \\perp Z|W,X)_{G_{\\overline X, \\overline{Z(W)}}}' class='latex' \/>. Then:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+p%28y%7Cw%2C%5Cmbox%7Bdo%7D%28x%29%2C%5Cmbox%7Bdo%7D%28z%29%29+%3D+p%28y%7Cw%2C%5Cmbox%7Bdo%7D%28x%29%29.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' p(y|w,\\mbox{do}(x),\\mbox{do}(z)) = p(y|w,\\mbox{do}(x)). ' title=' p(y|w,\\mbox{do}(x),\\mbox{do}(z)) = p(y|w,\\mbox{do}(x)). ' class='latex' \/>\n<p>In a sense, all three rules are statements of conditional independence.  The first rule tells us when we can ignore an observation.  The second rule tells us when we can ignore the <em>act<\/em> of intervention (although that doesn&#8217;t necessarily mean we can ignore the value of the variable being intervened with).  And the third rule tells us when we can ignore an intervention entirely, both the act of intervention, and the value of the variable being intervened with.<\/p>\n<p>I won&#8217;t prove rule 2 or rule 3 &#8211; this post is already quite long enough.  (If I ever significantly revise the post I may include the proofs).  The important thing to take away from these rules is that they give us conditions on the structure of causal models so that we know when we can ignore observations, acts of intervention, or even entire variables that have been intervened with.  This is obviously a powerful set of tools to be working with in manipulating conditional causal probabilities!<\/p>\n<p>Indeed, according to Pearl there&#8217;s even a sense in which this set of rules is <em>complete<\/em>, meaning that using these rules you can identify all causal effects in a causal model.  I haven&#8217;t yet understood the proof of this result, or even exactly what it means, but thought I&#8217;d mention it.  The proof is in papers by <a href=\"http:\/\/ftp.cs.ucla.edu\/pub\/stat_ser\/r329-uai.pdf\">Shpitser and   Pearl<\/a> and <a href=\"http:\/\/www.cse.sc.edu\/~mgv\/papers\/HuangValtortaUAI06.pdf\">Huang   and Valtorta<\/a>.  If you&#8217;d like to see the proofs of the rules of the calculus, you can either have a go at proving them yourself, or you can <a href=\"http:\/\/ftp.cs.ucla.edu\/pub\/stat_ser\/R218-B.pdf\">read the   proof<\/a>.<\/p>\n<h3>Problems for the author<\/h3>\n<ul>\n<li> Suppose the conditions of rules 1 and 2 hold.  Can we deduce   that the conditions of rule 3 also hold? <\/ul>\n<h3>Using the causal calculus to analyse the smoking-lung cancer   connection<\/h3>\n<p>We&#8217;ll now use the causal calculus to analyse the connection between smoking and lung cancer.  Earlier, I introduced a simple causal model of this connection:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/smoking_basic_causal_model.png\" width=\"230px\"><\/p>\n<p>The great benefit of this model was that it included as special cases both the hypothesis that smoking causes cancer and the hypothesis that some hidden causal factor was responsible for both smoking and cancer.<\/p>\n<p>It turns out, unfortunately, that the causal calculus doesn&#8217;t help us analyse this model.  I&#8217;ll explain why that&#8217;s the case below.  However, rather than worrying about this, at this stage it&#8217;s more instructive to work through an example showing how the causal calculus <em>can<\/em> be helpful in analysing a similar but slightly modified causal model. So although this modification looks a little mysterious at first, for now I hope you&#8217;ll be willing to accept it as given.<\/p>\n<p>The way I&#8217;m going to modify the causal model is by introducing an extra variable, namely, whether someone has appreciable amounts of tar in their lungs or not:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_modified_smoking_model.png\" width=\"260px\"><\/p>\n<p>(By tar, I don&#8217;t mean &#8220;tar&#8221; literally, but rather all the material deposits found as a result of smoking.)<\/p>\n<p>This causal model is a plausible modification of the original causal model.  It is at least plausible to suppose that smoking causes tar in the lungs and that those deposits in turn cause cancer.  But if the hidden causal factor is genetic, as the tobacco companies argued was the case, then it seems highly unlikely that the genetic factor caused tar in the lungs, except by the indirect route of causing those people to smoke.  (I&#8217;ll come back to what happens if you refuse to accept this line of reasoning.  For now, just go with it.)<\/p>\n<p>Our goal in this modified causal model is to compute probabilities like <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C%5Cmbox%7Bdo%7D%28smoking%29%29+%3D+p%28y%7C+%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}|\\mbox{do}(smoking)) = p(y| \\mbox{do}(x))' title='p(\\mbox{cancer}|\\mbox{do}(smoking)) = p(y| \\mbox{do}(x))' class='latex' \/>.  What we&#8217;ll show is that the causal calculus lets us compute this probability <em>entirely<\/em> in terms of probabilities like <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7Cz%29%2C+p%28z%7Cy%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|z), p(z|y)' title='p(y|z), p(z|y)' class='latex' \/> and other probabilities that <em>don&#8217;t<\/em> involve an intervention, i.e., that don&#8217;t involve <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Cmbox%7Bdo%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\mbox{do}' title='\\mbox{do}' class='latex' \/>.  <\/p>\n<p>This means that we can determine <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C%5Cmbox%7Bdo%7D%28smoking%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}|\\mbox{do}(smoking))' title='p(\\mbox{cancer}|\\mbox{do}(smoking))' class='latex' \/> <em>without<\/em> needing to know anything about the hidden factor.  We won&#8217;t even need to know the <em>nature<\/em> of the hidden factor.  It also means that we can determine <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C%5Cmbox%7Bdo%7D%28smoking%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}|\\mbox{do}(smoking))' title='p(\\mbox{cancer}|\\mbox{do}(smoking))' class='latex' \/> without needing to intervene to force someone to smoke or not smoke, i.e., to set the value for <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>.<\/p>\n<p>In other words, the causal calculus lets us do something that seems almost miraculous: we can figure out the probability that someone would get cancer given that they are in the smoking group in a randomized controlled experiment, without needing to do the randomized controlled experiment.  And this is true even though there may be a hidden causal factor underlying both smoking and cancer.<\/p>\n<p>Okay, so how do we compute <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C%5Cmbox%7Bdo%7D%28smoking%29%29+%3D+p%28y%7C+%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}|\\mbox{do}(smoking)) = p(y| \\mbox{do}(x))' title='p(\\mbox{cancer}|\\mbox{do}(smoking)) = p(y| \\mbox{do}(x))' class='latex' \/>?<\/p>\n<p>The obvious first question to ask is whether we can apply rule 2 or rule 3 directly to the conditional causal probability <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x))' title='p(y|\\mbox{do}(x))' class='latex' \/>.<\/p>\n<p>If rule 2 applies, for example, it would say that intervention doesn&#8217;t matter, and so <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29+%3D+p%28y%7Cx%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x)) = p(y|x)' title='p(y|\\mbox{do}(x)) = p(y|x)' class='latex' \/>.  Intuitively, this seems unlikely.  We&#8217;d expect that intervention really can change the probability of cancer given smoking, because intervention would override the hidden causal factor.<\/p>\n<p>If rule 3 applies, it would say that <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29+%3D+p%28y%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x)) = p(y)' title='p(y|\\mbox{do}(x)) = p(y)' class='latex' \/>, i.e., that an intervention to force someone to smoke has no impact on whether they get cancer.  This seems even more unlikely than rule 2 applying.<\/p>\n<p>However, as practice and a warm up, let&#8217;s work through the details of seeing whether rule 2 or rule 3 can be applied directly to <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x))' title='p(y|\\mbox{do}(x))' class='latex' \/>.<\/p>\n<p>For rule 2 to apply we need <img src='https:\/\/s0.wp.com\/latex.php?latex=%28Y%5Cperp+X%29_%7BG_%7B%5Cunderline+X%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(Y\\perp X)_{G_{\\underline X}}' title='(Y\\perp X)_{G_{\\underline X}}' class='latex' \/>.  To check whether this is true, recall that <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Cunderline+X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\underline X}' title='G_{\\underline X}' class='latex' \/> is the graph with the edges pointing out from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> deleted:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_calculation_rule_2.png\" width=\"199px\"><\/p>\n<p>Obviously, <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is not d-separated from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> in this graph, since <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> have a common ancestor.  This reflects the fact that the hidden causal factor indeed does influence both <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  So we can&#8217;t apply rule 2.<\/p>\n<p>What about rule 3?  For this to apply we&#8217;d need <img src='https:\/\/s0.wp.com\/latex.php?latex=%28Y+%5Cperp+X%29_%7BG_%7B%5Coverline+X%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(Y \\perp X)_{G_{\\overline X}}' title='(Y \\perp X)_{G_{\\overline X}}' class='latex' \/>.  Recall that <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Coverline+X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\overline X}' title='G_{\\overline X}' class='latex' \/> is the graph with the edges pointing toward <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> deleted:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_calculation_rule_3.png\" width=\"197px\"><\/p>\n<p>Again, <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> is not d-separated from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>, in this case because we have an unblocked path directly from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  This reflects our intuition that the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> can influence <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>, even when the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> has been set by intervention.  So we can&#8217;t apply rule 3.<\/p>\n<p>Okay, so we can&#8217;t apply the rules of the causal calculus directly to determine <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bx%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{x})' title='p(y|\\mbox{x})' class='latex' \/>.  Is there some indirect way we can determine this probability?  An experienced probabilist would at this point instinctively wonder whether it would help to condition on the value of <img src='https:\/\/s0.wp.com\/latex.php?latex=z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='z' title='z' class='latex' \/>, writing:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+%5B2%5D+%5C%2C%5C%2C%5C%2C%5C%2C+p%28y%7C+%5Cmbox%7Bdo%7D%28x%29%29+%3D+%5Csum_z+p%28y%7Cz%2C%5Cmbox%7Bdo%7D%28x%29%29+p%28z%7C%5Cmbox%7Bdo%7D%28x%29%29.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' [2] \\,\\,\\,\\, p(y| \\mbox{do}(x)) = \\sum_z p(y|z,\\mbox{do}(x)) p(z|\\mbox{do}(x)). ' title=' [2] \\,\\,\\,\\, p(y| \\mbox{do}(x)) = \\sum_z p(y|z,\\mbox{do}(x)) p(z|\\mbox{do}(x)). ' class='latex' \/>\n<p>Of course, saying an experienced probabilist would instinctively do this isn&#8217;t quite the same as explaining <em>why<\/em> one should do this! However, it is at least a moderately obvious thing to do: the only extra information we potentially have in the problem is <img src='https:\/\/s0.wp.com\/latex.php?latex=z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='z' title='z' class='latex' \/>, and so it&#8217;s certainly somewhat natural to try to introduce that variable into the problem.  As we shall see, this turns out to be a wise thing to do.<\/p>\n<h3>Exercises<\/h3>\n<ul>\n<li> I used without proof the equation <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C+%5Cmbox%7Bdo%7D%28x%29%29+%3D+%5Csum_z+++p%28y%7Cz%2C%5Cmbox%7Bdo%7D%28x%29%29+p%28z%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y| \\mbox{do}(x)) = \\sum_z   p(y|z,\\mbox{do}(x)) p(z|\\mbox{do}(x))' title='p(y| \\mbox{do}(x)) = \\sum_z   p(y|z,\\mbox{do}(x)) p(z|\\mbox{do}(x))' class='latex' \/>.  This should be intuitively   plausible, but really requires proof.  Prove that the equation is   correct. <\/ul>\n<p>To simplify the right-hand side of equation [2], we first note that we can apply rule 2 to the second term on the right-hand side, obtaining <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28z%7C%5Cmbox%7Bdo%7D%28x%29%29+%3D+p%28z%7Cx%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(z|\\mbox{do}(x)) = p(z|x)' title='p(z|\\mbox{do}(x)) = p(z|x)' class='latex' \/>.  To check this explicitly, note that the condition for rule 2 to apply is that <img src='https:\/\/s0.wp.com\/latex.php?latex=%28Z+%5Cperp+X%29_%7BG_%7B%5Cunderline+++++X%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(Z \\perp X)_{G_{\\underline     X}}' title='(Z \\perp X)_{G_{\\underline     X}}' class='latex' \/>. We already saw the graph <img src='https:\/\/s0.wp.com\/latex.php?latex=G_%7B%5Cunderline+X%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G_{\\underline X}' title='G_{\\underline X}' class='latex' \/> above, and, indeed, <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> is d-separated from <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> in that graph, since the only path from <img src='https:\/\/s0.wp.com\/latex.php?latex=Z&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Z' title='Z' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/> is blocked at <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/>.  As a result, we have:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+%5B3%5D+%5C%2C%5C%2C%5C%2C%5C%2C+p%28y%7C+%5Cmbox%7Bdo%7D%28x%29%29+%3D+%5Csum_z+p%28y%7Cz%2C%5Cmbox%7Bdo%7D%28x%29%29+p%28z%7Cx%29.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' [3] \\,\\,\\,\\, p(y| \\mbox{do}(x)) = \\sum_z p(y|z,\\mbox{do}(x)) p(z|x). ' title=' [3] \\,\\,\\,\\, p(y| \\mbox{do}(x)) = \\sum_z p(y|z,\\mbox{do}(x)) p(z|x). ' class='latex' \/>\n<p>At this point in the presentation, I&#8217;m going to speed the discussion up, telling you what rule of the calculus to apply at each step, but not going through the process of explicitly checking that the conditions of the rule hold.  (If you&#8217;re doing a close read, you may wish to check the conditions, however.)  <\/p>\n<p>The next thing we do is to apply rule 2 to the first term on the right-hand side of equation [3], obtaining <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7Cz%2C%5Cmbox%7Bdo%7D%28x%29%29+%3D+p%28y%7C%5Cmbox%7Bdo%7D%28z%29%2C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|z,\\mbox{do}(x)) = p(y|\\mbox{do}(z),\\mbox{do}(x))' title='p(y|z,\\mbox{do}(x)) = p(y|\\mbox{do}(z),\\mbox{do}(x))' class='latex' \/>.  We then apply rule 3 to remove the <img src='https:\/\/s0.wp.com\/latex.php?latex=%5Cmbox%7Bdo%7D%28x%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\mbox{do}(x)' title='\\mbox{do}(x)' class='latex' \/>, obtaining <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7Cz%2C%5Cmbox%7Bdo%7D%28x%29%29+%3D+p%28y%7C%5Cmbox%7Bdo%7D%28z%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|z,\\mbox{do}(x)) = p(y|\\mbox{do}(z))' title='p(y|z,\\mbox{do}(x)) = p(y|\\mbox{do}(z))' class='latex' \/>. Substituting back in gives us:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+%5B4%5D+%5C%2C%5C%2C%5C%2C%5C%2C+p%28y%7C+%5Cmbox%7Bdo%7D%28x%29%29+%3D+%5Csum_z+p%28y%7C%5Cmbox%7Bdo%7D%28z%29%29+p%28z%7Cx%29.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' [4] \\,\\,\\,\\, p(y| \\mbox{do}(x)) = \\sum_z p(y|\\mbox{do}(z)) p(z|x). ' title=' [4] \\,\\,\\,\\, p(y| \\mbox{do}(x)) = \\sum_z p(y|\\mbox{do}(z)) p(z|x). ' class='latex' \/>\n<p>So this means that we&#8217;ve reduced the computation of <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x))' title='p(y|\\mbox{do}(x))' class='latex' \/> to the computation of <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28z%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(z))' title='p(y|\\mbox{do}(z))' class='latex' \/>.  This doesn&#8217;t seem terribly encouraging: we&#8217;ve merely substituted the computation of one causal conditional probability for another.  Still, let us continue plugging away, and see if we can make progress.  The obvious first thing to try is to apply rule 2 or rule 3 to simplify <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28z%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(z)' title='p(y|\\mbox{do}(z)' class='latex' \/>.  Unfortunately, though not terribly surprisingly, neither rule applies.  So what do we do?  Well, in a repeat of our strategy above, we again condition on the other variable we have available to us, in this case <img src='https:\/\/s0.wp.com\/latex.php?latex=x&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x' title='x' class='latex' \/>:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+p%28y%7C%5Cmbox%7Bdo%7D%28z%29%29+%3D+%5Csum_x+p%28y%7Cx%2C%5Cmbox%7Bdo%7D%28z%29%29+p%28x%7C%5Cmbox%7Bdo%7D%28z%29%29.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' p(y|\\mbox{do}(z)) = \\sum_x p(y|x,\\mbox{do}(z)) p(x|\\mbox{do}(z)). ' title=' p(y|\\mbox{do}(z)) = \\sum_x p(y|x,\\mbox{do}(z)) p(x|\\mbox{do}(z)). ' class='latex' \/>\n<p>Now we&#8217;re cooking!  Rule 2 lets us simplify the first term to <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7Cx%2Cz%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|x,z)' title='p(y|x,z)' class='latex' \/>, while rule 3 lets us simplify the second term to <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28x%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(x)' title='p(x)' class='latex' \/>, and so we have <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28z%29%29+%3D+%5Csum_x+p%28y%7Cx%2Cz%29+p%28x%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(z)) = \\sum_x p(y|x,z) p(x)' title='p(y|\\mbox{do}(z)) = \\sum_x p(y|x,z) p(x)' class='latex' \/>.  To substitute this expression back into equation [4] it helps to change the summation index from <img src='https:\/\/s0.wp.com\/latex.php?latex=x&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x' title='x' class='latex' \/> to <img src='https:\/\/s0.wp.com\/latex.php?latex=x%27&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x&#039;' title='x&#039;' class='latex' \/>, since otherwise we would have a duplicate summation index.  This gives us:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=++%5B5%5D+%5C%2C%5C%2C%5C%2C%5C%2C+p%28y%7C+%5Cmbox%7Bdo%7D%28x%29%29+%3D+%5Csum_%7Bx%27z%7D+p%28y%7Cx%27%2Cz%29+p%28z%7Cx%29+p%28x%27%29+.+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='  [5] \\,\\,\\,\\, p(y| \\mbox{do}(x)) = \\sum_{x&#039;z} p(y|x&#039;,z) p(z|x) p(x&#039;) . ' title='  [5] \\,\\,\\,\\, p(y| \\mbox{do}(x)) = \\sum_{x&#039;z} p(y|x&#039;,z) p(z|x) p(x&#039;) . ' class='latex' \/>\n<p>This is the promised expression for <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x))' title='p(y|\\mbox{do}(x))' class='latex' \/> (i.e., for probabilities like <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%28smoking%29%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do(smoking)})' title='p(\\mbox{cancer}| \\mbox{do(smoking)})' class='latex' \/>, assuming the causal model above) in terms of quantities which may be observed directly from experimental data, and which don&#8217;t require intervention to do a randomized, controlled experiment.  Once <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%28smoking%29%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do(smoking)})' title='p(\\mbox{cancer}| \\mbox{do(smoking)})' class='latex' \/> is determined, we can compare it against <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer})' title='p(\\mbox{cancer})' class='latex' \/>.  If <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%28smoking%29%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do(smoking)})' title='p(\\mbox{cancer}| \\mbox{do(smoking)})' class='latex' \/> is larger than <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer})' title='p(\\mbox{cancer})' class='latex' \/> then we can conclude that smoking does, indeed, play a causal role in cancer.<\/p>\n<p>Something that bugs me about the derivation of equation [5] is that I don&#8217;t really know how to &#8220;see through&#8221; the calculations.  Yes, it all works out in the end, and it&#8217;s easy enough to follow along.  Yet that&#8217;s not the same as having a deep understanding.  Too many basic questions remain unanswered: Why did we have to condition as we did in the calculation?  Was there some other way we could have proceeded? What would have happeed if we&#8217;d conditioned on the value of the hidden variable?  (This is not obviously the wrong thing to do: maybe the hidden variable would ultimately drop out of the calculation).  Why is it possible to compute causal probabilities in this model, but not (as we shall see) in the model without tar?  Ideally, a deeper understanding would make the answers to some or all of these questions much more obvious. <\/p>\n<h3>Problems for the author<\/h3>\n<ul>\n<li> Why is it so much easier to compute <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28z%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(z))' title='p(y|\\mbox{do}(z))' class='latex' \/> than   <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x))' title='p(y|\\mbox{do}(x))' class='latex' \/> in the model above?  Is there some way we could   have seen that this would be the case, without needing to go through   a detailed computation?\n<li> Suppose we have a causal model <img src='https:\/\/s0.wp.com\/latex.php?latex=G&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='G' title='G' class='latex' \/>, with <img src='https:\/\/s0.wp.com\/latex.php?latex=S&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='S' title='S' class='latex' \/> a subset of   vertices for which all conditional probabilities are known.  Is it   possible to give a simple characterization of for which subsets <img src='https:\/\/s0.wp.com\/latex.php?latex=X&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='X' title='X' class='latex' \/>   and <img src='https:\/\/s0.wp.com\/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Y' title='Y' class='latex' \/> of vertices it is possible to compute <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28y%7C%5Cmbox%7Bdo%7D%28x%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(y|\\mbox{do}(x))' title='p(y|\\mbox{do}(x))' class='latex' \/>   using just the conditional probabilities from <img src='https:\/\/s0.wp.com\/latex.php?latex=S&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='S' title='S' class='latex' \/>? <\/ul>\n<p>Unfortunately, I don&#8217;t know what the experimentally observed probabilities are in the smoking-tar-cancer case.  If anyone does, I&#8217;d be interested to know.  In lieu of actual data, I&#8217;ll use some toy model data suggested by Pearl; the data is quite unrealistic, but nonetheless interesting as an illustration of the use of equation [5]. The toy model data is as follows:<\/p>\n<p>(1) 47.5 percent of the population are nonsmokers with no tar in their lungs, and 10 percent of these get cancer.<\/p>\n<p>(2) 2.5 percent are smokers with no tar, and 90 percent get cancer.<\/p>\n<p>(3) 2.5 percent are nonsmokers with tar, and 5 percent get cancer.<\/p>\n<p>(4) 47.5 percent are smokers with tar, and 85 percent get cancer.<\/p>\n<p>In this case, we get:<\/p>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=+++p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bdo%7D%28smoking%29%29+%3D+45.25+%5C+&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='   p(\\mbox{cancer} | \\mbox{do}(smoking)) = 45.25 \\ ' title='   p(\\mbox{cancer} | \\mbox{do}(smoking)) = 45.25 \\ ' class='latex' \/>\n<p>By contrast, <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%29+%3D+47.5&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}) = 47.5' title='p(\\mbox{cancer}) = 47.5' class='latex' \/> percent, and so if this data was correct (obviously it&#8217;s not even close) it would show that smoking actually somewhat <em>reduces<\/em> a person&#8217;s chance of getting lung cancer.  This is despite the fact that <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D+%7C+%5Cmbox%7Bsmoking%7D%29+%3D+85.25&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer} | \\mbox{smoking}) = 85.25' title='p(\\mbox{cancer} | \\mbox{smoking}) = 85.25' class='latex' \/> percent, and so a naive approach to causality based on correlations alone would suggest that smoking causes cancer. In fact, in this imagined world smoking might actually be useable as a preventative treatment for cancer!  Obviously this isn&#8217;t truly the case, but it does illustrate the power of this method of analysis.<\/p>\n<p>Summing up the general lesson of the smoking-cancer example, suppose we have two competing hypotheses for the causal origin of some effect in a system, A causes C or B causes C, say.  Then we should try to construct a realistic causal model which includes both hypotheses, and then use the causal calculus to attempt to distinguish the relative influence of the two causal factors, on the basis of experimentally accessible data.<\/p>\n<p>Incidentally, the kind of analysis of smoking we did above obviously wasn&#8217;t done back in the 1960s.  I don&#8217;t actually know how causality was established over the protestations that correlation doesn&#8217;t impy causation.  But it&#8217;s not difficult to think of ways you might have come up with truly convincing evidence that smoking was a causal factor.  One way would have been to look at the incidence of lung cancer in populations where smoking had only recently been introduced. Suppose, for example, that cigarettes had just been introduced into the (fictional) country of Nicotinia, and that this had been quickly followed by a rapid increase in rates of lung cancer.  If this pattern was seen across many new markets then it would be very difficult to argue that lung cancer was being caused solely by some pre-existing factor in the population.<\/p>\n<h3>Exercises<\/h3>\n<ul>\n<li> Construct toy model data where smoking increases a person&#8217;s   chance of getting lung cancer. <\/ul>\n<p>Let&#8217;s leave this model of smoking and lung cancer, and come back to our original model of smoking and lung cancer:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/smoking_basic_causal_model.png\" width=\"230px\"><\/p>\n<p>What would have happened if we&#8217;d tried to use the causal calculus to analyse this model?  I won&#8217;t go through all the details, but you can easily check that whatever rule you try to apply you quickly run into a dead end.  And so the causal calculus doesn&#8217;t seem to be any help in analysing this problem.<\/p>\n<p>This example illustrates some of the limitations of the causal calculus.  In order to compute <img src='https:\/\/s0.wp.com\/latex.php?latex=p%28%5Cmbox%7Bcancer%7D%7C+%5Cmbox%7Bdo%7D%28smoking%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(\\mbox{cancer}| \\mbox{do}(smoking))' title='p(\\mbox{cancer}| \\mbox{do}(smoking))' class='latex' \/> we needed to assume a causal model with a particular structure:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_modified_smoking_model.png\" width=\"260px\"><\/p>\n<p>While this model is plausible, it is not beyond reproach.  You could, for example, criticise it by saying that it is not the presence of tar deposits in the lungs that causes cancer, but maybe some other factor, perhaps something that is currently unknown.  This might lead us to consider a causal model with a revised structure:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_doubly_modified_smoking_model.png\" width=\"260px\"><\/p>\n<p>So we could try instead to use the causal calculus to analyse this new model.  I haven&#8217;t gone through this exercise, but I strongly suspect that doing so we wouldn&#8217;t be able to use the rules of the causal calculus to compute the relevant probabilities.  The intuition behind this suspicion is that we can imagine a world in which the tar may be a spurious side-effect of smoking that is in fact entirely unrelated to lung cancer.  What causes lung cancer is really an entirely different mechanism, but we couldn&#8217;t distinguish the two from the statistics alone.  <\/p>\n<p>The point of this isn&#8217;t to say that the causal calculus is useless. It&#8217;s remarkable that we can plausibly get information about the outcome of a randomized controlled experiment without actually doing anything like that experiment.  But there are limitations.  To get that information we needed to make some presumptions about the causal structure in the system.  Those presumptions are plausible, but not logically inevitable.  If someone questions the presumptions then it may be necessary to revise the model, perhaps adopting a more sophisticated causal model.  One can then use the causal calculus to attempt to analyse that more sophisticated model, but we are not guaranteed success.  It would be interesting to understand systematically when this will be possible and when it will not be. The following problems start to get at some of the issues involved.<\/p>\n<h3>Problems for the author<\/h3>\n<ul>\n<li> Is it possible to make a more precise statement than &#8220;the   causal calculus doesn&#8217;t seem to be any help&#8221; for the original   smoking-cancer model?\n<li> Given a probability distribution over some random variables, it   would be useful to have a classification theorem describing all the   causal models in which those random variables could appear.\n<li> Extending the last problem, it&#8217;d be good to have an algorithm to   answer questions like: in the space of all possible causal models   consistent with a given set of observed probabilities, what can we   say about the possible causal probabilities?  It would also be   useful to be able to input to the algorithm some constraints on the   causal models, representing knowledge we&#8217;re already sure of.\n<li> In real-world experiments there are many practical issues that   must be addressed to design a realiable randomized, controlled   experiment.  These issues include   <a href=\"http:\/\/en.wikipedia.org\/wiki\/Selection_bias\">selection bias<\/a>,   <a href=\"http:\/\/en.wikipedia.org\/wiki\/Blind_experiment\">blinding<\/a>, and   many others.  There is an entire field of   <a href=\"http:\/\/en.wikipedia.org\/wiki\/Design_of_experiments\">experimental     design<\/a> devoted to addressing such issues.  By comparison, my   description of causal inference ignores many of these practical   issues.  Can we integrate the best thinking on experimental design   with ideas such as causal conditional probabilities and the causal   calculus?\n<li> From a pedagogical point of view, I wonder if it might have been   better to work fully through the smoking-cancer example   <em>before<\/em> getting to the abstract statement of the rules of the   causal calculus.  Those rules can all be explained and motivated   quite nicely in the context of the smoking-cancer example, and that   may help in understanding. <\/ul>\n<h3>Conclusion<\/h3>\n<p>I&#8217;ve described just a tiny fraction of the work on causality that is now going on.  My impression as an admittedly non-expert outsider to the field is that this is an exceptionally fertile field which is developing rapidly and giving rise to many fascinating applications. Over the next few decades I expect the theory of causality will mature, and be integrated into the foundations of disciplines ranging from economics to medicine to social policy.<\/p>\n<p><strong>Causal discovery:<\/strong> One question I&#8217;d like to understand better is how to <em>discover<\/em> causal structures inside existing data sets. After all, human beings do a pretty good (though far from perfect) job at figuring out causal models from their observation of the world. I&#8217;d like to better understand how to use computers to automatically discover such causal models.  I understand that there is already quite a literature on the automated discovery of causal models, but I haven&#8217;t yet looked in much depth at that literature.  I may come back to it in a future post.<\/p>\n<p>I&#8217;m particularly fascinated by the idea of extracting causal models from very large unstructured data sets.  The <a href=\"http:\/\/www.cs.washington.edu\/research\/knowitall\/\">KnowItAll   group<\/a> at the University of Washington (see <a href=\"https:\/\/plus.google.com\/108035303158224422698\/posts\">Oren   Etzioni<\/a> on Google Plus) have done fascinating work on a related but (probably) easier problem, the problem of open information extraction. This means taking an unstructured information source (like the web), and using it to extract facts about the real world.  For instance, using the web one would like computers to be able to learn facts like &#8220;Barack Obama is President of the United States&#8221;, without needing a human to feed it that information.  One of the things that makes this task challenging is all the misleading and difficult-to-understand information out on the web.  For instance, there are also webpages saying &#8220;George Bush is President of the United States&#8221;, which was probably true at the time the pages were written, but which is now misleading.  We can find webpages which state things like &#8220;[Let&#8217;s imagine] <a href=\"http:\/\/radar.oreilly.com\/2011\/03\/steve-jobs-president.html\">Steve   Jobs is President of the United States<\/a>&#8220;; it&#8217;s a difficult task for an unsupervised algorithm to figure out how to interpret that &#8220;Let&#8217;s imagine&#8221;.  What the KnowItAll team have done is made progress on figuring out how to learn facts in such a rich but uncontrolled environment.<\/p>\n<p>What I&#8217;m wondering is whether such techniques can be adapted to extract causal models from data?  It&#8217;d be fascinating if so, because of course humans don&#8217;t just reason with facts, they also reason with (informal) causal models that relate those facts. Perhaps causal models or a similar concept may be a good way of representing some crucial part of our knowledge of the world.<\/p>\n<h3>Problems for the author<\/h3>\n<ul>\n<li> What systematic causal fallacies do human beings suffer from?   We certainly often make mistakes in the causal models we extract   from our observations of the world &#8211; one example is that we often   do assume that correlation implies causation, even when that&#8217;s not   true &#8211; and it&#8217;d be nice to understand what systematic biases we   have.\n<li> Humans aren&#8217;t just good with facts and causal models.  We&#8217;re   also really good at juggling multiple causal models, testing them   against one another, finding problems and inconsistencies, and   making adjustments and integrating the results of those models, even   when the results conflict.  In essence, we have a (working,   imperfect) theory of how to deal with causal models.  Can we teach   machines to do this kind of integration of causal models?\n<li> We know that in our world the sun rising causes the rooster to   crow, but it&#8217;s possible to imagine a world in which it is the   rooster crowing that causes the sun to rise.  This could be achieved   in a suitably designed virtual world, for example.  The reason we   believe the first model is correct in our world is not intrinsic to   the data we have on roosters and sunrise, but rather depends on a   much more complex network of background knowledge.  For instance,   given what we know about roosters and the sun we can easily come up   with plausible causal mechanisms (solar photons impinging on the   rooster&#8217;s eye, say) by which the sun could cause the rooster to   crow.  There do not seem to be any similarly plausible causal models   in the other direction.  How do we determine what makes a particular   causal model plausible or not?  How do we determine the class of   plausible causal models for a given phenomenon?  Can we make this   kind of judgement automatically?  (This is all closely related to   the last problem).\n<\/ul>\n<p><strong>Continuous-time causality:<\/strong> A peculiarity in my post is that even though we&#8217;re talking about causality, and time is presumably important, I&#8217;ve avoided any explicit mention of time.  Of course, it&#8217;s implicitly there: if I&#8217;d been a little more precise in specifying my models they&#8217;d no doubt be conditioned on events like &#8220;smoked at least a pack a day for 10 or more years&#8221;.  Of course, this way of putting time into the picture is rather coarse-grained.  In a lot of practical situations we&#8217;re interested in understanding causality in a much more temporally fine-grained way.  To explain what I mean, consider a simple model of the relationship between what we eat and our insulin levels:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/michaelnielsen.org\/ddi\/wp-content\/uploads\/2012\/01\/causality_eating_insulin.png\" width=\"250px\"><\/p>\n<p>This model represents the fact that what we eat determines our insulin levels, and our insulin levels in turn play a part in determining how hungry we feel, and thus what we eat.  But as a model, it&#8217;s quite inadequate.  In fact, there&#8217;s a much more complex feedback relationship going on, a constant back-and-forth between what we eat at any given time, and our insulin levels.  Ideally, this wouldn&#8217;t be represented by a few discrete events, but rather by a causal model that reflects the continual feedback between these possibilities. What I&#8217;d like to see developed is a theory of continuous-time causal models, which can address this sort of issue.  It would also be useful to extend the calculus to continuous spaces of events.  So far as I know, at present the causal calculus doesn&#8217;t work with these kinds of ideas.<\/p>\n<h3>Problems for the author<\/h3>\n<ul>\n<li> Can we formulate theories like electromagnetism, general   relativity and quantum mechanics within the framework of the causal   calculus (or some generalization)?  Do we learn anything by doing so? <\/ul>\n<p><strong>Other notions of causality:<\/strong> A point I&#8217;ve glossed over in the post is how the notion of causal influence we&#8217;ve been studying relates to other notions of causality.  <\/p>\n<p>The notion we&#8217;ve been exploring is based on the notion of causality that is established by a (hopefully well-designed!) randomized controlled experiment.  To understand what that means, think of what it would mean if we used such an experiment to establish that smoking does, indeed, cause cancer.  All this means is that <em>in the   population being studied<\/em>, forcing someone to smoke will increase their chance of getting cancer.<\/p>\n<p>Now, for the practical matter of setting public health policy, that&#8217;s obviously a pretty important notion of causality.  But nothing says that we won&#8217;t tomorrow discover some population of people where no such causal influence is found.  Or perhaps we&#8217;ll find a population where smoking actively helps prevent cancer.  Both these are entirely possible.<\/p>\n<p>What&#8217;s going on is that while our notion of causality is useful for some purposes, it doesn&#8217;t necessarily say anything about the details of an underlying causal mechanism, and it doesn&#8217;t tell us how the results will apply to other populations.  In other words, while it&#8217;s a useful and important notion of causality, it&#8217;s not the only way of thinking about causality.  Something I&#8217;d like to do is to understand better what other notions of causality are useful, and how the intervention-based approach we&#8217;ve been exploring relates to those other approaches.<\/p>\n<h3>Acknowledgments<\/h3>\n<p>Thanks to Jen Dodd, Rob Dodd, and Rob Spekkens for many discussions about causality.  Especial thanks to Rob Spekkens for pointing me toward the epilogue of Pearl&#8217;s book, which is what got me hooked on causality!  <\/p>\n<h3>Principal sources and further reading<\/h3>\n<p>A readable and stimulating overview of causal inference is the epilogue to <a href=\"http:\/\/www.amazon.com\/Causality-Reasoning-Inference-Judea-Pearl\/dp\/0521773628\">Judea   Pearl&#8217;s book<\/a>.  The epilogue, in turn, is based on a <a href=\"http:\/\/singapore.cs.ucla.edu\/LECTURE\/lecture_sec1.htm\">survey   lecture<\/a> by Pearl on causal inference.  I highly recommend getting a hold of the book and reading the epilogue; if you cannot do that, I suggest looking over the survey lecture.  A draft copy of the first edition of the entire book is <a href=\"http:\/\/bayes.cs.ucla.edu\/BOOK-99\/book-toc.html\">available<\/a> on Pearl&#8217;s website.  Unfortunately, the draft does not include the full text of the epilogue, only the survey lecture.  The lecture is still good, though, so you should look at it if you don&#8217;t have access to the full text of the epilogue.  I&#8217;ve also been told good things about the book on causality by <a href=\"http:\/\/www.amazon.com\/Causation-Prediction-Adaptive-Computation-Learning\/dp\/0262194406\">Spirtes,   Glymour and Scheines<\/a>, but haven&#8217;t yet had a chance to have a close look at it.  An unfortunate aspect of the current post is that it gives the impression that the theory of causal inference is entirely Judea Pearl&#8217;s creation.  Of course that&#8217;s far from the case, a fact which is quite evident from both Pearl&#8217;s book, and the Spirtes-Glymour-Scheines book.  However, the particular facets I&#8217;ve chosen to focus on are due principally to Pearl and his collaborators: most of the current post is based on <a href=\"http:\/\/bayes.cs.ucla.edu\/BOOK-99\/ch3.pdf\">chapter 3<\/a> and <a href=\"http:\/\/bayes.cs.ucla.edu\/BOOK-99\/ch1.pdf\">chapter 1<\/a> of Pearl&#8217;s book, as well as a <a href=\"http:\/\/ftp.cs.ucla.edu\/pub\/stat_ser\/R212.pdf\">1994 paper<\/a> by Pearl, which established many of the key ideas of the causal calculus. Finally, for an enjoyable and informative discussion of some of the challenges involved in understanding causal inference I recommend Jonah Lehrer&#8217;s <a href=\"http:\/\/www.wired.com\/magazine\/2011\/12\/ff_causation\/all\/1\">recent   article<\/a> in <em>Wired<\/em>.<\/p>\n<p>  <em>Interested in more?  Please <a href=\"htp:\/\/www.michaelnielsen.org\/ddi\/feed\/>subscribe to this blog<\/a>, or <a href=\"http:\/\/twitter.com\/\\#!\/michael_nielsen\">follow me on Twitter<\/a>.  You may also enjoy reading my new book about  open science, <a href=\"http:\/\/www.amazon.com\/Reinventing-Discovery-New-Networked-Science\/dp\/product-description\/0691148902\">Reinventing Discovery<\/a>.<\/em> <\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is a commonplace of scientific discussion that correlation does not imply causation. Business Week recently ran an spoof article pointing out some amusing examples of the dangers of inferring causation from correlation. For example, the article points out that Facebook&#8217;s growth has been strongly correlated with the yield on Greek government bonds: (credit) Despite&hellip; <a class=\"more-link\" href=\"https:\/\/michaelnielsen.org\/ddi\/if-correlation-doesnt-imply-causation-then-what-does\/\">Continue reading <span class=\"screen-reader-text\">If correlation doesn&#8217;t imply causation, then what does?<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-17","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"_links":{"self":[{"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/posts\/17","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/comments?post=17"}],"version-history":[{"count":0,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/posts\/17\/revisions"}],"wp:attachment":[{"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/media?parent=17"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/categories?post=17"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/tags?post=17"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}