Joint Probability Distribution

A Joint Probability Distribution (JPD) is a system used in probability modeling to determine the likelihood of certain events happening, given certain other events happening.

For example, I wish to know the likelihood that my car is broken, given the fact that smoke may or may not be billowing out from under the hood. I will use two variables, Broken and Smoke, to indicate the probabilities of these events taking place. Here is a simple JPD for these variables.

===========================
       | Broken | -Broken |
===========================
Smoke  |  0.02  |  0.01   |
===========================
-Smoke |  0.06  |  0.91   |
===========================

What the table is telling us is that the probability P of my car being broken (Broken) and smoke coming up from under my hood (Smoke), P(Broken ^ Smoke), is 0.02, whereas the probability of my car not being broken (-Broken), even though smoke is coming from under my hood (Smoke), P(-Broken ^ Smoke) is 0.01.

This comes in extremely handy when trying to determine the conditional probability of two events. The probability of X happening given that we know Y and only Y is written P(X|Y). This is equal to the probability of both X and Y happening divided by the probability of Y happening, P(X ^ Y)/P(Y). So, P(X|Y) = P(X ^ Y)/P(Y).

We can use the JPD to easily computer conditional probabilities by expanding it a bit, to include the probabilities of the individual events.

==================================================
       |  Broken  | -Broken   |                  |
==================================================
Smoke  |   0.02   |   0.01    | 0.03 = P(Smoke)  |
==================================================
-Smoke |   0.06   |   0.91    | 0.97 = P(-Smoke) |
==================================================
       |   0.08   |    0.92   |                  |
       |=P(Broken)|=P(-Broken)|                  |
==================================================

The probabilities of the individual events can be obtained by summing over the rows or columns they occupy. Also, notice that P(Smoke) + P(-Smoke) = 1.00; obviously, the probability that something happens plus the probability that it doesn't must be 100%. So, using the JPD, the probability that my car is broken given the fact that smoke is coming from under my hood is:

P(Broken|Smoke) = P(Broken ^ Smoke) = 0.02 = 0.66
                      P(Smoke)        0.03

As nice as this is, a JPD does not end up being quite as useful as one would like. For starters, these probabilities may not be easy to compute. Second, a JPD with n variables requires n² entries on the table. So, to use a JPD with a computer requires a whole bunch of space, and is way hard to make in the first place. Using Bayes' Theorem is much more efficient, and is at the base of most artificial intelligence probablistic inference systems. In fact, it is entirely possible that one may not be able to create a JPD without using Bayes' theorem in the first place.

Bayes' Theorem	quantum non-locality	Artificial intelligence should not be a substitute for real stupidity	cross validation
Probability matrix	bell curve	information gain	Turbo Lover
Mind Eraser	dependent variable	Turbo	Concrimination
base	How to be a convincing teenage girl on IRC	The Sure Fire Way to Win the Lottery	conditional probability
Bayesian Network	Subversion	Inference	Scotchguard
boolean	probability	Artificial Intelligence	Lilith