This

node is meant to be a somewhat more

technical retelling of the above, with a focus on how to

avoid being duped.

The first way in which

statistics can be

distorted is through the type of

sample used. Rarely is an entire

population surveyed in a study. More often, a

sample is taken and the

data from that sample is

extrapolated onto the rest of the

population. It is thus vitally important that the sample be judiciously chosen.

Let’s imagine we are looking for the average height of

Canadians. We choose to

sample three random Canadians and we get their heights. The

mean value we find for the height of Canadians is a

random variable (do follow the linkpipe on that one, the term

random variable has an important meaning here). The height of Canadians could be any number within a particular

range but it is not equally

probable that it would be any of them. Let’s imagine that we take 50 samples of three people each and plot the resulting

data set on a

histogram. The mean of this

histogram (the

average of the fifty

averages) is the

population mean as nearly as we can determine it.

Each of the fifty

data sets could also be plotted on a

histogram. The small size of the

sample means that there is a relatively high chance of an unusually tall or short person turning up in our

data set and thus making its

mean dramatically different from the

mean of the entire

population. Therefore, as our

sample size gets larger, the

distribution of the

sample averages will have less

spread.

If we knew the true

mean height of the Canadian population, we could put it on the

histogram from one trial. It will usually be either too large or too small when compared with our

estimated value. How far off it will be depends on the

sample size of the trial. Since a larger sample represents the whole population more effectively, it makes sense that it would do a better job of

estimating the true value.

95% of the time, the true value of the

mean height of the Canadian population will be within two

standard deviations of the estimated value. The

standard deviation of the

histogram will become smaller as the sample becomes larger. This means that the area in which the true value almost certainly lies on a histogram becomes smaller when a larger

sample size is used. This concept may be more familiar than you think.

Consider polls. When a poll result is stated, it is usually in the form: “55% of Canadians say Jean Chretien should play more golf, plus or minus 5% 19 times out of 20.” The “19 times out of 20” is the same 95% from the above paragraph. This means that 5% represents twice the

standard deviation for the

set from which the 55% value is determined. The pollsters are giving you the

standard deviation in disguise!

Another common method by which

statistics are

fudged is

conditioning. This is the process of selecting specific sub samples within a

data set for comparison. An example is the

average male wage compared with the

average female wage. The manner in which this is done affects the results you get.

Studies have shown that kids who go to private schools earn 10% more, on average, than those who go to public schools. What does this mean? If we change the

conditioning to examine neighbourhood and background, we see the difference reduced to zero. This essentially means that the

marginal impact of going to private school if you already live in a good area (high average income, low

unemployment) is quite small. Contrarily, students coming from a poor area stand to gain 10% in their average income for going to private school. Such statistical evidence (keep in mind that this is just an example) can lead to

government policy decisions. The above conclusion would support a proposal for

vouchers allowing poor kids to go to private school, for example.

One final

statistical trick I shall examine is that of

scale. Somebody can call an increase from 2-3% inflation (as calculated by the

Consumer Price Index, for example) a “50%” jump. In actuality, the change was rather small. Whenever percentage changes are used to examine changes in small values, alarmingly large percentage changes can result. For this reason, if you are presented with very large

percentage changes you ought to keep in mind that they may simple represent small variations in small quantities. For the

GDP of

Luxemburg to grow by 10 or even 50% represents very little actual growth compared with the

GDP of the

United States growing even 1%.

Remember, people can only lie to you with

statistics if you let them! Be aware of how they work and you will be a less

gullible member of society.