Statistics tends to be an imposing topic for many medical students. While this page won't even go so far as to cover the material you'd expect in a basic statistics course, it will introduce you to the underlying concepts and terms that you need to know.

Be aware that an important and often discussed area of medicine and public health which involves statistics is epidemiology, which we cover on a different page.


When we analyze something, our assumption is that we are trying to determine if something is true. Unfortunately, proving something as absolutely true is almost impossible - what we settle for in science is the absence of falsehood. This, in fact, is what separates scientists from people who claim to know the absolute truth - what we claim has to be stated in such a way that there are available methods to prove we might be wrong. In other words, if someone makes a statement which has no way of being invalidated, or proven to be false, then it isn't science.

The culture of science is one that tends to be aggressive in searching for error. When something appears to be true, scientists' initial instinct is to expose faulty assumptions, to account for the apparent result in other than the obvious way, to assume that what appears to be true is, in fact, false.

If I take an aspirin and then perform well on a biochemistry test, it may appear "obvious" that the aspirin improved my performance. A scientist would ask if, in fact, one caused the other, or if it was more likely to be a matter of coincidence. What does it mean to perform "well" on an exam? How would I have performed without the aspirin? Will I consistently perform better with the aspirin than without it? Does it only help if I started off with a headache? Can something else account for my performance - increased studying, better sleep, or simply my belief that aspirin improves grades?

There are generally considered to be 4 sources of error. These need to be "ruled out" before a scientist, grudgingly, accepts that an apparent effect is probably not due to error, and may, therefore, be true.

These sources are:

"p" is the way we describe how likely it is that an effect is being caused by some sort of error. In other words: How often does random error produce an apparent effect as big as the one we are seeing? The answer is "p." The smaller the "p," the more likely that the effect is systematic and not random.

Regardless of the size of p, the possibility of error still remains - of which there are two major types:

The rule of thumb is that something is accepted to be "statistically significant" if the p value is less than 5% - written as p<.05. While this number is arbitrary (how comfortable would you be knowing that the odds were "less than 5%" that your plane was likely to crash?), it tends to be sufficiently accurate for making many predictions. Obviously, the lower the p value, the better.

Be certain not to confuse statistical significance with clinical significance. A drug that has been shown to have a 30 second decrease in the average length of a migraine (a small effect) consistently - so consistently that the odds are less than 5% that this decrease is due to chance - really has limited to no real benefit or effect on patients or their conditions.

Statistical Tests
Mean Mathematical "average." Calculated by taking the sum of all the values and dividing by the number of values. The mean of 0, 1, 1, 1, 2, 3, 3, 4, 5, 9, 70 is 99/11, or 9. Can be highly influenced by "outliers," such as the 70 in our example.
Median The "middle" number. Calculated by listing all values in numerical order and picking the one in the middle of the list. The median of 0, 1, 1, 1, 2, 3, 3, 4, 5, 9, 70 is 3. Is not influenced by outliers.
Mode The "most popular" number. Calculated by listing all values and seeing which value gets listed the most times. The mode of 0, 1, 1, 1, 2, 3, 3, 4, 5, 9, 70 is 1
Mean, median, and mode are all "measures of central tendency," meaning that they describe something common about a group of numbers. They are often combined with information about how those numbers are distributed (using either standard deviation or variance). Knowing just two numbers - a measure of central tendency and a measure of distribution - all by themselves, can allow us to capture much that is meaningful about a large collection of numbers. For this reason, results are often reported in journals in the following format: 100(15), where 100 is the mean and 15 is the standard deviation.
Standard Deviation Assuming a "bell-shaped" distribution, what range of values are close enough to the mean as to be statistically indistinguishable? In other words, how far from the mean can a value be and still not be considered an "outlier?" In a bell-shaped curve, 66% of all results lie within a range that extends from one standard deviation above the mean to one standard deviation below the mean, and 96% fall within an area covered by two standard deviations in either direction. 100(15) is the standard for IQ measurement, so 66% of the population has an IQ between 85 and 115. 96% of the population falls between 70 and 130 - this is where the definitions of mental retardation (IQ < 70) and genius (IQ > 130) originate - these scores are sufficiently unlike the rest of the population as to warrant attention. Think of standard deviation as simply being a number that describes "how spread out" a set of numbers is. Scores with higher standard deviations (or higher variance - which is simply the standard deviation squared) are just more dispersed. If the standard deviation of the IQ test were 20, then a score of 65 would no longer be all that abnormal, since it was within the expected spread.
t-Test This is probably the most basic statistic you should know (other than mean, mode, and median). A t-test is used to compare one group or population to another group or population - this form of comparison is referred to as "between groups." Usually one group is the experimental group (the one receiving treatment, for example) and one is the control (a similar group, but not receiving treatment). An example of an experiment that might utilize the t-test is a comparison of a new beta-blocker to labetolol in reducing heart attack.
Z-Test This test is a variation on the t-test - in the Z-test, more information is known about the statistical properties (such as the means and the standard deviations) about each of the populations. Again, this is a "between groups" statistic. t-tests and Z-tests are used to compare exactly two groups - when more than two groups need to be compared, ANOVA is used.
ANOVA The same principles as the t-test (and Z-test) are used in this test, which is employed when more than two groups need to be compared. The ANOVA (ANalysis Of VAriance) is a more complicated application of between-groups testing. For example, it might be used to compare the efficacy of 4 beta blockers in preventing heart attack.
Chi-Square This test is used to analyze data which have been organized in terms of frequency - how often does an event occur? It answers the question of whether a given  frequency differs from what is expected. It can be used as either a within groups or a between groups test.