Institutional Research - Statistics Primer

This page offers some basic information about common statistical analyses you might see in reports published by the Office of Institutional Research or other research providers. It is organized into three sections:


Common Terms

bell curve mean ratio measure
bivariate analysis median regression coefficient
case mode sample
categorical variable multivariate analysis sampling error
continuous variable nominal measure standard deviation
control variable nonsampling error statistical significance
correlation normal distribution substantive significance
dependent variable null hypothesis univariate analysis
independent variable ordinal measure variable
interval measure population z-score


bell curve - See normal distribution. (Return to top)

bivariate analysis - The analysis of two variables simultaneously to determine if there is a relationship between them. (Return to top)

case - A specific instance of the general thing being studied in scientific research. For example, if cities were being studied, New York might be a case. In database terms, a case is the same as a record. (Return to top)

categorical variable - A variable measured at the nominal or ordinal level of measurement; also known as a discrete variable. (Return to top)

continuous variable - A variable measured at the interval or ratio level of measurement. (Return to top)

control variable - A variable that is held constant in an attempt to further clarify the relationship between two other variables. For example, if one found a relationship between level of education and level of prejudice, it might be useful to control for gender since the relationship between education and prejudice might differ for men and women. (Return to top)

correlation - A quantitative measure of how strongly two variables relate to each other. In colloquial terms, correlation is used as a way to say that two variables are simply related (e.g., geographic location and political party affiliation are correlated). The most common type of correlation is the Pearson correlation. Correlation and causation are not the same thing. (Return to top)

dependent variable - The variable whose values are (partially) affected by one or more other variables in a statistical analysis. The values of the dependent variable depend on the values of independent variables. For example, in a study of whether gender relates to playing on a sporting team, sports participation would be the dependent variable, also known as the outcome or response variable. (Return to top)

independent variable - A variable whose values are taken as given and presumed to affect the values on a dependent variable. For example, in a study of whether gender relates to playing on a sporting team, gender would be the independent variable, also known as the explanatory or predictor variable. (Return to top)

interval measure - A variable in which 1) the values can be rank-ordered, 2) the intervals between adjacent values are equal, and 3) there is no absolute zero point. Examples are SAT scores, IQ scores, and net worth. (Return to top)

mean - A descriptive statistic that measures the "central tendency" of a distribution of values for a given variable. It is computed by summing all observed values and dividing by the number of observations. The mean is sensitive to extremely high or low values (as compared to the rest of the observations). (Return to top)

median - A descriptive statistic that measures the "central tendency" of a distribution of values for a given variable. It represents the "middle" value in a rank-ordered set of observations. For example, in the set 1, 5, 9, 20, 50, the median is 9, while in 1, 5, 9, 20, the median is 7. The median is preferred when the mean is being drastically affected by extreme values. In a normal distribution, the mean and median are the same. (Return to top)

mode - A descriptive statistic that measures the "central tendency" of a distribution of values for a given variable. It represents the most frequently observed value in the set of observations and is used for nominal measures. For example, in a group of college students composed of 100 majors in literature, 50 in sociology, 30 in philosophy, and 20 in economics, the modal category is literature. (Return to top)

multivariate analysis - The analysis of the simultaneous relationships among three or more variables. (Return to top)

nominal measure - A variable in which the values are simply different from each other and cannot be rank-ordered. Examples are race, gender, and marital status. (Return to top)

nonsampling error - The statistical imprecision in an estimated population parameter that cannot be attributed to the sample used to make the estimate. This imprecision is unavoidable, and difficult or impossible to quantify, so care must be taken to minimize it as much as possible. Sources of nonsampling error in survey research include poorly worded questions, misunderstood questions, question ordering, question response options, incorrectly checked boxes, data entry errors, nonresponse, the provision of false data, and so on. (Return to top)

normal distribution - A symmetrical, bell-shaped curve plotted on two axes. The y-axis represents the number of cases with a particular value on a single variable, and the x-axis represents the value itself. In a normal distribution, the mean, median, and mode are all the same, and 68.3% of the cases fall within one standard deviation of the mean, 95.5% within two standard deviations, and 99.7% within three standard deviations. (Return to top)

null hypothesis - The hypothesis that is directly tested in statistical significance testing. It states that there is no relationship between the variables being analyzed. If one can statistically reject the null hypothesis, then one can conclude with relatively high certainty that the observed relationship is not due to sampling error, but to other reasons, such as theorized causes or unexamined confounding variables. (Return to top)

ordinal measure - A variable in which the values can be rank-ordered, but have no standard unit of measurement. Examples are movie ratings (thumbs up or thumbs down), socioeconomic status (low, middle, or high), and level of appreciation for coffee (love it, like it, or hate it). (Return to top)

population - The set of individuals or other things from which a sample is drawn. Ideally (though infrequently), a population is fully enumerated to ensure the selection of a well-drawn sample. A population is sometimes called a universe. (Return to top)

ratio measure - A variable in which 1) the values can be rank-ordered, 2) the intervals between adjacent values are equal, and 3) there is an absolute zero point. Examples are the Kelvin temperature scale, people's salaries, and the number of children a couple has. (Return to top)

regression coefficient - A measure expressing how an independent variable relates to a dependent variable in a regression model. In linear regression (the most common kind), a coefficient is shown in unstandardized and standardized form. The unstandardized coefficient, B, indicates how much change occurs in the dependent variable when there is a one-unit change in the independent variable. The standardized coefficient, Beta, indicates the same thing, though does so in terms of the z-scores for both variables, not in terms their original units. Comparing Beta coefficients is the way to assess the predictive strength of one independent variable against others. (Return to top)

sample - A set of cases drawn from and analyzed to estimate the parameters of a population. A simple random sample from an enumerated list of the population of interest is ideal. However, such samples are difficult to obtain, so other sampling techniques are often employed. These fall into two categories: probability sampling (e.g., systematic, stratification, multistage cluster, and probability proportional to size sampling) and nonprobability sampling (e.g., quota, convenience, purposive, and snowball sampling). (Return to top)

sampling error - The statistical imprecision in an estimated population parameter that results from using a random sample to make the estimate. The imprecision comes from the fact that the sample used for a particular estimate is only one of a large number of samples of the same size that could have been selected. If one drew multiple samples of a given size from the same population, the composition of the samples would differ due to random chance and the estimates based on the samples would differ as well. For example, if the true population mean was 50, one sample might provide an estimate of 49, another of 35, still another of 55, and so on up to the maximum number of samples that could be drawn. Statistical significance testing deals with the distribution of all of these many estimated means to determine how likely it would be to get the one particular estimate from the one sample that was selected. (Return to top)

standard deviation - A unit of measurement that describes how dispersed or spread out a group of values is around their mean. In a normal distribution or bell curve, 68.3% of the cases fall within one standard deviation of the mean, 95.5% within two standard deviations, and 99.7% within three standard deviations. (Return to top)

statistical significance - The probability, p, that an observed relationship between two variables could be attributed to sampling error or random chance operations alone. By convention, a relationship between two variables is called statistically significant when p < .05. In other words, when there is a relatively small chance (less than 5 in 100) that the observed relationship could be caused by sampling error, then one has identified a statistically significant relationship. The value of p is affected by the size of the sample and the strength of the observed relationship. Thus, it is common for trivial or weak relationships to be statistically significant in large samples. Similarly, strong relationships might not be statistically significant if a small sample is used. In any case, statistical significance should not be confused for substantive significance. See the Further Reading section below for more information about statistical significance testing. (Return to top)

substantive significance - The extent to which a relationship between two variables has an important or practical effect in the real world. For example, a researcher might find a statistically significant relationship between whether students take a math refresher course and their scores on a math placement test, but if the observed relationship between the two variables is such that taking the course results in an average increase of only a point or two, then decision-makers might conclude that the increase is not substantively significant (especially in relation to other concerns, such as the cost of providing math refresher courses). (Return to top)

univariate analysis - The analysis of a single variable for purposes of description. (Return to top)

variable - An attribute or characteristic of a case that is capable of assuming any of a set of values. Examples are colors of cars, breeds of dogs, and salaries of people. In database terms, a variable is the same as a field. (Return to top)

z-score - A standardized unit of measurement that is defined relative to the mean of a variable. This relativity lets researchers compare scores on variables that have different units of measurement. If a variable has a z-score of 1, then it is one standard deviation from the mean. For example, if the mean score on a test is 85 with a standard deviation of 5 points, then a student scoring 90 will have a z-score of 1, while a student scoring 80 will have a z-score of -1. (Return to top)

Common Statistical Tests


The table below shows some common statistical tests used for different combinations of independent and dependent variables at various levels of measurement. Click the test names for brief descriptions of them. More detailed information about test assumptions, null hypotheses tested, sampling distributions used, and computation of test statistics can be found in any undergraduate or graduate textbook on statistics (see the Further Reading section below).
Determining Appropriate Statistical Tests
Independent
Variable
Dependent Variable
Categorical
(Two Values)
Categorical
(Over Two Values)
Continuous
Categorical
(Two Values)
Categorical
(Over Two Values)
Continuous

analysis of variance (ANOVA) - Used to test for differences among the means of three or more groups. For example, one would use ANOVA to see if the average contribution to disaster relief differed among liberals, moderates, and conservatives. In this case, the amount of money donated is the dependent variable (measured at the ratio level) and political orientation is the independent variable (measured at the nominal level). If the ANOVA's F statistic is statistically significant, then one can say that at least one mean differs from one of the others. One cannot say which means differ. To make that determination, one uses a post-hoc test (e.g., Bonferroni) to statistically compare the pairs. In this example, there are three pairs to test (i.e., liberal/moderate, liberal/conservative, and moderate/conservative). (Return to Common Statistical Tests)

chi-square (crosstab) - Used to see whether two categorical variables relate to each other. For example, the crosstab below fictitiously shows how people's race relates to the type of music they like best. By putting race (independent variable) in the columns, music type (dependent variable) in the rows, and comparing the column percentages across the rows, we see that whites are most likely to choose rock as their favorite music, blacks are most likely to choose R&B, and Hispanics and those of other races are most likely to choose "other" types of music (e.g., jazz, world, or classical).

Fictitious Data on Race and Musical Preference
Music
Race
Total
White
Black
Hispanic
Other
Rock
20
50%
2
10%
4
20%
4
20%
30
30%
R&B
4
10%
12
60%
4
20%
4
20%
24
24%
Other
16
40%
6
30%
12
60%
12
60%
46
46%
Total
40
100%
20
100%
20
100%
20
100%
100
100%

In the example above, the differences in the percentages are fairly large, making the interpretation of the crosstab easy. Interpreting smaller crosstabs, like a 2 x 2 table composed of two binary variables, is also easy. However, difficulties arise when tables are large, differences are small, or both. To help interpret such tables, looking at the statistical significance of the chi-square statistic is useful. If the chi-square is statistically significant, then one knows there is at least one statistical difference in the crosstab. On the other hand, if it is not statistically significant, then from a statistical perspective there are no differences to be found. (Return to Common Statistical Tests)

linear regression - A technique that allows one to examine how a set of continuous variables relates to a continuous dependent variable. Linear regression assumes that the effects of the independent variables are additive, and that the relationships between the independent and dependent variables are linear. One use of regression analysis would be if a researcher were interested in how scores on a test of English fluency are affected by the number of days spent on an English immersion retreat. The control variables used might include how many years a person has been learning English and how old a person is. The primary output generated through regression analysis is a table of regression coefficients. These allow one to estimate how much effect one variable has on the dependent variable (independent of the effects of the control variables). (Return to Common Statistical Tests)

logistic regression - A technique that allows one to examine how a set of continuous variables relates to a dichotomous (a.k.a. binary, indicator, or dummy) dependent variable. Logistic regression assumes the effects of the independent variables are additive. One use of logistic regression would be if a researcher were interested in whether developing lung cancer is dependent on whether people worked in a chemical manufacturing plant. The control variables used might include how many years people smoked and whether they have a history of lung cancer in the family. In contrast to the regression coefficients produced by linear regression, those in logistic regression relate to the odds of something being the case versus not being the case (e.g., having lung cancer versus not having lung cancer). Output produced in logistic regression analysis includes odds ratios, which are defined as the natural logarithm, e, raised to the power of B. In other words, an odds ratio is eB, where e is approximately 2.72 and B is a regression coefficient produced in the logistic regression analysis. The odds ratio is interpreted as the factor by which the odds of something being the case are changed with a one-unit change in an independent variable. For example, if the odds ratio associated with working in a chemical manufacturing plant were 2, then that would mean that the odds of having lung cancer would be two times that of someone who did not work in a plant. (Return to Common Statistical Tests)

multinomial logistic regression (MLR) - An extension of logistic regression that allows one to analyze dependent categorical variables with three or more value categories (rather than dependent variables that are simply binary). The interpretation of MLR coefficients is the same as in logistic regression, though one of the categories in the dependent variable is taken as the reference category throughout the entire analysis. For example, if a market researcher were studying alcohol preference, he or she might have an outcome variable with four potential values: beer, wine, liquor, and dislikes alcohol. The researcher might want to compare the odds of liking one of the first three categories to disliking alcohol altogether. In that case, the last category would be the reference category, and the regression output would result in three tables of coefficients (for beer, wine, and liquor, respectively). (Return to Common Statistical Tests)

Pearson correlation - Used to test whether a linear relationship exists between two continuous variables and measure how strong the relationship is. The Pearson correlation coefficient, r, ranges from -1 to 1, where -1 is a perfect negative relationship (as one variable goes up, the other goes down), 0 is no relationship (the variables are independent of each other), and 1 is a perfect positive relationship (as one variable goes up, so does the other one). For example, one could use a Pearson correlation to see if there is a relationship between the number of years of education people have and their income. If one found a statistically significant value of r = .21, then one would conclude that there is a weak positive relationship between the two variables. People with more education tend to have higher incomes than people with less education and vice versa. The reason it is a weak correlation is because r2 = .04, which means that each variable explains only 4% of the variation in the other. In other words, 96% of the variation must be explained by other factors. (Return to Common Statistical Tests)

t-test - Used to test for differences between two group means, or between the mean of one group and a given number, such as a known population parameter or empirical constant. For example, a researcher would use a t-test to find out if exam scores differed for men and women, or determine whether the estimates of the speed of light from a series of physics experiments differed from 299,792,458 m/s. If the t statistic in either example were statistically significant, then one would conclude that there was a statistical difference between the average scores of men and women, or that the average estimate of the speed of light statistically differed from the scientifically accepted constant. (Return to Common Statistical Tests)

Further Reading

 (Return to top)

Most of the information provided above can be found in any undergraduate or graduate textbook on statistics. Examples of each are:

  • Statistical Methods for the Social Sciences, Third Edition, by Alan Agresti and Barbara Finlay
  • Social Statistics, Revised Second Edition, by Hubert M. Blalock, Jr.

For more in-depth discussions on specific topics, Sage Publications has a collection of short books in their Quantitative Applications in the Social Sciences series.

Note: Statistical significance testing is a common practice in social scientific research, though it is not without its problems and controversies. A good guide is Ramon Henkel's Tests of Significance, which is in the Sage series mentioned above. Critical discussions of statistical significance testing are:

  • Selvin, Hanan. 1957. "A Critique of Tests of Significance in Survey Research." American Sociological Review 22:519-27.
  • Kish, Leslie. 1959. "Some Statistical Problems in Research Design." American Sociological Review 24:328-38.
  • Carver, Ronald. 1978. "The Case Against Statistical Significance Testing." Harvard Educational Review 48:3:378-99.
  • Cohen, Jacob. 1990. "Things I Have Learned (So Far)." American Psychologist 45:1304-12.
  • Carver, Ronald. 1993. "The Case Against Statistical Significance Testing, Revisited." Journal of Experimental Education 61:4:287-92.
  • Cohen, Jacob. 1994. "The Earth Is Round (p < .05)." American Psychologist 49:997-1003.
Contact Information
Lisa M Rodrigues-Doolabh
203-596-2104 (p)
203-575-8051 (f)
Room: K709
750 Chase Parkway
Waterbury, CT 06708