Contrast (statistics)

In statistics, particularly in analysis of variance and linear regression, a contrast is a linear combination of variables (parameters or statistics) whose coefficients add up to zero, allowing comparison of different treatments.^[1]^[2]

Definitions

Let $\theta _{1},\ldots ,\theta _{t}$ be a set of variables, either parameters or statistics, and $a_{1},\ldots ,a_{t}$ be known constants. The quantity $\sum _{i=1}^{t}a_{i}\theta _{i}$ is a linear combination. It is called a contrast if $\sum _{i=1}^{t}a_{i}=0$ .^[3]^[4] Furthermore, two contrasts, $\sum _{i=1}^{t}a_{i}\theta _{i}$ and $\sum _{i=1}^{t}b_{i}\theta _{i}$ , are orthogonal if $\sum _{i=1}^{t}a_{i}b_{i}=0$ .^[5]

Examples

Let us imagine that we are comparing four means, $\mu _{1},\mu _{2},\mu _{3},\mu _{4}$ . The following table describes three possible contrasts:

$\mu _{1}$	$\mu _{2}$	$\mu _{3}$	$\mu _{4}$
1	-1	0	0
0	0	1	-1
1	1	-1	-1

The first contrast allows comparison of the first mean with the second, the second contrast allows comparison of the third mean with the fourth, and the third contrast allows comparison of the average of the first two means with the average of the last two.^[4]

In a balanced one-way analysis of variance, using orthogonal contrasts has the advantage of completely partitioning the treatment sum of squares into non-overlapping additive components that represent the variation due to each contrast.^[6] Consider the numbers above: each of the rows sums up to zero (hence they are contrasts). If we multiply each element of the first row by the corresponding element of the second row and add those up, this again results in zero, thus the first and second contrast are orthogonal and so on.

Sets of contrast

Orthogonal contrasts are a set of contrasts in which, for any distinct pair, the sum of the cross-products of the coefficients is zero (assume sample sizes are equal).^[7] Although there are potentially infinite sets of orthogonal contrasts, within any given set there will always be a maximum of exactly k – 1 possible orthogonal contrasts (where k is the number of group means available).^[8]
Polynomial contrasts are a special set of orthogonal contrasts that test polynomial patterns in data with more than two means (e.g., linear, quadratic, cubic, quartic, etc.).^[9]
Orthonormal contrasts are orthogonal contrasts which satisfy the additional condition that, for each contrast, the sum squares of the coefficients add up to one.^[7]

Background

A contrast is defined as the sum of each group mean multiplied by a coefficient for each group (i.e., a signed number, c_j).^[10] In equation form, $L=c_{1}{\bar {X}}_{1}+c_{2}{\bar {X}}_{2}+\cdots +c_{k}{\bar {X}}_{k}\equiv \sum _{j}c_{j}{\bar {X}}_{j}$ , where L is the weighted sum of group means, the c_j coefficients represent the assigned weights of the means (these must sum to 0 for orthogonal contrasts), and ${\bar {X}}$ _j represents the group means.^[8] Coefficients can be positive or negative, and fractions or whole numbers, depending on the comparison of interest. Linear contrasts are very useful and can be used to test complex hypotheses when used in conjunction with ANOVA or multiple regression. In essence, each contrast defines and tests for a particular pattern of differences among the means.^[10]

Contrasts should be constructed "to answer specific research questions", and do not necessarily have to be orthogonal.^[11]

A simple (not necessarily orthogonal) contrast is the difference between two means. A more complex contrast can test differences among several means (ex. with four means, assigning coefficients of –3, –1, +1, and +3), or test the difference between a single mean and the combined mean of several groups (e.g., if you have four means assign coefficients of –3, +1, +1, and +1) or test the difference between the combined mean of several groups and the combined mean of several other groups (i.e., with four means, assign coefficients of –1, –1, +1, and +1).^[8] The coefficients for the means to be combined (or averaged) must be the same in magnitude and direction, that is, equally weighted. When means are assigned different coefficients (either in magnitude or direction, or both), the contrast is testing for a difference between those means. A contrast may be any of: the set of coefficients used to specify a comparison; the specific value of the linear combination obtained for a given study or experiment; the random quantity defined by applying the linear combination to treatment effects when these are themselves considered as random variables. In the last context, the term contrast variable is sometimes used.

Contrasts are sometimes used to compare mixed effects. A common example is the difference between two test scores — one at the beginning of the semester and one at its end. Note that we are not interested in one of these scores by itself, but only in the contrast (in this case — the difference). Since this is a linear combination of independent variables, its variance equals the weighted sum of the summands' variances; in this case both weights are one. This "blending" of two variables into one might be useful in many cases such as ANOVA, regression, or even as descriptive statistics in its own right.

An example of a complex contrast would be comparing 5 standard treatments to a new treatment, hence giving each old treatment mean a weight of 1/5, and the new sixth treatment mean a weight of −1 (using the equation above). If this new linear combination has a mean zero, this will mean that there is no evidence that the old treatments are different from the new treatment on average. If the sum of the new linear combination is positive, there is some evidence (the strength of the evidence is often associated with the p-value computed on that linear combination) that the combined mean of the 5 standard treatments is higher than the new treatment mean. Analogous conclusions obtain when the linear combination is negative.^[10] However, the sum of the linear combination is not a significance test, see testing significance (below) to learn how to determine if the contrast computed from the sample is significant.

The usual results for linear combinations of independent random variables mean that the variance of a contrast is equal to the weighted sum of the variances.^[12] If two contrasts are orthogonal, estimates created by using such contrasts will be uncorrelated. If orthogonal contrasts are available, it is possible to summarize the results of a statistical analysis in the form of a simple analysis of variance table, in such a way that it contains the results for different test statistics relating to different contrasts, each of which are statistically independent. Linear contrasts can be easily converted into sums of squares. SS_contrast = ${\tfrac {n(\sum c_{j}{\bar {X}}_{j})^{2}}{\sum c_{j}^{2}}}$ , with 1 degree of freedom, where n represents the number of observations per group. If the contrasts are orthogonal, the sum of the SS_contrasts = SS_treatment. Testing the significance of a contrast requires the computation of SS_contrast.^[8]

Testing significance

SS_contrast also happens to be a mean square because all contrasts have 1 degree of freedom. Dividing $MS_{contrast}$ by $MS_{error}$ produces an F-statistic with one and $df_{error}$ degrees of freedom, the statistical significance of F_contrast can be determined by comparing the obtained F statistic with a critical value of F with the same degrees of freedom.^[8]

References

Casella, George; Berger, Roger L (2001). Statistical inference. Cengage Learning. ISBN 9780534243128.
George Casella (2008). Statistical design. Springer. ISBN 978-0-387-75965-4.
Everitt, B S; Skrondal, A (2010). Cambridge dictionary of statistics (4th ed.). Cambridge University Press. ISBN 9780521766999.
Dean, Angela M.; Voss, Daniel (1999). Design and analysis of experiments. Springer. ISBN 9780387985619.

External links

Notes

^ Casella, George; Berger, Roger L (2001). Statistical inference. Cengage Learning. ISBN 9780534243128.
^ George Casella (2008). Statistical design. Springer. ISBN 978-0-387-75965-4.
^ Casella a Berger 2001, p. 526.
^ ^a ^b Casella 2008, p. 11.
^ Casella 2008, p. 12.
^ Casella 2008, p. 13.
^ ^a ^b Everitt, B.S. (2002) The Cambridge Dictionary of Statistics, CUP. ISBN 0-521-81099-X (entry for "Orthogonal contrasts")
^ ^a ^b ^c ^d ^e Howell, David C. (2010). Statistical methods for psychology (7th ed.). Belmont, CA: Thomson Wadsworth. ISBN 978-0-495-59784-1.
^ Kim, Jong Sung. "Orthogonal Polynomial Contrasts" (PDF). Retrieved 27 April 2012.
^ ^a ^b ^c Clark, James M. (2007). Intermediate Data Analysis: Multiple Regression and Analysis of Variance. University of Winnipeg.{{cite book}}: CS1 maint: location missing publisher (link)
^ Kuehl, Robert O. (2000). Design of experiments: statistical principles of research design and analysis (2nd ed.). Pacific Grove, CA: Duxbury/Thomson Learning. ISBN 0534368344.
^ NIST/SEMATECH e-Handbook of Statistical Methods

[CasellaBerger2001-1] Casella, George; Berger, Roger L (2001). Statistical inference. Cengage Learning. ISBN 9780534243128.

[casella2008-2] George Casella (2008). Statistical design. Springer. ISBN 978-0-387-75965-4.

[3] Casella a Berger 2001, p. 526.

[Casella_2008,_p._11-4] Casella 2008, p. 11.

[5] Casella 2008, p. 12.

[6] Casella 2008, p. 13.

[EV-7] Everitt, B.S. (2002) The Cambridge Dictionary of Statistics, CUP. ISBN 0-521-81099-X (entry for "Orthogonal contrasts")

[Howell-8] Howell, David C. (2010). Statistical methods for psychology (7th ed.). Belmont, CA: Thomson Wadsworth. ISBN 978-0-495-59784-1.

[9] Kim, Jong Sung. "Orthogonal Polynomial Contrasts" (PDF). Retrieved 27 April 2012.

[Clark-10] Clark, James M. (2007). Intermediate Data Analysis: Multiple Regression and Analysis of Variance. University of Winnipeg.{{cite book}}: CS1 maint: location missing publisher (link)

[Kuehl-11] Kuehl, Robert O. (2000). Design of experiments: statistical principles of research design and analysis (2nd ed.). Pacific Grove, CA: Duxbury/Thomson Learning. ISBN 0534368344.

[nist-12] NIST/SEMATECH e-Handbook of Statistical Methods

[1]