WEHI Logo
WEHI Home Page

About WEHI

News

Research

Education

Intranet
(Staff Only)

Search

Contact WEHI


© Copyright 2003
Walter & Eliza Hall Institute
Contact the Webmaster
bioinformatics logo

Help Page for "Testing between Curves".

The Situation

Let's say you wanted to know whether there was any difference between two groups. For example, the height of 12 year olds, who did and didn't eat meat. With no assumptions about distribution, an appropriate test may be the Mann-Whitney (or Wilcoxon Rank-Sum) test. But what if you had height measurements for the children over a number of years? A separate test for each year would lose power, and not answer a question addressing the whole time period. This is where this test is appropriate.

The Program

The program will calculate a test statistic (D), given the data. This is found using the formula:
\begin{formula}
D =
\sum_t \frac{ \vert \bar{x_t} - \bar{y_t} \vert } {s_t (\frac{1}{n_{x_t}} + \frac{1}{n_{y_t}}) }
\par\end{formula}
where \bar{x}_t and \bar{y}_t are the averages of groups x and y, at time t.
st is the standard deviation, and nxt and nyt are the group sizes.

The absolute value of the difference between means (|\bar{x}_t-\bar{y}_t| as given above) is used when it is being asked whether there is any difference between groups. When the question is whether one group is larger than another, (\bar{x}_t-\bar{y}_t) is used.

To carry out a hypothesis test, the data is permuted. Each of the individuals are randomly allocated to one of the two groups, while keeping the same group sizes. D is then recalculated using the newly permuted data set. This represents a possible value of D, if there was no difference between the groups. The permutation process is repeated many times, each time noting whether the value of D for the original data set is greater than the D calculated using a permuted data set. The proportion of times this is true represents a p-value for a 2-sided test.

An example.

So far, the program has been used to test between different groups of mice, with different genotypes. In this example, the size of a leishmania lesion is measured by a number between 0 and 4, where 0 is no lesion.
The data file can be found here .

Entering the data.

This file can be cut and paste into data entry box on the previous page. Since each row of the data file contains lesion scores for a particular mouse, the default option "Each row contains a new individual from one of the two groups and each column contains a different time point. " should be used. Each column is separated by one tab, and so either the option "tab" or "whitespace" could be used.

If you then click on the Run box, the resultant page should look like this:
screenshot

Now if you scroll down, you should be able to fill in the variables to carry out the test. The page will have taken a stab in the dark at what some variables are, but undoubtedly you will need to correct them.
In this example, the name of the first and second groups are C57BL/6 and B6.lmr1/2 respectively.
The data does contain sex information, in column 1.
The row containing time information is 1 (note that the program will ignore all letters in the time row, including the w's in this example).
The first column containing data information is 2, and the last is 17.
The first row containing group one data is 2, and the last is 41.
The first row containing group two data is 43, and the last is 85.

The number of runs, is equivalent to the number of permutations carried out. I recommend 10000.
The box-plots option, when clicked, gives box-plots for both groups at each given time point.
If the data contained sex information, you can test between the sexes (as opposed to between groups).
The last option allows you to carry out, either a 1-sided or 2-sided hypothesis test.
Once you have chosen you're options, click the Run button, after a given amount of time, you should find the results page loaded. If it's taking for ever, maybe try a smaller number of runs first.

The results page.

Since the test statistic varies slightly, according to whether you have a one or two sided test, this is the first thing shown on the results page.

The next section contains the number of individuals in each group (and the number of males and females, if sex has been determined). This is information is useful for checking whether all the data has been entered into the program.

In the section titled, "Visual Comparison of the Data", the curves are plotted. That is, the means of the two groups are plotted against each other. If the box-plots option was chosen then two graphs (one for each group) are plotted, showing a box-plot for each time point. This is useful for comparing the spread of the data, as well just the means.

Lastly, the results of the test are displayed. The value, D_obs, is the value of the test statistic, as calculated using the observed data. The data is then permuted a given number of times, with the test statistic (D) calculated for each permutation. The p-value is equal to the proportion of permutations that the test statistic was found to be greater than the D_obs.

The distribution of D values for all permutations is plotted. This gives the distribution of D, if there was no difference between the groups. On this graph, the observed value of D is shown, with a ^. If the ^ is found to be to the right of the distribution, then the null hypothesis of no difference between groups should be rejected. If the distribution function looks disjointed, the number of permutations could be increased.

Comments/Questions? Contact Russell Thomson.
Last modified: 08 October 2003