WEHI Home Page
© Copyright 2003
Walter & Eliza Hall Institute
Contact the Webmaster
|
 |
Help Page for "Testing between Curves". |
The Situation
Let's say you wanted to know whether there was any difference between
two groups. For example, the height of 12 year olds, who did and didn't eat
meat. With no assumptions about distribution, an appropriate test may be
the Mann-Whitney (or Wilcoxon Rank-Sum) test. But what if you had height
measurements for the children over a number of years? A separate test for
each year would lose power, and not answer a question addressing the whole
time period. This is where this test is appropriate.
The Program
The program will calculate a test statistic (D), given the data. This is
found using the formula:
where
and
are the averages of groups x and y, at time t.
st is the standard deviation, and
nxt and nyt
are the group sizes.
The absolute value of the difference between means (| - | as given above) is used when it is being asked whether there is any difference between groups. When the question is whether one group is larger than another, ( - )
is used.
To carry out a hypothesis test, the data is permuted.
Each of the individuals are randomly
allocated to one of the two groups, while keeping the same group sizes.
D is then recalculated using the newly permuted data
set. This represents a possible value of D, if there was no
difference between the groups.
The permutation process is repeated many times, each time noting whether
the value of D for the original data set is greater than the D
calculated using a permuted data set. The proportion of times this is true
represents a p-value for a 2-sided test.
An example.
So far, the program has been used to test between different groups of mice,
with different genotypes. In this example, the size of a leishmania lesion is
measured by a number between 0 and 4, where 0 is no lesion.
The data file can be found
here .
Entering the data.
This file can be cut and paste into data entry box on the
previous page.
Since each row of the data file contains lesion scores for a particular mouse,
the default option "Each row contains a new individual from one of the
two groups and each column contains a different time point. " should be
used. Each column is separated by one tab, and so either the option
"tab" or "whitespace" could be used.
If you then click on the Run box, the resultant page should look like this:
Now if you scroll down, you should be able to fill in the variables to carry out the test. The page will have taken a stab in the dark at what some variables
are, but undoubtedly you will need to correct them.
In this example, the name of the first and second groups are C57BL/6 and
B6.lmr1/2 respectively.
The data does contain sex information, in column 1.
The row containing time information is 1 (note that the program will ignore all
letters in the time row, including the w's in this example).
The first column containing data information is 2, and the last is 17.
The first row containing group one data is 2, and the last is 41.
The first row containing group two data is 43, and the last is 85.
The number of runs, is equivalent to the number of permutations carried out. I recommend 10000.
The box-plots option, when clicked, gives box-plots for both groups at each
given time point.
If the data contained sex information, you can test between the sexes (as
opposed to between groups).
The last option allows you to carry out, either a 1-sided or 2-sided hypothesis test.
Once you have chosen you're options, click the Run button, after a given amount of time, you should find the results page loaded. If it's taking for ever, maybe try a smaller number of runs first.
The results page.
Since the test statistic varies slightly, according to whether you
have a one or two sided test, this is the first thing shown
on the results page.
The next section contains the number of individuals in each group (and
the number of males and females, if sex has been
determined). This is information is useful for
checking whether all the data has been entered into
the program.
In the section titled, "Visual Comparison of the Data", the curves are
plotted. That is, the means of the two groups are
plotted against each other. If the box-plots option
was chosen then two graphs (one for each group) are
plotted, showing a box-plot for each time point.
This is useful for comparing the spread of the data,
as well just the means.
Lastly, the results of the test are displayed. The value, D_obs, is
the value of the test statistic, as calculated using
the observed data. The data is then permuted a
given number of times, with the test statistic (D)
calculated for each permutation. The p-value is
equal to the proportion of permutations that the
test statistic was found to be greater than the D_obs.
The distribution of D values for all permutations is plotted. This
gives the distribution of D, if there was no
difference between the groups. On this graph, the
observed value of D is shown, with a ^. If the ^ is
found to be to the right of the distribution, then
the null hypothesis of no difference between groups
should be rejected. If the distribution function
looks disjointed, the number of permutations could
be increased.
|
Comments/Questions? Contact Russell Thomson.
Last modified:
08 October 2003
|
|