The Situation

For this program to be useful for you, you will have a number of markers that you believe have not had any recombination occurred between them. You want to know if these markers are also linked to a quantitative trait loci.

The Data

For a number of individuals, you have the a phenotypic value and the genotypic information. It will be assumed the phenotypic value is continuous, and normally distributed. Individuals where the phenotype could not be measured are also useful, as they help estimate haplotype frequencies within the population. In this case, missing values are to be labelled "." or "?" or "*".

The Program

Using the EM algorithm, the program estimates the frequencies of the haplotypes found within the population. This gives, for each individual, a list of possible pairs of haplotypes, and there associated probabilities. The output of the program, contains estimates for the expected value of the quantitative trait, given all possible pairs of haplotypes. A hypothesis test is available, to ascertain whether the QTL is within the vicinity of the markers.

An example

Blood_Pressure Marker1_geno1 Marker1_geno2 Marker2_geno1 Marker2_geno2
73 1 1 1 1
74 1 1 1 1
75 1 2 1 1
77 1 2 1 2
64 1 1 2 2
74 1 2 1 2
67 2 2 1 1
68 2 2 1 1
70 2 2 1 2
68 2 2 2 2
64 2 2 2 2
? 2 2 2 2

In this example, there are 12 patients which have been genotyped at two linked markers, thought to be linked to a qtl responsible for differences in blood pressure. This data can be cut and paste into the box on the previous page.
Note: You can copy and paste your data in from most other programs, including EXCEL.

In the box labelled, "Enter number of markers:" on the previous page,
the number "2" should be given, as in this example we have 2 markers.

The option, "Columns are separated by a tab, space, comma or any whitespace."
can be left as whitespace (either tabs or spaces), since the columns in the data set are separated by spaces.

If you now click on the button marker run, you should get the page:

If you scroll down the page...
you will see the question "Enter the number of alleles at..."
This will need a "2" in both boxes, since there are 2 alleles (labelled 1 and 2) for both markers. The smallest allele is 1 at both markers. The program will assume that the allele labels are in ascending order, starting with the smallest allele. If there are gaps, the program should still work (although it may take a bit longer to realise the frequency of the missing alleles is zero).
The default values of this form are correct for the example data set and you won't need actually need to change anything, unless you wont to carry out a hypothesis test.

Interpreting the Results

The model.

The program has first used an EM algorithm, to estimate the frequencies of each of the haplotypes.

The program then used an EM algorithm to estimate the influence of a haplotype on the phenotype. It is assumed that the influence of each haplotype is additive. This means that the expected phenotype of an individual given its haplotypes is the sum of the influence from the two haplotypes. The expected phenotypic value is then modelled with a normal mixture model with parameters µhap, for all possible haplotypes, and σ².

The µhap1 is the influence of having hap1, on an individual's phenotype, where an individual with haplotypes, hap1 and hap2 has an expected phenotypic value of µhap1 + µhap2.
σ² is the variance of this influence.

The example

If you entered the example data set, then the results page would contain the following table of estimated parameters.

Haplotype Frequency µhap

In this example, an individual with haplotypes 1-1 and 1-2 has an expected phenotypic value of

E(phenotype | 1-1 , 1-2) = 38.373 + 32.001 = 70.374 .

Similarly, the expected phenotype can be found for an individual with any of the 16 possible pairs of haplotypes here.

The hypothesis test is designed to see if the phenotype is influenced by the genetic information at the linked markers. If the phenotype is independent of the haplotypes an individual has at the linked markers (H0), then the µhap's will be the same for all haplotypes, In fact they will be equal to half the mean phenotypic value for all individuals.

In the example, E(phenotype) = 70.364 and the µhap's will actually be 35.182. The hypothesis test will test whether the estimates the µhap's are significantly different from this.

How the p-value is calculated

The test is carried out using a permutation method. The result is given by a p-value where you accept H1 at the 5% level when the p-value is less than 0.05.

The permutation test will permute the phenotypic values with respect to the genotype information. Each permutation represents an example data set under the null hypothesis. The log likelihood is calculated for each permutation and the p-value is given by the proportion of permutations where the log likelihood value is greater than that of the original data set.

There is an option to give a graph of the distribution function of the log likelihood under H0. This gives a histogram of the log likelihood values for all the permutations. The "^" marks where the log likelihood value for the original data set falls.

Linkage Disequilibrium

If the data set contains two markers, each with two alleles, then Lewontin's D' will be calculated. This is given by
D' = D / Dmax

where D = p(1-1) × p(2-2) - p(1-2) × p(2-1)
Dmax = min( p1(1-p2) , (1-p1)p2 ) when D > 0
Dmax = min( p1p2 , (1-p1)(1-p2) ) when D < 0
p(1-1) is the frequency of haplotype 1-1 (and similarly for 1-2 , 2-1 , 2-2 )
p1 is the frequency of allele 1 at marker 1
p2 is the frequency of allele 1 at marker 2

Russell Thomson
Last modified: 17 September 2009