Linkdatagen is a PERL script that generates LINKAGE style files for ALLEGRO, MERLIN, PREST, MORGAN, PLINK, FEstim, BEAGLE and RELATE and fastPHASE using as input genotype calls from Affymetrix SNP chips, Illumina SNP chips, or SNP genotypes inferred from massively parallel sequencing (MPS) data, such as whole exome or whole genome sequence data.
- The first incarnation of linkdatagen.pl was only able to process genotypes from Affymetrix SNP chips.
- Subsequently, linkdatagen.pl was renamed linkdatagen_affy.pl. Separate scripts were developed for Illumina SNP chip genotypes (linkdatagen_illumina.pl) and MPS genotypes (linkdatagen_mps.pl and companion script vcf2linkdatagen.pl).
- As of the 15th May 2012, the three linkdatagen scripts have been combined into a single script named linkdatagen.pl. The type of genotypes being processed is indicated by the -data option ('a' for Affymetrix SNP chip data, 'i' for Illumina SNP chip data or 'm' for SNP genotypes from MPS data). vcf2linkdatagen.pl remains a separate script that must be run before using linkdatagen.pl with the -data m option.
- On 3rd February 2016, a new annotation format was released for MPS and Illumina SNP chip data that can be used across both formats. Where annotation files for new Illumina SNP chips have not been released the new annotation file may be of benefit. MPS genotypes can now be called with GATK Unified Genotyper.
Additional files for processing Affymetrix SNP chip dataDownload the annotation files. These files are required by LINKDATAGEN and are generated by us from the (very large) Affymetrix annotation files and HapMap Phase I, II and III data.
For human genome build GRCh36/hg18 download the Affymetrix build 36 annotation files. Last updated 29th April 2013.
For human genome build GRCh37/hg19 download the download the Affymetrix build 37 annotation files. Last updated 15th November 2012.
For an example download the test data set affymetrix_testdata.tar.gz. Last updated 15th November 2012.
Additional files for processing most Illumina SNP chip dataThe annotation files are required by LINKDATAGEN and are generated by us from the Illumina annotation files and HapMap Phase I, II and III data. We support a range of Illumina chips including the 370Duo, 610Quad, Cyto12, OmniExpress, and 1M chips. These annotations are preferred when your SNP chip data is on these chips. For several other chips the best annotation file is suggested in the linkdatagen documentation.
For human genome build GRCh37/hg19 download the Illumina annotation files. Last updated 3rd February 2016.
Additional files for processing SNP genotypes obtained from MPS and some Illumina SNP chip dataFor MPS data you will need to download vcf2linkdatagen.pl. Last updated 3rd February 2016. This is a companion script used to convert VCF files into a BRLMM genotype call file that can be processed by linkdatagen.pl.
Download the vcf2linkdatagen documentation. Last updated 3rd February 2016.
Download our quick-start guide to processing MPS genotypes for linkage analysis. Last updated 3rd February 2016.
Download the HapMap Phase II annotation files (genome build b37). Last updated 3rd February 2016. Annotation for up to 4,031,388 SNPs for the four HapMap Phase II populations (CHB, CEU, JPT, YRI).
Download the HapMap Phase III annotation files (genome build b37). Last updated 3rd February 2016. Annotation for up to 1,582,941 SNPs for the eleven HapMap Phase III populations.
If you wish to perform genotype calling with GATK's UnifiedGenotyper, then download the appropriate VCF file for HapMap Phase II or HapMap Phase III for the respective annotation file you plan to use.
Test data for MPS dataWe provide test data that was presented in the Smith KR et al (2011) paper. Last updated 3rd February 2016
Included in the test data:
VCF files containing SNP genotypes at the location of HapMap Phase II SNPs:
(i) Family A: Single affected individual - A-7.HapMapII.SNPs.vcf, recessive family, homozygosity mapping
(ii) Family T: Single affected individual - T-1.HapMapII.SNPs.vcf, recessive family, homozygosity mapping
(iii) Family M: Two affected siblings - M-3.HapMapII.SNPs.vcf and M-4.HapMapII.SNPs.vcf, dominant family.
SNP genotypes from Illumina genotyping arrays for the same individuals:
(i) Family A: A-7.FinalReport.txt
(ii) Family T: T-1.FinalReport.txt
(iii) Family M: M-3.FinalReport.txt and M-4.FinalReport.txt
VCF files containing SNP genotypes at the location of genotyping array SNPs (for concordance checks):
(i) Family A: A-7.IL610Q.SNPs.vcf
(ii) Family T: T-1.IL610Q.SNPs.vcf
(iii) Family M: M-3.ILOE.SNPs.vcf and M-4.ILOE.SNPs.vcf
Email bug reports & questions to Melanie Bahlo (firstname.lastname@example.org).
If you use linkdatagen.pl, please acknowledge by citing:
If you use linkdatagen.pl and/or vcf2linkdatagen.pl to process MPS genotypes, please also cite:
Smith KR, Bromhead CJ, Hildebrand MS, Shearer AE, Lockhart PJ, Najmabadi H, Leventer RJ, McGillivray G, Amor DJ, Smith RJ, Bahlo M (2011). Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes. Genome Biology 12:R85.