Sample data generation for PLINK Pedigree and Map files

The tutorial uses the following PLINK-format sample data files:



They have been generated as follows:

  1. Go to the Genapha project page.
  2. Navigate to Mapping Tools → Pathway to gene. Search for the “asthma” pathway selecting “Text” pathway input.
  3. In Additional Options, select skip the results page and export to file, entering the file name.
  4. In the downloaded file, remove the header line and the first column containing the pathway name (asthma).
    awk '{if (NR>1) {print $2,"\t",$3}}' asthmaGenesIn > asthmaGenesOut
  5. In Genapha navigate to Mapping Tools → Gene to SNP. Upload the modified gene list via File Upload.
  6. Select skip the results page and export to file, entering the file name.
  7. After pressing the Next button, wait for a few minutes as the Genapha server processes your request.
  8. When a Save As ... dialogue appears save the list of SNP’s corresponding to the asthma pathway.
  9. Use awk to delete the file header row and extract only the first column in this SNP list file.
    awk '{if (NR!=1) {print $2;}}' snpListIn > snpListOut
  10. Use PLINK + HapMap CEU Phase 3 PLINK format data to create new PED/MAP files filtered by the newly created SNP list.
    plink --file hapmap3_r2_b36_fwd.CEU.qc.poly --extract snpListFile --out asthma_pathway --noweb --recode
  11. Replace the missing phenotypes (column 6 in the PED file contains all -9’s) with random ones and twos (unaffected/affected status).
    perl -r 6 asthma_pathway.ped asthma_pathway_rand.ped
  12. (Optional) For tests that require complete trios (2 parents + offspring all genotyped; for ex. Nyholt test in Path2) you need to remove offspring without two genotyped parents. To generate the sample file asthma_pathway_rand_onlytrios.ped the following offspring were removed: 10852, 10853, 12375, 12707, 12708, 10830, 10836, 10838

Return to Tutorial