Path2 Tutorial



Analysis Options tab

Once the genotype data is loaded and all mapping files have been retrieved, Path2 will advance to the “Analysis Options” tab. The Analysis Options tab has three purposes:

  • Run PLINK-based single SNP association analyses
  • Run Perl-based genetic pathway association analyses
  • Generate Linkage Disequilibrium (LD) plots

If you have been following the tutorial, the Analysis options panel will look like this:

Analysis Options tab

Order of operation when the “Run” button is pressed

  1. First, the *one* PLINK-based single SNP association analysis is used to generate an analysis results file. This results file is output by PLINK to:

    Path2/analysis/singlesnp/Combined/default/dat.out.(assoc.linear, assoc.logistic, tdt)

    The file's SNP's, P-values and test statistics are copied to:

    Path2/analysis/singlesnp/Combined/default/singleSNP.results.forDB

  2. This single results file is then used in *all* of the Perl-based genetic pathway association analyses. In order to generate pathway analysis results for multiple single SNP association tests, you must select and “Run” each single SNP test, either: (a) saving the pathway association analysis results to a different location after each run, or (b) specifying the output folder for the pathway tests.
  3. The Generate LD Plots feature uses only the single SNP association analysis results file, and is run after all Pathway/Ontology tests are completed.

Performing single SNP association analyses

The “Single SNP Association Analysis Options” panel lets you specify what type of PLINK single SNP analysis you want Path2 to run. There are four tabs under the Single SNP Association Analysis Options panel, Path2 will automatically enable or disable certain options based upon the dataset imported in the Import data panel in the previous step.

  • Logistic regression: For Binary phenotype data. Calls PLINK with --logistic flag.
  • Linear regression: For Continuous phenotype data. Calls PLINK with --linear flag.
  • Family-based analysis: Calls PLINK --tdt, running a Transmission Disequilibrium Test
  • Skip (use imported results): PLINK will not be called. The imported association results specified in the Import data panel will be used instead.

For Logistic and Linear regression tests different analysis options exist. Please see the PLINK documentation for details. Here are the options included in Path2:

  • Allelic association test: PLINK default.
  • Genotypic association test: uses PLINK --genotypic flag.
  • Dominant gene action test: uses PLINK --dominant flag.
  • Recessive gene action test: uses PLINK --recessive flag.

Once the Run button has been selected Path2 will call PLINK to perform the selected single SNP analysis. If PLINK is successfully run, the output PLINK results file should be in Path2/analysis/singlesnp/Combined/default. The original association results file generated by PLINK will be called dat.out.assoc.logistic for a Logistic regression test or dat.out.assoc.linear for a Linear regression test, for example. Regardless of the test, or if imported results are used, Path2 will copy the results file to the file: Path2/analysis/singlesnp/Combined/default/singleSNP.results.forDB . This file will be used in Pathway analysis or LD plot generation.

Ensure that the Logistic regression tab and Allelic association test option are selected then click the Run button.

Running a Logistic regression Allelic association analysis on the sample dataset

Here is an example line of a logistic association results file produced from a sample dataset:

CHRSNPBPA1TESTNMISSORSTATP
1rs12135788157531167GADD1651.1420.35890.7197

 

 

Running Pathway/Ontology association analyses

Perhaps the most useful application of the Path2 program is in performing different Pathway- and Ontology-based association analyses. Currently, four pathway tests and one gene ontology test are built into Path2:

These tests are implemented in the Perl scripting language and the algorithms for these tests can all be found in the literature. You can find the Perl scripts in the folder Path2/Perl. (Additional documentation on running the scripts separately from Path2’s Java interface can be found in the Word documents in Path2/Perl.) For example, the source for the SNP Ratio Test is comprised of the files:

runSRT.pl

SRT/Constants.pm

SRT/MyTools.pm

SRT/SRT.pm

SRT/Tools.pm

SRT/R/computeSRT.R

Please note that these Perl scripts rely on the R statistic programming language to perform the backend computational work. This is why Path2 requires Java, Perl, and R all to be installed in order to run correctly.

A particular pathway/ontology test can be run by selecting its corresponding check box in the “Pathway/Ontology Association Analysis Options” panel, then click on the Run button. As noted in the “Performing single SNP association analyses”, the tests will use the single SNP association results file at:

Path2/analysis/singlesnp/Combined/default/singleSNP.results.forDB

Path2 will always run the single SNP association test specified immediately before running the selected pathway tests.

** PLEASE NOTE! Several of the Pathway tests (for example SNP Ratio Test) can take a very long time to run depending upon the size of the dataset. Genome-wide association study level datasets must be run on a dedicated computational cluster. At this time, it is highly recommended to first run Path2 on the sample synthetic asthma dataset provided with the application in “Path2/data/sampleData” to ensure your install is set up correctly, and to familiarize yourself with the application first. Moreover, a few potential problems with running Path2 on GWAS datasets have been identified and will addressed in future versions of the application. **

Specifying the output folder for Pathway/Ontology tests

By default, you should find the pathway/ontology test results in the Path2/analysis/pathway folder. If, for example, you run the Sidak test and find no results under analysis/pathway/Sidak/Combined/default, please check the terminal output or the log file in Path2/log.txt for errors!

To change the output folder for Pathway tests, please use the file chooser in the panel “Select Output Folder for Pathway/Ontology Association Analysis Options” via the “Browse” button. This file chooser expects you to select a directory to which future Pathway/Ontology test results will be stored.

This feature is useful, for example, if you wish to run several slightly different pathway analyses on the same dataset without having to copy the Path2 output to a different folder after each analysis.

Generating LD (Linkage Disequilibrium) plots

Path2 can optionally generate LD plots for each gene in the loaded dataset if specified. SNPs are assigned to genes based on NCBI designations. This option is not available when only an analysis results file is loaded in the “Import Data” panel. The LD plots are generated by Haploview. Haploview is included in Path2 as Path2/Perl/Haploview.jar. The version of Haploview used is 4.1 built from source that can be found at the project’s SourceForge page. To run Haploview, Path2 converts its binary PLINK data to Haploview format pedigree and position files, then runs Haploview with the following options:

  • maxDistance : the maximum intermarker distance for LD comparisons (in kilobases)
  • ldvalues : specifies what measure Haploview should use in calculating LD (one of "DEFAULT", "RSQ", "DPALT", "GAB" or "GAM")
  • ldcolorscheme : color scheme used for Haploview LD plots (one of "DPRIME", "RSQ", or "NONE")

To generate LD plots for the sample dataset genes:

  1. Select an appropriate single SNP analysis option. See “Performing single SNP association analyses above”.
  2. (Optional) Un-select all Pathway/Ontology association analysis options (they can take a while, even for small datasets)
  3. Select the “Generate LD Plots” check box and click on the “Run” button.

If Haploview runs successfully, the LD plots should be in: Path2/analysis/LD .

Example LD plot - gene 2205

Tutorial
Previous First