Path2 Tutorial



Running Path2

Windows - To start Path2 double-click on startPath2.bat. This will open up Path2 from a terminal so that you may see extra feedback from the program. (You can also just run the Jar file Path2.jar and refer to the log.txt file for the same feedback.)

Linux - To run Path2 from a terminal you may use: java -jar Path2.jar

With a fresh install of Path2, the first thing that you will see on start-up is the Import Data tab.

Import Data tab

Restarting Path2

** PLEASE NOTE! If you import data into Path2 you must have Path2 delete this data first in order to import more and then run a new analysis. To have Path2 delete all data and analysis results and return to the Import Data tab, select File → Clean the Database. The files that will be deleted are listed below. Please make sure you have saved all desired analysis files prior to “Cleaning the database”! **

data/dat.* (for ex. dat.fam, dat.bim, dat.bed, dat.log, dat.opt)

analysis/* (all files in the analysis folder)

Perl/Hash/* (all files in this folder)

Importing genotype data

Path2 currently accepts genotype data in the form of PLINK PED/MAP files, binary PLINK BIM/BED/FAM files, or association results files. PED/MAP files are specified in Import Genotype Information → Standard PLINK, binary PLINK files are specified in the → Binary PLINK tab, and association results can be specified in → Summary Results.

If specifying an association results file, it should have at least two tab-delimited columns titled exactly SNP and P:

  • SNP (rs number) = rs12345 for example
  • P (p-value) = 0.7194 for example
For example, the GABRIEL study results file “sampleData/gabriel_asthma_rs-P_fix.txt” may be imported via the Import genotype information → Summary results tab, then used to run the Sidak or ALIGATOR pathway tests (which require only SNP ID's and P-values).

(Please note: future versions of Path2 will add support for DATA/PED/MAP files. A DATA file lets you specify the format of the PED file used.)

Using the file choosers in the Standard PLINK tab, select the sample PED and MAP files asthma_pathway.ped and .map. You may find them in: Path2/data/sampleData


How are insertions/deletion/copy variant numbers treated?

The "SNP" column must contain rs numbers of the form: rs1234 (rs[0-9]+).These rs numbers *must* correspond only to SNP's, and not for example to: insertions, deletions, .... All entries not of the rs form will be safely ignored by Path2. For example, copy number variants (CNV's) of the form cnv1234 will not be included in PLINK output files or any subsequent step in running Path2.


Mapping files

Path2 performs genetic pathway association analyses using “mapping files” which contain information about how SNP’s, genes, KEGG pathways, and Gene ontologies are related. You may manually select these mapping files via the Import Mapping files tab. Alternatively, you may select the “Download from Internet?” check box and Path2 will grab all the required info from the internet. At this time, Path2 cannot grab Gene Ontology information from the internet and so the next step will be to manually specify two gene ontology information files that will work with our asthma pathway sample dataset.

** PLEASE NOTE! The mapping files generated by Path2 automatically are dataset specific (meaning that Path2 fetches only the information that it needs given the specified dataset) and must be regenerated for each dataset that it loaded into the application. In addition, it is a good idea to periodically (for example, weekly) re-download any needed mapping files since the KEGG and NCBI databases are updated quite frequently. **

Using the file choosers in the Import Mapping files tab, select the “Gene to gene ontology category” and “Gene ontology category to type” files. For this tutorial these are:

“Gene to gene ontology category” : Path2/Perl/data/ALIGATOR/gene_go_17_03_09.dat

“Gene ontology category to type” : Path2/Perl/data/ALIGATOR/pfc_17_03_09.dat

(Please note: future versions of Path2 will add support for automatically downloading gene ontology information.)

At this point your Import data tab should look like the following figure:

Import Data tab with fields specified

Your selections for the various file choosers will be saved in the plaintext file Path2/path.properties and reloaded upon subsequent runs of Path2 for convenience. You may edit this file while Path2 is not running.

Continuing

Press the “Next” button to start processing the provided data files.

When the Next button is pressed in the Import data tab, assuming all needed files are specified, Path2 will call your system’s installed PLINK version to process the genotype data files. The command used to run PLINK depends on the visible sub-tab in the Import Genotype Information panel:

  • Standard PLINK: PLINK will be called to make Binary PLINK files from the specified PED/MAP files. The output will be Path2/data/dat.fam, dat.bim, dat.bed, (the Binary PLINK files) dat.opt (options used to run PLINK), and dat.log (PLINK log file). If the “Set missing phenotype as 0” option is specified then PLINK will treat 0 as meaning missing, otherwise -9 will indicate missingness (default PLINK behaviour).
  • Binary PLINK: PLINK will be called to make Binary PLINK files from the specified BED/BIM/FAM files. The output will be the same as for Standard PLINK (dat.fam, dat.bim, dat.bed, dat.opt, dat.log).
  • Summary Results: The association summary results file will be copied to Path2/data/dat.assoc . PLINK will not be called.

If the “Download from Internet?” check box is enabled, Path2 will also attempt to grab mapping files from the internet. (It uses NCBI’s dbSNP database via Entrez queries to grab SNP to gene information, and KEGG’s database to grab KEGG pathway to gene information. It will over-write and save the files by default to the following locations:

Path2/Perl/data/KEGG/geneKeggFile.txt

Path2/Perl/data/KEGG/snpKeggFile.txt

Path2/Perl/data/dbSNP/snpGeneFile.txt

If you are not connected to the internet, or the mapping files do not download correctly for some reason, for the sake of the tutorial you may use the provided mapping files:

Path2/Perl/data/KEGG/asthma_pathway_geneKeggFile.txt

Path2/Perl/data/KEGG/asthma_pathway_snpKeggFile.txt

Path2/Perl/data/dbSNP/asthma_pathway_snpGeneFile.txt

To use them:

  1. Restart Path2
  2. Select File → Clean the database
  3. You should be at the Import data tab
  4. Reselect the sample PED/MAP files as necessary
  5. Uncheck the Download from Internet check box
  6. Specify the mapping files manually via the file choosers as follows:

    “SNP ID to Gene ID File” : Path2/Perl/data/dbSNP/asthma_pathway_snpGeneFile.txt

    “Gene ID to KEGG pathway file” : Path2/Perl/data/KEGG/asthma_pathway_geneKeggFile.txt

    “SNP ID to KEGG pathway file” : Path2/Perl/data/KEGG/asthma_pathway_snpKeggFile.txt

    “Gene to gene ontology category” : Path2/Perl/data/ALIGATOR/gene_go_17_03_09.dat

    “Gene ontology category to type” : Path2/Perl/data/ALIGATOR/pfc_17_03_09.dat

  7. Click the Next button again to retry loading the data

Tutorial
Previous Next