BPHunter standalone is for users who want to run it locally in a command-line environment
(also in GitHub).
Current Version: 2
Please download and replace the programs and reference datasets.
Programs:
BPHunter_VCF.py
BPHunter_VCF_batch.py
Dependency:
The code is written in
python3,
and requires
bedtools installed.
Reference Datasets:
Download them and put them in the same folder with the
BPHunter programs.
Data_BPHunter_GRCh37_detection_all.bed
Data_BPHunter_GRCh37_detection_canonical.bed
Data_BPHunter_GRCh38_detection_all.bed
Data_BPHunter_GRCh38_detection_canonical.bed
The following two BED files could be loaded to IGV for visualizing BP locations, aligned with the variants of interest.
Data_BPHunter_GRCh37_BP_IGV.bed
Data_BPHunter_GRCh38_BP_IGV.bed
File Format:
Input: Variants in VCF format, with 5 mandatory and tab-delimited fields (CHROM, POS, ID, REF, ALT).
Output: BPHunter-detected variants will be output with the following annotations.
SAMPLE (only for BPHunter_VCF_batch.py)
CHROM, POS, ID, REF, ALT (exactly the same as input)
STRAND
VAR_TYPE (snv, x nt-del, x nt-ins)
GENE
TRANSCRIPT_IVS (e.g., ENST123456789_IVS10)
CANONICAL (canonical transcript_IVS)
BP_NAME (m/e/cBP_chrom_pos_strand_nucl)
BP_ACC_DIST (distance from BP to acceptor site)
BP_RANK (rank of BP in this intron)
BP_TOTAL (total number of BP in this intron)
BP_HIT (BP position (-2, -1, 0) hit by the variant)
BP_SOURCE (number of sources supporting this BP position)
CONSENSUS (1:YTNAY, 2:YTNA, 3:TNA, 4:YNA, 0:none)
BP/BP2_GERP (conservation score GERP for BP and BP-2 positions)
BP/BP2_PHYL (servation score PHYLOP for BP and BP-2 positions)
BPHunter_HIGHRISK (YES/NO, if a BP variant considered as high-risk)
BPHunter_SCORE (score of a BP variant (suggested cutoff >= 3, max = 10))
Command:
python BPHunter_VCF.py -i variants.vcf
python BPHunter_VCF.py -i variants.vcf -g GRCh37/GRCh38 -t all/canonical
arguments:
-h, --help show help message
-i, --input input variants in VCF-format file
-g, --genome human genome assembly {GRCh37, GRCh38}, default: GRCh37
-t, --transcript all/canonical transcript? {all, canonical}, default: all
python BPHunter_VCF_batch.py -d /dir -s samplelist.txt -o output.txt
python BPHunter_VCF_batch.py -d /dir -s samplelist.txt -o output.txt -g GRCh37/GRCh38 -t all/canonical
arguments:
-h, --help show help message
-d, --dir directory of VCF files
-s, --sample sample list (without .vcf extension) to be screened in the above directory
-o, --output output CSV filename, comma-delimited
-g, --genome human genome assembly {GRCh37, GRCh38}, default: GRCh37
-t, --transcript all/canonical transcript? {all, canonical}, default: all
BPHunter Scoring Scheme:
We recommend to keep SNVs and deletions with BPHunter_HIGHRISK = 'YES', and then prioritize by BPHunter_SCORE.