BPHunter

BPHunter standalone is for users who want to run it locally in a command-line environment (also in GitHub).

Current Version: 2

Please download and replace the programs and reference datasets.

Programs:

BPHunter_VCF_one.py
BPHunter_VCF_batch.py

Dependency:

The code is written in python3, and requires bedtools installed.

Reference Datasets (GRCh38):

Download them and put them in the same folder with the BPHunter programs.

BPHunter_Detection_all.bed

BPHunter_Detection_canonical.bed

The following BED files could be loaded to IGV for visualizing BP locations, aligned with the variants of interest.

BPHunter_IGV.bed

File Format:

Input: Variants in VCF format, with 5 mandatory and tab-delimited fields (CHROM, POS, ID, REF, ALT).

Output: BPHunter-detected variants will be output with the following annotations.

SAMPLE (only for BPHunter_VCF_batch.py)

CHROM, POS, ID, REF, ALT (exactly the same as input)

STRAND

VAR_TYPE (snv, x nt-del, x nt-ins)

GENE

TRANSCRIPT_IVS (e.g., ENST123456789_IVS10)

CANONICAL (canonical transcript_IVS)

BP_NAME (m/e/cBP_chrom_pos_strand_nucl)

BP_ACC_DIST (distance from BP to acceptor site)

BP_RANK (rank of BP in this intron)

BP_TOTAL (total number of BP in this intron)

BP_HIT (BP position (-2, -1, 0) hit by the variant)

BP_SOURCE (number of sources supporting this BP position)

CONSENSUS (1:YTNAY, 2:YTNA, 3:TNA, 4:YNA, 0:none)

BP/BP2_GERP (conservation score GERP for BP and BP-2 positions)

BP/BP2_PHYL (servation score PHYLOP for BP and BP-2 positions)

BPHunter_HIGHRISK (YES/NO, if a BP variant considered as high-risk)

BPHunter_SCORE (score of a BP variant (suggested cutoff >= 3, max = 10))

Command:


    python BPHunter_VCF.py -i variants.vcf

    python BPHunter_VCF.py -i variants.vcf -g GRCh37/GRCh38 -t all/canonical

    arguments:
      -h, --help         show help message
      -i, --input        input variants in VCF-format file
      -g, --genome       human genome assembly {GRCh37, GRCh38}, default: GRCh37
      -t, --transcript   all/canonical transcript? {all, canonical}, default: all


    python BPHunter_VCF_batch.py -d /dir -s samplelist.txt -o output.txt
    
    python BPHunter_VCF_batch.py -d /dir -s samplelist.txt -o output.txt -g GRCh37/GRCh38 -t all/canonical

    arguments:
      -h, --help         show help message
      -d, --dir          directory of VCF files
      -s, --sample       sample list (without .vcf extension) to be screened in the above directory
      -o, --output       output CSV filename, comma-delimited
      -g, --genome       human genome assembly {GRCh37, GRCh38}, default: GRCh37
      -t, --transcript   all/canonical transcript? {all, canonical}, default: all

BPHunter Scoring Scheme:

We recommend to keep SNVs and deletions with BPHunter_HIGHRISK = 'YES', and then prioritize by BPHunter_SCORE.