Standalone & Datasets

BPHunter standalone is for users who want to run it locally in a command-line environment (also in GitHub).


Current Version: 2


Please download and replace the programs and reference datasets.

Programs:


BPHunter_VCF.py
BPHunter_VCF_batch.py

Dependency:


The code is written in python3, and requires bedtools installed.

Reference Datasets:


Download them and put them in the same folder with the BPHunter programs.

  • Data_BPHunter_GRCh37_detection_all.bed
  • Data_BPHunter_GRCh37_detection_canonical.bed

  • Data_BPHunter_GRCh38_detection_all.bed
  • Data_BPHunter_GRCh38_detection_canonical.bed

  • The following two BED files could be loaded to IGV for visualizing BP locations, aligned with the variants of interest.

  • Data_BPHunter_GRCh37_BP_IGV.bed
  • Data_BPHunter_GRCh38_BP_IGV.bed
  • File Format:


    Input: Variants in VCF format, with 5 mandatory and tab-delimited fields (CHROM, POS, ID, REF, ALT).

    Output: BPHunter-detected variants will be output with the following annotations.

  • SAMPLE (only for BPHunter_VCF_batch.py)
  • CHROM, POS, ID, REF, ALT (exactly the same as input)
  • STRAND
  • VAR_TYPE (snv, x nt-del, x nt-ins)
  • GENE
  • TRANSCRIPT_IVS (e.g., ENST123456789_IVS10)
  • CANONICAL (canonical transcript_IVS)
  • BP_NAME (m/e/cBP_chrom_pos_strand_nucl)
  • BP_ACC_DIST (distance from BP to acceptor site)
  • BP_RANK (rank of BP in this intron)
  • BP_TOTAL (total number of BP in this intron)
  • BP_HIT (BP position (-2, -1, 0) hit by the variant)
  • BP_SOURCE (number of sources supporting this BP position)
  • CONSENSUS (1:YTNAY, 2:YTNA, 3:TNA, 4:YNA, 0:none)
  • BP/BP2_GERP (conservation score GERP for BP and BP-2 positions)
  • BP/BP2_PHYL (servation score PHYLOP for BP and BP-2 positions)
  • BPHunter_HIGHRISK (YES/NO, if a BP variant considered as high-risk)
  • BPHunter_SCORE (score of a BP variant (suggested cutoff >= 3, max = 10))
  • Command:


    
        python BPHunter_VCF.py -i variants.vcf
    
        python BPHunter_VCF.py -i variants.vcf -g GRCh37/GRCh38 -t all/canonical
    
        arguments:
          -h, --help         show help message
          -i, --input        input variants in VCF-format file
          -g, --genome       human genome assembly {GRCh37, GRCh38}, default: GRCh37
          -t, --transcript   all/canonical transcript? {all, canonical}, default: all
        
    
        python BPHunter_VCF_batch.py -d /dir -s samplelist.txt -o output.txt
        
        python BPHunter_VCF_batch.py -d /dir -s samplelist.txt -o output.txt -g GRCh37/GRCh38 -t all/canonical
    
        arguments:
          -h, --help         show help message
          -d, --dir          directory of VCF files
          -s, --sample       sample list (without .vcf extension) to be screened in the above directory
          -o, --output       output CSV filename, comma-delimited
          -g, --genome       human genome assembly {GRCh37, GRCh38}, default: GRCh37
          -t, --transcript   all/canonical transcript? {all, canonical}, default: all
        

    BPHunter Scoring Scheme:


    We recommend to keep SNVs and deletions with BPHunter_HIGHRISK = 'YES', and then prioritize by BPHunter_SCORE.