Standalone & Dataset

DIVERS-standalone is intended for users who want to run it locally in a command-line environment (also in GitHub).


Current Version: 1


Programs:


DIVERS_VCF_one.py
DIVERS_VCF_batch.py

Dependency:


The code is written in python3, and requires bedtools installed.

Dataset:


Download it and put it in the same folder with the DIVERS program.

  • DIVERS_Detection.bed
  • File Format:


    Input: Variants in VCF format, with 5 mandatory tab-delimited fields (CHROM, POS, ID, REF, ALT).

    Output: DIVERS-detected variants will be output in CSV format, with the following annotations.

  • SAMPLE: sample name (only for DIVERS_VCF_batch.py)
  • CHROM, POS, ID, REF, ALT: (exactly the same as input)
  • STRAND: the strand +/- where the variant found affecting RS
  • GENE: gene symbol
  • TRANSCRIPT: transcript ID (e.g. ENST123456789)
  • IVS#: the ranking number of the intron in the gene (e.g. IVS1, IVS2, IVS3)
  • IVS_SIZE: the size of the intron
  • RS#: the ranking number of the RS in this intron (e.g. RS1, RS2, RS3)
  • RS_CONSEQ: the predicted consequences (RS-AGGT, RS-BP/BP2, RS-AGAIN, RS-DW5SS-xnt, CRYPRS-DW5SS/UP3SS-xnt), where xnt suggesting the size between the paired cryptic splice sites
  • RS_SCORE: the weighted confidence score of RS-site (1-5, the higher the better)
  • RS_POS: the first position of the essential RS-site AGGT
  • BP_POS: the BP position of the RS-site
  • PPT: the pyrimidine content in the PPT region
  • CLIP: if the RS-site is supported by eCLIP-U2AF data (Y/N)
  • RARE: if the RS-site is absent of common human variants (Y/N)
  • CONSERV: the averaged conservation score phyloP of the essential RS-site (numerical)
  • RNALM: the predicted probability by RNA language model (0-1)
  • All other annotation fields from the input data: Note: if you are using DIVERS_VCF_batch.py, all VCF files should have the same annotation fields.
  • Command:


    
        python DIVERS_VCF_one.py -i variants.vcf
    
        arguments:
          -i, --input        input variants in VCF format file
        
    
        python DIVERS_VCF_batch.py -d ./foldername/ -s samplelist.txt -o output.csv
        
        arguments:
          -d, --dir          directory of VCF files
          -s, --sample       sample list in the above directory (without .vcf extension)
          -o, --output       output filename