DIVERS-standalone is intended for users who want to run it locally in a command-line environment
(also in GitHub).
Current Version: 1
Programs:
DIVERS_VCF_one.py
DIVERS_VCF_batch.py
Dependency:
The code is written in
python3,
and requires
bedtools installed.
Dataset:
Download it and put it in the same folder with the
DIVERS program.
DIVERS_Detection.bed
File Format:
Input: Variants in VCF format, with 5 mandatory tab-delimited fields (CHROM, POS, ID, REF, ALT).
Output: DIVERS-detected variants will be output in CSV format, with the following annotations.
SAMPLE: sample name (only for DIVERS_VCF_batch.py)
CHROM, POS, ID, REF, ALT: (exactly the same as input)
STRAND: the strand +/- where the variant found affecting RS
GENE: gene symbol
TRANSCRIPT: transcript ID (e.g. ENST123456789)
IVS#: the ranking number of the intron in the gene (e.g. IVS1, IVS2, IVS3)
IVS_SIZE: the size of the intron
RS#: the ranking number of the RS in this intron (e.g. RS1, RS2, RS3)
RS_CONSEQ: the predicted consequences (RS-AGGT, RS-BP/BP2, RS-AGAIN, RS-DW5SS-xnt, CRYPRS-DW5SS/UP3SS-xnt), where xnt suggesting the size between the paired cryptic splice sites
RS_SCORE: the weighted confidence score of RS-site (1-5, the higher the better)
RS_POS: the first position of the essential RS-site AGGT
BP_POS: the BP position of the RS-site
PPT: the pyrimidine content in the PPT region
CLIP: if the RS-site is supported by eCLIP-U2AF data (Y/N)
RARE: if the RS-site is absent of common human variants (Y/N)
CONSERV: the averaged conservation score phyloP of the essential RS-site (numerical)
RNALM: the predicted probability by RNA language model (0-1)
All other annotation fields from the input data: Note: if you are using DIVERS_VCF_batch.py, all VCF files should have the same annotation fields.
Command:
python DIVERS_VCF_one.py -i variants.vcf
arguments:
-i, --input input variants in VCF format file
python DIVERS_VCF_batch.py -d ./foldername/ -s samplelist.txt -o output.csv
arguments:
-d, --dir directory of VCF files
-s, --sample sample list in the above directory (without .vcf extension)
-o, --output output filename