DIVERS-standalone is intended for users who want to run it locally in a command-line environment
(also in GitHub).
Current Version: 1
Programs:
DIVERS-single.py
DIVERS-batch.py
Dependency:
The code is written in
python3,
and requires
bedtools installed.
Datasets:
Download them and put them in the same folder with the
DIVERS program.
File Format:
Input: Variants in VCF format, with 5 mandatory tab-delimited fields (CHROM, POS, ID, REF, ALT).
Output: DIVERS-detected variants will be output in CSV format, with the following annotations.
- SAMPLE: sample name (only for DIVERS_VCF_batch.py)
- CHROM, POS, ID, REF, ALT: (exactly the same as input)
- STRAND: +/-
- GENE: gene symbol
- TRANSCRIPT: transcript ID (e.g. ENST123456789)
- IVS#: rank of the intron in this gene (e.g., IVS1, IVS2, IVS3)
- RS#: rank of the RS in this intron (e.g., RS1, RS2, RS3)
- RS_CONSEQ: predicted consequences (RS_AGGT_loss, RS_BP_loss, RS_3SS_gain, RS_5SS_gain, RS_CRYP)
- RS_SCORE: weighted confidence score of the RS-site (1-9, the higher the better)
- IVS_SIZE: intron size
- BP_POS: BP position of the RS-site
- PPT_Y: pyrimidine content (%) in the [BP+2, RS-5] region
- RS_START: start position of the essential RS-site
- RS_END: end position of the essential RS-site
- RNASEQ: support from RNA-seq splice junction reads? (Y/N)
- CLIP: support from U2AF eCLIP peaks? (Y/N)
- RNALM: support from RNA language model prediction? (Y/N)
- PHYLOP: all four essential RS-site nucleotides show positive phyloP conservation scores? (Y/N)
- PHASTCONS: RS-site overlaps a conserved element with PhastCons score > 0.5? (Y/N)
- RARE: no human variants with MAF > 0.01% at all four essential RS-site nucleotides? (Y/N)
- All other annotation fields from the input data: Note: if you are using DIVERS_VCF_batch.py, all VCF files should have the same annotation fields.
Command:
python DIVERS-single.py -i variants.vcf
arguments:
-i, --input input variants in VCF format file
python DIVERS-batch.py -d foldername/ -s samplelist.txt -o output.csv
arguments:
-d, --dir directory of VCF files
-s, --sample sample list in the above directory (without .vcf extension)
-o, --output output filename