AGAIN standalone is for users who want to run it locally in a command-line environment
(also in GitHub).
Current Version: 1
Programs:
AGAIN_VCF.py
AGAIN_VCF_batch.py
Dependency:
The code is written in
python3,
and requires
bedtools installed.
Reference Datasets:
Download them and put them in the same folder with the
AGAIN programs.
Data_AGAIN_coding_map_GRCh37.txt
Data_AGAIN_detection_GRCh37_all.bed
Data_AGAIN_detection_GRCh37_canonical.bed
Data_AGAIN_coding_map_GRCh38.txt
Data_AGAIN_detection_GRCh38_all.bed
Data_AGAIN_detection_GRCh38_canonical.bed
File Format:
Input: Variants in VCF format, with 5 mandatory and tab-delimited fields (CHROM, POS, ID, REF, ALT).
Output: AGAIN-detected variants will be output with the following annotations.
SAMPLE (only for AGAIN_VCF_batch.py)
CHROM, POS, ID, REF, ALT (exactly the same as input)
STRAND
VAR_TYPE (snv, x nt-del, x nt-ins)
GENE
TRANSCRIPT_IVS (e.g., ENST123456789_IVS10)
CANONICAL (canonical transcript_IVS)
AGAIN_ZONE (ZONE1/ZONE2, ZONE1 is from 1st BP to ACC, ZONE2 is from 2nd BP to 1st BP)
AGAIN_YAG (YES/NO, if the AG-gain variant also fits YAG)
AGAIN_BP_DIST (the distance from the created AG to BP)
AGAIN_ACC_DIST (the distance from the created AG to ACC)
AGAIN_HIGHRISK (YES/NO, if the AG-gain variant falls inside high-risk [BP+8, ACC-4] region)
AGAIN_SCORE (score of the AG-gain variant (suggested cutoff >= 3, max = 5))
PROT_SEQ_WT (wild-type protein sequence)
PROT_SEQ_NEW_ACC (consequent protein sequence if new acceptor site is created)
HGVS_NEW_ACC (protein-level HGVS annotation if new acceptor site is created)
PROT_SEQ_EXON_SKIP (consequent protein sequence if exon skipping occurs)
HGVS_EXON_SKIP (protein-level HGVS annotation if exon skipping occurs)
Command:
python AGAIN_VCF.py -i variants.vcf
python AGAIN_VCF.py -g GRCh37 -t all -i variants.vcf
arguments:
-h, --help show help message
-g, --genome human genome assembly {GRCh37, GRCh38}, default: GRCh37
-t, --transcript all/canonical transcript? {all, canonical}, default: all
-i, --input input variants in VCF format file
python AGAIN_VCF_batch.py -d directory -s samplelist.txt -o output.txt
python AGAIN_VCF_batch.py -g GRCh37 -t all -d directory -s samplelist.txt -o output.txt
arguments:
-h, --help show help message
-g, --genome human genome assembly {GRCh37, GRCh38}, default: GRCh37
-t, --transcript all/canonical transcript? {all, canonical}, default: all
-d, --dir directory of VCF files
-s, --sample sample list (without .vcf extension) in the above directory
-o, --output output filename
AGAIN Scoring Scheme:
We recommend to focus on the AG-gain variants with AGAIN_HIGHRISK = 'YES', and then prioritize by AGAIN_SCORE.