fastq2BP

BPHunter_fastq2BP is for users who want to identify BP positions from RNA-seq data.


Package:


BPHunter_fastq2BP.zip

Dependency:


The code is written in python3, and requires Biopython and BLAST+ installed.

Workflow:


  • Preparation: users should have aligned the RNA-seq raw reads to human reference genome first, and retain the unmapped reads (in fastq) for BP identification.
  • Pre-construction of BLAST index databases (inside the package) for the 20-nt intronic sequences downstream of 5’ss (as 20-nt 5’ss library), and the inverted 200-nt intronic sequences upstream of 3’ss (200-nt 3’ss library).
  • Align all input RNA-seq reads to the 20-nt 5’ss library, and retain the reads that have a perfect 20-nt match or only one mismatch, as BP-searching 5’ss-hit reads.
  • For each 5’ss-hit read, trim away the read sequence from the start of its alignment to the 20-nt 5’ss library, and invert the remaining sequence.
  • Align the trimmed and inverted 5’ss-hit reads to the 200-nt 3’ss library, and retain the reads that have at least a 20-nt alignment with at least 95% identity in the same intron, as BP-searching 5’ss-3’ss-hit reads.
  • For 5’ss-3’ss-hit reads, the ends of the aligned sequence in the 200-nt 3’ss library were used to determine the genomic positions of BP.
  • File Format:


    Input: RNA-seq reads, that are unmapped to human genome, in fastq format.

    Output: The identified BP positions with following informaiton: CHROM, POS, STRAND, NUCLEOTIDE, MOTIF SEQUENCE [-7, +3], GENE, and #READS.
    The output genomic position is on human genome assembly hg19. To convert to hg38 coordinates, users could use LiftOver tool.

    Command:


    
        python BPHunter_fastq2BP.py -i reads.fastq
    
        python BPHunter_fastq2BP.py -i reads.fastq -e 0.01 -t 5 -r no
    
        arguments:
          -h, --help             show help message
          -i, --input            input RNA-seq reads in fastq-format file
          -e, --evalue           evalue cutoff in BLAST alignment {float}, default: 0.01
          -t, --threads          number of threads in running BLAST {int}, default: 1
          -r, --blast_removal    remove the intermediate files generated by BLAST? {yes/no}, default: no