BPHunter

BPHunter_fastq2BP is for users who want to identify BP positions from RNA-seq data.

Package:

Dependency:

The code is written in python3, and requires Biopython and BLAST+ installed.

Workflow:

Preparation: users should have aligned the RNA-seq raw reads to human reference genome first, and retain the unmapped reads (in fastq) for BP identification.

Pre-construction of BLAST index databases (inside the package) for the 20-nt intronic sequences downstream of 5’ss (as 20-nt 5’ss library), and the inverted 200-nt intronic sequences upstream of 3’ss (200-nt 3’ss library).

Align all input RNA-seq reads to the 20-nt 5’ss library, and retain the reads that have a perfect 20-nt match or only one mismatch, as BP-searching 5’ss-hit reads.

For each 5’ss-hit read, trim away the read sequence from the start of its alignment to the 20-nt 5’ss library, and invert the remaining sequence.

Align the trimmed and inverted 5’ss-hit reads to the 200-nt 3’ss library, and retain the reads that have at least a 20-nt alignment with at least 95% identity in the same intron, as BP-searching 5’ss-3’ss-hit reads.

For 5’ss-3’ss-hit reads, the ends of the aligned sequence in the 200-nt 3’ss library were used to determine the genomic positions of BP.

File Format:

Input: RNA-seq reads, that are unmapped to human genome, in fastq format.

Output: The identified BP positions with following informaiton: CHROM, POS, STRAND, NUCLEOTIDE, MOTIF SEQUENCE [-7, +3], GENE, and #READS.
The output genomic position is on human genome assembly hg19. To convert to hg38 coordinates, users could use LiftOver tool.

Command:


    python BPHunter_fastq2BP.py -i reads.fastq

    python BPHunter_fastq2BP.py -i reads.fastq -e 0.01 -t 5 -r no

    arguments:
      -h, --help             show help message
      -i, --input            input RNA-seq reads in fastq-format file
      -e, --evalue           evalue cutoff in BLAST alignment {float}, default: 0.01
      -t, --threads          number of threads in running BLAST {int}, default: 1
      -r, --blast_removal    remove the intermediate files generated by BLAST? {yes/no}, default: no