RNASeq aligners

books aligned.jpgI would say the match has now four competitors:

Tophat2:

  • Pros: the classic, the first universally used, still widely adopted in pipelines all over the World, basically people keep using it so their new results are comparable to the old ones
  • Cons: slow (several CPU hours per alignment on a human genome with 10M reads), limited to 4Gbases genomes (so, no complex metatranscriptomics for him) and on their very website they say to use HISAT2

STAR:

  • Pros: super, wicked fast, the standard used by ENCODE and the big RNASeq projects
  • Cons: uses a LOT of RAM, like really a lot (64GB for a human index)

HISAT2:

  • Pros: fast and low RAM requirements. If you start from scratch, this is the aligner to pick
  • Cons: it’s still new and so many people don’t trust it yet

Salmon/Kallisto:

These are actually not strictly aligners, but rather transcript counters. I put them together for simplicity, but they are different softwares

  • Pros: high speed and low RAM requirements. Ideal for quick RNA-Seq gene expression measurements
  • Cons: they cannot do de novo transcript detection, sad. They don’t produce counts, which are the expected input for many downstream analysis tools. However, some tools are starting to accept Salmon/Kallisto outputs (in R you can use the transcript abundance import package tximport)

 

So… USE HISAT! 😀

 

 

Advertisements

Quantifying RNA-Seq Transcripts

About ten years ago, when RNA-Seq was young, we struggled to make sense of the huge quantity of data that came out of Next-Generation Sequencers. The RNA-Seq pipelines were founded on the simple scheme:

Reads -> Alignments -> Quantification

The most popular RNA-Seq alignment tool, Tophat (now Tophat2) was actually built on the Bowtie aligner to focus on transcribed genomic regions (the Transcriptome), with the optional feature of aligning reads in the whole Genome, for de-novo transcript discovery.

Continue reading Quantifying RNA-Seq Transcripts

Network Biology has never been this easy