graph-based alignment of short nucleotide reads to many genomes
HISAT2 is a fast and sensitive alignment program for mapping next-generation
sequencing reads (both DNA and RNA) to a population of human genomes (as well
as against a single reference genome). Based on an extension of BWT for graphs
a graph FM index (GFM) was designed and implementd. In addition to using
one global GFM index that represents a population of human genomes, HISAT2
uses a large set of small GFM indexes that collectively cover the whole genome
(each index representing a genomic region of 56 Kbp, with 55,000 indexes
needed to cover the human population). These small indexes (called local
indexes), combined with several alignment strategies, enable rapid and
accurate alignment of sequencing reads. This new indexing scheme is called a
Hierarchical Graph FM index (HGFM).