fast and exact comparison and clustering of sequences
With the development of next-generation sequencing, efficient tools are
needed to handle millions of sequences in reasonable amounts of time.
Sumatra is a program developed by the LECA. Sumatra aims to compare
sequences in a way that is fast and exact at the same time. This tool
has been developed to be adapted to the type of data generated by DNA
metabarcoding, i.e. entirely sequenced, short markers. Sumatra computes
the pairwise alignment scores from one dataset or between two datasets,
with the possibility to specify a similarity threshold under which pairs
of sequences that have a lower similarity are not reported. The output
can then go through a classification process with programs such as MCL
or MOTHUR.