[ ソース: any2fasta ]
パッケージ: any2fasta (0.4.2-2)
convert various sequence formats to FASTA
Established tools like readseq and seqret from EMBOSS, both create mangled IDs containing | or . characters, and there is no way to fix this behaviour. This resultes in inconsitences between .gbk and .fna versions of files in pipelines.
This script uses only core Perl modules, has no other dependencies like Bioperl or Biopython, and runs very quickly.
It supports the following input formats:
1. Genbank flat file, typically .gb, .gbk, .gbff (starts with LOCUS) 2. EMBL flat file, typically .embl, (starts with ID) 3. GFF with sequence, typically .gff, .gff3 (starts with ##gff) 4. FASTA DNA, typically .fasta, .fa, .fna, .ffn (starts with >) 5. FASTQ DNA, typically .fastq, .fq (starts with @) 6. CLUSTAL alignments, typically .clw, .clu (starts with CLUSTAL or MUSCLE) 7. STOCKHOLM alignments, typically .sth (starts with # STOCKHOLM) 8. GFA assembly graph, typically .gfa (starts with ^[A-Z]\t)
Files may be compressed with:
1. gzip, typically .gz 2. bzip2, typically .bz2 3. zip, typically .zip