软件包:samclip(0.4.0-4)
filter SAM file for soft and hard clipped alignments
Most short read aligners perform local alignment of reads to the reference genome. Examples includes bwa mem, minimap2, and bowtie2 (unless in --end-to-end mode). This means the ends of the read may not be part of the best alignment.
This can be caused by:
* adapter sequences (aren't in the reference) * poor quality bases (mismatches only make the alignment score worse) * structural variation in your sample compared to the reference * reads overlapping the start and end of contigs (including circular genomes)
Read aligners output a SAM file. Column 6 in this format stores the CIGAR string. which describes which parts of the read aligned and which didn't. The unaligned ends of the read can be "soft" or "hard" clipped, denoted with S and H at each end of the CIGAR string. It is possible for both types to be present, but that is not common. Soft and hard don't mean anything biologically, they just refer to whether the full read sequence is in the SAM file or not.