The SAM format. More...
#include <seqan3/io/alignment_file/format_sam.hpp>
Public Member Functions | |
template<typename stream_type , typename seq_type , typename id_type , typename offset_type , typename ref_seq_type , typename ref_id_type , typename ref_offset_type , typename align_type , typename flag_type , typename mapq_type , typename qual_type , typename mate_type , typename tag_dict_type , typename e_value_type , typename bit_score_type > | |
void | write (stream_type &stream, alignment_file_output_options const &options, std::unique_ptr< alignment_file_header > &header_ptr, seq_type &&seq, qual_type &&qual, id_type &&id, offset_type &&offset, ref_seq_type &&ref_seq, ref_id_type &&ref_id, ref_offset_type &&ref_offset, align_type &&align, flag_type &&flag, mapq_type &&mapq, mate_type &&mate, tag_dict_type &&tag_dict, e_value_type &&e_value, bit_score_type &&bit_score) |
Write the given fields to the specified stream. More... | |
Constructors, destructor and assignment | |
Rule of five explicitly defaulted. | |
alignment_file_format_sam ()=default | |
alignment_file_format_sam (alignment_file_format_sam const &)=delete | |
alignment_file_format_sam & | operator= (alignment_file_format_sam const &)=delete |
alignment_file_format_sam (alignment_file_format_sam &&)=default | |
alignment_file_format_sam & | operator= (alignment_file_format_sam &&)=default |
~alignment_file_format_sam ()=default | |
Static Public Attributes | |
static std::vector< std::string > | file_extensions |
The valid file extensions for this format; note that you can modify this value. More... | |
The SAM format.
SAM is often used for storing alignments of several read sequences against one or more reference sequences. See the article on wikipedia for an introduction of the format or look into the official SAM format specifications. SeqAn implements version 1.6 of the SAM specification.
The SAM format provides the following fields: seqan3::field::ALIGNMENT, seqan3::field::SEQ, seqan3::field::QUAL, seqan3::field::ID, seqan3::field::REF_SEQ, seqan3::field::REF_ID seqan3::field::REF_OSSFET, seqan3::field::OFFSET, seqan3::field::FLAG, seqan3::field::MAPQ and seqan3::field::MATE. In addition there is the seqan3::field::HEADER_PTR, which is usually not set but needed to provide the range-based functionality of the file.
None of the fields are required when writing but will be defaulted to '0' for numeric fields and '*' for other fields.
As many users will be accustomed to the columns of the SAM format, here is a mapping of the common SAM format columns to the SeqAn3 record fields:
# | SAM Column ID | FIELD name |
---|---|---|
1 | QNAME | seqan3::field::ID |
2 | FLAG | seqan3::field::FLAG |
3 | RNAME | seqan3::field::REF_ID |
4 | POS | seqan3::field::REF_OFFSET |
5 | MAPQ | seqan3::field::MAPQ |
6 | CIGAR | implicilty stored in seqan3::field::ALIGNMENT |
7 | RNEXT | seqan3::field::MATE (tuple pos 0) |
8 | PNEXT | seqan3::field::MATE (tuple pos 1) |
9 | TLEN | seqan3::field::MATE (tuple pos 2) |
10 | SEQ | seqan3::field::SEQ |
11 | QUAL | seqan3::field::QUAL |
The (read sequence/query) OFFSET will be required to store the soft clipping information at the read start (end clipping will be automatically deduced by how much the read sequence length + offset is larger than the alignment length).
Note: SeqAn currently does not support hard clipping. When reading SAM, hard-clipping is discarded; but the resulting alignment/sequence combination is still valid.
The format checks are implemented according to the official SAM format specifications in order to ensure correct SAM file output.
If a non-recoverable format violation is encountered on reading, or you specify invalid values/combinations when writing, seqan3::format_error is thrown.
Note: All sequence like fields in SAM (e.g. field::SEQ) are truncated at the the first white space character (see seqan3::is_space) to ensure a correct format.
The SAM header is printed once in the beginning, before the first record is written.
|
inline |
Write the given fields to the specified stream.
stream_type | Output stream, must model seqan3::ostream_concept with char . |
seq_type | Type of the seqan3 |
id_type | Type of the seqan3 |
offset_type | Type of the seqan3 |
ref_seq_type | Type of the seqan3 |
ref_id_type | Type of the seqan3 |
ref_offset_type | Type of the seqan3 |
align_type | Type of the seqan3 |
flag_type | Type of the seqan3 |
mapq_type | Type of the seqan3 |
qual_type | Type of the seqan3 |
mate_type | Type of the seqan3 |
tag_dict_type | Type of the seqan3 |
e_value_type | Type of the seqan3 |
bit_score_type | Type of the seqan3 |
[in,out] | stream | The output stream to write into. |
[in] | options | File specific options passed to the format. |
[in] | header_ptr | A pointer to the header object of the file. |
[in] | seq | The data for seqan3::field::SEQ, i.e. the query sequence. |
[in] | qual | The data for seqan3::field::QUAL, e.g. the query quality sequence. |
[in] | id | The data for seqan3::field::ID, e.g. the read id. |
[in] | offset | The data for seqan3::field::OFFSET, i.e. the start position of the alignment in seq . |
[in] | ref_seq | The data for seqan3::field::REF_OFFSET, i.e. the reference sequence. |
[in] | ref_id | The data for seqan3::field::REF_ID, e.g. the reference id.. |
[in] | ref_offset | The data for seqan3::field::REF_OFFSET, i.e. the start position of the alignment in ref_seq . |
[in] | align | The data for seqan3::field::ALIGN, e.g. the alignment between query and ref. |
[in] | flag | The data for seqan3::field::FLAG, e.g. the SAM mapping flag value. |
[in] | mapq | The data for seqan3::field::MAPQ, e.g. the mapping quality value. |
[in] | mate | The data for seqan3::field::MATE, e.g. the mate information of paired reads. |
[in] | tag_dict | The data for seqan3::field::TAGS, e.g. the optional SAM field tag dictionary. |
[in] | e_value | The data for seqan3::field::E_VALUE, e.g. the e-value of the alignment (BLAST). |
[in] | bit_score | The data for seqan3::field::, e.g. the bit score of the alignment (BLAST). |
|
inlinestatic |
The valid file extensions for this format; note that you can modify this value.