SeqAn3
Aminoacid

Contains the amino acid alphabets and functionality for translation from nucleotide. More...

Collaboration diagram for Aminoacid:

Classes

class  seqan3::aa20
 The canonical amino acid alphabet. More...
 
class  seqan3::aa27
 The twenty-seven letter amino acid alphabet. More...
 
class  seqan3::aminoacid_base< derived_type, size >
 A CRTP-base that refines seqan3::alphabet_base and is used by the amino acids. More...
 
interface  seqan3::aminoacid_concept
 A concept that indicates whether an alphabet represents amino acids.In addition to the requirements for seqan3::alphabet_concept, the amino_acid_concept expects conforming alphabets to provide an enum-like interface with all possible 27 amino acids as values (although some may be mapped to others if the alphabet is smaller than 27). More...
 

Detailed Description

Contains the amino acid alphabets and functionality for translation from nucleotide.

Introduction
Amino acid sequences are an important part of bioinformatic data processing and used by many applications and while it is possible to represent them in a regular std::string, it makes sense to have specialised data structures in most cases. This sub-module offers the 27 letter aminoacid alphabet as well as a reduced version that can be used with regular container and ranges. The 27 letter amino acid alphabet contains the 20 canonical amino acids, 2 additional proteinogenic amino acids (Pyrrolysine and Selenocysteine) and a termination letter (*). Additionally 4 wildcard letters are offered which allow a more generic usage for example in case of ambiguous amino acids (e.g. J which means either Isoleucine or Leucine). See also https://en.wikipedia.org/wiki/Amino_acid for more information about the amino acid alphabet.
Conversions
Amino acid name Three letter code One letter code Remapped in
seqan3::aa20
Alanine Ala A A
Arginine Arg R R
Asparagine Asn N N
Aspartic acid Asp D D
Cysteine Cys C C
Tyrosine Tyr Y Y
Glutamic acid Glu E E
Glutamine Gln Q Q
Glycine Gly G G
Histidine His H H
Isoleucine Ile I I
Leucine leu L L
Lysine Lys K K
Methionine Met M M
Phenylalanine Phe F F
Proline Pro P P
Serine Ser S S
Threonine Thr T T
Tryptophan Trp W W
Valine Val V V
Selenocysteine Sec U C
Pyrrolysine Pyl O K
Asparagine or aspartic acidAsx B D
Glutamine or glutamic acid Glx Z E
Leucine or Isoleucine Xle J L
Unknown Xaa X S
Stop Codon N/A * W
All amino acid alphabets provide static value members (like an enum) for all amino acids in the form of the one-letter representation. As shown above, alphabets smaller than 27 internally represent multiple amino acids as one.
For most cases it is highly recommended to use seqan3::aa27 as seqan3::aa20 provides no benefits in regard to space consumption (both need 5bits). Use it only when you know you need to interface with other software of formats that only support the canonical set.