A simple FASTA read and write toolbox for small to medium size projects. FASTA files are text-based files for storing nucleotide or amino acid sequences. Reading such files is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.
miniFASTA offers an alternative to this and brings many useful functions without relying on third party packages.
/not-a-feature/miniFasta /project/miniFasta
Installation
How to use
miniFASTA offers easy to use functions for fasta handling. The five main parts are:
- read()
- write()
- fasta_object()
- toAmino()
- roRevComp()
- valid()
- len() / str() / eq() / iter()
- translate_seq()
- reverse_comp()
Reading FASTA files
read()
is a fasta reader which is able to handle compressed and non-compressed files.
Following compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read.
This function returns a Iterator of fasta_objects. If only the sequences should be returnes set the positional argument seq=True
.
The entries are usually casted to upper case letters. Set read("path.fasta", upper=False)
to disable casting.
Writing FASTA files
write()
is a basic fasta writer.
It takes a single or a list of fasta_objects and writes it to the given path.
The file is usually overwritten. Set write(fo, "path.fasta", mode="a")
to append file.
fasta_object()
The core component of miniFASTA is the fasta_object()
. This object represents an FASTA entry and consists of a head and body.
fasta_object(…).valid()
Checks if the body contains invalid characters. stype of fasta_object needs to be set in order to check for illegal characters in its body.
stype is one of:
- ANY : [default] Allows all characters.
- NA : Allows all Nucleic Acid Codes (DNA & RNA).
- DNA : Allows all IUPAC DNA Codes.
- RNA : Allows all IUPAC RNA Codes.
- PROT: Allows all IUPAC Aminoacid Codes.
Optional: allowedChars can be set to overwrite default settings.
fasta_object(…).toAmino(translation_dict)
Translates the body to an amino-acid sequence. See tranlate_seq()
for more details.
fasta_object(…).toRevComp(complement_dict)
Converts the body to its reverse comlement. See reverse_comp()
for more details.
Sequence translation
translate_seq()
translates a sequence starting at position 0.
Unless translation_dict is provided, the standart bacterial code is used. If the codon was not found, it will be replaced by an ~
. Tailing bases that do not fit into a codon will be ignored.
Reverse Complement
reverse_comp()
converts a sequence to its reverse comlement.
Unless complement_dict is provided, the standart complement is used. If no complement was found, the nucleotide remains unchanged.