ample.util.sequence_util module

class Sequence(fasta=None, pdb=None, canonicalise=False)[source]

Bases: object

A class to handle a fasta file

add_pdb_data(pdbin)[source]

Add the resseq information from a pdb to ourselves when we already have the sequence information from a fasta file - such as from an alignment We assume that there will be gaps (-) in the sequence and the letters may be upper or lower case Currently this only supports adding data for single-chain pdbs

canonicalise()[source]

Reformat the fasta file

Needed because Rosetta has problems reading fastas. For it to be read, it has to have no spaces in the sequence, a name that is 4 characters and an underscore (ABCD_), everything has to be uppercase, and there has to be a return carriage at the end - this has to be linux formatted as when I make a fasta in windows, it doesnt recognize the return carriage.

Rosetta has a lot of problems with fastas so we put in this script to deal with it.

fasta_str(pdbname=False)[source]
from_fasta(fasta_file, canonicalise=True, resseq=True)[source]
from_pdb(pdbin)[source]
length(seq_no=0)[source]
mutate_residue(from_aa, res_seq, to_aa, seq_id=0)[source]

Change residue type from_aa at position res_seq to be of type to_aa

Note: res_seq positions start from 1 not zero!

Parameters:
  • from_aa (str) – Single-letter amino acid
  • res_seq (int) – Residue sequence number
  • to_aa (str) – Single-letter amino acid
  • seq_id (int) – The index of the sequence to operate on (counting from zero)
numSequences()[source]
pirStr(seqNo=0)[source]

Return a canonical MAXWIDTH PIR representation of the file as a line-separated string

sequence(seq_no=0)[source]
toPir(input_fasta, output_pir=None)[source]

Take a fasta file and output the corresponding PIR file

write_fasta(fasta_file, pdbname=False)[source]
chain_data(chain)[source]
chain_sequence(chain)[source]
process_fasta(amoptd, canonicalise=False)[source]
sequence(pdbin)[source]
sequence_data(pdbin)[source]