analyses.helpers package

Submodules

analyses.helpers.data_readers module

Module containing tools for reading alignment files in various formats.

Todo

  • FASTA reader?

exception analyses.helpers.data_readers.AlignmentIOError[source]

Bases: Exception

Wrapper class for alignment errors.

analyses.helpers.data_readers.create_sequence_list_from_dict(values_dict)[source]

Creates a list of sequences from a dictionary

Parameters

values_dict (dict) – A dictionary of taxon name keys and a list of values for each value.

Note

  • The dictionary should have structure:

    {
        "{taxon_name}" : [{values}]
    }
    
Returns

A list of Sequence objects and None for headers.

Raises

AlignmentIOError – If a dictionary value is not a list.

analyses.helpers.data_readers.get_character_matrix_from_sequences_list(sequences, var_headers=None)[source]

Converts a list of sequences into a character matrix.

Parameters
  • sequences (list of Sequence) – A list of Sequence objects to be converted.

  • var_headers (list of headers, optional) – If provided, uses these as variable headers for the columns in the matrix.

Returns

A matrix of sequence data.

Return type

Matrix

analyses.helpers.data_readers.load_alignment_from_filename(filename)[source]

Attempts to load an alignment from a file path by guessing schema

Parameters

filename (str) – The file location containing the alignment

Raises

RuntimeError – Raised with the method needed to load the alignment cannot be determined.

Returns

Containing a list of sequences and headers

Return type

tuple

analyses.helpers.data_readers.read_csv_alignment_flo(csv_flo)[source]

Reads a CSV file-like object and return a list of sequences and headers.

Parameters

csv_flo (file-like) – A file-like object with CSV alignment data.

Returns

A list of Sequence objects and headers.

Raises

AlignmentIOError – If the number of columns is inconsistent across the sequences.

analyses.helpers.data_readers.read_json_alignment_flo(json_flo)[source]

Read a JSON file-like object and return a list of sequences and headers.

Parameters

json_flo (file-like) – A file-like object with JSON alignment data.

Note

  • File should have structure:

    {
        "headers" : [{header_names}],
        "values" : [
            {
                "name" : "{taxon_name}",
                "values" : [{values}]
            }
        ]
    }
    
Returns

A list of Sequence objects and headers.

Raises

AlignmentIOError – If headers are provided but they are not a list.

analyses.helpers.data_readers.read_phylip_alignment_flo(phylip_flo)[source]

Reads a phylip alignment file-like object and return the sequences.

Parameters

phylip_flo (file-like) – The phylip file-like object.

Note

  • We assume that the phylip files are extended and not strict (in terms

    of how many characters for taxon names).

  • The phylip file is in the format::

    numoftaxa numofsites seqlabel sequence seqlabel sequence

Returns

A list of Sequence objects.

Raises

AlignmentIOError – If there is a problem creating sequences.

analyses.helpers.data_readers.read_table_alignment_flo(table_flo)[source]

Reads a table from a file-like object.

Parameters

table_flo (file-like) – A file-like object containing table data.

Returns

A list of Sequence objects.

Raises

AlignmentIOError – If there is a problem creating sequences.

analyses.helpers.sequence module

Module containing sequence class.

class analyses.helpers.sequence.Sequence(name='', seq='')[source]

Bases: object

Barebones class for sequences.

This is a barebones class for sequences. These can be aligned or not and can be any type of alphabet.

name

A name for this sequence.

Type

str

seq

A sequence string.

Type

str

qualstr

A string of sequence characters.

Type

str

qualarr

A list of offset code points.

Type

list

cont_values

A list of continuous values for a sequence.

Type

list

get_fasta()[source]

Get a fasta string.

get_fastq()[source]

Get a fastq string.

set_cont_values(values)[source]

Set the continuous values for the sequence.

Parameters

values (list) – A list of values.

set_qualarr(qual)[source]

Set the qualarr attribute.

Parameters

qual (list of int) – A list of code point integers.

Note

  • An offset of 33 is assumed.

Todo

  • Should this reset both or neither?

set_qualstr(qual)[source]

Set the qualstr attribute.

Parameters

qual (str) – A new string to use for qualstr.

Note

  • An offset of 33 is assumed.

Todo

  • Should this reset both or neither?

Module contents