Ancestral Distribution¶
Description¶
The ancestral distribution tool (bin/ancestral_distribution.py) uses a novel approach developed by S.A. Smith and B. O’Meara. Given a set of histograms for species, representing occupancy of environmental space in terms of common bins (i.e., a PNO or predicted niche occupancy profile), this approach reconstructs ancestral histograms of occupancy of climate space.
This approach is different from those used previously, based either on (1) summary statistics (mean, median, maximum, 95th percentile, etc.), or (2) sampling statistically from present day environmental space. Instead of sampling environmental space, probabilities of climate occupancy per bin are explicitly reconstructed. Likewise, unlike summary statistic approaches, which result either in a point estimate (mean/median) or a minimum and maximum constraint on ancestral reconstructions (min/max coding), a distribution is explicitly reconstructed here, revealing the potential shape of ancestral climate space. A key advantage of this approach is the ability to reconstruct multimodal ancestral distributions, whereas sampling-based approaches tend to result in normally distributed ancestral reconstructions regardless of extant species distributions.
Input species data must have common bins or results will be meaningless.
Using¶
usage: ancestral_distribution.py [-h] [-l ANNOTATE_LABELS] [-p PLOT_DIRECTORY] [-c OUT_CSV_FILENAME] in_tree_filename {newick,nexml,nexus} data_filename {csv,json,phylip,table} out_tree_filename {newick,nexml,nexus} Generates ancestral distribution estimations based on the environmental distributions at the tips of the tree positional arguments: in_tree_filename Path to the tree file {newick,nexml,nexus} The format of the tree data_filename Path to file with character state data {csv,json,phylip,table} The format of the character data out_tree_filename Path to write the resulting annotated tree {newick,nexml,nexus} The format to use when writing the tree optional arguments: -h, --help show this help message and exit -l ANNOTATE_LABELS, --annotate_labels ANNOTATE_LABELS If provided, annotate the tree labels with this data column -p PLOT_DIRECTORY, --plot_directory PLOT_DIRECTORY If provided, write distribution plots to this directory -c OUT_CSV_FILENAME, --out_csv_filename OUT_CSV_FILENAME If provided, write the output character matrix CSV to this file location
- Notes:
Use the -l option with either a column name or column index to use the reconstructed values for that column as the labels of your output tree.
The -p option will tell the tool to write out plots for the distributions
The -c option will write out the reconstruction matrix as a CSV file for processing elsewhere
Data formats¶
Alignment data can be provided as CSV [pages/format_csv]_, JSON [pages/format_csv]_, Phylip [pages/format_phylip]_, or an alignment table. Tree data can be provided as Newick [pages/format_newick]_, NeXML [pages/format_nexml]_, or Nexus [pages/format_nexus].
CSV¶
For CSV data, the first row can contain headers for the columns in the file. Each row should have a header for the taxon that it represents. An example CSV alignment file looks like
, var_1, var_2, var_3, var_4, var_5, var_6
A, 0.9, 0.2, 0.2, 0.3, 0.4, 0.4
B, 0.01, 0.1, 0.2, 0.3, 0.4, 0.4
C, 0.8, 0.1, 0.2, 0.3, 0.4, 0.4
D, 0.3, 0.1, 0.2, 0.3, 0.4, 0.4
E, 0.001, 0.1, 0.2, 0.3, 0.4, 0.4
F, 0.11, 0.1, 0.2, 0.3, 0.4, 0.4
G, 0.99, 0.2, 0.2, 0.3, 0.4, 0.4
JSON¶
If you want to provide alignment data in JSON format, the file should have a key named “headers” that is an array of headers for each column of data. It should also include a key named “values” that is an array of objects with keys for “name” (taxon name) and “values” (an array of data values). An example JSON alignment file looks like
{
"headers" : [
"var_1",
"var_2",
"var_3",
"var_4",
"var_5",
"var_6"
],
"values" : [
{
"name" : "A",
"values" : [0.9, 0.2, 0.2, 0.3, 0.4, 0.4]
},
{
"name" : "B",
"values" : [0.01, 0.1, 0.2, 0.3, 0.4, 0.4]
},
{
"name" : "C",
"values" : [0.8, 0.1, 0.2, 0.3, 0.4, 0.4]
},
{
"name" : "D",
"values" : [0.3, 0.1, 0.2, 0.3, 0.4, 0.4]
},
{
"name" : "E",
"values" : [0.001, 0.1, 0.2, 0.3, 0.4, 0.4]
},
{
"name" : "F",
"values" : [0.11, 0.1, 0.2, 0.3, 0.4, 0.4]
},
{
"name" : "G",
"values" : [0.99, 0.2, 0.2, 0.3, 0.4, 0.4]
}
]
}
Phylip¶
Phylip data should be formatted as a list of taxa with corresponding values. An example phylip alignment file looks like
7 6
A 0.9 0.2 0.2 0.3 0.4 0.4
B 0.01 0.1 0.2 0.3 0.4 0.4
C 0.8 0.1 0.2 0.3 0.4 0.4
D 0.3 0.1 0.2 0.3 0.4 0.4
E 0.001 0.1 0.2 0.3 0.4 0.4
F 0.11 0.1 0.2 0.3 0.4 0.4
G 0.99 0.2 0.2 0.3 0.4 0.4
Table¶
You can provide your alignment data as a table as well. This format looks like Phylip but does not include metadata for the number of taxa or the number of data values. It looks like
A 0.9 0.2 0.2 0.3 0.4 0.4
B 0.01 0.1 0.2 0.3 0.4 0.4
C 0.8 0.1 0.2 0.3 0.4 0.4
D 0.3 0.1 0.2 0.3 0.4 0.4
E 0.001 0.1 0.2 0.3 0.4 0.4
F 0.11 0.1 0.2 0.3 0.4 0.4
G 0.99 0.2 0.2 0.3 0.4 0.4
Newick¶
You can provide your tree data as a Newick file. You can also request that the resulting tree be formatted as Newick. An example Newick file looks like
(A:2.9999,((B:0.1,C:0.1):0.1,(G:0.2,(D:0.1,(E:0.1,F:0.1):0.1):0.1):0.1):0.1);
NeXML¶
You can provide your tree data as a NeXML file. You can also request that the resulting tree be formatted as NeXML. An example NeXML file looks like
<?xml version="1.0" encoding="ISO-8859-1"?>
<nex:nexml
version="0.9"
xsi:schemaLocation="http://www.nexml.org/2009 ../xsd/nexml.xsd"
xmlns="http://www.nexml.org/2009"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:nex="http://www.nexml.org/2009"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>
<otus id="d0">
<otu id="d1" label="A" />
<otu id="d2" label="B" />
<otu id="d3" label="C" />
</otus>
<trees id="d4" otus="d0">
<tree id="d5" xsi:type="nex:FloatTree">
<node id="d6" />
<node id="d7" otu="d1" />
<node id="d8" />
<node id="d9" otu="d2" />
<node id="d10" otu="d3" />
<rootedge id="d11" target="d6" />
<edge id="d12" source="d6" target="d7" />
<edge id="d13" source="d6" target="d8" />
<edge id="d14" source="d8" target="d9" />
<edge id="d15" source="d8" target="d10" />
</tree>
</trees>
</nex:nexml>
Nexus¶
You can provide your tree data as a Nexus file. You can also request that the resulting tree be formatted as Nexus. An example Nexus file looks like
#NEXUS
BEGIN TAXA;
DIMENSIONS NTAX=7;
TAXLABELS
A
B
C
G
D
E
F
;
END;
BEGIN TREES;
TREE 1 = (A:2.9999,((B:0.1,C:0.1):0.1,(G:0.2,(D:0.1,(E:0.1,F:0.1):0.1):0.1):0.1):0.1);
END;
Executable¶
The ancestral_distribution executable can be found at bin/ancestral_distribution.py
Output¶
The ancestral distribution analysis creates an annotated version of the input tree with the reconstructed values that were computed. Additionally, character data can be written to a CSV file if the -c option is provided. You can also generate plots of the ancestral distributions by providing a directory to the -p option.
References¶
Folk, R. A., Visger, C. J., Soltis, P. S., Soltis, D. E., & Guralnick, R. P. (2018). Geographic range dynamics drove ancient hybridization in a lineage of angiosperms. The American Naturalist, 192(2), 171-187.
Smith, S. A., & Donoghue, M. J. (2010). Combining historical biogeography with niche modeling in the Caprifolium clade of Lonicera (Caprifoliaceae, Dipsacales). Systematic biology, 59(3), 322-341.