This function reads the metadata from a file containing trees in binary format.

read_binary_tree_metadata(file, invalid_trailer = c("scan", "fail", "ignore"))

Arguments

file

A file name.

invalid_trailer

If this is set to "scan" (the default), if the tree file has an invalid trailer, the function will print a warning and then read the whole file, attempting to parse as many trees as possible and storing the addresses of those trees. If this is set to "fail", an error will be generated if the tree file has an invalid trailer. If this is set to "ignore" and the tree file has an invalid trailer, a warning will be printed and the returned object will be missing the TreeAddresses element.

Value

An object of class "BinaryTreeMetadata" with the following components:

GlobalNames

A logical value indicating whether the tree file contains a list of names in the header.

Names

Only present if GlobalNames is TRUE. A vector of mode character containing the names specified in the file header.

GlobalAttributes

A logical value indicating whether the tree file contains a list of attributes in the header.

Attributes

Only present if GlobalAttributes is TRUE. A list of attributes. Each attribute is itself a list of two elements: AttributeName is a character object describing the attribute's name (e.g. "Length"), and IsNumeric describes whether the attribute represents a numeric value (e.g. a branch's length) or not.

TreeAddresses

A vector of mode integer containing the addresses (i.e. byte offsets from the start of the file) of the trees. If invalid_trailer is "ignore" and the file has an invalid trailer, this element will be missing.

Details

This function reads the metadata information from the header and trailer of a file containing trees in binary format. This information consists in the addresses of the trees (i.e. byte offsets at which the data stream describing each tree starts) and in any names or attributes that are stored in the tree header.

If there are such names or attributes in the header, it usually means that every tree in the file should have the same names and attributes. However, this is not required by the file format; some (or all) of the trees in the file may have additional/missing taxa, or additional/missing attributes.

If the file's trailer is invalid (e.g. because the file is incomplete), the default behaviour is to read the whole file, attempting to parse as many trees as possible. The trees themselves are discarded, while their addresses are stored. This is desirable when the concern preventing all the trees in the file from being read at once (i.e., the use of read_binary_trees) is memory. If this is not the case, changing the value of invalid_trailer provides alternative ways to deal with this situation, either by generating an error, or by returning a valid object which is however missing the TreeAddresses attribute.

Due to limitations with R's integral types, this function may have issues with files larger than 2GB.

References

https://github.com/arklumpus/TreeNode/blob/master/BinaryTree.md

See also

Examples

# Tree file (replace with your own) treeFile <- system.file("extdata", "manyTrees.tbi", package="TreeNode") # Read the binary tree metadata meta <- read_binary_tree_metadata(treeFile) #Print a list of the names defined in the file's header meta$Names
#> [1] "Taxon1" "Taxon2" "Taxon3" "Taxon5" "Taxon6" "Taxon4" "Taxon8" #> [8] "Taxon7" "Taxon11" "Taxon15" "Taxon18" "Taxon20" "Taxon19" "Taxon16" #> [15] "Taxon17" "Taxon12" "Taxon13" "Taxon14" "Taxon9" "Taxon10"
#Print a list of the attributes defined in the file's header meta$Attributes
#> $AttributeName #> [1] "Length" "Name" "Support" "TreeName" #> #> $IsNumeric #> [1] TRUE FALSE TRUE FALSE #>