You're reading an old version of this documentation. If you want up-to-date information, please have a look at stable (v1.6.0).
file-tree package¶
file_tree.file_tree module¶
Defines the main FileTree object, which will be the main point of interaction
- class file_tree.file_tree.FileTree(templates: Dict[str, Template], placeholders: Dict[str, Any] | Placeholders, return_path=False)[source]¶
Bases:
object
Represents a structured directory
The many methods can be split into 4 categories
The template interface. Each path (file or directory) is represented by a
Template
, which defines the filename with any unknown parts (e.g., subject ID) marked by placeholders. Templates are accessed based on their key.get_template()
: used to access a template based on its key.template_keys()
: used to list all the template keys.add_template()
: used to add a new template or overwrite an existing one.add_subtree()
: can be used to add all the templates from a different tree to this one.filter_templates()
: reduce the filetree to a user-provided list of templates and its parents
The placeholder interface. Placeholders represent values to be filled into the placeholders. Each placeholder can be either undefined, have a singular value, or have a sequence of possible values.
You can access the
placeholders dictionary-like object
directly through FileTree.placeholdersupdate()
: returns a new FileTree with updated placeholders or updates the placeholders in the current one.update_glob()
: sets the placeholder values based on which files/directories exist on disk.iter_vars()
: iterate over all possible values for the selected placeholders.iter()
: iterate over all possible values for the placeholders that are part of a given template.
Getting the actual filenames based on filling the placeholder values into the templates.
get()
: Returns a valid path by filling in all the placeholders in a template. For this to work all placeholder values should be defined and singular.get_mult()
: Returns array of all possible valid paths by filling in the placeholders in a template. Placeholder values can be singular or a sequence of possible values.get_mult_glob()
: Returns array with existing paths on disk. Placeholder values can be singular, a sequence of possible values, or undefined. In the latter case possible values for that placeholder are determined by checking the disk.fill()
: Returns new FileTree with any singular values filled into the templates and removed from the placeholder dict.
Input/output
report()
: create a pretty overview of the filetreerun_app()
: opens a terminal-based App to explore the filetree interactivelyempty()
: creates empty FileTree with no templates or placeholder values.read()
: reads a new FileTree from a file.from_string()
: reads a new FileTree from a string.write()
: writes a FileTree to a file.to_string()
: writes a FileTree to a string.
- add_subtree(sub_tree: FileTree, precursor: str | None | Sequence[str | None] = (None,), parent: str | None = '', fill=None) None [source]¶
Updates the templates and the placeholders in place with those in sub_tree
The top-level directory of the sub-tree will be replaced by the parent (unless set to None). The sub-tree templates will be available with the key “<precursor>/<original_key>”, unless the precursor is None in which case they will be unchanged (which can easily lead to errors due to naming conflicts).
What happens with the placeholder values of the sub-tree depends on whether the precursor is None or not:
if the precursor is None, any singular values are directly filled into the sub-tree templates. Any placeholders with multiple values will be added to the top-level variable list (error is raised in case of conflicts).
if the precursor is a string, the templates are updated to look for “<precursor>/<original_placeholder>” and all sub-tree placeholder values are also prepended with this precursor. Any template values with “<precursor>/<key>” will first look for that full key, but if that is undefined they will fall back to “<key>” (see
Placeholders
).
The net effect of either of these procedures is that the sub-tree placeholder values will be used in that sub-tree, but will not affect templates defined elsewhere in the parent tree. If a placeholder is undefined in a sub-tree, it will be taken from the parent placeholder values (if available).
- Parameters:
sub_tree (FileTree) – tree to be added to the current one
precursor (list(str or None)) – name(s) of the sub-tree. Defaults to just adding the sub-tree to the main tree without precursor
parent (str) – key of the template used as top-level directory for the sub tree. Defaults to top-level directory of the main tree. Can be set to None for an independent tree.
fill (Optional[bool]) – whether any defined placeholders should be filled in before adding the sub-tree. By default this is True if there is no precursor and false otherwise
- add_template(template_path: str, key: str | Sequence[str] | None = None, parent: str | None = '', overwrite=False) str [source]¶
Updates the FileTree with the new template
- Parameters:
template_path – path name with respect to the parent (or top-level if no parent provided)
key – key(s) to access this template in the future. Defaults to result from
Template.guess_key
(i.e., the path basename without the extension).parent – if defined, template_path will be interpreted as relative to this template. By default the top-level template is used as reference. To create a template unaffiliated with the rest of the tree, set parent to None. Such a template should be an absolute path or relative to the current directory and can be used as parent for other templates
overwrite – if True, overwrites any existing template rather than raising a ValueError. Defaults to False.
- Returns:
one of the short names under which the template has been stored
- Return type:
str
- copy() FileTree [source]¶
Creates a copy of the tree
The dictionaries (templates, placeholders) are copied, but the values within them are not.
- Returns:
new tree object with identical templates, sub-trees and placeholders
- Return type:
- classmethod empty(top_level: str | Template = '.', return_path=False) FileTree [source]¶
Creates a new empty FileTree containing only a top-level directory
- fill(keep_optionals=True) FileTree [source]¶
Fills in singular placeholder values.
- Parameters:
keep_optionals – if True keep optional parameters that have not been set
- Returns:
new tree with singular placeholder values filled into the templates and removed from the placeholder dict
- Return type:
- filter_templates(template_names: Collection[str], check=True)[source]¶
New FileTree containing just the templates in template_names and their parents.
A KeyError will be raised if any of the template names are not in the FileTree (unless check is set to False)
- Parameters:
template_names – names of the templates to keep.
check – if True, check whether all template names are actually part of the FileTree
- classmethod from_string(definition: str, top_level: str | Template = '.', return_path=False, **placeholders) FileTree [source]¶
Creates a FileTree based on the given definition
- Parameters:
definition (str) – A FileTree definition describing a structured directory
top_level (str) – top-level directory name. Defaults to current directory. Set to parent template for sub-trees.
- Returns:
tree matching the definition in the file
- Return type:
- get(key: str, make_dir=False) str | Path [source]¶
Returns template with placeholder values filled in
- Parameters:
key (str) – identifier for the template
make_dir (bool, optional) – If set to True, create the parent directory of the returned path.
- Returns:
- Filled in template as Path object.
Returned as a pathlib.Path object if FileTree.return_path is True. Otherwise a string is returned.
- Return type:
Path
- get_mult(key: str | Sequence[str], filter=False, make_dir=False) DataArray | Dataset [source]¶
Returns array of paths with all possible values filled in for the placeholders
Singular placeholder values are filled into the template directly. For each placeholder with multiple values a dimension is added to the output array. This dimension will have the name of the placeholder and labels corresponding to the possible values (see http://xarray.pydata.org/en/stable/). The precense of required, undefined placeholders will lead to an error (see
get_mult_glob()
orupdate_glob()
to set these placeholders based on which files exist on disk).- Parameters:
key (str, Sequence[str]) – identifier(s) for the template.
filter (bool, optional) – If Set to True, will filter out any non-existent files. If the return type is strings, non-existent entries will be empty strings. If the return type is Path objects, non-existent entries will be None. Note that the default behaviour is opposite from
get_mult_glob()
.make_dir (bool, optional) – If set to True, create the parent directory for each returned path.
- Returns:
- For a single key returns all possible paths in an xarray DataArray.
For multiple keys it returns the combination of them in an xarray Dataset. Each element of in the xarray is a pathlib.Path object if FileTree.return_path is True. Otherwise the xarray will contain the paths as strings.
- Return type:
xarray.DataArray, xarray.Dataset
- get_mult_glob(key: str | Sequence[str]) DataArray | Dataset [source]¶
Returns array of paths with all possible values filled in for the placeholders
Singular placeholder values are filled into the template directly. For each placeholder with multiple values a dimension is added to the output array. This dimension will have the name of the placeholder and labels corresponding to the possible values (see http://xarray.pydata.org/en/stable/). The possible values for undefined placeholders will be determined by which files actually exist on disk.
The same result can be obtained by calling self.update_glob(key).get_mult(key, filter=True). However calling this method is more efficient, because it only has to check the disk for which files exist once.
- Parameters:
key (str, Sequence[str]) – identifier(s) for the template.
- Returns:
- For a single key returns all possible paths in an xarray DataArray.
For multiple keys it returns the combination of them in an xarray Dataset. Each element of in the xarray is a pathlib.Path object if FileTree.return_path is True. Otherwise the xarray will contain the paths as strings.
- Return type:
xarray.DataArray, xarray.Dataset
- get_template(key: str) Template [source]¶
Returns the template corresponding to key.
- Raises:
KeyError – if no template with that identifier is available
- Parameters:
key (str) – key identifying the template.
- Returns:
description of pathname with placeholders not filled in
- Return type:
- iter(template: str, check_exists: bool = False) Generator[FileTree, None, None] [source]¶
Iterate over trees containng all possible values for template
- Parameters:
template (str) – short name identifier of the template
check_exists (bool) – set to True to only return trees for which the template actually exists
- Returns:
yields trees, where each placeholder in given template only has a single possible value
- Return type:
Generator[FileTree]
- iter_vars(placeholders: Sequence[str]) Generator[FileTree, None, None] [source]¶
Iterate over the placeholder placeholder names
- Parameters:
placeholders (Sequence[str]) – sequence of placeholder names to iterate over
- Returns:
yields trees, where each placeholder only has a single possible value
- Return type:
Generator[FileTree]
- classmethod read(name: str, top_level: str | Template = '.', return_path=False, **placeholders) FileTree [source]¶
Reads a filetree based on the given name
- Parameters:
name (str) –
name of the filetree. Interpreted as:
a filename containing the tree definition if “name” or “name.tree” exist on disk
one of the trees in tree_directories if one of those contains “name” or “name.tree”
one of the tree in the plugin FileTree modules
top_level (str) – top-level directory name. Defaults to current directory. Set to parent template for sub-trees.
placeholders (str->Any) – maps placeholder names to their values
- Returns:
tree matching the definition in the file
- Return type:
- report(fill=True, pager=False)[source]¶
Prints a formatted report of the filetree to the console.
Prints a report of the file-tree to the terminal with: - table with placeholders and their values - tree of templates with template keys marked in cyan
- Parameters:
fill (bool, optional) – by default any fixed placeholders are filled in before printing the tree (using
fill()
). Set to False to disable this.pager (bool, optional) – if set to True, the report will be filed into a pager (recommended if the output is very large)
- run_app()[source]¶
Open a terminal-based App to explore the filetree interactively
The resulting app runs directly in the terminal, so it should work when ssh’ing to some remote cluster.
There will be two panels:
The left panel will show all the templates in a tree format. Template keys are shown in cyan. For each template the number of files that exist on disc out of the total number is shown colour coded based on completeness (red: no files; yellow: some files; blue: all files). Templates can be selected by hovering over them. Clicking on directories with hide/show their content.
The right panel will show for the selected template the complete template string and a table showing for which combination of placeholders the file is present/absent (rows for absent files are colour-coded red).
- template_keys(only_leaves=False)[source]¶
Returns the keys of all the templates in the FileTree
Each key will be returned for templates with multiple keys.
- Args
only_leaves (bool, optional): set to True to only return templates that do not have any children
- to_string(indentation=4) str [source]¶
Converts FileTree into a valid filetree definition
An identical FileTree can be created by running
from_string()
on the resulting string.- Parameters:
indentation (int, optional) – Number of spaces to use for indendation. Defaults to 4.
- property top_level¶
- update(inplace=False, **placeholders) FileTree [source]¶
Updates the placeholder values to be filled into the templates
- Parameters:
inplace (bool) – if True change the placeholders in-place (and return the FileTree itself); by default a new FileTree is returned with the updated values without altering this one.
**placeholders (Dict[str, Any]) – maps placeholder names to their new values (None to mark placeholder as undefined)
- Returns:
Tree with updated placeholders (same tree as the current one if inplace is True)
- Return type:
- update_glob(template_key: str | Sequence[str], inplace=False) FileTree [source]¶
Updates any undefined placeholders based on which files exist on disk for template
- Parameters:
template_key (str or sequence of str) – key(s) of the template(s) to use
inplace (bool) – if True change the placeholders in-place (and return the FileTree itself); by default a new FileTree is returned with the updated values without altering this one.
- Returns:
Tree with updated placeholders (same tree as the current one if inplace is True)
- Return type:
- write(filename, indentation=4)[source]¶
Writes the FileTree to a disk as a text file
The first few lines will contain the placeholders. The remaining lines will contain the actual FileTree with all the templates (including sub-trees). The top-level directory is not stored in the file and hence will need to be provided when reading the tree from the file.
- Parameters:
filename (str or Path) – where to store the file (directory should exist already)
indentation (int, optional) – Number of spaces to use in indendation. Defaults to 4.
- file_tree.file_tree.convert(src_tree: FileTree, target_tree: FileTree | None = None, keys=None, symlink=False, overwrite=False)[source]¶
Copies or links files defined in keys from the src_tree to the target_tree.
Given two example trees
source:
subject = A,B sub-{subject} data T1w.nii.gz FLAIR.nii.gz
target:
subject = A,B data sub-{subject} {subject}-T1w.nii.gz (T1w) {subject}-T2w.nii.gz (T2w)
Given pre-existing data matching the source tree:
. ├── sub-A │ └── data │ ├── FLAIR.nii.gz │ └── T1w.nii.gz └── sub-B └── data ├── FLAIR.nii.gz └── T1w.nii.gz
We can do the following conversions:
- convert(source, target):
copies all matching keys from source to target. This will only copy the “T1w.nii.gz” files, because they are the only match in the template keys. Note that the data template key also matches between the two trees, but this template is not a leave, so is ignored.
- convert(source, target, keys=[‘T1w’, (‘FLAIR’, ‘T2w’)]):
copies the “T1w.nii.gz” files from source to target and copies the “FLAIR.nii.gz” in source to “T2w..nii.gz” in target.
- convert(source.update(subject=’B’), source.update(subject=’C’)):
creates a new “data/sub-C” directory and copies all the data from “data/sub-B” into that directory.
- convert(source, keys=[(‘FLAIR’, ‘T1w’)], overwrite=True):
copies the “FLAIR.nii.gz” into the “T1w.nii.gz” files overwriting the originals.
Warnings are raised in two cases: - if a source file is missing - if a target file already exists and overwrite is False
- Parameters:
src_tree – prepopulated filetree with the source files
target_tree – filetree that will be populated. Defaults to same as src_tree.
keys (collection of str or (str, str), optional) – collection of template keys to transfer from src_tree to target_tree. Defaults to all templates keys shared between src_tree and target_tree.
symlink – if set to true links the files rather than copying them
overwrite – if set to True overwrite any existing files
file_tree.parse_tree module¶
file_tree.template module¶
- class file_tree.template.MyDataArray(data, coords=None)[source]¶
Bases:
object
Wrapper around xarray.DataArray for internal usage
It tries to delay creating the DataArray object as long as possible (as using them for small arrays is slow…)
- static concat(parts, new_index) MyDataArray [source]¶
- map(func) MyDataArray [source]¶
- class file_tree.template.OptionalPart(sub_template: TemplateParts)[source]¶
Bases:
Part
- add_precursor(text: str) OptionalPart [source]¶
Prepends any placeholder names by text.
- append_placeholders(placeholders, valid=None)[source]¶
Appends the placeholders in this part to the provided list in order
- contains_optionals(placeholders=None)[source]¶
Returns True if this part contains the optional placeholders
- fill_single_placeholders(placeholders: Placeholders, ignore_type=False)[source]¶
Fills in the given placeholders
- class file_tree.template.Part[source]¶
Bases:
object
Individual part of a template
3 subclasses are defined:
Literal
: piece of textRequired
: required placeholder to fill in (between curly brackets)OptionalPart
: part of text containing optional placeholders (between square brackets)
- append_placeholders(placeholders: List[str], valid=None)[source]¶
Appends the placeholders in this part to the provided list in order
- contains_optionals(placeholders: Set[Part] | None = None)[source]¶
Returns True if this part contains the optional placeholders
- fill_single_placeholders(placeholders: Placeholders, ignore_type=False) Sequence[Part] [source]¶
Fills in the given placeholders
- class file_tree.template.Placeholders(*args, **kwargs)[source]¶
Bases:
MutableMapping
Dictionary-like object containing the placeholder values.
It understands about sub-trees (i.e., if “<sub_tree>/<placeholder>” does not exist it will return “<placeholder>” instead).
- find_key(key: str) str | None [source]¶
Finds the actual key containing the value
Will look for:
not None value for the key itself
not None value for any parent (i.e, for key “A/B”, will look for “B” as well)
otherwise will return None
- Parameters:
key (str) – placeholder name
- Returns:
None if no value for the key is available, otherwise the key used to index the value
- Return type:
Optional[str]
- iter_over(keys) Generator[Placeholders, None, None] [source]¶
Iterate over the placeholder placeholder names
- Parameters:
keys (Sequence[str]) – sequence of placeholder names to iterate over
- Returns:
yields Placeholders object, where each of the listed keys only has a single possible value
- Return type:
Generator[FileTree]
- link(*keys)[source]¶
Link the placeholders represented by keys.
When iterating over linked placeholders the i-th tree will contain the i-th element from all linked placeholders, instead of the tree containing all possible combinations of placeholder values.
This can be thought of using zip for linked variables and itertools.product for unlinked ones.
- split() Tuple[Placeholders, Placeholders] [source]¶
Splits all placeholders into those with a single value or those with multiple values
Placeholders are considered to have multiple values if they are equivalent to 1D-arrays (lists, tuples, 1D ndarray, etc.). Anything else is considered a single value (string, int, float, etc.)
- Parameters:
placeholders (Dict) – all mappings from placeholder names to values
- Returns:
Returns tuples with two dictionaries (first those with single values, then those with the multiple values)
- Return type:
Tuple[Dict, Dict]
- class file_tree.template.Required(var_name, var_formatting=None)[source]¶
Bases:
Part
- append_placeholders(placeholders, valid=None)[source]¶
Appends the placeholders in this part to the provided list in order
- fill_single_placeholders(placeholders: Placeholders, ignore_type=False)[source]¶
Fills in the given placeholders
- class file_tree.template.Template(parent: Template | None, unique_part: str)[source]¶
Bases:
object
- add_precursor(text) Template [source]¶
Returns a new Template with any placeholder names in the unique part now preceded by text
Used for adding sub-trees
- all_matches(placeholders: Placeholders)[source]¶
Returns a sequence of all possible variable values matching existing files on disk
Only variable values matching existing placeholder values are returned (undefined placeholders are unconstrained).
- as_multi_line(other_templates: Dict[str, Template], indentation=4) str [source]¶
Generates a string describing this and any child templates
- Parameters:
other_templates (Dict[str, Template]) – templates including all the child templates and itself
indentation (int, optional) – number of spaces to use as indentation. Defaults to 4
- Returns:
multi-line string that can be processed by
file_tree.FileTree.read()
- Return type:
str
- property as_path: Path¶
The full path with no placeholders filled in
- property as_string¶
- children(templates: Iterable[Template]) List[Template] [source]¶
From a sequence of templates find the children
- Returns:
list of children templates
- Return type:
List[Template]
- format_mult(placeholders: Placeholders, check=False, filter=False, matches=None) DataArray [source]¶
Replaces placeholders in template with the provided placeholder values
- Parameters:
placeholders (Placeholders) – mapping from placeholder names to single or multiple vaalues
check (bool) – skip check for missing placeholders if set to True
filter (bool) – filter out non-existing files if set to True
- Raises:
KeyError – if any placeholder is missing
- Returns:
- array with possible resolved paths.
If filter is set to True the non-existent paths are replaced by None
- Return type:
xarray.DataArray
- format_single(placeholders: Placeholders, check=True, keep_optionals=False) str [source]¶
Formats the template with the placeholders filled in
Only placeholders with a single value are considered.
- Parameters:
placeholders (Placeholders) – values to fill into the placeholder
check (bool) – skip check for missing placeholders if set to True
keep_optionals – if True keep optional parameters that have not been set (will cause the check to fail)
- Raises:
KeyError – if any placeholder is missing
- Returns:
filled in template
- Return type:
str
- get_all_placeholders(placeholders: Placeholders, matches=None) Placeholders [source]¶
Fill placeholders with possible values based on what is available on disk
- Parameters:
placeholders (Placeholders) – New values for undefined placeholders in template
- guess_key() str [source]¶
Proposes a short name for the template
The proposed short name is created by:
taking the basename (i.e., last component) of the path
removing the first ‘.’ and everything beyond (to remove the extension)
Warning
If there are multiple dots within the path’s basename, this might remove far more than just the extension.
- Returns:
proposed short name for this template (used if user does not provide one)
- Return type:
str
- optional_placeholders() Set[str] [source]¶
Finds all placeholders that are only within optional blocks (i.e., they do not require a value)
- Returns:
names of optional placeholders
- Return type:
Set[str]
- placeholders(valid=None) List[str] [source]¶
Returns a list of the placeholder names
- Returns:
placeholder names in order that they appear in the template
- Return type:
List[str]
- class file_tree.template.TemplateParts(parts: Sequence[Part])[source]¶
Bases:
object
The parts of a larger template
- all_matches() List[Dict[str, Any]] [source]¶
Finds all potential matches to existing templates
Returns a list with the possible combination of values for the placeholders.
- extract_placeholders(filename, known_vars=None)[source]¶
Extracts the placeholder values from the filename
- Parameters:
filename – filename
known_vars – already known placeholders
- Returns:
dictionary from placeholder names to string representations (unused placeholders set to None)
- fill_known(placeholders: Placeholders, ignore_type=False) MyDataArray [source]¶
Fill in the known placeholders
Any optional parts, where all placeholders have been filled will be automatically replaced
- fill_single_placeholders(placeholders: Placeholders, ignore_type=False) TemplateParts [source]¶
Fills in placeholders with singular values
Assumes that all placeholders are in fact singular
- optional_re = re.compile('(\\[.*?\\])')¶
- optional_subsets() Iterator[TemplateParts] [source]¶
Yields template sub-sets with every combination optional placeholders
- ordered_placeholders(valid=None) List[str] [source]¶
Sequence of all placeholders in order (can contain duplicates)
- static parse(text: str) TemplateParts [source]¶
Parses a template string into its constituent parts
- Raises:
ValueError – raised if a parsing error is
- Returns:
object that contains the parts of the template
- Return type:
- remove_optionals(optionals=None) TemplateParts [source]¶
Removes any optionals containing the provided placeholders (default: remove all)
- remove_precursors(placeholders=None)[source]¶
Replaces keys to those existing in the placeholders
If no placeholders provided all precursors are removed
- requires_re = re.compile('(\\{.*?\\})')¶
- resolve(placeholders, ignore_type=False) MyDataArray [source]¶
Resolves the template given a set of placeholders
- Parameters:
placeholders – mapping of placeholder names to values
ignore_type – if True, ignore the type formatting when filling in placeholders
- Returns:
cleaned string
- file_tree.template.extract_placeholders(template, filename, known_vars=None)[source]¶
Extracts the placeholder values from the filename
- Parameters:
template – template matching the given filename
filename – filename
known_vars – already known placeholders
- Returns:
dictionary from placeholder names to string representations (unused placeholders set to None)