You're reading an old version of this documentation. If you want up-to-date information, please have a look at stable (v1.6.0).

file-tree package

file_tree.file_tree module

Defines the main FileTree object, which will be the main point of interaction

class file_tree.file_tree.FileTree(templates: Dict[str, Template], placeholders: Dict[str, Any] | Placeholders, return_path=False)[source]

Bases: object

Represents a structured directory

The many methods can be split into 4 categories

  1. The template interface. Each path (file or directory) is represented by a Template, which defines the filename with any unknown parts (e.g., subject ID) marked by placeholders. Templates are accessed based on their key.

  2. The placeholder interface. Placeholders represent values to be filled into the placeholders. Each placeholder can be either undefined, have a singular value, or have a sequence of possible values.

    • You can access the placeholders dictionary-like object directly through FileTree.placeholders

    • update(): returns a new FileTree with updated placeholders or updates the placeholders in the current one.

    • update_glob(): sets the placeholder values based on which files/directories exist on disk.

    • iter_vars(): iterate over all possible values for the selected placeholders.

    • iter(): iterate over all possible values for the placeholders that are part of a given template.

  3. Getting the actual filenames based on filling the placeholder values into the templates.

    • get(): Returns a valid path by filling in all the placeholders in a template. For this to work all placeholder values should be defined and singular.

    • get_mult(): Returns array of all possible valid paths by filling in the placeholders in a template. Placeholder values can be singular or a sequence of possible values.

    • get_mult_glob(): Returns array with existing paths on disk. Placeholder values can be singular, a sequence of possible values, or undefined. In the latter case possible values for that placeholder are determined by checking the disk.

    • fill(): Returns new FileTree with any singular values filled into the templates and removed from the placeholder dict.

  4. Input/output

    • report(): create a pretty overview of the filetree

    • run_app(): opens a terminal-based App to explore the filetree interactively

    • empty(): creates empty FileTree with no templates or placeholder values.

    • read(): reads a new FileTree from a file.

    • from_string(): reads a new FileTree from a string.

    • write(): writes a FileTree to a file.

    • to_string(): writes a FileTree to a string.

add_subtree(sub_tree: FileTree, precursor: str | None | Sequence[str | None] = (None,), parent: str | None = '', fill=None) None[source]

Updates the templates and the placeholders in place with those in sub_tree

The top-level directory of the sub-tree will be replaced by the parent (unless set to None). The sub-tree templates will be available with the key “<precursor>/<original_key>”, unless the precursor is None in which case they will be unchanged (which can easily lead to errors due to naming conflicts).

What happens with the placeholder values of the sub-tree depends on whether the precursor is None or not:

  • if the precursor is None, any singular values are directly filled into the sub-tree templates. Any placeholders with multiple values will be added to the top-level variable list (error is raised in case of conflicts).

  • if the precursor is a string, the templates are updated to look for “<precursor>/<original_placeholder>” and all sub-tree placeholder values are also prepended with this precursor. Any template values with “<precursor>/<key>” will first look for that full key, but if that is undefined they will fall back to “<key>” (see Placeholders).

The net effect of either of these procedures is that the sub-tree placeholder values will be used in that sub-tree, but will not affect templates defined elsewhere in the parent tree. If a placeholder is undefined in a sub-tree, it will be taken from the parent placeholder values (if available).

Parameters:
  • sub_tree (FileTree) – tree to be added to the current one

  • precursor (list(str or None)) – name(s) of the sub-tree. Defaults to just adding the sub-tree to the main tree without precursor

  • parent (str) – key of the template used as top-level directory for the sub tree. Defaults to top-level directory of the main tree. Can be set to None for an independent tree.

  • fill (Optional[bool]) – whether any defined placeholders should be filled in before adding the sub-tree. By default this is True if there is no precursor and false otherwise

add_template(template_path: str, key: str | Sequence[str] | None = None, parent: str | None = '', overwrite=False) str[source]

Updates the FileTree with the new template

Parameters:
  • template_path – path name with respect to the parent (or top-level if no parent provided)

  • key – key(s) to access this template in the future. Defaults to result from Template.guess_key (i.e., the path basename without the extension).

  • parent – if defined, template_path will be interpreted as relative to this template. By default the top-level template is used as reference. To create a template unaffiliated with the rest of the tree, set parent to None. Such a template should be an absolute path or relative to the current directory and can be used as parent for other templates

  • overwrite – if True, overwrites any existing template rather than raising a ValueError. Defaults to False.

Returns:

one of the short names under which the template has been stored

Return type:

str

copy() FileTree[source]

Creates a copy of the tree

The dictionaries (templates, placeholders) are copied, but the values within them are not.

Returns:

new tree object with identical templates, sub-trees and placeholders

Return type:

FileTree

classmethod empty(top_level: str | Template = '.', return_path=False) FileTree[source]

Creates a new empty FileTree containing only a top-level directory

Parameters:

top_level (str, Template, optional) – Top-level directory that other templates will use as a reference. Defaults to current directory.

Returns:

empty FileTree

Return type:

FileTree

fill(keep_optionals=True) FileTree[source]

Fills in singular placeholder values.

Parameters:

keep_optionals – if True keep optional parameters that have not been set

Returns:

new tree with singular placeholder values filled into the templates and removed from the placeholder dict

Return type:

FileTree

filter_templates(template_names: Collection[str], check=True)[source]

New FileTree containing just the templates in template_names and their parents.

A KeyError will be raised if any of the template names are not in the FileTree (unless check is set to False)

Parameters:
  • template_names – names of the templates to keep.

  • check – if True, check whether all template names are actually part of the FileTree

classmethod from_string(definition: str, top_level: str | Template = '.', return_path=False, **placeholders) FileTree[source]

Creates a FileTree based on the given definition

Parameters:
  • definition (str) – A FileTree definition describing a structured directory

  • top_level (str) – top-level directory name. Defaults to current directory. Set to parent template for sub-trees.

Returns:

tree matching the definition in the file

Return type:

FileTree

get(key: str, make_dir=False) str | Path[source]

Returns template with placeholder values filled in

Parameters:
  • key (str) – identifier for the template

  • make_dir (bool, optional) – If set to True, create the parent directory of the returned path.

Returns:

Filled in template as Path object.

Returned as a pathlib.Path object if FileTree.return_path is True. Otherwise a string is returned.

Return type:

Path

get_mult(key: str | Sequence[str], filter=False, make_dir=False) DataArray | Dataset[source]

Returns array of paths with all possible values filled in for the placeholders

Singular placeholder values are filled into the template directly. For each placeholder with multiple values a dimension is added to the output array. This dimension will have the name of the placeholder and labels corresponding to the possible values (see http://xarray.pydata.org/en/stable/). The precense of required, undefined placeholders will lead to an error (see get_mult_glob() or update_glob() to set these placeholders based on which files exist on disk).

Parameters:
  • key (str, Sequence[str]) – identifier(s) for the template.

  • filter (bool, optional) – If Set to True, will filter out any non-existent files. If the return type is strings, non-existent entries will be empty strings. If the return type is Path objects, non-existent entries will be None. Note that the default behaviour is opposite from get_mult_glob().

  • make_dir (bool, optional) – If set to True, create the parent directory for each returned path.

Returns:

For a single key returns all possible paths in an xarray DataArray.

For multiple keys it returns the combination of them in an xarray Dataset. Each element of in the xarray is a pathlib.Path object if FileTree.return_path is True. Otherwise the xarray will contain the paths as strings.

Return type:

xarray.DataArray, xarray.Dataset

get_mult_glob(key: str | Sequence[str]) DataArray | Dataset[source]

Returns array of paths with all possible values filled in for the placeholders

Singular placeholder values are filled into the template directly. For each placeholder with multiple values a dimension is added to the output array. This dimension will have the name of the placeholder and labels corresponding to the possible values (see http://xarray.pydata.org/en/stable/). The possible values for undefined placeholders will be determined by which files actually exist on disk.

The same result can be obtained by calling self.update_glob(key).get_mult(key, filter=True). However calling this method is more efficient, because it only has to check the disk for which files exist once.

Parameters:

key (str, Sequence[str]) – identifier(s) for the template.

Returns:

For a single key returns all possible paths in an xarray DataArray.

For multiple keys it returns the combination of them in an xarray Dataset. Each element of in the xarray is a pathlib.Path object if FileTree.return_path is True. Otherwise the xarray will contain the paths as strings.

Return type:

xarray.DataArray, xarray.Dataset

get_template(key: str) Template[source]

Returns the template corresponding to key.

Raises:

KeyError – if no template with that identifier is available

Parameters:

key (str) – key identifying the template.

Returns:

description of pathname with placeholders not filled in

Return type:

Template

iter(template: str, check_exists: bool = False) Generator[FileTree, None, None][source]

Iterate over trees containng all possible values for template

Parameters:
  • template (str) – short name identifier of the template

  • check_exists (bool) – set to True to only return trees for which the template actually exists

Returns:

yields trees, where each placeholder in given template only has a single possible value

Return type:

Generator[FileTree]

iter_vars(placeholders: Sequence[str]) Generator[FileTree, None, None][source]

Iterate over the placeholder placeholder names

Parameters:

placeholders (Sequence[str]) – sequence of placeholder names to iterate over

Returns:

yields trees, where each placeholder only has a single possible value

Return type:

Generator[FileTree]

classmethod read(name: str, top_level: str | Template = '.', return_path=False, **placeholders) FileTree[source]

Reads a filetree based on the given name

Parameters:
  • name (str) –

    name of the filetree. Interpreted as:

    • a filename containing the tree definition if “name” or “name.tree” exist on disk

    • one of the trees in tree_directories if one of those contains “name” or “name.tree”

    • one of the tree in the plugin FileTree modules

  • top_level (str) – top-level directory name. Defaults to current directory. Set to parent template for sub-trees.

  • placeholders (str->Any) – maps placeholder names to their values

Returns:

tree matching the definition in the file

Return type:

FileTree

report(fill=True, pager=False)[source]

Prints a formatted report of the filetree to the console.

Prints a report of the file-tree to the terminal with: - table with placeholders and their values - tree of templates with template keys marked in cyan

Parameters:
  • fill (bool, optional) – by default any fixed placeholders are filled in before printing the tree (using fill()). Set to False to disable this.

  • pager (bool, optional) – if set to True, the report will be filed into a pager (recommended if the output is very large)

run_app()[source]

Open a terminal-based App to explore the filetree interactively

The resulting app runs directly in the terminal, so it should work when ssh’ing to some remote cluster.

There will be two panels:

  • The left panel will show all the templates in a tree format. Template keys are shown in cyan. For each template the number of files that exist on disc out of the total number is shown colour coded based on completeness (red: no files; yellow: some files; blue: all files). Templates can be selected by hovering over them. Clicking on directories with hide/show their content.

  • The right panel will show for the selected template the complete template string and a table showing for which combination of placeholders the file is present/absent (rows for absent files are colour-coded red).

template_keys(only_leaves=False)[source]

Returns the keys of all the templates in the FileTree

Each key will be returned for templates with multiple keys.

Args

only_leaves (bool, optional): set to True to only return templates that do not have any children

to_string(indentation=4) str[source]

Converts FileTree into a valid filetree definition

An identical FileTree can be created by running from_string() on the resulting string.

Parameters:

indentation (int, optional) – Number of spaces to use for indendation. Defaults to 4.

property top_level
update(inplace=False, **placeholders) FileTree[source]

Updates the placeholder values to be filled into the templates

Parameters:
  • inplace (bool) – if True change the placeholders in-place (and return the FileTree itself); by default a new FileTree is returned with the updated values without altering this one.

  • **placeholders (Dict[str, Any]) – maps placeholder names to their new values (None to mark placeholder as undefined)

Returns:

Tree with updated placeholders (same tree as the current one if inplace is True)

Return type:

FileTree

update_glob(template_key: str | Sequence[str], inplace=False) FileTree[source]

Updates any undefined placeholders based on which files exist on disk for template

Parameters:
  • template_key (str or sequence of str) – key(s) of the template(s) to use

  • inplace (bool) – if True change the placeholders in-place (and return the FileTree itself); by default a new FileTree is returned with the updated values without altering this one.

Returns:

Tree with updated placeholders (same tree as the current one if inplace is True)

Return type:

FileTree

write(filename, indentation=4)[source]

Writes the FileTree to a disk as a text file

The first few lines will contain the placeholders. The remaining lines will contain the actual FileTree with all the templates (including sub-trees). The top-level directory is not stored in the file and hence will need to be provided when reading the tree from the file.

Parameters:
  • filename (str or Path) – where to store the file (directory should exist already)

  • indentation (int, optional) – Number of spaces to use in indendation. Defaults to 4.

file_tree.file_tree.convert(src_tree: FileTree, target_tree: FileTree | None = None, keys=None, symlink=False, overwrite=False)[source]

Copies or links files defined in keys from the src_tree to the target_tree.

Given two example trees

  • source:

    subject = A,B
    
    sub-{subject}
        data
            T1w.nii.gz
            FLAIR.nii.gz
    
  • target:

    subject = A,B
    
    data
        sub-{subject}
            {subject}-T1w.nii.gz (T1w)
            {subject}-T2w.nii.gz (T2w)
    

Given pre-existing data matching the source tree:

.
├── sub-A
│   └── data
│       ├── FLAIR.nii.gz
│       └── T1w.nii.gz
└── sub-B
    └── data
        ├── FLAIR.nii.gz
        └── T1w.nii.gz

We can do the following conversions:

  • convert(source, target):

    copies all matching keys from source to target. This will only copy the “T1w.nii.gz” files, because they are the only match in the template keys. Note that the data template key also matches between the two trees, but this template is not a leave, so is ignored.

  • convert(source, target, keys=[‘T1w’, (‘FLAIR’, ‘T2w’)]):

    copies the “T1w.nii.gz” files from source to target and copies the “FLAIR.nii.gz” in source to “T2w..nii.gz” in target.

  • convert(source.update(subject=’B’), source.update(subject=’C’)):

    creates a new “data/sub-C” directory and copies all the data from “data/sub-B” into that directory.

  • convert(source, keys=[(‘FLAIR’, ‘T1w’)], overwrite=True):

    copies the “FLAIR.nii.gz” into the “T1w.nii.gz” files overwriting the originals.

Warnings are raised in two cases: - if a source file is missing - if a target file already exists and overwrite is False

Parameters:
  • src_tree – prepopulated filetree with the source files

  • target_tree – filetree that will be populated. Defaults to same as src_tree.

  • keys (collection of str or (str, str), optional) – collection of template keys to transfer from src_tree to target_tree. Defaults to all templates keys shared between src_tree and target_tree.

  • symlink – if set to true links the files rather than copying them

  • overwrite – if set to True overwrite any existing files

file_tree.parse_tree module

file_tree.template module

class file_tree.template.Literal(text: str)[source]

Bases: Part

class file_tree.template.MyDataArray(data, coords=None)[source]

Bases: object

Wrapper around xarray.DataArray for internal usage

It tries to delay creating the DataArray object as long as possible (as using them for small arrays is slow…)

static concat(parts, new_index) MyDataArray[source]
map(func) MyDataArray[source]
to_xarray() DataArray[source]
class file_tree.template.OptionalPart(sub_template: TemplateParts)[source]

Bases: Part

add_precursor(text: str) OptionalPart[source]

Prepends any placeholder names by text.

append_placeholders(placeholders, valid=None)[source]

Appends the placeholders in this part to the provided list in order

contains_optionals(placeholders=None)[source]

Returns True if this part contains the optional placeholders

fill_single_placeholders(placeholders: Placeholders, ignore_type=False)[source]

Fills in the given placeholders

for_defined(placeholder_names: Set[str]) List[Part][source]

Returns the template string assuming the placeholders in placeholder_names are defined

Removes any optional parts, whose placeholders are not in placeholder_names.

optional_placeholders()[source]

Returns all placeholders in optional parts

remove_precursors(placeholders=None)[source]
class file_tree.template.Part[source]

Bases: object

Individual part of a template

3 subclasses are defined:

  • Literal: piece of text

  • Required: required placeholder to fill in (between curly brackets)

  • OptionalPart: part of text containing optional placeholders (between square brackets)

add_precursor(text: str) Part[source]

Prepends any placeholder names by text.

append_placeholders(placeholders: List[str], valid=None)[source]

Appends the placeholders in this part to the provided list in order

contains_optionals(placeholders: Set[Part] | None = None)[source]

Returns True if this part contains the optional placeholders

fill_single_placeholders(placeholders: Placeholders, ignore_type=False) Sequence[Part][source]

Fills in the given placeholders

for_defined(placeholder_names: Set[str]) List[Part][source]

Returns the template string assuming the placeholders in placeholder_names are defined

Removes any optional parts, whose placeholders are not in placeholder_names.

optional_placeholders() Set[str][source]

Returns all placeholders in optional parts

remove_precursors(placeholders=None)[source]
required_placeholders() Set[str][source]

Returns all required placeholders

class file_tree.template.Placeholders(*args, **kwargs)[source]

Bases: MutableMapping

Dictionary-like object containing the placeholder values.

It understands about sub-trees (i.e., if “<sub_tree>/<placeholder>” does not exist it will return “<placeholder>” instead).

copy()[source]
find_key(key: str) str | None[source]

Finds the actual key containing the value

Will look for:

  • not None value for the key itself

  • not None value for any parent (i.e, for key “A/B”, will look for “B” as well)

  • otherwise will return None

Parameters:

key (str) – placeholder name

Returns:

None if no value for the key is available, otherwise the key used to index the value

Return type:

Optional[str]

iter_over(keys) Generator[Placeholders, None, None][source]

Iterate over the placeholder placeholder names

Parameters:

keys (Sequence[str]) – sequence of placeholder names to iterate over

Returns:

yields Placeholders object, where each of the listed keys only has a single possible value

Return type:

Generator[FileTree]

Link the placeholders represented by keys.

When iterating over linked placeholders the i-th tree will contain the i-th element from all linked placeholders, instead of the tree containing all possible combinations of placeholder values.

This can be thought of using zip for linked variables and itertools.product for unlinked ones.

split() Tuple[Placeholders, Placeholders][source]

Splits all placeholders into those with a single value or those with multiple values

Placeholders are considered to have multiple values if they are equivalent to 1D-arrays (lists, tuples, 1D ndarray, etc.). Anything else is considered a single value (string, int, float, etc.)

Parameters:

placeholders (Dict) – all mappings from placeholder names to values

Returns:

Returns tuples with two dictionaries (first those with single values, then those with the multiple values)

Return type:

Tuple[Dict, Dict]

Unlink the placeholders represented by keys.

See link() for how linking affects the iteration through placeholders with multiple values.

Raises a ValueError if the placeholders are not actually linked.

class file_tree.template.Required(var_name, var_formatting=None)[source]

Bases: Part

add_precursor(text: str) Required[source]

Prepends any placeholder names by text.

append_placeholders(placeholders, valid=None)[source]

Appends the placeholders in this part to the provided list in order

fill_single_placeholders(placeholders: Placeholders, ignore_type=False)[source]

Fills in the given placeholders

remove_precursors(placeholders=None)[source]
required_placeholders()[source]

Returns all required placeholders

class file_tree.template.Template(parent: Template | None, unique_part: str)[source]

Bases: object

add_precursor(text) Template[source]

Returns a new Template with any placeholder names in the unique part now preceded by text

Used for adding sub-trees

all_matches(placeholders: Placeholders)[source]

Returns a sequence of all possible variable values matching existing files on disk

Only variable values matching existing placeholder values are returned (undefined placeholders are unconstrained).

as_multi_line(other_templates: Dict[str, Template], indentation=4) str[source]

Generates a string describing this and any child templates

Parameters:
  • other_templates (Dict[str, Template]) – templates including all the child templates and itself

  • indentation (int, optional) – number of spaces to use as indentation. Defaults to 4

Returns:

multi-line string that can be processed by file_tree.FileTree.read()

Return type:

str

property as_path: Path

The full path with no placeholders filled in

property as_string
children(templates: Iterable[Template]) List[Template][source]

From a sequence of templates find the children

Returns:

list of children templates

Return type:

List[Template]

format_mult(placeholders: Placeholders, check=False, filter=False, matches=None) DataArray[source]

Replaces placeholders in template with the provided placeholder values

Parameters:
  • placeholders (Placeholders) – mapping from placeholder names to single or multiple vaalues

  • check (bool) – skip check for missing placeholders if set to True

  • filter (bool) – filter out non-existing files if set to True

Raises:

KeyError – if any placeholder is missing

Returns:

array with possible resolved paths.

If filter is set to True the non-existent paths are replaced by None

Return type:

xarray.DataArray

format_single(placeholders: Placeholders, check=True, keep_optionals=False) str[source]

Formats the template with the placeholders filled in

Only placeholders with a single value are considered.

Parameters:
  • placeholders (Placeholders) – values to fill into the placeholder

  • check (bool) – skip check for missing placeholders if set to True

  • keep_optionals – if True keep optional parameters that have not been set (will cause the check to fail)

Raises:

KeyError – if any placeholder is missing

Returns:

filled in template

Return type:

str

get_all_placeholders(placeholders: Placeholders, matches=None) Placeholders[source]

Fill placeholders with possible values based on what is available on disk

Parameters:

placeholders (Placeholders) – New values for undefined placeholders in template

guess_key() str[source]

Proposes a short name for the template

The proposed short name is created by:

  • taking the basename (i.e., last component) of the path

  • removing the first ‘.’ and everything beyond (to remove the extension)

Warning

If there are multiple dots within the path’s basename, this might remove far more than just the extension.

Returns:

proposed short name for this template (used if user does not provide one)

Return type:

str

optional_placeholders() Set[str][source]

Finds all placeholders that are only within optional blocks (i.e., they do not require a value)

Returns:

names of optional placeholders

Return type:

Set[str]

placeholders(valid=None) List[str][source]

Returns a list of the placeholder names

Returns:

placeholder names in order that they appear in the template

Return type:

List[str]

required_placeholders() Set[str][source]

Finds all placeholders that are outside of optional blocks (i.e., they do require a value)

Returns:

names of required placeholders

Return type:

Set[str]

rich_line(all_templates)[source]

Produces a line for rendering using rich

class file_tree.template.TemplateParts(parts: Sequence[Part])[source]

Bases: object

The parts of a larger template

all_matches() List[Dict[str, Any]][source]

Finds all potential matches to existing templates

Returns a list with the possible combination of values for the placeholders.

extract_placeholders(filename, known_vars=None)[source]

Extracts the placeholder values from the filename

Parameters:
  • filename – filename

  • known_vars – already known placeholders

Returns:

dictionary from placeholder names to string representations (unused placeholders set to None)

fill_known(placeholders: Placeholders, ignore_type=False) MyDataArray[source]

Fill in the known placeholders

Any optional parts, where all placeholders have been filled will be automatically replaced

fill_single_placeholders(placeholders: Placeholders, ignore_type=False) TemplateParts[source]

Fills in placeholders with singular values

Assumes that all placeholders are in fact singular

get_parser()[source]
optional_placeholders() Set[str][source]

Set of optional placeholders

optional_re = re.compile('(\\[.*?\\])')
optional_subsets() Iterator[TemplateParts][source]

Yields template sub-sets with every combination optional placeholders

ordered_placeholders(valid=None) List[str][source]

Sequence of all placeholders in order (can contain duplicates)

static parse(text: str) TemplateParts[source]

Parses a template string into its constituent parts

Raises:

ValueError – raised if a parsing error is

Returns:

object that contains the parts of the template

Return type:

TemplateParts

remove_optionals(optionals=None) TemplateParts[source]

Removes any optionals containing the provided placeholders (default: remove all)

remove_precursors(placeholders=None)[source]

Replaces keys to those existing in the placeholders

If no placeholders provided all precursors are removed

required_placeholders() Set[str][source]

Set of required placeholders

requires_re = re.compile('(\\{.*?\\})')
resolve(placeholders, ignore_type=False) MyDataArray[source]

Resolves the template given a set of placeholders

Parameters:
  • placeholders – mapping of placeholder names to values

  • ignore_type – if True, ignore the type formatting when filling in placeholders

Returns:

cleaned string

file_tree.template.extract_placeholders(template, filename, known_vars=None)[source]

Extracts the placeholder values from the filename

Parameters:
  • template – template matching the given filename

  • filename – filename

  • known_vars – already known placeholders

Returns:

dictionary from placeholder names to string representations (unused placeholders set to None)

file_tree.template.is_singular(value)[source]

Whether a value is singular or has multiple options.