test_database

plantdb.commons.test_database Link

This module regroup tools to download test datasets, pipeline configuration files and trained CNN models from ZENODO repository. It aims at simplifying the creation of a test database for demonstration or CI purposes.

Examples:

>>> from plantdb.commons.test_database import setup_test_database
>>> # EXAMPLE 1 - Download and extract the 'real_plant' test database to `plantdb/tests/testdata` module directory:
>>> db_path = setup_test_database('real_plant')
INFO     [test_database] File 'real_plant.zip' exists locally. Skipping download.
INFO     [test_database] Verifying 'real_plant.zip' MD5 hash value...
INFO     [test_database] The test database is set up under '/home/jonathan/Projects/plantdb/tests/testdata'.
>>> print(db_path)
PosixPath('/home/jonathan/Projects/plantdb/tests/testdata')
>>> # EXAMPLE 2 - Download and extract the 'real_plant' and 'virtual_plant' test dataset and configuration pipelines to a temporary folder called 'ROMI_DB':
>>> db_path = setup_test_database(['real_plant', 'virtual_plant'], '/tmp/ROMI_DB', with_configs=True)
INFO     [test_database] File 'real_plant.zip' exists locally. Skipping download.
INFO     [test_database] Verifying 'real_plant.zip' MD5 hash value...
INFO     [test_database] File 'virtual_plant.zip' exists locally. Skipping download.
INFO     [test_database] Verifying 'virtual_plant.zip' MD5 hash value...
INFO     [test_database] File 'configs.zip' exists locally. Skipping download.
INFO     [test_database] Verifying 'configs.zip' MD5 hash value...
INFO     [test_database] The test database is set up under '/tmp/ROMI_DB'.
>>> print(db_path)
PosixPath('/tmp/ROMI_DB')

The list of valid dataset names are: * 'real_plant': 60 images of a Col-0 Arabidopsis thaliana plant acquired with the Plant Imager; * 'virtual_plant': 18 snapshots of a virtual Arabidopsis thaliana plant generated with the Virtual Plant Imager; * 'real_plant_analyzed': the real_plant dataset reconstructed using the AnglesAndInternodes task with the config/geom_pipe_real.toml configuration file; * 'virtual_plant_analyzed': the virtual_plant dataset reconstructed using the AnglesAndInternodes task with the config/geom_pipe_virtual.toml configuration file; * 'arabidopsis000': 72 snapshots of a virtual Arabidopsis thaliana plant generated with the Virtual Plant Imager;

Archive 'configs.zip' contains the configuration files used with the romi_run_task CLI to reconstruct the datasets.

Archive 'models.zip' contains a preconfigured directory structure with the trained CNN weight file Resnet_896_896_epoch50.pt.

get_configs Link

get_configs(out_path=TEST_DIR, keep_tmp=False, force=False)

Download and extract the pipeline configurations from ZENODO.

Parameters:

Name	Type	Description	Default
`out_path`	`str or Path`	The path where to download the pipeline configurations. Defaults to `TEST_DIR`.	`TEST_DIR`
`keep_tmp`	`bool`	Whether to keep the temporary files. Defaults to `False`.	`False`
`force`	`bool`	Whether to force redownload of archive. Defaults to `False`.	`False`

Returns:

Type	Description
`Path`	The path to the downloaded configs.

Examples:

>>> from plantdb.commons.test_database import get_configs
>>> get_configs()  # download and extract the pipeline configurations to `plantdb/tests/testdata` directory

Source code in plantdb/commons/test_database.py

def get_configs(out_path=TEST_DIR, keep_tmp=False, force=False):
    """Download and extract the pipeline configurations from ZENODO.

    Parameters
    ----------
    out_path : str or pathlib.Path, optional
        The path where to download the pipeline configurations. Defaults to ``TEST_DIR``.
    keep_tmp : bool, optional
        Whether to keep the temporary files. Defaults to ``False``.
    force : bool, optional
        Whether to force redownload of archive. Defaults to ``False``.

    Returns
    -------
    pathlib.Path
        The path to the downloaded configs.

    Examples
    --------
    >>> from plantdb.commons.test_database import get_configs
    >>> get_configs()  # download and extract the pipeline configurations to `plantdb/tests/testdata` directory
    """
    ds_path = out_path / "configs"
    if ds_path.exists() and not force:
        out_path = ds_path
    else:
        out_path = _get_extract_archive("configs", out_path=out_path, keep_tmp=keep_tmp, force=force)
    return out_path

get_models_dataset Link

get_models_dataset(out_path=TEST_DIR, keep_tmp=False, force=False)

Download and extract the trained CNN model from ZENODO.

Parameters:

Name	Type	Description	Default
`out_path`	`str or Path`	The path where to download the trained CNN model. Defaults to `TEST_DIR`.	`TEST_DIR`
`keep_tmp`	`bool`	Whether to keep the temporary files. Defaults to `False`.	`False`
`force`	`bool`	Whether to force redownload of archive. Defaults to `False`.	`False`

Returns:

Type	Description
`Path`	The path to the downloaded trained CNN model.

Examples:

>>> from plantdb.commons.test_database import get_models_dataset
>>> get_models_dataset()  # download and extract the trained CNN models to `plantdb/tests/testdata` directory

Source code in plantdb/commons/test_database.py

def get_models_dataset(out_path=TEST_DIR, keep_tmp=False, force=False):
    """Download and extract the trained CNN model from ZENODO.

    Parameters
    ----------
    out_path : str or pathlib.Path, optional
        The path where to download the trained CNN model. Defaults to ``TEST_DIR``.
    keep_tmp : bool, optional
        Whether to keep the temporary files. Defaults to ``False``.
    force : bool, optional
        Whether to force redownload of archive. Defaults to ``False``.

    Returns
    -------
    pathlib.Path
        The path to the downloaded trained CNN model.

    Examples
    --------
    >>> from plantdb.commons.test_database import get_models_dataset
    >>> get_models_dataset()  # download and extract the trained CNN models to `plantdb/tests/testdata` directory
    """
    ds_path = out_path / "models"
    if ds_path.exists() and not force:
        out_path = ds_path
    else:
        out_path = _get_extract_archive("models", out_path=out_path, keep_tmp=keep_tmp, force=force)
    return out_path

get_test_dataset Link

get_test_dataset(dataset, out_path=TEST_DIR, keep_tmp=False, force=False)

Download and extract a test dataset from ZENODO.

Parameters:

Name	Type	Description	Default
`dataset`	`(real_plant, virtual_plant, real_plant_analyzed, virtual_plant_analyzed, arabidopsis000)`	The name of the dataset to download.	`'real_plant'`
`out_path`	`str or Path`	The path where to extract the test dataset archive. Defaults to `TEST_DIR`.	`TEST_DIR`
`keep_tmp`	`bool`	Whether to keep the temporary files. Defaults to `False`.	`False`
`force`	`bool`	Whether to force redownload of archive. Defaults to `False`.	`False`

Returns:

Type	Description
`Path`	The path to the downloaded test dataset.

Examples:

>>> from plantdb.commons.test_database import get_test_dataset
>>> get_test_dataset()  # download and extract the test dataset to `plantdb/tests/testdata` directory

Source code in plantdb/commons/test_database.py

def get_test_dataset(dataset, out_path=TEST_DIR, keep_tmp=False, force=False):
    """Download and extract a test dataset from ZENODO.

    Parameters
    ----------
    dataset : {'real_plant', 'virtual_plant', 'real_plant_analyzed', 'virtual_plant_analyzed', 'arabidopsis000'}
        The name of the dataset to download.
    out_path : str or pathlib.Path, optional
        The path where to extract the test dataset archive. Defaults to ``TEST_DIR``.
    keep_tmp : bool, optional
        Whether to keep the temporary files. Defaults to ``False``.
    force : bool, optional
        Whether to force redownload of archive. Defaults to ``False``.

    Returns
    -------
    pathlib.Path
        The path to the downloaded test dataset.

    Examples
    --------
    >>> from plantdb.commons.test_database import get_test_dataset
    >>> get_test_dataset()  # download and extract the test dataset to `plantdb/tests/testdata` directory
    """
    ds_path = out_path / dataset
    if ds_path.exists() and not force:
        out_path = ds_path
    else:
        out_path = _get_extract_archive(dataset, out_path=out_path, keep_tmp=keep_tmp, force=force)
    return out_path

setup_empty_database Link

setup_empty_database(out_path=None)

Sets up an empty ROMI database.

Sets up necessary marker file and ensures the absence of a lock file.

Parameters:

Name	Type	Description	Default
`out_path`	`str or Path`	The directory path where the database should be set up. Defaults to `None`.	`None`

Returns:

Type	Description
`Path`	The directory path where the database was set up.

Examples:

>>> from plantdb.commons.test_database import setup_empty_database
>>> path = setup_empty_database()
>>> print(path)  # initialize a `ROMI_DB` directory in the temporary directory by default
/tmp/ROMI_DB_********
>>> print([path.name for path in path.iterdir()])  # only the 'marker' file is created
['romidb']

Source code in plantdb/commons/test_database.py

def setup_empty_database(out_path=None):
    """Sets up an empty ROMI database.

    Sets up necessary marker file and ensures the absence of a lock file.

    Parameters
    ----------
    out_path : str or Path, optional
        The directory path where the database should be set up.
        Defaults to ``None``.

    Returns
    -------
    pathlib.Path
        The directory path where the database was set up.

    Examples
    --------
    >>> from plantdb.commons.test_database import setup_empty_database
    >>> path = setup_empty_database()
    >>> print(path)  # initialize a `ROMI_DB` directory in the temporary directory by default
    /tmp/ROMI_DB_********
    >>> print([path.name for path in path.iterdir()])  # only the 'marker' file is created
    ['romidb']
    """
    from plantdb.commons.fsdb.core import MARKER_FILE_NAME
    from plantdb.commons.fsdb.core import LOCK_FILE_NAME

    if isinstance(out_path, str):
        out_path = Path(out_path)
    elif out_path is None:
        out_path = _mkdtemp_romidb()
    else:
        try:
            assert isinstance(out_path, Path)
        except AssertionError:
            logger.critical(f"Invalid pth to set up the database: '{out_path}'.")
            logger.critical("Please provide a valid path to set up the database or leave it to None to use a temporary directory.")
            raise TypeError(f"Invalid type for 'out_path': {type(out_path)}.")

    # Make sure the path to the database exists:
    out_path.mkdir(parents=True, exist_ok=True)
    # Make sure the marker file exists:
    marker_path = out_path / MARKER_FILE_NAME
    marker_path.touch(exist_ok=True)
    # Make sure the locking file do NOT exist:
    lock_path = out_path / LOCK_FILE_NAME
    lock_path.unlink(missing_ok=True)

    return out_path

setup_test_database Link

setup_test_database(dataset, out_path=TEST_DIR, keep_tmp=True, with_configs=False, with_models=False, force=False)

Download and extract the test database from ZENODO.

Parameters:

Name	Type	Description	Default
`dataset`	`all or str or list`	The dataset name or a list of dataset names to download to the test database. Using "all" allows to download all defined datasets. See notes below for a list of dataset names and their meanings.	required
`out_path`	`str or Path`	The path where to set up the database. Defaults to `TEST_DIR`.	`TEST_DIR`
`keep_tmp`	`bool`	Whether to keep the temporary files. Defaults to `False`.	`True`
`with_configs`	`bool`	Whether to download the config files. Defaults to `False`.	`False`
`with_models`	`bool`	Whether to download the trained CNN model files. Defaults to `False`.	`False`
`force`	`bool`	Whether to force download of archive. Defaults to `False`.	`False`

Returns:

Type	Description
`Path`	The path to the database.

Notes

The list of valid dataset names are: * 'real_plant': 60 images of a Col-0 Arabidopsis thaliana plant acquired with the Plant Imager; * 'virtual_plant': 18 snapshots of a virtual Arabidopsis thaliana plant generated with the Virtual Plant Imager; * 'real_plant_analyzed': the real_plant dataset reconstructed using the AnglesAndInternodes task with the testcfg/geom_pipe_real.toml configuration file; * 'virtual_plant_analyzed': the virtual_plant dataset reconstructed using the AnglesAndInternodes task with the config/geom_pipe_virtual.toml configuration file; * 'arabidopsis000': 72 snapshots of a virtual Arabidopsis thaliana plant generated with the Virtual Plant Imager;

Examples:

>>> from plantdb.commons.test_database import setup_test_database
>>> # EXAMPLE 1 - Download and extract the 'real_plant' test database to `plantdb/tests/testdata` module directory:
>>> setup_test_database('real_plant')
PosixPath('/home/jonathan/Projects/plantdb/tests/testdata')
>>> # EXAMPLE 2 - Download and extract the 'real_plant' and 'virtual_plant' test dataset and configuration pipelines to a temporary folder called 'ROMI_DB':
>>> setup_test_database(['real_plant', 'virtual_plant'], None, with_configs=True)
PosixPath('/tmp/ROMI_DB_********')

Source code in plantdb/commons/test_database.py

def setup_test_database(dataset, out_path=TEST_DIR, keep_tmp=True, with_configs=False, with_models=False, force=False):
    """Download and extract the test database from ZENODO.

    Parameters
    ----------
    dataset : "all" or str or list
        The dataset name or a list of dataset names to download to the test database.
        Using "all" allows to download all defined datasets.
        See notes below for a list of dataset names and their meanings.
    out_path : str or pathlib.Path, optional
        The path where to set up the database. Defaults to ``TEST_DIR``.
    keep_tmp : bool, optional
        Whether to keep the temporary files. Defaults to ``False``.
    with_configs : bool, optional
        Whether to download the config files. Defaults to ``False``.
    with_models : bool, optional
        Whether to download the trained CNN model files. Defaults to ``False``.
    force : bool, optional
        Whether to force download of archive. Defaults to ``False``.

    Returns
    -------
    pathlib.Path
        The path to the database.

    Notes
    -----
    The list of valid dataset names are:
      * ``'real_plant'``: 60 images of a Col-0 _Arabidopsis thaliana_ plant acquired with the _Plant Imager_;
      * ``'virtual_plant'``: 18 snapshots of a virtual _Arabidopsis thaliana_ plant generated with the _Virtual Plant Imager_;
      * ``'real_plant_analyzed'``: the ``real_plant`` dataset reconstructed using the ``AnglesAndInternodes`` task with the ``testcfg/geom_pipe_real.toml`` configuration file;
      * ``'virtual_plant_analyzed'``: the ``virtual_plant`` dataset reconstructed using the ``AnglesAndInternodes`` task with the ``config/geom_pipe_virtual.toml`` configuration file;
      * ``'arabidopsis000'``: 72 snapshots of a virtual _Arabidopsis thaliana_ plant generated with the _Virtual Plant Imager_;

    Examples
    --------
    >>> from plantdb.commons.test_database import setup_test_database
    >>> # EXAMPLE 1 - Download and extract the 'real_plant' test database to `plantdb/tests/testdata` module directory:
    >>> setup_test_database('real_plant')
    PosixPath('/home/jonathan/Projects/plantdb/tests/testdata')
    >>> # EXAMPLE 2 - Download and extract the 'real_plant' and 'virtual_plant' test dataset and configuration pipelines to a temporary folder called 'ROMI_DB':
    >>> setup_test_database(['real_plant', 'virtual_plant'], None, with_configs=True)
    PosixPath('/tmp/ROMI_DB_********')
    """
    # Initialize an empty ROMI database
    out_path = setup_empty_database(out_path)

    # Get the list of all test dataset if required:
    if isinstance(dataset, str) and dataset.lower() == "all":
        dataset = DATASET
    # Create a dict of keyword arguments to use for download:
    kwargs = {'out_path': out_path, 'keep_tmp': keep_tmp, 'force': force}
    # Download the test datasets:
    if isinstance(dataset, list):
        [get_test_dataset(ds, **kwargs) for ds in dataset]
    else:
        _ = get_test_dataset(dataset, **kwargs)
    # Download configs archive if requested:
    if with_configs:
        _ = get_configs(**kwargs)
    if with_models:
        _ = get_models_dataset(**kwargs)

    logger.info(f"The test database is set up under '{out_path}'.")
    return out_path

test_database Link

test_database(dataset='real_plant_analyzed', out_path=None, **kwargs)

Create and return an FSDB test database.

Parameters:

Name	Type	Description	Default
`dataset`	`str or list[str] or None`	The (list of) test dataset to use, by default 'real_plant_analyzed'. If `None`, only set up an empty database.	`'real_plant_analyzed'`
`out_path`	`str or Path`	The path where to set up the database. Defaults to the temporary directory under 'ROMI_DB', as defined by `TMP_TEST_DIR`.	`None`

Other Parameters:

Name	Type	Description
`keep_tmp`	`bool`	Whether to keep the temporary files. Defaults to `False`.
`with_configs`	`bool`	Whether to download the config files. Defaults to `False`.
`with_models`	`bool`	Whether to download the trained CNN model files. Defaults to `False`.
`force`	`bool`	Whether to force redownload of archive. Defaults to `False`.

Returns:

Type	Description
`FSDB`	The FSDB test database.

Examples:

>>> from plantdb.commons.test_database import test_database
>>> db = test_database()
>>> db.connect()
>>> db.list_scans()
['real_plant_analyzed']
>>> db.path()
PosixPath('/tmp/ROMI_DB_********')
>>> db.disconnect()

Source code in plantdb/commons/test_database.py

def test_database(dataset='real_plant_analyzed', out_path=None, **kwargs):
    """Create and return an FSDB test database.

    Parameters
    ----------
    dataset : str or list[str] or None, optional
        The (list of) test dataset to use, by default 'real_plant_analyzed'.
        If ``None``, only set up an empty database.
    out_path : str or pathlib.Path, optional
        The path where to set up the database.
        Defaults to the temporary directory under 'ROMI_DB', as defined by ``TMP_TEST_DIR``.

    Other Parameters
    ----------------
    keep_tmp : bool
        Whether to keep the temporary files. Defaults to ``False``.
    with_configs : bool
        Whether to download the config files. Defaults to ``False``.
    with_models : bool
        Whether to download the trained CNN model files. Defaults to ``False``.
    force : bool
        Whether to force redownload of archive. Defaults to ``False``.

    Returns
    -------
    plantdb.commons.fsdb.FSDB
        The FSDB test database.

    Examples
    --------
    >>> from plantdb.commons.test_database import test_database
    >>> db = test_database()
    >>> db.connect()
    >>> db.list_scans()
    ['real_plant_analyzed']
    >>> db.path()
    PosixPath('/tmp/ROMI_DB_********')
    >>> db.disconnect()
    """
    from plantdb.commons.fsdb import FSDB
    if dataset is None:
        return FSDB(setup_empty_database(out_path=out_path))
    else:
        return FSDB(setup_test_database(dataset, out_path=out_path, **kwargs))