Skip to content

test_database

plantdb.commons.test_database Link

This module regroup tools to download test datasets, pipeline configuration files and trained CNN models from ZENODO repository. It aims at simplifying the creation of a test database for demonstration or CI purposes.

Examples:

>>> from plantdb.commons.test_database import setup_test_database
>>> # EXAMPLE 1 - Download and extract the 'real_plant' test database to `plantdb/tests/testdata` module directory:
>>> db_path = setup_test_database('real_plant')
INFO     [test_database] File 'real_plant.zip' exists locally. Skipping download.
INFO     [test_database] Verifying 'real_plant.zip' MD5 hash value...
INFO     [test_database] The test database is set up under '/home/jonathan/Projects/plantdb/tests/testdata'.
>>> print(db_path)
PosixPath('/home/jonathan/Projects/plantdb/tests/testdata')
>>> # EXAMPLE 2 - Download and extract the 'real_plant' and 'virtual_plant' test dataset and configuration pipelines to a temporary folder called 'ROMI_DB':
>>> db_path = setup_test_database(['real_plant', 'virtual_plant'], '/tmp/ROMI_DB', with_configs=True)
INFO     [test_database] File 'real_plant.zip' exists locally. Skipping download.
INFO     [test_database] Verifying 'real_plant.zip' MD5 hash value...
INFO     [test_database] File 'virtual_plant.zip' exists locally. Skipping download.
INFO     [test_database] Verifying 'virtual_plant.zip' MD5 hash value...
INFO     [test_database] File 'configs.zip' exists locally. Skipping download.
INFO     [test_database] Verifying 'configs.zip' MD5 hash value...
INFO     [test_database] The test database is set up under '/tmp/ROMI_DB'.
>>> print(db_path)
PosixPath('/tmp/ROMI_DB')

The list of valid dataset names are: * 'real_plant': 60 images of a Col-0 Arabidopsis thaliana plant acquired with the Plant Imager; * 'virtual_plant': 18 snapshots of a virtual Arabidopsis thaliana plant generated with the Virtual Plant Imager; * 'real_plant_analyzed': the real_plant dataset reconstructed using the AnglesAndInternodes task with the config/geom_pipe_real.toml configuration file; * 'virtual_plant_analyzed': the virtual_plant dataset reconstructed using the AnglesAndInternodes task with the config/geom_pipe_virtual.toml configuration file; * 'arabidopsis000': 72 snapshots of a virtual Arabidopsis thaliana plant generated with the Virtual Plant Imager;

Archive 'configs.zip' contains the configuration files used with the romi_run_task CLI to reconstruct the datasets.

Archive 'models.zip' contains a preconfigured directory structure with the trained CNN weight file Resnet_896_896_epoch50.pt.

get_configs Link

get_configs(out_path=TEST_DIR, keep_tmp=False, force=False)

Download and extract the pipeline configurations from ZENODO.

Parameters:

Name Type Description Default
out_path str or Path

The path where to download the pipeline configurations. Defaults to TEST_DIR.

TEST_DIR
keep_tmp bool

Whether to keep the temporary files. Defaults to False.

False
force bool

Whether to force redownload of archive. Defaults to False.

False

Returns:

Type Description
Path

The path to the downloaded configs.

Examples:

>>> from plantdb.commons.test_database import get_configs
>>> get_configs()  # download and extract the pipeline configurations to `plantdb/tests/testdata` directory
Source code in plantdb/commons/test_database.py
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
def get_configs(out_path=TEST_DIR, keep_tmp=False, force=False):
    """Download and extract the pipeline configurations from ZENODO.

    Parameters
    ----------
    out_path : str or pathlib.Path, optional
        The path where to download the pipeline configurations. Defaults to ``TEST_DIR``.
    keep_tmp : bool, optional
        Whether to keep the temporary files. Defaults to ``False``.
    force : bool, optional
        Whether to force redownload of archive. Defaults to ``False``.

    Returns
    -------
    pathlib.Path
        The path to the downloaded configs.

    Examples
    --------
    >>> from plantdb.commons.test_database import get_configs
    >>> get_configs()  # download and extract the pipeline configurations to `plantdb/tests/testdata` directory
    """
    ds_path = out_path / "configs"
    if ds_path.exists() and not force:
        out_path = ds_path
    else:
        out_path = _get_extract_archive("configs", out_path=out_path, keep_tmp=keep_tmp, force=force)
    return out_path

get_models_dataset Link

get_models_dataset(out_path=TEST_DIR, keep_tmp=False, force=False)

Download and extract the trained CNN model from ZENODO.

Parameters:

Name Type Description Default
out_path str or Path

The path where to download the trained CNN model. Defaults to TEST_DIR.

TEST_DIR
keep_tmp bool

Whether to keep the temporary files. Defaults to False.

False
force bool

Whether to force redownload of archive. Defaults to False.

False

Returns:

Type Description
Path

The path to the downloaded trained CNN model.

Examples:

>>> from plantdb.commons.test_database import get_models_dataset
>>> get_models_dataset()  # download and extract the trained CNN models to `plantdb/tests/testdata` directory
Source code in plantdb/commons/test_database.py
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
def get_models_dataset(out_path=TEST_DIR, keep_tmp=False, force=False):
    """Download and extract the trained CNN model from ZENODO.

    Parameters
    ----------
    out_path : str or pathlib.Path, optional
        The path where to download the trained CNN model. Defaults to ``TEST_DIR``.
    keep_tmp : bool, optional
        Whether to keep the temporary files. Defaults to ``False``.
    force : bool, optional
        Whether to force redownload of archive. Defaults to ``False``.

    Returns
    -------
    pathlib.Path
        The path to the downloaded trained CNN model.

    Examples
    --------
    >>> from plantdb.commons.test_database import get_models_dataset
    >>> get_models_dataset()  # download and extract the trained CNN models to `plantdb/tests/testdata` directory
    """
    ds_path = out_path / "models"
    if ds_path.exists() and not force:
        out_path = ds_path
    else:
        out_path = _get_extract_archive("models", out_path=out_path, keep_tmp=keep_tmp, force=force)
    return out_path

get_test_dataset Link

get_test_dataset(dataset, out_path=TEST_DIR, keep_tmp=False, force=False)

Download and extract a test dataset from ZENODO.

Parameters:

Name Type Description Default
dataset (real_plant, virtual_plant, real_plant_analyzed, virtual_plant_analyzed, arabidopsis000)

The name of the dataset to download.

'real_plant'
out_path str or Path

The path where to extract the test dataset archive. Defaults to TEST_DIR.

TEST_DIR
keep_tmp bool

Whether to keep the temporary files. Defaults to False.

False
force bool

Whether to force redownload of archive. Defaults to False.

False

Returns:

Type Description
Path

The path to the downloaded test dataset.

Examples:

>>> from plantdb.commons.test_database import get_test_dataset
>>> get_test_dataset()  # download and extract the test dataset to `plantdb/tests/testdata` directory
Source code in plantdb/commons/test_database.py
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
def get_test_dataset(dataset, out_path=TEST_DIR, keep_tmp=False, force=False):
    """Download and extract a test dataset from ZENODO.

    Parameters
    ----------
    dataset : {'real_plant', 'virtual_plant', 'real_plant_analyzed', 'virtual_plant_analyzed', 'arabidopsis000'}
        The name of the dataset to download.
    out_path : str or pathlib.Path, optional
        The path where to extract the test dataset archive. Defaults to ``TEST_DIR``.
    keep_tmp : bool, optional
        Whether to keep the temporary files. Defaults to ``False``.
    force : bool, optional
        Whether to force redownload of archive. Defaults to ``False``.

    Returns
    -------
    pathlib.Path
        The path to the downloaded test dataset.

    Examples
    --------
    >>> from plantdb.commons.test_database import get_test_dataset
    >>> get_test_dataset()  # download and extract the test dataset to `plantdb/tests/testdata` directory
    """
    ds_path = out_path / dataset
    if ds_path.exists() and not force:
        out_path = ds_path
    else:
        out_path = _get_extract_archive(dataset, out_path=out_path, keep_tmp=keep_tmp, force=force)
    return out_path

setup_empty_database Link

setup_empty_database(out_path=None)

Sets up an empty ROMI database.

Sets up necessary marker file and ensures the absence of a lock file.

Parameters:

Name Type Description Default
out_path str or Path

The directory path where the database should be set up. Defaults to None.

None

Returns:

Type Description
Path

The directory path where the database was set up.

Examples:

>>> from plantdb.commons.test_database import setup_empty_database
>>> path = setup_empty_database()
>>> print(path)  # initialize a `ROMI_DB` directory in the temporary directory by default
/tmp/ROMI_DB_********
>>> print([path.name for path in path.iterdir()])  # only the 'marker' file is created
['romidb']
Source code in plantdb/commons/test_database.py
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
def setup_empty_database(out_path=None):
    """Sets up an empty ROMI database.

    Sets up necessary marker file and ensures the absence of a lock file.

    Parameters
    ----------
    out_path : str or Path, optional
        The directory path where the database should be set up.
        Defaults to ``None``.

    Returns
    -------
    pathlib.Path
        The directory path where the database was set up.

    Examples
    --------
    >>> from plantdb.commons.test_database import setup_empty_database
    >>> path = setup_empty_database()
    >>> print(path)  # initialize a `ROMI_DB` directory in the temporary directory by default
    /tmp/ROMI_DB_********
    >>> print([path.name for path in path.iterdir()])  # only the 'marker' file is created
    ['romidb']
    """
    from plantdb.commons.fsdb.core import MARKER_FILE_NAME
    from plantdb.commons.fsdb.core import LOCK_FILE_NAME

    if isinstance(out_path, str):
        out_path = Path(out_path)
    elif out_path is None:
        out_path = _mkdtemp_romidb()
    else:
        try:
            assert isinstance(out_path, Path)
        except AssertionError:
            logger.critical(f"Invalid pth to set up the database: '{out_path}'.")
            logger.critical("Please provide a valid path to set up the database or leave it to None to use a temporary directory.")
            raise TypeError(f"Invalid type for 'out_path': {type(out_path)}.")

    # Make sure the path to the database exists:
    out_path.mkdir(parents=True, exist_ok=True)
    # Make sure the marker file exists:
    marker_path = out_path / MARKER_FILE_NAME
    marker_path.touch(exist_ok=True)
    # Make sure the locking file do NOT exist:
    lock_path = out_path / LOCK_FILE_NAME
    lock_path.unlink(missing_ok=True)

    return out_path

setup_test_database Link

setup_test_database(dataset, out_path=TEST_DIR, keep_tmp=True, with_configs=False, with_models=False, force=False)

Download and extract the test database from ZENODO.

Parameters:

Name Type Description Default
dataset all or str or list

The dataset name or a list of dataset names to download to the test database. Using "all" allows to download all defined datasets. See notes below for a list of dataset names and their meanings.

required
out_path str or Path

The path where to set up the database. Defaults to TEST_DIR.

TEST_DIR
keep_tmp bool

Whether to keep the temporary files. Defaults to False.

True
with_configs bool

Whether to download the config files. Defaults to False.

False
with_models bool

Whether to download the trained CNN model files. Defaults to False.

False
force bool

Whether to force download of archive. Defaults to False.

False

Returns:

Type Description
Path

The path to the database.

Notes

The list of valid dataset names are: * 'real_plant': 60 images of a Col-0 Arabidopsis thaliana plant acquired with the Plant Imager; * 'virtual_plant': 18 snapshots of a virtual Arabidopsis thaliana plant generated with the Virtual Plant Imager; * 'real_plant_analyzed': the real_plant dataset reconstructed using the AnglesAndInternodes task with the testcfg/geom_pipe_real.toml configuration file; * 'virtual_plant_analyzed': the virtual_plant dataset reconstructed using the AnglesAndInternodes task with the config/geom_pipe_virtual.toml configuration file; * 'arabidopsis000': 72 snapshots of a virtual Arabidopsis thaliana plant generated with the Virtual Plant Imager;

Examples:

>>> from plantdb.commons.test_database import setup_test_database
>>> # EXAMPLE 1 - Download and extract the 'real_plant' test database to `plantdb/tests/testdata` module directory:
>>> setup_test_database('real_plant')
PosixPath('/home/jonathan/Projects/plantdb/tests/testdata')
>>> # EXAMPLE 2 - Download and extract the 'real_plant' and 'virtual_plant' test dataset and configuration pipelines to a temporary folder called 'ROMI_DB':
>>> setup_test_database(['real_plant', 'virtual_plant'], None, with_configs=True)
PosixPath('/tmp/ROMI_DB_********')
Source code in plantdb/commons/test_database.py
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
def setup_test_database(dataset, out_path=TEST_DIR, keep_tmp=True, with_configs=False, with_models=False, force=False):
    """Download and extract the test database from ZENODO.

    Parameters
    ----------
    dataset : "all" or str or list
        The dataset name or a list of dataset names to download to the test database.
        Using "all" allows to download all defined datasets.
        See notes below for a list of dataset names and their meanings.
    out_path : str or pathlib.Path, optional
        The path where to set up the database. Defaults to ``TEST_DIR``.
    keep_tmp : bool, optional
        Whether to keep the temporary files. Defaults to ``False``.
    with_configs : bool, optional
        Whether to download the config files. Defaults to ``False``.
    with_models : bool, optional
        Whether to download the trained CNN model files. Defaults to ``False``.
    force : bool, optional
        Whether to force download of archive. Defaults to ``False``.

    Returns
    -------
    pathlib.Path
        The path to the database.

    Notes
    -----
    The list of valid dataset names are:
      * ``'real_plant'``: 60 images of a Col-0 _Arabidopsis thaliana_ plant acquired with the _Plant Imager_;
      * ``'virtual_plant'``: 18 snapshots of a virtual _Arabidopsis thaliana_ plant generated with the _Virtual Plant Imager_;
      * ``'real_plant_analyzed'``: the ``real_plant`` dataset reconstructed using the ``AnglesAndInternodes`` task with the ``testcfg/geom_pipe_real.toml`` configuration file;
      * ``'virtual_plant_analyzed'``: the ``virtual_plant`` dataset reconstructed using the ``AnglesAndInternodes`` task with the ``config/geom_pipe_virtual.toml`` configuration file;
      * ``'arabidopsis000'``: 72 snapshots of a virtual _Arabidopsis thaliana_ plant generated with the _Virtual Plant Imager_;

    Examples
    --------
    >>> from plantdb.commons.test_database import setup_test_database
    >>> # EXAMPLE 1 - Download and extract the 'real_plant' test database to `plantdb/tests/testdata` module directory:
    >>> setup_test_database('real_plant')
    PosixPath('/home/jonathan/Projects/plantdb/tests/testdata')
    >>> # EXAMPLE 2 - Download and extract the 'real_plant' and 'virtual_plant' test dataset and configuration pipelines to a temporary folder called 'ROMI_DB':
    >>> setup_test_database(['real_plant', 'virtual_plant'], None, with_configs=True)
    PosixPath('/tmp/ROMI_DB_********')
    """
    # Initialize an empty ROMI database
    out_path = setup_empty_database(out_path)

    # Get the list of all test dataset if required:
    if isinstance(dataset, str) and dataset.lower() == "all":
        dataset = DATASET
    # Create a dict of keyword arguments to use for download:
    kwargs = {'out_path': out_path, 'keep_tmp': keep_tmp, 'force': force}
    # Download the test datasets:
    if isinstance(dataset, list):
        [get_test_dataset(ds, **kwargs) for ds in dataset]
    else:
        _ = get_test_dataset(dataset, **kwargs)
    # Download configs archive if requested:
    if with_configs:
        _ = get_configs(**kwargs)
    if with_models:
        _ = get_models_dataset(**kwargs)

    logger.info(f"The test database is set up under '{out_path}'.")
    return out_path

test_database Link

test_database(dataset='real_plant_analyzed', out_path=None, **kwargs)

Create and return an FSDB test database.

Parameters:

Name Type Description Default
dataset str or list[str] or None

The (list of) test dataset to use, by default 'real_plant_analyzed'. If None, only set up an empty database.

'real_plant_analyzed'
out_path str or Path

The path where to set up the database. Defaults to the temporary directory under 'ROMI_DB', as defined by TMP_TEST_DIR.

None

Other Parameters:

Name Type Description
keep_tmp bool

Whether to keep the temporary files. Defaults to False.

with_configs bool

Whether to download the config files. Defaults to False.

with_models bool

Whether to download the trained CNN model files. Defaults to False.

force bool

Whether to force redownload of archive. Defaults to False.

Returns:

Type Description
FSDB

The FSDB test database.

Examples:

>>> from plantdb.commons.test_database import test_database
>>> db = test_database()
>>> db.connect()
>>> db.list_scans()
['real_plant_analyzed']
>>> db.path()
PosixPath('/tmp/ROMI_DB_********')
>>> db.disconnect()
Source code in plantdb/commons/test_database.py
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
def test_database(dataset='real_plant_analyzed', out_path=None, **kwargs):
    """Create and return an FSDB test database.

    Parameters
    ----------
    dataset : str or list[str] or None, optional
        The (list of) test dataset to use, by default 'real_plant_analyzed'.
        If ``None``, only set up an empty database.
    out_path : str or pathlib.Path, optional
        The path where to set up the database.
        Defaults to the temporary directory under 'ROMI_DB', as defined by ``TMP_TEST_DIR``.

    Other Parameters
    ----------------
    keep_tmp : bool
        Whether to keep the temporary files. Defaults to ``False``.
    with_configs : bool
        Whether to download the config files. Defaults to ``False``.
    with_models : bool
        Whether to download the trained CNN model files. Defaults to ``False``.
    force : bool
        Whether to force redownload of archive. Defaults to ``False``.

    Returns
    -------
    plantdb.commons.fsdb.FSDB
        The FSDB test database.

    Examples
    --------
    >>> from plantdb.commons.test_database import test_database
    >>> db = test_database()
    >>> db.connect()
    >>> db.list_scans()
    ['real_plant_analyzed']
    >>> db.path()
    PosixPath('/tmp/ROMI_DB_********')
    >>> db.disconnect()
    """
    from plantdb.commons.fsdb import FSDB
    if dataset is None:
        return FSDB(setup_empty_database(out_path=out_path))
    else:
        return FSDB(setup_test_database(dataset, out_path=out_path, **kwargs))