Experiment Configs

The most comprehensive way to config an experiment in minerva is to use YAML files. This guide will walk through all the possible config options and their structure. It will also explain how to use the config file for an experiment.

A good example to look at is example_config.yaml

The example_config.yml demonstrating how to construct a master config to define an experiment in minerva.

---
#       *       *    __  ________   ____________ _    _____
#   *        *      /  |/  /  _/ | / / ____/ __ \ |  / /   |  *           *
#       *          / /|_/ // //  |/ / __/ / /_/ / | / / /| |     *
#   *       *     / /  / // // /|  / /___/ _, _/| |/ / ___ |               *
#    *           /_/  /_/___/_/ |_/_____/_/ |_| |___/_/  |_|     *   *
#
#                          EXAMPLE MASTER CONFIG FILE
#
# === PATHS ===================================================================
data_root: tests/fixtures/data
results_dir: tests/tmp/results
cache_dir: tests/tmp/cache

# === HYPERPARAMETERS =========================================================
# ---+ Model Specification +---------------------------------------------------
# Name of model. This no longer used for model class (see model_params).
model_name: FCN32ResNet18-test

# Type of model. Can be mlp, scene classifier, segmentation, ssl or siamese.
model_type: segmentation

# ---+ Sizing +----------------------------------------------------------------
batch_size: 8                               # Number of samples in each batch.
input_size: [4, 32, 32]   # patch_size plus leading channel dim.
patch_size: '${to_patch_size: ${input_size}}'  # 2D tuple or float.
n_classes: 8                                   # Number of classes in dataset.

# ---+ Experiment Execution +--------------------------------------------------
max_epochs: 4                         # Maximum number of training epochs.
pre_train: false                      # Activate pre-training mode.
fine_tune: false                      # Activate fine-tuning mode.
elim: true                            # Eliminates empty classes from schema.
balance: true                         # Balances dataset classes.
torch_compile: true                   # Wrap model in `torch.compile`.

# ---+ Optimisers +---------------------------------------------------
lr: 1.0E-2                            # Learning rate of optimiser.
optim_func: SGD                       # Name of the optimiser function.

# ---+ Model Parameters +------------------------------------------------------
model_params:
_target_: minerva.models.FCN32ResNet18
input_size: ${input_size}
n_classes: ${n_classes}
# any other params...

# ---+ Optimiser Parameters +--------------------------------------------------
optimiser:
_target_: torch.optim.${optim_func}
lr: ${lr}

# ---+ Scheduler Parameters +--------------------------------------------------
scheduler:
_target_: torch.optim.lr_scheduler.LinearLR
start_factor: 1.0
end_factor: 0.5
total_iters: 5

# ---+ Loss Function Parameters +----------------------------------------------
loss_params:
_target_: torch.nn.CrossEntropyLoss

# ---+ Dataloader Parameters +-------------------------------------------------
loader_params:
num_workers: 0
pin_memory: true

# === MODEL IO & LOGGING ======================================================
# ---+ wandb Logging +---------------------------------------------------------
wandb_log: true              # Activates wandb logging.
project: pytest              # Define the project name for wandb.
wandb_dir: /test/tmp/wandb   # Directory to store wandb logs locally.

# ---+ Collator +--------------------------------------------------------------
collator: torchgeo.datasets.stack_samples

# === TASKS ===================================================================
tasks:
fit-train:
    _target_: minerva.tasks.StandardEpoch
    train: true
    record_float: true

    imagery_config: '${oc.create:${cfg_load: minerva/inbuilt_cfgs/dataset/NAIP.yaml}}'  # yamllint disable-line rule:line-length
    data_config: '${oc.create:${cfg_load: minerva/inbuilt_cfgs/dataset/Chesapeake7.yaml}}'  # yamllint disable-line rule:line-length

    # ---+ Dataset Parameters +----------------------------------------
    dataset_params:
        sampler:
            _target_: torchgeo.samplers.RandomGeoSampler
            roi: false
            size: ${patch_size}
            length: 32

        image:
            transforms: false
            subdatasets:
                images_1:
                    _target_: minerva.datasets.__testing.TstImgDataset
                    paths: NAIP
                    res: 1.0

                image2:
                    _target_: minerva.datasets.__testing.TstImgDataset
                    paths: NAIP
                    res: 1.0

        mask:
            transforms: false
            _target_: minerva.datasets.__testing.TstMaskDataset
            paths: Chesapeake7
            res: 1.0

fit-val:
    _target_: minerva.tasks.StandardEpoch
    train: false
    record_float: true

    imagery_config: '${oc.create:${cfg_load: minerva/inbuilt_cfgs/dataset/NAIP.yaml}}'  # yamllint disable-line rule:line-length
    data_config: '${oc.create:${cfg_load: minerva/inbuilt_cfgs/dataset/Chesapeake7.yaml}}'  # yamllint disable-line rule:line-length

    # ---+ Minerva Inbuilt Logging Functions +-------------------------
    task_logger: minerva.logger.tasklog.SupervisedTaskLogger
    model_io: minerva.modelio.supervised_torchgeo_io

    # ---+ Dataset Parameters +----------------------------------------
    dataset_params:
        sampler:
            _target_: torchgeo.samplers.RandomGeoSampler
            roi: false
            size: ${patch_size}
            length: 32

        image:
            transforms: false
            _target_: minerva.datasets.__testing.TstImgDataset
            paths: NAIP
            res: 1.0

        mask:
            transforms: false
            _target_: minerva.datasets.__testing.TstMaskDataset
            paths: Chesapeake7
            res: 1.0

test-test:
    _target_: minerva.tasks.StandardEpoch
    record_float: true

    imagery_config: '${oc.create:${cfg_load: minerva/inbuilt_cfgs/dataset/NAIP.yaml}}'  # yamllint disable-line rule:line-length
    data_config: '${oc.create:${cfg_load: minerva/inbuilt_cfgs/dataset/Chesapeake7.yaml}}'  # yamllint disable-line rule:line-length

    # ---+ Minerva Inbuilt Logging Functions +-------------------------
    task_logger: minerva.logger.tasklog.SupervisedTaskLogger
    model_io: minerva.modelio.supervised_torchgeo_io

    # ---+ Dataset Parameters +----------------------------------------
    dataset_params:
        sampler:
            _target_: torchgeo.samplers.RandomGeoSampler
            roi: false
            size: ${patch_size}
            length: 32

        image:
            transforms: false
            _target_: minerva.datasets.__testing.TstImgDataset
            paths: NAIP
            res: 1.0

        mask:
            transforms: false
            _target_: minerva.datasets.__testing.TstMaskDataset
            paths: Chesapeake7
            res: 1.0

# === PLOTTING OPTIONS ========================================================
plots:
    History: true   # Plot of the training and validation metrics over epochs.
    CM: true        # Confusion matrix.
    Pred: true      # Pie chart of the distribution of the predicted classes.
    ROC: true       # Receiver Operator Characteristics for each class.
    micro: true     # Include micro averaging in ROC plot.
    macro: true     # Include macro averaging in ROC plot.
    Mask: true      # Plot predicted masks against ground truth and imagery.

# === MISCELLANEOUS OPTIONS ===================================================
# ---+ Early Stopping +--------------------------------------------------------
stopping:
patience: 1    # No. of val epochs with increasing loss before stopping.
verbose: true  # Verbosity of early stopping prints to stdout.

# ---+ Verbosity and Saving +--------------------------------------------------
verbose: true           # Verbosity of Trainer print statements to stdout.
save: true              # Saves created figures to file.
show: false             # Shows created figures in a pop-up window.
p_dist: true            # Shows the distribution of classes to stdout.
plot_last_epoch: true   # Plot the results of the last training and val epochs.

# opt to ask at runtime; auto or True to automatically do so; or False,
# None etc to not
save_model: true

# ---+ Other +-----------------------------------------------------------------
# opt to ask at runtime; auto or True to automatically do so; or False,
# None etc to not
run_tensorboard: false
calc_norm: false

Paths

Paths to required directories are defined in the data_root, results_dir and cache_dir keys.

Example dir dictionary describing the paths to directories needed in experiment.

# === PATHS ===================================================================
data_root: tests/fixtures/data
results_dir: tests/tmp/results
cache_dir: tests/tmp/cache

data_root

Path to the data directory where the input data is stored within. Can be relative or absolute.

Type:: str

cache_dir

Path to the cache directory storing dataset manifests and a place to output the latest / best version of a model. Can be relative or absolute.

Type:: str

results_dir

Path to the results directory where the results from all experiments will be stored. Can be relative or absolute.

Type:: str

Hyperparameters

This section of the config file covers hyperparmeters of the model and experiment. The most important of these are now top-level variables in the config. Most are also accessible from the CLI.

Model Specification

These parameters focus on defining the model, such as class, version and type.

# Name of model. Substring before hyphen is model class.
model_name: FCN32ResNet18-MkI

# Type of model.
model_type: segmentation

model_name

Name of the model. Used to create the unique exp_name that is created dynamically for each experiment run.

Type:: str

model_type

Type of model. Can contain these key words seperated by hyphens:

"segmentation"
"scene_classifier"
"mlp"
"ssl"
"siamese"
"change_detection"
"multilabel"

Type:: str
Value:: “scene_classifier”

Sizing

These parameters concern the shapes and sizes of the IO to the model.

batch_size: 8             # Number of samples in each batch.
patch_size: [32, 32]      # 2D tuple or float.
input_size: [4, 32, 32]   # patch_size plus leading channel dim.
n_classes: 8              # Number of classes in dataset.

batch_size

Number of samples in each batch.

Type:: int

patch_size

Define the shape of the patches in the dataset.

Type:: Tuple[int, int]

input_size

The patch_size plus the leading channel dimension.

Type:: Tuple[int, int, int]

n_classes

Number of possible classes in the dataset.

Type:: int

Experiment Execution

These parameters control the execution of the model fitting such as the number of epochs, type of job or class balancing.

max_epochs: 5                         # Maximum number of training epochs.
pre_train: false                      # Activate pre-training mode.
fine_tune: false                      # Activate fine-tuning mode.
elim: true                            # Eliminates empty classes from schema.
balance: true                         # Balances dataset classes.

max_epochs

Maximum number of epochs of training and validation.

Type:: int
Value:: 5

pre_train

Defines this as a pre-train experiment. In this case, the backbone of the model will be saved to the cache at the end of training.

Type:: bool
Value:: False

fine_tune

Defines this as a fine-tuning experiment.

Type:: bool
Value:: False

elim

Will eliminate classes that have no samples in and reorder the class labels so they still run from 0 to n-1 classes where n is the reduced number of classes. minerva ensures that labels are converted between the old and new schemes seamlessly.

Type:: bool
Value:: False

balance

Activates class balancing. For model_type="scene_classifer" or model_type="mlp", over and under sampling will be used. For model_type="segmentation", class weighting will be used on the loss function.

Type:: bool
Value:: False

Loss and Optimisers

These parameters set the most important aspects of the loss function and optimiser.

loss_func: CrossEntropyLoss           # Name of the loss function to use.
lr: 1.0E-2                            # Learning rate of optimiser.
optim_func: SGD                       # Name of the optimiser function.

loss_func

Name of the loss function to use.

Type:: str

lr

Learning rate of the optimiser

Type:: float

optim_func

Name of the optimiser function.

Type:: str

Model Paramaters

These are the parameters parsed to the model class to initiate it.

model_params:
    _target_: minerva.models.FCN32ResNet18
    input_size: ${input_size}
    n_classes: ${n_classes}
    # any other params...

Two common parameters are:

input_size

Shape of the input to the model. Typically in CxHxW format. Should align with the values given for patch_size.

Type:: list

n_classes

Number of possible classes to predict in output. Best to parse n_classes using ${n_classes}.

Type:: int

But you can add any other parameters in the model_params dict that the model expects.

Optimiser Parameters

Here’s where to place any additional parameters for the optimiser, other than the already handled learning rate – lr. Place them in the params key. If using a non-torch optimiser, use the module key to specify the import path to the optimiser function.

optimiser:
    _target_: torch.optim.${optim_func}
    lr: ${lr}

Loss Paramaters

Here’s where to specify any additional parameters for the loss function in the params key. If using a non-torch loss function, you need to specify the import path with the module key.

loss:
    _target_: torch.nn.${loss_func}
    # any other params...

Dataloader Paramaters

Finally, this is where to define parameters for the DataLoader. Unlike other parameters, there is no _target_ field as it is locked to DataLoader.

loader_params:
    num_workers: 1
    pin_memory: true

Model IO & Logging

These parameters allow for the configuring how to handle different types of input/ output to the model and how to handle logging of the model.

wandb Logging

Here’s where to define how Weights and Biases (wandb) behaves in minerva.

wandb_log: true              # Activates wandb logging.
project: pytest              # Define the project name for wandb.
wandb_dir: /test/tmp/wandb   # Directory to store wandb logs locally.

Minerva Inbuilt Logging Functions

In addition, there are also options for defining the logging, metric calculator and IO function using inbuilt minerva functionality:

task_logger: minerva.logger.tasklog.SupervisedTaskLogger
step_logger:
    _target_: minerva.logger.steplog.SupervisedStepLogger
    # any other params...

model_io: minerva.modelio.supervised_torchgeo_io

record_int: true    # Store integer results in memory.
record_float: true  # Store floating point results too. Beware memory overload!

logger

Specify the logger to use. Must be the name of a MinervaLogger class within logger.

Type:: str

metrics

Specify the metric logger to use. Must be the name of a MinervaMetrics class within metrics.

Type:: str

model_io

Specify the IO function to use to handle IO for the model during fitting. Must be the name of a function within modelio.

Type:: str

record_int

Store the integer results of each epoch in memory such the predictions, ground truth etc.

Type:: bool

record_float: Store the floating point results of each epoch in memory such as the raw predicted probabilities.

Warning

Could cause a memory overload issue with large datasets or systems with small RAM capacity.

Collator

The collator is the function that collates the samples from the datset to make a mini-batch. It can be defined using the simple collator param at the global-level.

collator: torchgeo.datasets.stack_samples

collator

Dot-based import path to the desired collator.

Type:: str

Plots Dictionary

To define which plots to make from the results of testing, use the plots sub-dictionary with these keys:

Example plots dictionary.

plots:
    History: True
    CM: False
    Pred: False
    ROC: False
    micro: False
    macro: True
    Mask: False

History

Plot a graph of the model history. By default, this will plot a graph of any metrics with keys containing "train" or "val".

Type:: bool

CM

Plots a confusion matrix.

Type:: bool

Pred

Plots a pie chart of the relative sizes of the classes within the predictions from the model.

Type:: bool

ROC

Plots a Receiver over Operator Curve (ROC) including Area Under Curve (AUC) scores.

Type:: bool

micro

Only used with ROC=True. ROC plot includes micro-average ROC.

Warning

Adding this plot can be very computationally and memory intensive. Avoid use with large datasets!

Type:: bool

macro

Only used with ROC=True. ROC plot includes macro-average ROC.

Type:: bool

Mask

Plots a comparison of predicted segmentation masks, the ground truth and original RGB imagery from a random selection of samples put to the model.

Type:: bool

Miscellaneous Options

And finally, this section holds various other options.

Early Stopping

Here’s where to define the behaviour of early stopping functionality.

stopping:
    patience: 2    # No. of val epochs with increasing loss before stopping.
    verbose: true  # Verbosity of early stopping prints to stdout.

stopping

Dictionary to hold the parameters defining the early stopping functionality. If no dictionary is given, it is assumed that there will be no early stopping.

Type:: dict

patience

Number of validation epochs with increasing loss from the lowest recorded validation loss before stopping the experiment.

Type:: int

verbose

Verbosity of the early stopping prints to stdout.

Type:: bool

Verbosity and Saving

These parameters dictate the behaviour of the outputs to stdout and saving results.

verbose: true           # Verbosity of Trainer print statements to stdout.
save: true              # Saves created figures to file.
show: false             # Shows created figures in a pop-up window.
p_dist: true            # Shows the distribution of classes to stdout.
plot_last_epoch: true   # Plot the results of the last training and val epochs.

# opt to ask at runtime; auto or True to automatically do so; or False,
# None etc to not
save_model: true

verbose

Verbosity of Trainer prints to stdout.

Type:: bool

save

Whether to save plots created to file or not.

Type:: bool
Value:: True

show

Whether to show plots created in a window or not.

Warning

Do not use with a terminal-less operation, e.g. SLURM.

Type:: bool
Value:: False

p_dist

Whether to print the distribution of classes within the data to stdout.

Type:: bool
Value:: False

plot_last_epoch

Whether to plot the results from the final validation epoch.

Type:: bool
Value:: False

save_model

Whether to save the model at end of testing. Must be True, False or "auto". Setting "auto" will automatically save the model to file. True will ask the user whether to or not at runtime. False will not save the model and will not ask the user at runtime.

Type:: str | bool
Value:: False

Other

All other options belong in this section.

# opt to ask at runtime; auto or True to automatically do so; or False,
# None etc to not
run_tensorboard: false
calc_norm: false

run_tensorboard

Whether to run the Tensorboard logs at end of testing. Must be True, False or "auto". Setting "auto" will automatically locate and run the logs on a local browser. True will ask the user whether to or not at runtime. False will not save the model and will not ask the user at runtime.

Type:: str | bool
Value:: False

calc_norm

Depreciated: Calculates the gradient norms.

Type:: bool
Value:: False