utilsο
Module to handle all utility functions for training, testing and evaluation of a model.
- IMAGERY_CONFIGο
Config defining the properties of the imagery used in the experiment.
- DATA_CONFIGο
Config defining the properties of the data used in the experiment.
- BAND_IDSο
Band IDs and position in sample image.
- batch_flatten(x: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) ndarray[Any, dtype[Any]] ο
Flattens the supplied array with
numpy.flatten()
.
- calc_norm_euc_dist(a: Tensor, b: Tensor) Tensor ο
Calculates the normalised Euclidean distance between two vectors.
- check_dict_key(dictionary: dict[Any, Any], key: Any) bool ο
Checks if a key exists in a dictionary and if it is
None
orFalse
.
- check_len(param: Any, comparator: Any) Any | Sequence[Any] ο
Checks the length of one object against a comparator object.
- Parameters:
- Returns:
- Return type:
- check_optional_import_exist(package: str) bool ο
Checks if a package is installed. Useful for optional dependencies.
- check_substrings_in_string(string: str, *substrings, all_true: bool = False) bool ο
Checks if either any or all substrings are in the provided string.
- Parameters:
- Returns:
True if any
substring
is instring
ifall_true==False
. OnlyTrue
if allsubstrings
instring
ifall_true==True
.False
if else.- Return type:
- check_test_empty(pred: Sequence[int] | ndarray[Any, dtype[int64]], labels: Sequence[int] | ndarray[Any, dtype[int64]], class_labels: dict[int, str] | None = None, p_dist: bool = True) tuple[ndarray[Any, dtype[int64]], ndarray[Any, dtype[int64]], dict[int, str]] ο
Checks if any of the classes in the dataset were not present in both the predictions and ground truth labels. Returns corrected and re-ordered predictions, labels and class labels.
- Parameters:
pred (Sequence[int] | ndarray[int]) β List of predicted labels.
labels (Sequence[int] | ndarray[int]) β List of corresponding ground truth labels.
class_labels (dict[int, str]) β Optional; Dictionary mapping class labels to class names.
p_dist (bool) β Optional; Whether to print to screen the distribution of classes within each dataset.
- Returns:
tuple
of:List of predicted labels transformed to new classes.
List of corresponding ground truth labels transformed to new classes.
Dictionary mapping new class labels to class names.
- Return type:
- check_within_bounds(bbox: BoundingBox, bounds: BoundingBox) BoundingBox ο
Ensures that the a bounding box is within another.
- Parameters:
bbox (BoundingBox) β First bounding box that needs to be within the second.
bounds (BoundingBox) β Second outer bounding box to use as the bounds.
- Returns:
Copy of
bbox
if it is withinbounds
or a new bounding box that has been limited to the dimensions ofbounds
if those ofbbox
exceeded them.- Return type:
- class_dist_transform(class_dist: list[tuple[int, int]], matrix: dict[int, int]) list[tuple[int, int]] ο
Transforms the class distribution from an old schema to a new one.
- class_frac(patch: Series) dict[Any, Any] ο
Computes the fractional sizes of the classes of the given patch and returns a
dict
of the results.
- class_transform(label: int, matrix: dict[int, int]) int ο
Transforms labels from one schema to another mapped by a supplied dictionary.
- class_weighting(class_dist: list[tuple[int, int]], normalise: bool = False) dict[int, float] ο
Constructs weights for each class defined by the distribution provided.
Note
Each class weight is the inverse of the number of samples of that class. This will most likely mean that the weights will not sum to unity.
- Parameters:
- Returns:
Dictionary mapping class number to its weight.
- Return type:
- compile_dataset_paths(data_dir: Path | str, in_paths: list[Path | str] | Path | str) list[str] ο
Ensures that a list of paths is returned with the data directory prepended, even if a single string is supplied
- compute_roc_curves(probs: ndarray[Any, dtype[float64]], labels: Sequence[int] | ndarray[Any, dtype[int64]], class_labels: list[int], micro: bool = True, macro: bool = True) tuple[dict[Any, float], dict[Any, float], dict[Any, float]] ο
Computes the false-positive rate, true-positive rate and AUCs for each class using a one-vs-all approach. The micro and macro averages are for each of these variables is also computed.
Adapted from scikit-learnβs example at: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
- Parameters:
probs (ndarray[float]) β Array of probabilistic predicted classes from model where each sample should have a list of the predicted probability for each class.
labels (list[int]) β List of corresponding ground truth labels.
micro (bool) β Optional; Whether to compute the micro average ROC curves.
macro (bool) β Optional; Whether to compute the macro average ROC curves.
- Returns:
tuple
of:Dictionary of false-positive rates for each class and micro and macro averages.
Dictionary of true-positive rates for each class and micro and macro averages.
Dictionary of AUCs for each class and micro and macro averages.
- Return type:
- datetime_reformat(timestamp: str, fmt1: str, fmt2: str) str ο
Takes a
str
representing a time stamp in one format and returns it reformatted into a second.
- dec2deg(dec_co: Sequence[float] | ndarray[Any, dtype[float64]], axis: str = 'lat') list[str] ο
Wrapper for
deg_to_dms()
.
- deg_to_dms(deg: float, axis: str = 'lat') str ο
Converts between decimal degrees of lat/lon to degrees, minutes, seconds.
Credit to Gustavo Gonçalves on Stack Overflow. https://stackoverflow.com/questions/2579535/convert-dd-decimal-degrees-to-dms-degrees-minutes-seconds-in-python
- Parameters:
- Returns:
String of inputted
deg
in degrees, minutes and seconds in the form DegreesΒΊ Minutes Seconds Hemisphere.- Return type:
- dublicator(cls)ο
Dublicates decorated transform object to handle paired samples.
- eliminate_classes(empty_classes: list[int] | tuple[int, ...] | ndarray[Any, dtype[int64]], old_classes: dict[int, str], old_cmap: dict[int, str] | None = None) tuple[dict[int, str], dict[int, int], dict[int, str] | None] ο
Eliminates empty classes from the class text label and class colour dictionaries and re-normalise.
This should ensure that the remaining list of classes is still a linearly spaced list of numbers.
- Parameters:
- Returns:
tuple
of dictionaries:Mapping of remaining class labels to class names.
Mapping from old to new classes.
Mapping of remaining class labels to RGB colours.
- Return type:
- extract_class_type(var: Any) type ο
Ensures that a class type is returned from a variable whether it is one already or not.
- Parameters:
var (Any) β Variable to get class type from. May already be a class type.
- Returns:
Class type of
var
.- Return type:
- fallback_params(key: str, params_a: dict[str, Any], params_b: dict[str, Any], fallback: Any | None = None) Any ο
Search for a value associated with
key
from
- find_best_of(patch_id: str, manifest: ~pandas.core.frame.DataFrame, selector: ~typing.Callable[[~pandas.core.frame.DataFrame], list[str]] = <function threshold_scene_select>, **kwargs) list[str] ο
Finds the scenes sorted by cloud cover using selector function supplied.
- Parameters:
patch_id (str) β Unique patch ID.
manifest (DataFrame) β
DataFrame
outlining cloud cover percentages for all scenes in the patches desired.selector (Callable[[DataFrame], list[str]]) β Optional; Function to use to select scenes. Must take an appropriately constructed
DataFrame
.**kwargs β Kwargs for func.
- Returns:
List of strings representing dates of the selected scenes in
YY_MM_DD
format.- Return type:
- find_empty_classes(class_dist: list[tuple[int, int]], class_names: dict[int, str]) list[int] ο
Finds which classes defined by config files are not present in the dataset.
- Parameters:
- Returns:
List of classes not found in
class_dist
and are thus empty/ not present in dataset.- Return type:
- find_geo_similar(bbox: BoundingBox, max_r: int = 256) BoundingBox ο
Find an image that is less than or equal to the geo-spatial distance
r
from the intial image.Based on the the work of GeoCLR https://arxiv.org/abs/2108.06421v1.
- Parameters:
bbox (BoundingBox) β Original bounding box.
max_r (int) β Optional; Maximum distance new bounding box can be from original. Defaults to
256
.
- Returns:
New bounding box translated a random displacement from original.
- Return type:
- find_modes(labels: Iterable[int], plot: bool = False, classes: dict[int, str] | None = None, cmap_dict: dict[int, str] | None = None) list[tuple[int, int]] ο
Finds the modal distribution of the classes within the labels provided.
Can plot the results as a pie chart if
plot=True
.
- find_tensor_mode(mask: LongTensor) LongTensor ο
Finds the mode value in a
LongTensor
.- Parameters:
mask (LongTensor) β Tensor to find modal value in.
- Returns:
A 0D, 1-element tensor containing the modal value.
- Return type:
LongTensor
Added in version 0.22.
- func_by_str(module_path: str, func: str) Callable[[...], Any] ο
Gets the constructor or callable within a module defined by the names supplied.
- get_centre_loc(bounds: BoundingBox) tuple[float, float] ο
Gets the centre co-ordinates of the parsed bounding box.
- Parameters:
bounds (BoundingBox) β Bounding box to find the centre co-ordinates.
- Returns:
tuple
of the centre x, y co-ordinates of the bounding box.- Return type:
- get_cuda_device(device_sig: int | str = 'cuda:0') device ο
Finds and returns the
CUDA
device, if one is available. Else, returns CPU as device. Assumes there is at most only oneCUDA
device.
- is_notebook() bool ο
Check if this code is being executed from a Juypter Notebook or not.
Adapted from https://gist.github.com/thomasaarholt/e5e2da71ea3ee412616b27d364e3ae82
- Returns:
True
if executed by Juypter kernel.False
if not.- Return type:
- labels_to_ohe(labels: Sequence[int], n_classes: int) ndarray[Any, dtype[Any]] ο
Convert an iterable of indices to one-hot encoded (OHE) labels.
- lat_lon_to_loc(lat: str | float, lon: str | float) str ο
Takes a latitude - longitude co-ordinate and returns a string of the semantic location.
- make_classification_report(pred: Sequence[int] | ndarray[Any, dtype[int64]], labels: Sequence[int] | ndarray[Any, dtype[int64]], class_labels: dict[int, str] | None = None, print_cr: bool = True, p_dist: bool = False) DataFrame ο
Generates a DataFrame of the precision, recall, f-1 score and support of the supplied predictions and ground truth labels.
Uses scikit-learnβs classification_report to calculate the metrics: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
- Parameters:
pred (list[int] | ndarray[int]) β List of predicted labels.
labels (list[int] | ndarray[int]) β List of corresponding ground truth labels.
class_labels (dict[int, str]) β Dictionary mapping class labels to class names.
print_cr (bool) β Optional; Whether to print a copy of the classification report
DataFrame
put throughtabulate
.p_dist (bool) β Optional; Whether to print to screen the distribution of classes within each dataset.
- Returns:
Classification report with the precision, recall, f-1 score and support for each class in a
DataFrame
.- Return type:
- mask_to_ohe(mask: LongTensor, n_classes: int) LongTensor ο
Converts a segmentation mask to one-hot-encoding (OHE).
- Parameters:
mask (LongTensor) β Segmentation mask to convert.
n_classes (int) β Optional; Number of classes in total across dataset. If not provided, the number of classes is infered from those found in
mask
.
Note
It is advised that one provides
n_classes
as there is a fair chance that not all possible classes are inmask
. Infering from the classes present inmask
therefore is likely to result in shaping issues between masks in a batch.- Returns:
mask
converted to OHE. The one-hot-encoding is placed in the leading dimension. (CxHxW) where C is the number of classes.- Return type:
LongTensor
Added in version 0.23.
- mask_transform(array: ndarray[Any, dtype[int64]], matrix: dict[int, int]) ndarray[Any, dtype[int64]] ο
- mask_transform(array: LongTensor, matrix: dict[int, int]) LongTensor
Transforms all labels of an N-dimensional array from one schema to another mapped by a supplied dictionary.
- mkexpdir(name: str, results_dir: Path | str = 'results') None ο
Makes a new directory below the results directory with name provided. If directory already exists, no action is taken.
- modes_from_manifest(manifest: DataFrame, classes: dict[int, str], plot: bool = False, cmap_dict: dict[int, str] | None = None) list[tuple[int, int]] ο
Uses the dataset manifest to calculate the fractional size of the classes.
- Parameters:
- Returns:
Modal distribution of classes in the dataset provided.
- Return type:
- pair_collate(func: Callable[[Any], Any]) Callable[[Any], Any] ο
Wraps a collator function so that it can handle paired samples.
Warning
NOT compatible with
DistributedDataParallel
due to itβs use ofpickle
. Usestack_sample_pairs()
instead as a direct replacement forstack_samples()
.
- pair_return(cls)ο
Wrapper for
GeoDataset
classes to be able to handle pairs of queries and returns.Warning
NOT compatible with
DistributedDataParallel
due to itβs use ofpickle
. UsePairedGeoDataset
directly instead, supplying the dataset to wrap on init.- Raises:
AttributeError β If an attribute cannot be found in either the
Wrapper
or the wrappeddataset
.
- print_class_dist(class_dist: list[tuple[int, int]], class_labels: dict[int, str] | None = None) None ο
Prints the supplied
class_dist
in a pretty table format usingtabulate
.
- return_updated_kwargs(func: Callable[[...], tuple[Any, ...]]) Callable[[...], tuple[Any, ...]] ο
Decorator that allows the kwargs supplied to the wrapped function to be returned with updated values.
Assumes that the wrapped function returns a
dict
in the last position of thetuple
of returns with keys inkwargs
that have new values.
- run_tensorboard(exp_name: str, path: str | list[str] | tuple[str, ...] | Path = '', env_name: str = 'env', host_num: str | int = 6006, _testing: bool = False) int | None ο
Runs the
TensorBoard
logs and hosts on a local webpage.- Parameters:
exp_name (str) β Unique name of the experiment to run the logs of.
path (str | list[str] | tuple[str, ...] | Path) β Path to the directory holding the log. Can be a string or a list of strings for each sub-directory.
env_name (str) β Name of the
conda
environment to runtensorBoard
in.host_num (str | int) β Local host number
tensorBoard
will be hosted on.
- Raises:
KeyError β If
path is None
but the default cannot be found inconfig
, returnNone
.- Returns:
Exitcode for testing purposes.
None
under normal use.- Return type:
int | None
- set_seeds(seed: int) None ο
Set
torch
,numpy
andrandom
seeds for reproducibility.- Parameters:
seed (int) β Seed number to set all seeds to.
- tg_to_torch(cls, keys: Sequence[str] | None = None)ο
Ensures wrapped transform can handle both
Tensor
andtorchgeo
styledict
inputs.Warning
NOT compatible with
DistributedDataParallel
due to itβs use ofpickle
. This functionality is now handled withinMinervaCompose
.
- threshold_scene_select(df: DataFrame, thres: float = 0.3) list[str] ο
Selects all scenes in a patch with a cloud cover less than the threshold provided.
- Parameters:
- Returns:
List of strings representing dates of the selected scenes in
YY_MM_DD
format.- Return type:
- transform_coordinates(x: Sequence[float], y: Sequence[float], src_crs: CRS, new_crs: CRS = WGS84) tuple[Sequence[float], Sequence[float]] ο
- transform_coordinates(x: Sequence[float], y: float, src_crs: CRS, new_crs: CRS = WGS84) tuple[Sequence[float], Sequence[float]]
- transform_coordinates(x: float, y: Sequence[float], src_crs: CRS, new_crs: CRS = WGS84) tuple[Sequence[float], Sequence[float]]
- transform_coordinates(x: float, y: float, src_crs: CRS, new_crs: CRS = WGS84) tuple[float, float]
Transforms co-ordinates from one
CRS
to another.
- tsne_cluster(embeddings: ndarray[Any, dtype[Any]], n_dim: int = 2, lr: str = 'auto', n_iter: int = 1000, verbose: int = 1, perplexity: int = 30) Any ο
Trains a TSNE algorithm on the embeddings passed.
- Parameters:
embeddings (ndarray[Any]) β Embeddings outputted from the model.
n_dim (int, optional) β Number of dimensions to reduce embeddings to. Defaults to 2.
lr (str, optional) β Learning rate. Defaults to βautoβ.
n_iter (int, optional) β Number of iterations. Defaults to 1000.
verbose (int, optional) β Verbosity. Defaults to 1.
perplexity (int, optional) β Relates to number of nearest neighbours used. Must be less than the length of
embeddings
.
- Returns:
Embeddings transformed to
n_dim
dimensions using TSNE.- Return type: