Data#

Generic Interfaces#

Dataset#

class monai.data.Dataset(data, transform=None)[source]#

A generic dataset with a length property and an optional callable data transform when fetching a data sample. If passing slicing indices, will return a PyTorch Subset, for example: data: Subset = dataset[1:4], for more details, please check: https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset

For example, typical input data can be a list of dictionaries:

[{                            {                            {
     'img': 'image1.nii.gz',      'img': 'image2.nii.gz',      'img': 'image3.nii.gz',
     'seg': 'label1.nii.gz',      'seg': 'label2.nii.gz',      'seg': 'label3.nii.gz',
     'extra': 123                 'extra': 456                 'extra': 789
 },                           },                           }]

__getitem__(index)[source]#: Returns a Subset if index is a slice or Sequence, a data item otherwise.

__init__(data, transform=None)[source]#

Parameters:

data (Sequence) – input data to load and transform to generate dataset for model.
transform (UnionType[Sequence[Callable], Callable, None]) – a callable, sequence of callables or None. If transform is not
instance (a Compose)
Sequences (it will be wrapped in a Compose instance.)
passed (of callables are applied in order and if None is)
is. (the data is returned as)

IterableDataset#

class monai.data.IterableDataset(data, transform=None)[source]#

A generic dataset for iterable data source and an optional callable data transform when fetching a data sample. Inherit from PyTorch IterableDataset: https://pytorch.org/docs/stable/data.html?highlight=iterabledataset#torch.utils.data.IterableDataset. For example, typical input data can be web data stream which can support multi-process access.

To accelerate the loading process, it can support multi-processing based on PyTorch DataLoader workers, every process executes transforms on part of every loaded data. Note that the order of output data may not match data source in multi-processing mode. And each worker process will have a different copy of the dataset object, need to guarantee process-safe from data source or DataLoader.

__init__(data, transform=None)[source]#

Parameters:

data (Iterable[Any]) – input data source to load and transform to generate dataset for model.
transform (UnionType[Callable, None]) – a callable data transform on input data.

DatasetFunc#

class monai.data.DatasetFunc(data, func, **kwargs)[source]#

Execute function on the input dataset and leverage the output to act as a new Dataset. It can be used to load / fetch the basic dataset items, like the list of image, label paths. Or chain together to execute more complicated logic, like partition_dataset, resample_datalist, etc. The data arg of Dataset will be applied to the first arg of callable func. Usage example:

data_list = DatasetFunc(
    data="path to file",
    func=monai.data.load_decathlon_datalist,
    data_list_key="validation",
    base_dir="path to base dir",
)
# partition dataset for every rank
data_partition = DatasetFunc(
    data=data_list,
    func=lambda **kwargs: monai.data.partition_dataset(**kwargs)[torch.distributed.get_rank()],
    num_partitions=torch.distributed.get_world_size(),
)
dataset = Dataset(data=data_partition, transform=transforms)

Parameters:

data (Any) – input data for the func to process, will apply to func as the first arg.
func (Callable) – callable function to generate dataset items.
kwargs – other arguments for the func except for the first arg.

reset(data=None, func=None, **kwargs)[source]#

Reset the dataset items with specified func.

Parameters:

data (UnionType[Any, None]) – if not None, execute func on it, default to self.src.
func (UnionType[Callable, None]) – if not None, execute the func with specified kwargs, default to self.func.
kwargs – other arguments for the func except for the first arg.

ShuffleBuffer#

class monai.data.ShuffleBuffer(data, transform=None, buffer_size=512, seed=0, epochs=1)[source]#

Extend the IterableDataset with a buffer and randomly pop items.

Parameters:

data – input data source to load and transform to generate dataset for model.
transform – a callable data transform on input data.
buffer_size (int) – size of the buffer to store items and randomly pop, default to 512.
seed (int) – random seed to initialize the random state of all workers, set seed += 1 in every iter() call, refer to the PyTorch idea: pytorch/pytorch.
epochs (int) – number of epochs to iterate over the dataset, default to 1, -1 means infinite epochs.

Note

Both monai.data.DataLoader and torch.utils.data.DataLoader do not seed this class (as a subclass of IterableDataset) at run time. persistent_workers=True flag (and pytorch>1.8) is therefore required for multiple epochs of loading when num_workers>0. For example:

import monai

def run():
    dss = monai.data.ShuffleBuffer([1, 2, 3, 4], buffer_size=30, seed=42)

    dataloader = monai.data.DataLoader(
        dss, batch_size=1, num_workers=2, persistent_workers=True)
    for epoch in range(3):
        for item in dataloader:
            print(f"epoch: {epoch} item: {item}.")

if __name__ == '__main__':
    run()

generate_item()[source]#: Fill a buffer list up to self.size, then generate randomly popped items.

randomize(size)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: None

randomized_pop(buffer)[source]#: Return the item at a randomized location self._idx in buffer.

CSVIterableDataset#

class monai.data.CSVIterableDataset(src, chunksize=1000, buffer_size=None, col_names=None, col_types=None, col_groups=None, transform=None, shuffle=False, seed=0, kwargs_read_csv=None, **kwargs)[source]#

Iterable dataset to load CSV files and generate dictionary data. It is particularly useful when data come from a stream, inherits from PyTorch IterableDataset: https://pytorch.org/docs/stable/data.html?highlight=iterabledataset#torch.utils.data.IterableDataset.

It also can be helpful when loading extremely big CSV files that can’t read into memory directly, just treat the big CSV file as stream input, call reset() of CSVIterableDataset for every epoch. Note that as a stream input, it can’t get the length of dataset.

To effectively shuffle the data in the big dataset, users can set a big buffer to continuously store the loaded data, then randomly pick data from the buffer for following tasks.

To accelerate the loading process, it can support multi-processing based on PyTorch DataLoader workers, every process executes transforms on part of every loaded data. Note: the order of output data may not match data source in multi-processing mode.

It can load data from multiple CSV files and join the tables with additional kwargs arg. Support to only load specific columns. And it can also group several loaded columns to generate a new column, for example, set col_groups={“meta”: [“meta_0”, “meta_1”, “meta_2”]}, output can be:

[
    {"image": "./image0.nii", "meta_0": 11, "meta_1": 12, "meta_2": 13, "meta": [11, 12, 13]},
    {"image": "./image1.nii", "meta_0": 21, "meta_1": 22, "meta_2": 23, "meta": [21, 22, 23]},
]

Parameters:

src (UnionType[str, Sequence[str], Iterable, Sequence[Iterable]]) – if provided the filename of CSV file, it can be a str, URL, path object or file-like object to load. also support to provide iter for stream input directly, will skip loading from filename. if provided a list of filenames or iters, it will join the tables.
chunksize (int) – rows of a chunk when loading iterable data from CSV files, default to 1000. more details: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html.
buffer_size (UnionType[int, None]) – size of the buffer to store the loaded chunks, if None, set to 2 x chunksize.
col_names (UnionType[Sequence[str], None]) – names of the expected columns to load. if None, load all the columns.
col_types (UnionType[dict[str, UnionType[dict[str, Any], None]], None]) –
type and default value to convert the loaded columns, if None, use original data. it should be a dictionary, every item maps to an expected column, the key is the column name and the value is None or a dictionary to define the default value and data type. the supported keys in dictionary are: [“type”, “default”]. for example:
```
col_types = {
    "subject_id": {"type": str},
    "label": {"type": int, "default": 0},
    "ehr_0": {"type": float, "default": 0.0},
    "ehr_1": {"type": float, "default": 0.0},
    "image": {"type": str, "default": None},
}
```
col_groups (UnionType[dict[str, Sequence[str]], None]) – args to group the loaded columns to generate a new column, it should be a dictionary, every item maps to a group, the key will be the new column name, the value is the names of columns to combine. for example: col_groups={“ehr”: [f”ehr_{i}” for i in range(10)], “meta”: [“meta_1”, “meta_2”]}
transform (UnionType[Callable, None]) – transform to apply on the loaded items of a dictionary data.
shuffle (bool) – whether to shuffle all the data in the buffer every time a new chunk loaded.
seed (int) – random seed to initialize the random state for all the workers if shuffle is True, set seed += 1 in every iter() call, refer to the PyTorch idea: pytorch/pytorch.
kwargs_read_csv (UnionType[dict, None]) – dictionary args to pass to pandas read_csv function. Default to {"chunksize": chunksize}.
kwargs – additional arguments for pandas.merge() API to join tables.

close()[source]#: Close the pandas TextFileReader iterable objects. If the input src is file path, TextFileReader was created internally, need to close it. If the input src is iterable object, depends on users requirements whether to close it in this function. For more details, please check: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html?#iteration.

reset(src=None)[source]#

Reset the pandas TextFileReader iterable object to read data. For more details, please check: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html?#iteration.

Parameters:: src (UnionType[str, Sequence[str], Iterable, Sequence[Iterable], None]) – if not None and provided the filename of CSV file, it can be a str, URL, path object or file-like object to load. also support to provide iter for stream input directly, will skip loading from filename. if provided a list of filenames or iters, it will join the tables. default to self.src.

PersistentDataset#

class monai.data.PersistentDataset(data, transform, cache_dir, hash_func=<function pickle_hashing>, pickle_module='pickle', pickle_protocol=2, hash_transform=None, reset_ops_id=True)[source]#

Persistent storage of pre-computed values to efficiently manage larger than memory dictionary format data, it can operate transforms for specific fields. Results from the non-random transform components are computed when first used, and stored in the cache_dir for rapid retrieval on subsequent uses. If passing slicing indices, will return a PyTorch Subset, for example: data: Subset = dataset[1:4], for more details, please check: https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset

For example, typical input data can be a list of dictionaries:

[{                            {                            {
    'image': 'image1.nii.gz',    'image': 'image2.nii.gz',    'image': 'image3.nii.gz',
    'label': 'label1.nii.gz',    'label': 'label2.nii.gz',    'label': 'label3.nii.gz',
    'extra': 123                 'extra': 456                 'extra': 789
},                           },                           }]

For a composite transform like

[ LoadImaged(keys=['image', 'label']),
Orientationd(keys=['image', 'label'], axcodes='RAS'),
ScaleIntensityRanged(keys=['image'], a_min=-57, a_max=164, b_min=0.0, b_max=1.0, clip=True),
RandCropByPosNegLabeld(keys=['image', 'label'], label_key='label', spatial_size=(96, 96, 96),
                        pos=1, neg=1, num_samples=4, image_key='image', image_threshold=0),
ToTensord(keys=['image', 'label'])]

Upon first use a filename based dataset will be processed by the transform for the [LoadImaged, Orientationd, ScaleIntensityRanged] and the resulting tensor written to the cache_dir before applying the remaining random dependant transforms [RandCropByPosNegLabeld, ToTensord] elements for use in the analysis.

Subsequent uses of a dataset directly read pre-processed results from cache_dir followed by applying the random dependant parts of transform processing.

During training call set_data() to update input data and recompute cache content.

Note

The input data must be a list of file paths and will hash them as cache keys.

The filenames of the cached files also try to contain the hash of the transforms. In this fashion, PersistentDataset should be robust to changes in transforms. This, however, is not guaranteed, so caution should be used when modifying transforms to avoid unexpected errors. If in doubt, it is advisable to clear the cache directory.

Cached data is expected to be tensors, primitives, or dictionaries keying to these values. Numpy arrays will be converted to tensors, however any other object type returned by transforms will not be loadable since torch.load will be used with weights_only=True to prevent loading of potentially malicious objects. Legacy cache files may not be loadable and may need to be recomputed.

Lazy Resampling:: If you make use of the lazy resampling feature of monai.transforms.Compose, please refer to its documentation to familiarize yourself with the interaction between PersistentDataset and lazy resampling.

__init__(data, transform, cache_dir, hash_func=<function pickle_hashing>, pickle_module='pickle', pickle_protocol=2, hash_transform=None, reset_ops_id=True)[source]#

Parameters:

data (Sequence) – input data file paths to load and transform to generate dataset for model. PersistentDataset expects input data to be a list of serializable and hashes them as cache keys using hash_func.
transform (UnionType[Sequence[Callable], Callable]) – transforms to execute operations on input data.
cache_dir (UnionType[Path, str, None]) – If specified, this is the location for persistent storage of pre-computed transformed data tensors. The cache_dir is computed once, and persists on disk until explicitly removed. Different runs, programs, experiments may share a common cache dir provided that the transforms pre-processing is consistent. If cache_dir doesn’t exist, will automatically create it. If cache_dir is None, there is effectively no caching.
hash_func (Callable[…, bytes]) – a callable to compute hash from data items to be cached. defaults to monai.data.utils.pickle_hashing.
pickle_module (str) – string representing the module used for pickling metadata and objects, default to “pickle”. due to the pickle limitation in multi-processing of Dataloader, we can’t use pickle as arg directly, so here we use a string name instead. if want to use other pickle module at runtime, just register like: >>> from monai.data import utils >>> utils.SUPPORTED_PICKLE_MOD[“test”] = other_pickle this arg is used by torch.save, for more details, please check: https://pytorch.org/docs/stable/generated/torch.save.html#torch.save, and monai.data.utils.SUPPORTED_PICKLE_MOD.
pickle_protocol (int) – specifies pickle protocol when saving, with torch.save. Defaults to torch.serialization.DEFAULT_PROTOCOL. For more details, please check: https://pytorch.org/docs/stable/generated/torch.save.html#torch.save.
hash_transform (UnionType[Callable[…, bytes], None]) – a callable to compute hash from the transform information when caching. This may reduce errors due to transforms changing during experiments. Default to None (no hash). Other options are pickle_hashing and json_hashing functions from monai.data.utils.
reset_ops_id (bool) – whether to set TraceKeys.ID to Tracekys.NONE, defaults to True. When this is enabled, the traced transform instance IDs will be removed from the cached MetaTensors. This is useful for skipping the transform instance checks when inverting applied operations using the cached content and with re-created transform instances.

set_data(data)[source]#: Set the input data and delete all the out-dated cache content.

set_transform_hash(hash_xform_func)[source]#: Get hashable transforms, and then hash them. Hashable transforms are deterministic transforms that inherit from Transform. We stop at the first non-deterministic transform, or first that does not inherit from MONAI’s Transform class.

GDSDataset#

class monai.data.GDSDataset(data, transform, cache_dir, device, hash_func=<function pickle_hashing>, hash_transform=None, reset_ops_id=True, **kwargs)[source]#

An extension of the PersistentDataset using direct memory access(DMA) data path between GPU memory and storage, thus avoiding a bounce buffer through the CPU. This direct path can increase system bandwidth while decreasing latency and utilization load on the CPU and GPU.

A tutorial is available: Project-MONAI/tutorials.

CacheNTransDataset#

class monai.data.CacheNTransDataset(data, transform, cache_n_trans, cache_dir, hash_func=<function pickle_hashing>, pickle_module='pickle', pickle_protocol=2, hash_transform=None, reset_ops_id=True)[source]#

Extension of PersistentDataset, it can also cache the result of first N transforms, no matter it’s random or not.

__init__(data, transform, cache_n_trans, cache_dir, hash_func=<function pickle_hashing>, pickle_module='pickle', pickle_protocol=2, hash_transform=None, reset_ops_id=True)[source]#

Parameters:

data (Sequence) – input data file paths to load and transform to generate dataset for model. PersistentDataset expects input data to be a list of serializable and hashes them as cache keys using hash_func.
transform (UnionType[Sequence[Callable], Callable]) – transforms to execute operations on input data.
cache_n_trans (int) – cache the result of first N transforms.
cache_dir (UnionType[Path, str, None]) – If specified, this is the location for persistent storage of pre-computed transformed data tensors. The cache_dir is computed once, and persists on disk until explicitly removed. Different runs, programs, experiments may share a common cache dir provided that the transforms pre-processing is consistent. If cache_dir doesn’t exist, will automatically create it. If cache_dir is None, there is effectively no caching.
hash_func (Callable[…, bytes]) – a callable to compute hash from data items to be cached. defaults to monai.data.utils.pickle_hashing.
pickle_module (str) – string representing the module used for pickling metadata and objects, default to “pickle”. due to the pickle limitation in multi-processing of Dataloader, we can’t use pickle as arg directly, so here we use a string name instead. if want to use other pickle module at runtime, just register like: >>> from monai.data import utils >>> utils.SUPPORTED_PICKLE_MOD[“test”] = other_pickle this arg is used by torch.save, for more details, please check: https://pytorch.org/docs/stable/generated/torch.save.html#torch.save, and monai.data.utils.SUPPORTED_PICKLE_MOD.
pickle_protocol (int) – specifies pickle protocol when saving, with torch.save. Defaults to torch.serialization.DEFAULT_PROTOCOL. For more details, please check: https://pytorch.org/docs/stable/generated/torch.save.html#torch.save.
hash_transform (UnionType[Callable[…, bytes], None]) – a callable to compute hash from the transform information when caching. This may reduce errors due to transforms changing during experiments. Default to None (no hash). Other options are pickle_hashing and json_hashing functions from monai.data.utils.
reset_ops_id (bool) – whether to set TraceKeys.ID to Tracekys.NONE, defaults to True. When this is enabled, the traced transform instance IDs will be removed from the cached MetaTensors. This is useful for skipping the transform instance checks when inverting applied operations using the cached content and with re-created transform instances.

LMDBDataset#

class monai.data.LMDBDataset(data, transform, cache_dir='cache', hash_func=<function pickle_hashing>, db_name='monai_cache', progress=True, pickle_protocol=2, hash_transform=None, reset_ops_id=True, lmdb_kwargs=None)[source]#

Extension of PersistentDataset using LMDB as the backend.

See also

monai.data.PersistentDataset

Examples

>>> items = [{"data": i} for i in range(5)]
# [{'data': 0}, {'data': 1}, {'data': 2}, {'data': 3}, {'data': 4}]
>>> lmdb_ds = monai.data.LMDBDataset(items, transform=monai.transforms.SimulateDelayd("data", delay_time=1))
>>> print(list(lmdb_ds))  # using the cached results

__init__(data, transform, cache_dir='cache', hash_func=<function pickle_hashing>, db_name='monai_cache', progress=True, pickle_protocol=2, hash_transform=None, reset_ops_id=True, lmdb_kwargs=None)[source]#

Parameters:

data (Sequence) – input data file paths to load and transform to generate dataset for model. LMDBDataset expects input data to be a list of serializable and hashes them as cache keys using hash_func.
transform (UnionType[Sequence[Callable], Callable]) – transforms to execute operations on input data.
cache_dir (UnionType[Path, str]) – if specified, this is the location for persistent storage of pre-computed transformed data tensors. The cache_dir is computed once, and persists on disk until explicitly removed. Different runs, programs, experiments may share a common cache dir provided that the transforms pre-processing is consistent. If the cache_dir doesn’t exist, will automatically create it. Defaults to “./cache”.
hash_func (Callable[…, bytes]) – a callable to compute hash from data items to be cached. defaults to monai.data.utils.pickle_hashing.
db_name (str) – lmdb database file name. Defaults to “monai_cache”.
progress (bool) – whether to display a progress bar.
pickle_protocol – specifies pickle protocol when saving, with torch.save. Defaults to torch.serialization.DEFAULT_PROTOCOL. For more details, please check: https://pytorch.org/docs/stable/generated/torch.save.html#torch.save.
hash_transform (UnionType[Callable[…, bytes], None]) – a callable to compute hash from the transform information when caching. This may reduce errors due to transforms changing during experiments. Default to None (no hash). Other options are pickle_hashing and json_hashing functions from monai.data.utils.
reset_ops_id (bool) – whether to set TraceKeys.ID to Tracekeys.NONE, defaults to True. When this is enabled, the traced transform instance IDs will be removed from the cached MetaTensors. This is useful for skipping the transform instance checks when inverting applied operations using the cached content and with re-created transform instances.
lmdb_kwargs (UnionType[dict, None]) – additional keyword arguments to the lmdb environment. for more details please visit: https://lmdb.readthedocs.io/en/release/#environment-class

info()[source]#: Returns: dataset info dictionary.

set_data(data)[source]#: Set the input data and delete all the out-dated cache content.

CacheDataset#

class monai.data.CacheDataset(data, transform=None, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, hash_as_key=False, hash_func=<function pickle_hashing>, runtime_cache=False)[source]#

Dataset with cache mechanism that can load data and cache deterministic transforms’ result during training.

By caching the results of non-random preprocessing transforms, it accelerates the training data pipeline. If the requested data is not in the cache, all transforms will run normally (see also monai.data.dataset.Dataset).

Users can set the cache rate or number of items to cache. It is recommended to experiment with different cache_num or cache_rate to identify the best training speed.

The transforms which are supposed to be cached must implement the monai.transforms.Transform interface and should not be Randomizable. This dataset will cache the outcomes before the first Randomizable Transform within a Compose instance. So to improve the caching efficiency, please always put as many as possible non-random transforms before the randomized ones when composing the chain of transforms. If passing slicing indices, will return a PyTorch Subset, for example: data: Subset = dataset[1:4], for more details, please check: https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset

For example, if the transform is a Compose of:

transforms = Compose([
    LoadImaged(),
    EnsureChannelFirstd(),
    Spacingd(),
    Orientationd(),
    ScaleIntensityRanged(),
    RandCropByPosNegLabeld(),
    ToTensord()
])

when transforms is used in a multi-epoch training pipeline, before the first training epoch, this dataset will cache the results up to ScaleIntensityRanged, as all non-random transforms LoadImaged, EnsureChannelFirstd, Spacingd, Orientationd, ScaleIntensityRanged can be cached. During training, the dataset will load the cached results and run RandCropByPosNegLabeld and ToTensord, as RandCropByPosNegLabeld is a randomized transform and the outcome not cached.

During training call set_data() to update input data and recompute cache content, note that it requires persistent_workers=False in the PyTorch DataLoader.

Note

CacheDataset executes non-random transforms and prepares cache content in the main process before the first epoch, then all the subprocesses of DataLoader will read the same cache content in the main process during training. it may take a long time to prepare cache content according to the size of expected cache data. So to debug or verify the program before real training, users can set cache_rate=0.0 or cache_num=0 to temporarily skip caching.

Lazy Resampling:: If you make use of the lazy resampling feature of monai.transforms.Compose, please refer to its documentation to familiarize yourself with the interaction between CacheDataset and lazy resampling.

__init__(data, transform=None, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, hash_as_key=False, hash_func=<function pickle_hashing>, runtime_cache=False)[source]#

Parameters:

data (Sequence) – input data to load and transform to generate dataset for model.
transform (UnionType[Sequence[Callable], Callable, None]) – transforms to execute operations on input data.
cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers (UnionType[int, None]) – the number of worker threads if computing cache in the initialization. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress (bool) – whether to display a progress bar.
copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
hash_as_key (bool) – whether to compute hash value of input data as the key to save cache, if key exists, avoid saving duplicated content. it can help save memory when the dataset has duplicated items or augmented dataset.
hash_func (Callable[…, bytes]) – if hash_as_key, a callable to compute hash from data items to be cached. defaults to monai.data.utils.pickle_hashing.
runtime_cache (UnionType[bool, str, list, ListProxy]) –
mode of cache at the runtime. Default to False to prepare the cache content for the entire data during initialization, this potentially largely increase the time required between the constructor called and first mini-batch generated. Three options are provided to compute the cache on the fly after the dataset initialization:
1. "threads" or True: use a regular list to store the cache items.
2. "processes": use a ListProxy to store the cache items, it can be shared among processes.
3. A list-like object: a users-provided container to be used to store the cache items.
For thread-based caching (typically for caching cuda tensors), option 1 is recommended. For single process workflows with multiprocessing data loading, option 2 is recommended. For multiprocessing workflows (typically for distributed training), where this class is initialized in subprocesses, option 3 is recommended, and the list-like object should be prepared in the main process and passed to all subprocesses. Not following these recommendations may lead to runtime errors or duplicated cache across processes.

set_data(data)[source]#

Set the input data and run deterministic transforms to generate cache content.

Note: should call this func after an entire epoch and must set persistent_workers=False in PyTorch DataLoader, because it needs to create new worker processes based on new generated cache content.

Return type:: None

SmartCacheDataset#

class monai.data.SmartCacheDataset(data, transform=None, replace_rate=0.1, cache_num=9223372036854775807, cache_rate=1.0, num_init_workers=1, num_replace_workers=1, progress=True, shuffle=True, seed=0, copy_cache=True, as_contiguous=True, runtime_cache=False)[source]#

Re-implementation of the SmartCache mechanism in NVIDIA Clara-train SDK. At any time, the cache pool only keeps a subset of the whole dataset. In each epoch, only the items in the cache are used for training. This ensures that data needed for training is readily available, keeping GPU resources busy. Note that cached items may still have to go through a non-deterministic transform sequence before being fed to GPU. At the same time, another thread is preparing replacement items by applying the transform sequence to items not in cache. Once one epoch is completed, Smart Cache replaces the same number of items with replacement items. Smart Cache uses a simple running window algorithm to determine the cache content and replacement items. Let N be the configured number of objects in cache; and R be the number of replacement objects (R = ceil(N * r), where r is the configured replace rate). For more details, please refer to: https://docs.nvidia.com/clara/clara-train-archive/3.1/nvmidl/additional_features/smart_cache.html If passing slicing indices, will return a PyTorch Subset, for example: data: Subset = dataset[1:4], for more details, please check: https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset

For example, if we have 5 images: [image1, image2, image3, image4, image5], and cache_num=4, replace_rate=0.25. so the actual training images cached and replaced for every epoch are as below:

epoch 1: [image1, image2, image3, image4]
epoch 2: [image2, image3, image4, image5]
epoch 3: [image3, image4, image5, image1]
epoch 3: [image4, image5, image1, image2]
epoch N: [image[N % 5] ...]

The usage of SmartCacheDataset contains 4 steps:

Initialize SmartCacheDataset object and cache for the first epoch.

Call start() to run replacement thread in background.

Call update_cache() before every epoch to replace training items.

Call shutdown() when training ends.

During training call set_data() to update input data and recompute cache content, note to call shutdown() to stop first, then update data and call start() to restart.

Note

This replacement will not work for below cases: 1. Set the multiprocessing_context of DataLoader to spawn. 2. Launch distributed data parallel with torch.multiprocessing.spawn. 3. Run on windows(the default multiprocessing method is spawn) with num_workers greater than 0. 4. Set the persistent_workers of DataLoader to True with num_workers greater than 0.

If using MONAI workflows, please add SmartCacheHandler to the handler list of trainer, otherwise, please make sure to call start(), update_cache(), shutdown() during training.

Parameters:

data (Sequence) – input data to load and transform to generate dataset for model.
transform (UnionType[Sequence[Callable], Callable, None]) – transforms to execute operations on input data.
replace_rate (float) – percentage of the cached items to be replaced in every epoch (default to 0.1).
cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_init_workers (UnionType[int, None]) – the number of worker threads to initialize the cache for first epoch. If num_init_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
num_replace_workers (UnionType[int, None]) – the number of worker threads to prepare the replacement cache for every epoch. If num_replace_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress (bool) – whether to display a progress bar when caching for the first epoch.
shuffle (bool) – whether to shuffle the whole data list before preparing the cache content for first epoch. it will not modify the original input data sequence in-place.
seed (int) – random seed if shuffle is True, default to 0.
copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cache content or every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
runtime_cache – Default to False, other options are not implemented yet.

is_started()[source]#: Check whether the replacement thread is already started.

manage_replacement()[source]#

Background thread for replacement.

Return type:: None

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: None

set_data(data)[source]#

Set the input data and run deterministic transforms to generate cache content.

Note: should call shutdown() before calling this func.

shutdown()[source]#: Shut down the background thread for replacement.

start()[source]#: Start the background thread to replace training items for every epoch.

update_cache()[source]#: Update cache items for current epoch, need to call this function before every epoch. If the cache has been shutdown before, need to restart the _replace_mgr thread.

ZipDataset#

class monai.data.ZipDataset(datasets, transform=None)[source]#

Zip several PyTorch datasets and output data(with the same index) together in a tuple. If the output of single dataset is already a tuple, flatten it and extend to the result. For example: if datasetA returns (img, imgmeta), datasetB returns (seg, segmeta), finally return (img, imgmeta, seg, segmeta). And if the datasets don’t have same length, use the minimum length of them as the length of ZipDataset. If passing slicing indices, will return a PyTorch Subset, for example: data: Subset = dataset[1:4], for more details, please check: https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset

Examples:

>>> zip_data = ZipDataset([[1, 2, 3], [4, 5]])
>>> print(len(zip_data))
2
>>> for item in zip_data:
>>>    print(item)
[1, 4]
[2, 5]

__init__(datasets, transform=None)[source]#

Parameters:

datasets (Sequence) – list of datasets to zip together.
transform (UnionType[Callable, None]) – a callable data transform operates on the zipped item from datasets.

ArrayDataset#

class monai.data.ArrayDataset(img, img_transform=None, seg=None, seg_transform=None, labels=None, label_transform=None)[source]#

Dataset for segmentation and classification tasks based on array format input data and transforms. It ensures the same random seeds in the randomized transforms defined for image, segmentation and label. The transform can be monai.transforms.Compose or any other callable object. For example: If train based on Nifti format images without metadata, all transforms can be composed:

img_transform = Compose(
    [
        LoadImage(image_only=True),
        EnsureChannelFirst(),
        RandAdjustContrast()
    ]
)
ArrayDataset(img_file_list, img_transform=img_transform)

If training based on images and the metadata, the array transforms can not be composed because several transforms receives multiple parameters or return multiple values. Then Users need to define their own callable method to parse metadata from LoadImage or set affine matrix to Spacing transform:

class TestCompose(Compose):
    def __call__(self, input_):
        img, metadata = self.transforms[0](input_)
        img = self.transforms[1](img)
        img, _, _ = self.transforms[2](img, metadata["affine"])
        return self.transforms[3](img), metadata
img_transform = TestCompose(
    [
        LoadImage(image_only=False),
        EnsureChannelFirst(),
        Spacing(pixdim=(1.5, 1.5, 3.0)),
        RandAdjustContrast()
    ]
)
ArrayDataset(img_file_list, img_transform=img_transform)

Examples:

>>> ds = ArrayDataset([1, 2, 3, 4], lambda x: x + 0.1)
>>> print(ds[0])
1.1

>>> ds = ArrayDataset(img=[1, 2, 3, 4], seg=[5, 6, 7, 8])
>>> print(ds[0])
[1, 5]

__init__(img, img_transform=None, seg=None, seg_transform=None, labels=None, label_transform=None)[source]#

Initializes the dataset with the filename lists. The transform img_transform is applied to the images and seg_transform to the segmentations.

Parameters:

img (Sequence) – sequence of images.
img_transform (UnionType[Callable, None]) – transform to apply to each element in img.
seg (UnionType[Sequence, None]) – sequence of segmentations.
seg_transform (UnionType[Callable, None]) – transform to apply to each element in seg.
labels (UnionType[Sequence, None]) – sequence of labels.
label_transform (UnionType[Callable, None]) – transform to apply to each element in labels.

randomize(data=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: None

ImageDataset#

class monai.data.ImageDataset(image_files, seg_files=None, labels=None, transform=None, seg_transform=None, label_transform=None, image_only=True, transform_with_metadata=False, dtype=<class 'numpy.float32'>, reader=None, *args, **kwargs)[source]#

Loads image/segmentation pairs of files from the given filename lists. Transformations can be specified for the image and segmentation arrays separately. The difference between this dataset and ArrayDataset is that this dataset can apply transform chain to images and segs and return both the images and metadata, and no need to specify transform to load images from files. For more information, please see the image_dataset demo in the MONAI tutorial repo, Project-MONAI/tutorials

__init__(image_files, seg_files=None, labels=None, transform=None, seg_transform=None, label_transform=None, image_only=True, transform_with_metadata=False, dtype=<class 'numpy.float32'>, reader=None, *args, **kwargs)[source]#

Initializes the dataset with the image and segmentation filename lists. The transform transform is applied to the images and seg_transform to the segmentations.

Parameters:

image_files (Sequence[str]) – list of image filenames.
seg_files (UnionType[Sequence[str], None]) – if in segmentation task, list of segmentation filenames.
labels (UnionType[Sequence[float], None]) – if in classification task, list of classification labels.
transform (UnionType[Callable, None]) – transform to apply to image arrays.
seg_transform (UnionType[Callable, None]) – transform to apply to segmentation arrays.
label_transform (UnionType[Callable, None]) – transform to apply to the label data.
image_only (bool) – if True return only the image volume, otherwise, return image volume and the metadata.
transform_with_metadata (bool) – if True, the metadata will be passed to the transforms whenever possible.
dtype (Union[dtype, type, str, None]) – if not None convert the loaded image to this data type.
reader (UnionType[ImageReader, str, None]) – register reader to load image file and metadata, if None, will use the default readers. If a string of reader name provided, will construct a reader object with the *args and **kwargs parameters, supported reader name: “NibabelReader”, “PILReader”, “ITKReader”, “NumpyReader”
args – additional parameters for reader if providing a reader name.
kwargs – additional parameters for reader if providing a reader name.

Raises:

ValueError – When seg_files length differs from image_files

randomize(data=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: None

NPZDictItemDataset#

class monai.data.NPZDictItemDataset(npzfile, keys, transform=None, other_keys=())[source]#

Represents a dataset from a loaded NPZ file. The members of the file to load are named in the keys of keys and stored under the keyed name. All loaded arrays must have the same 0-dimension (batch) size. Items are always dicts mapping names to an item extracted from the loaded arrays. If passing slicing indices, will return a PyTorch Subset, for example: data: Subset = dataset[1:4], for more details, please check: https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset

Parameters:

npzfile (UnionType[str, IO]) – Path to .npz file or stream containing .npz file data
keys (dict[str, str]) – Maps keys to load from file to name to store in dataset
transform (UnionType[Callable[…, dict[str, Any]], None]) – Transform to apply to batch dict
other_keys (UnionType[Sequence[str], None]) – secondary data to load from file and store in dict other_keys, not returned by __getitem__

CSVDataset#

class monai.data.CSVDataset(src=None, row_indices=None, col_names=None, col_types=None, col_groups=None, transform=None, kwargs_read_csv=None, **kwargs)[source]#

Dataset to load data from CSV files and generate a list of dictionaries, every dictionary maps to a row of the CSV file, and the keys of dictionary map to the column names of the CSV file.

It can load multiple CSV files and join the tables with additional kwargs arg. Support to only load specific rows and columns. And it can also group several loaded columns to generate a new column, for example, set col_groups={“meta”: [“meta_0”, “meta_1”, “meta_2”]}, output can be:

[
    {"image": "./image0.nii", "meta_0": 11, "meta_1": 12, "meta_2": 13, "meta": [11, 12, 13]},
    {"image": "./image1.nii", "meta_0": 21, "meta_1": 22, "meta_2": 23, "meta": [21, 22, 23]},
]

Parameters:

src (UnionType[str, Sequence[str], None]) – if provided the filename of CSV file, it can be a str, URL, path object or file-like object to load. also support to provide pandas DataFrame directly, will skip loading from filename. if provided a list of filenames or pandas DataFrame, it will join the tables.
row_indices (UnionType[Sequence[UnionType[int, str]], None]) – indices of the expected rows to load. it should be a list, every item can be a int number or a range [start, end) for the indices. for example: row_indices=[[0, 100], 200, 201, 202, 300]. if None, load all the rows in the file.
col_names (UnionType[Sequence[str], None]) – names of the expected columns to load. if None, load all the columns.
col_types (UnionType[dict[str, UnionType[dict[str, Any], None]], None]) –
type and default value to convert the loaded columns, if None, use original data. it should be a dictionary, every item maps to an expected column, the key is the column name and the value is None or a dictionary to define the default value and data type. the supported keys in dictionary are: [“type”, “default”]. for example:
```
col_types = {
    "subject_id": {"type": str},
    "label": {"type": int, "default": 0},
    "ehr_0": {"type": float, "default": 0.0},
    "ehr_1": {"type": float, "default": 0.0},
    "image": {"type": str, "default": None},
}
```
col_groups (UnionType[dict[str, Sequence[str]], None]) – args to group the loaded columns to generate a new column, it should be a dictionary, every item maps to a group, the key will be the new column name, the value is the names of columns to combine. for example: col_groups={“ehr”: [f”ehr_{i}” for i in range(10)], “meta”: [“meta_1”, “meta_2”]}
transform (UnionType[Callable, None]) – transform to apply on the loaded items of a dictionary data.
kwargs_read_csv (UnionType[dict, None]) – dictionary args to pass to pandas read_csv function.
kwargs – additional arguments for pandas.merge() API to join tables.

Patch-based dataset#

GridPatchDataset#

class monai.data.GridPatchDataset(data, patch_iter, transform=None, with_coordinates=True, cache=False, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, hash_func=<function pickle_hashing>)[source]#

Yields patches from data read from an image dataset. Typically used with PatchIter or PatchIterd so that the patches are chosen in a contiguous grid sampling scheme.

import numpy as np

from monai.data import GridPatchDataset, DataLoader, PatchIter, RandShiftIntensity

# image-level dataset
images = [np.arange(16, dtype=float).reshape(1, 4, 4),
          np.arange(16, dtype=float).reshape(1, 4, 4)]
# image-level patch generator, "grid sampling"
patch_iter = PatchIter(patch_size=(2, 2), start_pos=(0, 0))
# patch-level intensity shifts
patch_intensity = RandShiftIntensity(offsets=1.0, prob=1.0)

# construct the dataset
ds = GridPatchDataset(data=images,
                      patch_iter=patch_iter,
                      transform=patch_intensity)
# use the grid patch dataset
for item in DataLoader(ds, batch_size=2, num_workers=2):
    print("patch size:", item[0].shape)
    print("coordinates:", item[1])

# >>> patch size: torch.Size([2, 1, 2, 2])
#     coordinates: tensor([[[0, 1], [0, 2], [0, 2]],
#                          [[0, 1], [2, 4], [0, 2]]])

Parameters:

data (UnionType[Iterable, Sequence]) – the data source to read image data from.
patch_iter (Callable) – converts an input image (item from dataset) into a iterable of image patches. patch_iter(dataset[idx]) must yield a tuple: (patches, coordinates). see also: monai.data.PatchIter or monai.data.PatchIterd.
transform (UnionType[Callable, None]) – a callable data transform operates on the patches.
with_coordinates (bool) – whether to yield the coordinates of each patch, default to True.
cache (bool) – whether to use cache mache mechanism, default to False. see also: monai.data.CacheDataset.
cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers (UnionType[int, None]) – the number of worker threads if computing cache in the initialization. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress (bool) – whether to display a progress bar.
copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
hash_func (Callable[…, bytes]) – a callable to compute hash from data items to be cached. defaults to monai.data.utils.pickle_hashing.

set_data(data)[source]#

Set the input data and run deterministic transforms to generate cache content.

Note: should call this func after an entire epoch and must set persistent_workers=False in PyTorch DataLoader, because it needs to create new worker processes based on new generated cache content.

Return type:: None

PatchDataset#

class monai.data.PatchDataset(data, patch_func, samples_per_image=1, transform=None)[source]#

Yields patches from data read from an image dataset. The patches are generated by a user-specified callable patch_func, and are optionally post-processed by transform. For example, to generate random patch samples from an image dataset:

import numpy as np

from monai.data import PatchDataset, DataLoader
from monai.transforms import RandSpatialCropSamples, RandShiftIntensity

# image dataset
images = [np.arange(16, dtype=float).reshape(1, 4, 4),
          np.arange(16, dtype=float).reshape(1, 4, 4)]
# image patch sampler
n_samples = 5
sampler = RandSpatialCropSamples(roi_size=(3, 3), num_samples=n_samples,
                                 random_center=True, random_size=False)
# patch-level intensity shifts
patch_intensity = RandShiftIntensity(offsets=1.0, prob=1.0)
# construct the patch dataset
ds = PatchDataset(dataset=images,
                  patch_func=sampler,
                  samples_per_image=n_samples,
                  transform=patch_intensity)

# use the patch dataset, length: len(images) x samplers_per_image
print(len(ds))

>>> 10

for item in DataLoader(ds, batch_size=2, shuffle=True, num_workers=2):
    print(item.shape)

>>> torch.Size([2, 1, 3, 3])

__init__(data, patch_func, samples_per_image=1, transform=None)[source]#

Parameters:

data (Sequence) – an image dataset to extract patches from.
patch_func (Callable) – converts an input image (item from dataset) into a sequence of image patches. patch_func(dataset[idx]) must return a sequence of patches (length samples_per_image).
samples_per_image (int) – patch_func should return a sequence of samples_per_image elements.
transform (UnionType[Callable, None]) – transform applied to each patch.

PatchIter#

class monai.data.PatchIter(patch_size, start_pos=(), mode=wrap, **pad_opts)[source]#

Return a patch generator with predefined properties such as patch_size. Typically used with monai.data.GridPatchDataset.

__call__(array)[source]#

Parameters:: array (~NdarrayTensor) – the image to generate patches from.
Return type:: Generator[tuple[~NdarrayTensor, ndarray], None, None]

__init__(patch_size, start_pos=(), mode=wrap, **pad_opts)[source]#

Parameters:

patch_size (Sequence[int]) – size of patches to generate slices for, 0/None selects whole dimension
start_pos (Sequence[int]) – starting position in the array, default is 0 for each dimension
mode (UnionType[str, None]) – available modes: (Numpy) {"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} (PyTorch) {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. If None, no wrapping is performed. Defaults to "wrap". See also: https://numpy.org/doc/stable/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html requires pytorch >= 1.10 for best compatibility.
pad_opts (dict) – other arguments for the np.pad function. note that np.pad treats channel dimension as the first dimension.

Note

The patch_size is the size of the patch to sample from the input arrays. It is assumed the arrays first dimension is the channel dimension which will be yielded in its entirety so this should not be specified in patch_size. For example, for an input 3D array with 1 channel of size (1, 20, 20, 20) a regular grid sampling of eight patches (1, 10, 10, 10) would be specified by a patch_size of (10, 10, 10).

PatchIterd#

class monai.data.PatchIterd(keys, patch_size, start_pos=(), mode=wrap, **pad_opts)[source]#

Dictionary-based wrapper of monai.data.PatchIter. Return a patch generator for dictionary data and the coordinate, Typically used with monai.data.GridPatchDataset. Suppose all the expected fields specified by keys have same shape.

Parameters:

keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to iterate patches.
patch_size (Sequence[int]) – size of patches to generate slices for, 0/None selects whole dimension
start_pos (Sequence[int]) – starting position in the array, default is 0 for each dimension
mode (UnionType[str, None]) – available modes: (Numpy) {"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} (PyTorch) {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. If None, no wrapping is performed. Defaults to "wrap". See also: https://numpy.org/doc/stable/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html requires pytorch >= 1.10 for best compatibility.
pad_opts – other arguments for the np.pad function. note that np.pad treats channel dimension as the first dimension.

__call__(data)[source]#

Call self as a function.

Return type:: Generator[tuple[Mapping[Hashable, ~NdarrayTensor], ndarray], None, None]

Image reader#

ImageReader#

class monai.data.ImageReader[source]#

An abstract class defines APIs to load image files.

Typical usage of an implementation of this class is:

image_reader = MyImageReader()
img_obj = image_reader.read(path_to_image)
img_data, meta_data = image_reader.get_data(img_obj)

The read call converts image filenames into image objects,
The get_data call fetches the image data, as well as metadata.
A reader should implement verify_suffix with the logic of checking the input filename by the filename extensions.

abstractmethod get_data(img)[source]#

Extract data array and metadata from loaded image and return them. This function must return two objects, the first is a numpy array of image data, the second is a dictionary of metadata.

Parameters:: img – an image object loaded from an image file or a list of image objects.
Return type:: tuple[ndarray, dict]

abstractmethod read(data, **kwargs)[source]#

Read image data from specified file or files. Note that it returns a data object or a sequence of data objects.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read.
kwargs – additional args for actual read API of 3rd party libs.

Return type:

UnionType[Sequence[Any], Any]

abstractmethod verify_suffix(filename)[source]#

Verify whether the specified filename is supported by the current reader. This method should return True if the reader is able to read the format suggested by the filename.

Parameters:: filename (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read. if a list of files, verify all the suffixes.
Return type:: bool

ITKReader#

class monai.data.ITKReader(channel_dim=None, series_name='', reverse_indexing=False, series_meta=False, affine_lps_to_ras=True, **kwargs)[source]#

Load medical images based on ITK library. All the supported image formats can be found at: InsightSoftwareConsortium/ITK The loaded data array will be in C order, for example, a 3D image NumPy array index order will be CDWH.

Parameters:

channel_dim (UnionType[str, int, None]) –
the channel dimension of the input image, default is None. This is used to set original_channel_dim in the metadata, EnsureChannelFirstD reads this field. If None, original_channel_dim will be either no_channel or -1.
- Nifti file is usually “channel last”, so there is no need to specify this argument.
- PNG file usually has GetNumberOfComponentsPerPixel()==3, so there is no need to specify this argument.
series_name (str) – the name of the DICOM series if there are multiple ones. used when loading DICOM series.
reverse_indexing (bool) – whether to use a reversed spatial indexing convention for the returned data array. If False, the spatial indexing convention is reversed to be compatible with ITK; otherwise, the spatial indexing follows the numpy convention. Default is False. This option does not affect the metadata.
series_meta (bool) – whether to load the metadata of the DICOM series (using the metadata from the first slice). This flag is checked only when loading DICOM series. Default is False.
affine_lps_to_ras (bool) – whether to convert the affine matrix from “LPS” to “RAS”. Defaults to True. Set to True to be consistent with NibabelReader, otherwise the affine matrix remains in the ITK convention.
kwargs – additional args for itk.imread API. more details about available args: InsightSoftwareConsortium/ITK

get_data(img)[source]#

Extract data array and metadata from loaded image and return them. This function returns two objects, first is numpy array of image data, second is dict of metadata. It constructs affine, original_affine, and spatial_shape and stores them in meta dict. When loading a list of files, they are stacked together at a new dimension as the first dimension, and the metadata of the first image is used to represent the output metadata.

Parameters:: img – an ITK image object loaded from an image file or a list of ITK image objects.
Return type:: tuple[ndarray, dict]

read(data, **kwargs)[source]#

Read image data from specified file or files, it can read a list of images and stack them together as multi-channel data in get_data(). If passing directory path instead of file path, will treat it as DICOM images series and read. Note that the returned object is ITK image object or list of ITK image objects.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read,
kwargs – additional args for itk.imread API, will override self.kwargs for existing keys. More details about available args: InsightSoftwareConsortium/ITK

verify_suffix(filename)[source]#

Verify whether the specified file or files format is supported by ITK reader.

Parameters:: filename (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read. if a list of files, verify all the suffixes.
Return type:: bool

NibabelReader#

class monai.data.NibabelReader(channel_dim=None, as_closest_canonical=False, squeeze_non_spatial_dims=False, to_gpu=False, **kwargs)[source]#

Load NIfTI format images based on Nibabel library.

Parameters:

channel_dim (UnionType[str, int, None]) – the channel dimension of the input image, default is None. this is used to set original_channel_dim in the metadata, EnsureChannelFirstD reads this field. if None, original_channel_dim will be either no_channel or -1. most Nifti files are usually “channel last”, no need to specify this argument for them.
as_closest_canonical (bool) – if True, load the image as closest to canonical axis format.
squeeze_non_spatial_dims (bool) – if True, non-spatial singletons will be squeezed, e.g. (256,256,1,3) -> (256,256,3)
to_gpu (bool) – If True, load the image into GPU memory using CuPy and Kvikio. This can accelerate data loading. Default is False. CuPy and Kvikio are required for this option. Note: For compressed NIfTI files, some operations may still be performed on CPU memory, and the acceleration may not be significant. In some cases, it may be slower than loading on CPU.
kwargs – additional args for nibabel.load API. more details about available args: nipy/nibabel

get_data(img)[source]#

Parameters:: img – a Nibabel image object loaded from an image file or a list of Nibabel image objects.
Return type:: tuple[ndarray, dict]

read(data, **kwargs)[source]#

Read image data from specified file or files, it can read a list of images and stack them together as multi-channel data in get_data(). Note that the returned object is Nibabel image object or list of Nibabel image objects.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read.
kwargs – additional args for nibabel.load API, will override self.kwargs for existing keys. More details about available args: nipy/nibabel

verify_suffix(filename)[source]#

Verify whether the specified file or files format is supported by Nibabel reader.

Parameters:: filename (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read. if a list of files, verify all the suffixes.
Return type:: bool

warmup_kvikio()[source]#: Warm up the Kvikio library to initialize the internal buffers, cuFile, GDS, etc. This can accelerate the data loading process when to_gpu is set to True.

NumpyReader#

class monai.data.NumpyReader(npz_keys=None, channel_dim=None, **kwargs)[source]#

Load NPY or NPZ format data based on Numpy library, they can be arrays or pickled objects. A typical usage is to load the mask data for classification task. It can load part of the npz file with specified npz_keys.

Parameters:

npz_keys (Union[Collection[Hashable], Hashable, None]) – if loading npz file, only load the specified keys, if None, load all the items. stack the loaded items together to construct a new first dimension.
channel_dim (UnionType[str, int, None]) – if not None, explicitly specify the channel dim, otherwise, treat the array as no channel.
kwargs – additional args for numpy.load API except allow_pickle. more details about available args: https://numpy.org/doc/stable/reference/generated/numpy.load.html

get_data(img)[source]#

Parameters:: img – a Numpy array loaded from a file or a list of Numpy arrays.
Return type:: tuple[ndarray, dict]

read(data, **kwargs)[source]#

Read image data from specified file or files, it can read a list of data files and stack them together as multi-channel data in get_data(). Note that the returned object is Numpy array or list of Numpy arrays.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read.
kwargs – additional args for numpy.load API except allow_pickle, will override self.kwargs for existing keys. More details about available args: https://numpy.org/doc/stable/reference/generated/numpy.load.html

verify_suffix(filename)[source]#

Verify whether the specified file or files format is supported by Numpy reader.

Parameters:: filename (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read. if a list of files, verify all the suffixes.
Return type:: bool

PILReader#

class monai.data.PILReader(converter=None, reverse_indexing=True, **kwargs)[source]#

Load common 2D image format (supports PNG, JPG, BMP) file or files from provided path.

Parameters:

converter (UnionType[Callable, None]) – additional function to convert the image data after read(). for example, use converter=lambda image: image.convert(“LA”) to convert image format.
reverse_indexing (bool) – whether to swap axis 0 and 1 after loading the array, this is enabled by default, so that output of the reader is consistent with the other readers. Set this option to False to use the PIL backend’s original spatial axes convention.
kwargs – additional args for Image.open API in read(), mode details about available args: https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.open

get_data(img)[source]#

Extract data array and metadata from loaded image and return them. This function returns two objects, first is numpy array of image data, second is dict of metadata. It computes spatial_shape and stores it in meta dict. When loading a list of files, they are stacked together at a new dimension as the first dimension, and the metadata of the first image is used to represent the output metadata. Note that by default self.reverse_indexing is set to True, which swaps axis 0 and 1 after loading the array because the spatial axes definition in PIL is different from other common medical packages.

Parameters:: img – a PIL Image object loaded from a file or a list of PIL Image objects.
Return type:: tuple[ndarray, dict]

read(data, **kwargs)[source]#

Read image data from specified file or files, it can read a list of images and stack them together as multi-channel data in get_data(). Note that the returned object is PIL image or list of PIL image.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike, ndarray]) – file name or a list of file names to read.
kwargs – additional args for Image.open API in read(), will override self.kwargs for existing keys. Mode details about available args: https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.open

verify_suffix(filename)[source]#

Verify whether the specified file or files format is supported by PIL reader.

Parameters:: filename (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read. if a list of files, verify all the suffixes.
Return type:: bool

NrrdReader#

class monai.data.NrrdReader(channel_dim=None, dtype=<class 'numpy.float32'>, index_order='F', affine_lps_to_ras=True, **kwargs)[source]#

Load NRRD format images based on pynrrd library.

Parameters:

channel_dim (UnionType[str, int, None]) – the channel dimension of the input image, default is None. This is used to set original_channel_dim in the metadata, EnsureChannelFirstD reads this field. If None, original_channel_dim will be either no_channel or 0. NRRD files are usually “channel first”.
dtype (UnionType[dtype, type, str, None]) – dtype of the data array when loading image.
index_order (str) – Specify whether the returned data array should be in C-order (‘C’) or Fortran-order (‘F’). Numpy is usually in C-order, but default on the NRRD header is F
affine_lps_to_ras (bool) – whether to convert the affine matrix from “LPS” to “RAS”. Defaults to True. Set to True to be consistent with NibabelReader, otherwise the affine matrix is unmodified.
kwargs – additional args for nrrd.read API. more details about available args: mhe/pynrrd

get_data(img)[source]#

Extract data array and metadata from loaded image and return them. This function must return two objects, the first is a numpy array of image data, the second is a dictionary of metadata.

Parameters:: img (UnionType[NrrdImage, list[NrrdImage]]) – a NrrdImage loaded from an image file or a list of image objects.
Return type:: tuple[ndarray, dict]

read(data, **kwargs)[source]#

Read image data from specified file or files. Note that it returns a data object or a sequence of data objects.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read.
kwargs – additional args for actual read API of 3rd party libs.

Return type:

UnionType[Sequence[Any], Any]

verify_suffix(filename)[source]#

Verify whether the specified filename is supported by pynrrd reader.

Parameters:: filename (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read. if a list of files, verify all the suffixes.
Return type:: bool

Image writer#

resolve_writer#

monai.data.resolve_writer(ext_name, error_if_not_found=True)[source]#

Resolves to a tuple of available ImageWriter in SUPPORTED_WRITERS according to the filename extension key ext_name.

Parameters:

ext_name – the filename extension of the image. As an indexing key it will be converted to a lower case string.
error_if_not_found – whether to raise an error if no suitable image writer is found. if True , raise an OptionalImportError, otherwise return an empty tuple. Default is True.

Return type:

Sequence

register_writer#

monai.data.register_writer(ext_name, *im_writers)[source]#

Register ImageWriter, so that writing a file with filename extension ext_name could be resolved to a tuple of potentially appropriate ImageWriter. The customised writers could be registered by:

from monai.data import register_writer
# `MyWriter` must implement `ImageWriter` interface
register_writer("nii", MyWriter)

Parameters:

ext_name – the filename extension of the image. As an indexing key, it will be converted to a lower case string.
im_writers – one or multiple ImageWriter classes with high priority ones first.

ImageWriter#

class monai.data.ImageWriter(**kwargs)[source]#

The class is a collection of utilities to write images to disk.

Main aspects to be considered are:

dimensionality of the data array, arrangements of spatial dimensions and channel/time dimensions

convert_to_channel_last()

metadata of the current affine and output affine, the data array should be converted accordingly

get_meta_info()

resample_if_needed()

data type handling of the output image (as part of resample_if_needed())

Subclasses of this class should implement the backend-specific functions:

set_data_array() to set the data array (input must be numpy array or torch tensor)

this method sets the backend object’s data part

set_metadata() to set the metadata and output affine

this method sets the metadata including affine handling and image resampling

backend-specific data object create_backend_obj()

backend-specific writing function write()

The primary usage of subclasses of ImageWriter is:

writer = MyWriter()  # subclass of ImageWriter
writer.set_data_array(data_array)
writer.set_metadata(meta_dict)
writer.write(filename)

This creates an image writer object based on data_array and meta_dict and write to filename.

It supports up to three spatial dimensions (with the resampling step supports for both 2D and 3D). When saving multiple time steps or multiple channels data_array, time and/or modality axes should be the at the channel_dim. For example, the shape of a 2D eight-class and channel_dim=0, the segmentation probabilities to be saved could be (8, 64, 64); in this case data_array will be converted to (64, 64, 1, 8) (the third dimension is reserved as a spatial dimension).

The metadata could optionally have the following keys:

'original_affine': for data original affine, it will be the
affine of the output object, defaulting to an identity matrix.

'affine': it should specify the current data affine, defaulting to an identity matrix.

'spatial_shape': for data output spatial shape.

When metadata is specified, the saver will may resample data from the space defined by “affine” to the space defined by “original_affine”, for more details, please refer to the resample_if_needed method.

__init__(**kwargs)[source]#: The constructor supports adding new instance members. The current member in the base class is self.data_obj, the subclasses can add more members, so that necessary meta information can be stored in the object and shared among the class methods.

classmethod convert_to_channel_last(data, channel_dim=0, squeeze_end_dims=True, spatial_ndim=3, contiguous=False)[source]#

Rearrange the data array axes to make the channel_dim-th dim the last dimension and ensure there are spatial_ndim number of spatial dimensions.

When squeeze_end_dims is True, a postprocessing step will be applied to remove any trailing singleton dimensions.

Parameters:

data (Union[ndarray, Tensor]) – input data to be converted to “channel-last” format.
channel_dim (UnionType[None, int, Sequence[int]]) – specifies the channel axes of the data array to move to the last. None indicates no channel dimension, a new axis will be appended as the channel dimension. a sequence of integers indicates multiple non-spatial dimensions.
squeeze_end_dims (bool) – if True, any trailing singleton dimensions will be removed (after the channel has been moved to the end). So if input is (H,W,D,C) and C==1, then it will be saved as (H,W,D). If D is also 1, it will be saved as (H,W). If False, image will always be saved as (H,W,D,C).
spatial_ndim (UnionType[int, None]) – modifying the spatial dims if needed, so that output to have at least this number of spatial dims. If None, the output will have the same number of spatial dimensions as the input.
contiguous (bool) – if True, the output will be contiguous.

classmethod create_backend_obj(data_array, **kwargs)[source]#

Subclass should implement this method to return a backend-specific data representation object. This method is used by cls.write and the input data_array is assumed ‘channel-last’.

Return type:: ndarray

classmethod get_meta_info(metadata=None)[source]#: Extracts relevant meta information from the metadata object (using .get). Optional keys are "spatial_shape", MetaKeys.AFFINE, "original_affine".

classmethod resample_if_needed(data_array, affine=None, target_affine=None, output_spatial_shape=None, mode=bilinear, padding_mode=border, align_corners=False, dtype=<class 'numpy.float64'>)[source]#

Convert the data_array into the coordinate system specified by target_affine, from the current coordinate definition of affine.

If the transform between affine and target_affine could be achieved by simply transposing and flipping data_array, no resampling will happen. Otherwise, this function resamples data_array using the transformation computed from affine and target_affine.

This function assumes the NIfTI dimension notations. Spatially it supports up to three dimensions, that is, H, HW, HWD for 1D, 2D, 3D respectively. When saving multiple time steps or multiple channels, time and/or modality axes should be appended after the first three dimensions. For example, shape of 2D eight-class segmentation probabilities to be saved could be (64, 64, 1, 8). Also, data in shape (64, 64, 8) or (64, 64, 8, 1) will be considered as a single-channel 3D image. The convert_to_channel_last method can be used to convert the data to the format described here.

Note that the shape of the resampled data_array may subject to some rounding errors. For example, resampling a 20x20 pixel image from pixel size (1.5, 1.5)-mm to (3.0, 3.0)-mm space will return a 10x10-pixel image. However, resampling a 20x20-pixel image from pixel size (2.0, 2.0)-mm to (3.0, 3.0)-mm space will output a 14x14-pixel image, where the image shape is rounded from 13.333x13.333 pixels. In this case output_spatial_shape could be specified so that this function writes image data to a designated shape.

Parameters:

data_array (Union[ndarray, Tensor]) – input data array to be converted.
affine (Union[ndarray, Tensor, None]) – the current affine of data_array. Defaults to identity
target_affine (Union[ndarray, Tensor, None]) – the designated affine of data_array. The actual output affine might be different from this value due to precision changes.
output_spatial_shape (UnionType[Sequence[int], int, None]) – spatial shape of the output image. This option is used when resampling is needed.
mode (str) – available options are {"bilinear", "nearest", "bicubic"}. This option is used when resampling is needed. Interpolation mode to calculate output values. Defaults to "bilinear". See also: https://pytorch.org/docs/stable/nn.functional.html#grid-sample
padding_mode (str) – available options are {"zeros", "border", "reflection"}. This option is used when resampling is needed. Padding mode for outside grid values. Defaults to "border". See also: https://pytorch.org/docs/stable/nn.functional.html#grid-sample
align_corners (bool) – boolean option of grid_sample to handle the corner convention. See also: https://pytorch.org/docs/stable/nn.functional.html#grid-sample
dtype (Union[dtype, type, str, None]) – data type for resampling computation. Defaults to np.float64 for best precision. If None, use the data type of input data. The output data type of this method is always np.float32.

write(filename, verbose=True, **kwargs)[source]#: subclass should implement this method to call the backend-specific writing APIs.

ITKWriter#

class monai.data.ITKWriter(output_dtype=<class 'numpy.float32'>, affine_lps_to_ras=True, **kwargs)[source]#

Write data and metadata into files on disk using ITK-python.

import numpy as np
from monai.data import ITKWriter

np_data = np.arange(48).reshape(3, 4, 4)

# write as 3d spatial image no channel
writer = ITKWriter(output_dtype=np.float32)
writer.set_data_array(np_data, channel_dim=None)
# optionally set metadata affine
writer.set_metadata({"affine": np.eye(4), "original_affine": -1 * np.eye(4)})
writer.write("test1.nii.gz")

# write as 2d image, channel-first
writer = ITKWriter(output_dtype=np.uint8)
writer.set_data_array(np_data, channel_dim=0)
writer.set_metadata({"spatial_shape": (5, 5)})
writer.write("test1.png")

__init__(output_dtype=<class 'numpy.float32'>, affine_lps_to_ras=True, **kwargs)[source]#

Parameters:

output_dtype (Union[dtype, type, str, None]) – output data type.
affine_lps_to_ras (UnionType[bool, None]) – whether to convert the affine matrix from “LPS” to “RAS”. Defaults to True. Set to True to be consistent with NibabelWriter, otherwise the affine matrix is assumed already in the ITK convention. Set to None to use data_array.meta[MetaKeys.SPACE] to determine the flag.
kwargs – keyword arguments passed to ImageWriter.

The constructor will create self.output_dtype internally. affine and channel_dim are initialized as instance members (default None, 0):

user-specified affine should be set in set_metadata,

user-specified channel_dim should be set in set_data_array.

classmethod create_backend_obj(data_array, channel_dim=0, affine=None, dtype=<class 'numpy.float32'>, affine_lps_to_ras=True, **kwargs)[source]#

Create an ITK object from data_array. This method assumes a ‘channel-last’ data_array.

Parameters:

data_array (Union[ndarray, Tensor]) – input data array.
channel_dim (UnionType[int, None]) – channel dimension of the data array. This is used to create a Vector Image if it is not None.
affine (Union[ndarray, Tensor, None]) – affine matrix of the data array. This is used to compute spacing, direction and origin.
dtype (Union[dtype, type, str, None]) – output data type.
affine_lps_to_ras (UnionType[bool, None]) – whether to convert the affine matrix from “LPS” to “RAS”. Defaults to True. Set to True to be consistent with NibabelWriter, otherwise the affine matrix is assumed already in the ITK convention. Set to None to use data_array.meta[MetaKeys.SPACE] to determine the flag.
kwargs – keyword arguments. Current itk.GetImageFromArray will read ttype from this dictionary.

See also

InsightSoftwareConsortium/ITK

set_data_array(data_array, channel_dim=0, squeeze_end_dims=True, **kwargs)[source]#

Convert data_array into ‘channel-last’ numpy ndarray.

Parameters:

data_array (Union[ndarray, Tensor]) – input data array with the channel dimension specified by channel_dim.
channel_dim (UnionType[int, None]) – channel dimension of the data array. Defaults to 0. None indicates data without any channel dimension.
squeeze_end_dims (bool) – if True, any trailing singleton dimensions will be removed.
kwargs – keyword arguments passed to self.convert_to_channel_last, currently support spatial_ndim and contiguous, defauting to 3 and False respectively.

set_metadata(meta_dict=None, resample=True, **options)[source]#

Resample self.dataobj if needed. This method assumes self.data_obj is a ‘channel-last’ ndarray.

Parameters:

meta_dict (UnionType[Mapping, None]) – a metadata dictionary for affine, original affine and spatial shape information. Optional keys are "spatial_shape", "affine", "original_affine".
resample (bool) – if True, the data will be resampled to the original affine (specified in meta_dict).
options – keyword arguments passed to self.resample_if_needed, currently support mode, padding_mode, align_corners, and dtype, defaulting to bilinear, border, False, and np.float64 respectively.

write(filename, verbose=False, **kwargs)[source]#

Create an ITK object from self.create_backend_obj(self.obj, ...) and call itk.imwrite.

Parameters:

filename (Union[str, PathLike]) – filename or PathLike object.
verbose (bool) – if True, log the progress.
kwargs – keyword arguments passed to itk.imwrite, currently support compression and imageio.

See also

InsightSoftwareConsortium/ITK

NibabelWriter#

class monai.data.NibabelWriter(output_dtype=<class 'numpy.float32'>, **kwargs)[source]#

Write data and metadata into files on disk using Nibabel.

import numpy as np
from monai.data import NibabelWriter

np_data = np.arange(48).reshape(3, 4, 4)
writer = NibabelWriter()
writer.set_data_array(np_data, channel_dim=None)
writer.set_metadata({"affine": np.eye(4), "original_affine": np.eye(4)})
writer.write("test1.nii.gz", verbose=True)

__init__(output_dtype=<class 'numpy.float32'>, **kwargs)[source]#

Parameters:

output_dtype (Union[dtype, type, str, None]) – output data type.
kwargs – keyword arguments passed to ImageWriter.

The constructor will create self.output_dtype internally. affine is initialized as instance members (default None), user-specified affine should be set in set_metadata.

classmethod create_backend_obj(data_array, affine=None, dtype=None, **kwargs)[source]#

Create an Nifti1Image object from data_array. This method assumes a ‘channel-last’ data_array.

Parameters:

data_array (Union[ndarray, Tensor]) – input data array.
affine (Union[ndarray, Tensor, None]) – affine matrix of the data array.
dtype (Union[dtype, type, str, None]) – output data type.
kwargs – keyword arguments. Current nib.nifti1.Nifti1Image will read header, extra, file_map from this dictionary.

See also

https://nipy.org/nibabel/reference/nibabel.nifti1.html#nibabel.nifti1.Nifti1Image

set_data_array(data_array, channel_dim=0, squeeze_end_dims=True, **kwargs)[source]#

Convert data_array into ‘channel-last’ numpy ndarray.

Parameters:

data_array (Union[ndarray, Tensor]) – input data array with the channel dimension specified by channel_dim.
channel_dim (UnionType[int, None]) – channel dimension of the data array. Defaults to 0. None indicates data without any channel dimension.
squeeze_end_dims (bool) – if True, any trailing singleton dimensions will be removed.
kwargs – keyword arguments passed to self.convert_to_channel_last, currently support spatial_ndim, defauting to 3.

set_metadata(meta_dict, resample=True, **options)[source]#

Resample self.dataobj if needed. This method assumes self.data_obj is a ‘channel-last’ ndarray.

Parameters:

meta_dict (UnionType[Mapping, None]) – a metadata dictionary for affine, original affine and spatial shape information. Optional keys are "spatial_shape", "affine", "original_affine".
resample (bool) – if True, the data will be resampled to the original affine (specified in meta_dict).
options – keyword arguments passed to self.resample_if_needed, currently support mode, padding_mode, align_corners, and dtype, defaulting to bilinear, border, False, and np.float64 respectively.

write(filename, verbose=False, **obj_kwargs)[source]#

Create a Nibabel object from self.create_backend_obj(self.obj, ...) and call nib.save.

Parameters:

filename (Union[str, PathLike]) – filename or PathLike object.
verbose (bool) – if True, log the progress.
obj_kwargs – keyword arguments passed to self.create_backend_obj,

See also

https://nipy.org/nibabel/reference/nibabel.nifti1.html#nibabel.nifti1.save

PILWriter#

class monai.data.PILWriter(output_dtype=<class 'numpy.float32'>, channel_dim=0, scale=255, **kwargs)[source]#

Write image data into files on disk using pillow.

It’s based on the Image module in PIL library: https://pillow.readthedocs.io/en/stable/reference/Image.html

import numpy as np
from monai.data import PILWriter

np_data = np.arange(48).reshape(3, 4, 4)
writer = PILWriter(np.uint8)
writer.set_data_array(np_data, channel_dim=0)
writer.write("test1.png", verbose=True)

__init__(output_dtype=<class 'numpy.float32'>, channel_dim=0, scale=255, **kwargs)[source]#

Parameters:

output_dtype (Union[dtype, type, str, None]) – output data type.
channel_dim (UnionType[int, None]) – channel dimension of the data array. Defaults to 0. None indicates data without any channel dimension.
scale (UnionType[int, None]) – {255, 65535} postprocess data by clipping to [0, 1] and scaling [0, 255] (uint8) or [0, 65535] (uint16). Default is None to disable scaling.
kwargs – keyword arguments passed to ImageWriter.

classmethod create_backend_obj(data_array, dtype=None, scale=255, reverse_indexing=True, **kwargs)[source]#

Create a PIL object from data_array.

Parameters:

data_array (Union[ndarray, Tensor]) – input data array.
dtype (Union[dtype, type, str, None]) – output data type.
scale (UnionType[int, None]) – {255, 65535} postprocess data by clipping to [0, 1] and scaling [0, 255] (uint8) or [0, 65535] (uint16). Default is None to disable scaling.
reverse_indexing (bool) – if True, the data array’s first two dimensions will be swapped.
kwargs – keyword arguments. Currently PILImage.fromarray will read image_mode from this dictionary, defaults to None.

See also

https://pillow.readthedocs.io/en/stable/reference/Image.html

classmethod get_meta_info(metadata=None)[source]#: Extracts relevant meta information from the metadata object (using .get). Optional keys are "spatial_shape", MetaKeys.AFFINE, "original_affine".

classmethod resample_and_clip(data_array, output_spatial_shape=None, mode=bicubic)[source]#

Resample data_array to output_spatial_shape if needed. :type data_array: Union[ndarray, Tensor] :param data_array: input data array. This method assumes the ‘channel-last’ format. :type output_spatial_shape: UnionType[Sequence[int], None] :param output_spatial_shape: output spatial shape. :type mode: str :param mode: interpolation mode, default is InterpolateMode.BICUBIC.

Return type:: ndarray

set_data_array(data_array, channel_dim=0, squeeze_end_dims=True, contiguous=False, **kwargs)[source]#

Convert data_array into ‘channel-last’ numpy ndarray.

Parameters:

data_array (Union[ndarray, Tensor]) – input data array with the channel dimension specified by channel_dim.
channel_dim (UnionType[int, None]) – channel dimension of the data array. Defaults to 0. None indicates data without any channel dimension.
squeeze_end_dims (bool) – if True, any trailing singleton dimensions will be removed.
contiguous (bool) – if True, the data array will be converted to a contiguous array. Default is False.
kwargs – keyword arguments passed to self.convert_to_channel_last, currently support spatial_ndim, defauting to 2.

set_metadata(meta_dict=None, resample=True, **options)[source]#

Resample self.dataobj if needed. This method assumes self.data_obj is a ‘channel-last’ ndarray.

Parameters:

meta_dict (UnionType[Mapping, None]) – a metadata dictionary for affine, original affine and spatial shape information. Optional key is "spatial_shape".
resample (bool) – if True, the data will be resampled to the spatial shape specified in meta_dict.
options – keyword arguments passed to self.resample_if_needed, currently support mode, defaulting to bicubic.

write(filename, verbose=False, **kwargs)[source]#

Create a PIL image object from self.create_backend_obj(self.obj, ...) and call save.

Parameters:

filename (Union[str, PathLike]) – filename or PathLike object.
verbose (bool) – if True, log the progress.
kwargs – optional keyword arguments passed to self.create_backend_obj currently support reverse_indexing, image_mode, defaulting to True, None respectively.

See also

https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.save

Synthetic#

monai.data.synthetic.create_test_image_2d(height, width, num_objs=12, rad_max=30, rad_min=5, noise_max=0.0, num_seg_classes=5, channel_dim=None, random_state=None)[source]#

Return a noisy 2D image with num_objs circles and a 2D mask image. The maximum and minimum radii of the circles are given as rad_max and rad_min. The mask will have num_seg_classes number of classes for segmentations labeled sequentially from 1, plus a background class represented as 0. If noise_max is greater than 0 then noise will be added to the image taken from the uniform distribution on range [0,noise_max). If channel_dim is None, will create an image without channel dimension, otherwise create an image with channel dimension as first dim or last dim.

Parameters:

height (int) – height of the image. The value should be larger than 2 * rad_max.
width (int) – width of the image. The value should be larger than 2 * rad_max.
num_objs (int) – number of circles to generate. Defaults to 12.
rad_max (int) – maximum circle radius. Defaults to 30.
rad_min (int) – minimum circle radius. Defaults to 5.
noise_max (float) – if greater than 0 then noise will be added to the image taken from the uniform distribution on range [0,noise_max). Defaults to 0.
num_seg_classes (int) – number of classes for segmentations. Defaults to 5.
channel_dim (UnionType[int, None]) – if None, create an image without channel dimension, otherwise create an image with channel dimension as first dim or last dim. Defaults to None.
random_state (UnionType[RandomState, None]) – the random generator to use. Defaults to np.random.

Return type:

tuple[ndarray, ndarray]

Returns:

Randomised Numpy array with shape (height, width)

monai.data.synthetic.create_test_image_3d(height, width, depth, num_objs=12, rad_max=30, rad_min=5, noise_max=0.0, num_seg_classes=5, channel_dim=None, random_state=None)[source]#

Return a noisy 3D image and segmentation.

Parameters:

height (int) – height of the image. The value should be larger than 2 * rad_max.
width (int) – width of the image. The value should be larger than 2 * rad_max.
depth (int) – depth of the image. The value should be larger than 2 * rad_max.
num_objs (int) – number of circles to generate. Defaults to 12.
rad_max (int) – maximum circle radius. Defaults to 30.
rad_min (int) – minimum circle radius. Defaults to 5.
noise_max (float) – if greater than 0 then noise will be added to the image taken from the uniform distribution on range [0,noise_max). Defaults to 0.
num_seg_classes (int) – number of classes for segmentations. Defaults to 5.
channel_dim (UnionType[int, None]) – if None, create an image without channel dimension, otherwise create an image with channel dimension as first dim or last dim. Defaults to None.
random_state (UnionType[RandomState, None]) – the random generator to use. Defaults to np.random.

Return type:

tuple[ndarray, ndarray]

Returns:

Randomised Numpy array with shape (height, width, depth)

See also

create_test_image_2d()

Ouput folder layout#

class monai.data.folder_layout.FolderLayout(output_dir, postfix='', extension='', parent=False, makedirs=False, data_root_dir='')[source]#

A utility class to create organized filenames within output_dir. The filename method could be used to create a filename following the folder structure.

Example:

from monai.data import FolderLayout

layout = FolderLayout(
    output_dir="/test_run_1/",
    postfix="seg",
    extension="nii",
    makedirs=False)
layout.filename(subject="Sub-A", idx="00", modality="T1")
# return value: "/test_run_1/Sub-A_seg_00_modality-T1.nii"

The output filename is a string starting with a subject ID, and includes additional information about a customized index and image modality. This utility class doesn’t alter the underlying image data, but provides a convenient way to create filenames.

__init__(output_dir, postfix='', extension='', parent=False, makedirs=False, data_root_dir='')[source]#

Parameters:

output_dir (Union[str, PathLike]) – output directory.
postfix (str) – a postfix string for output file name appended to subject.
extension (str) – output file extension to be appended to the end of an output filename.
parent (bool) – whether to add a level of parent folder to contain each image to the output filename.
makedirs (bool) – whether to create the output parent directories if they do not exist.
data_root_dir (Union[str, PathLike]) – an optional PathLike object to preserve the folder structure of the input subject. Please see monai.data.utils.create_file_basename() for more details.

filename(subject='subject', idx=None, **kwargs)[source]#

Create a filename based on the input subject and idx.

The output filename is formed as:

output_dir/[subject/]subject[_postfix][_idx][_key-value][ext]

Parameters:

subject (Union[str, PathLike]) – subject name, used as the primary id of the output filename. When a PathLike object is provided, the base filename will be used as the subject name, the extension name of subject will be ignored, in favor of extension from this class’s constructor.
idx – additional index name of the image.
kwargs – additional keyword arguments to be used to form the output filename. The key-value pairs will be appended to the output filename as f"_{k}-{v}".

Return type:

Union[str, PathLike]

class monai.data.folder_layout.FolderLayoutBase[source]#

Abstract base class to define a common interface for FolderLayout and derived classes Mainly, defines the filename(**kwargs) -> PathLike function, which must be defined by the deriving class.

Example:

from monai.data import FolderLayoutBase

class MyFolderLayout(FolderLayoutBase):
    def __init__(
        self,
        basepath: Path,
        extension: str = "",
        makedirs: bool = False
    ):
        self.basepath = basepath
        if not extension:
            self.extension = ""
        elif extension.startswith("."):
            self.extension = extension:
        else:
            self.extension = f".{extension}"
        self.makedirs = makedirs

    def filename(self, patient_no: int, image_name: str, **kwargs) -> Path:
        sub_path = self.basepath / patient_no
        if not sub_path.exists():
            sub_path.mkdir(parents=True)

        file = image_name
        for k, v in kwargs.items():
            file += f"_{k}-{v}"

        file +=  self.extension
        return sub_path / file

abstractmethod filename(**kwargs)[source]#

Create a filename with path based on the input kwargs. Abstract method, implement your own.

Return type:: Union[str, PathLike]

monai.data.folder_layout.default_name_formatter(metadict, saver)[source]#

Returns a kwargs dict for FolderLayout.filename(), according to the input metadata and SaveImage transform.

Return type:: dict

Utilities#

monai.data.utils.affine_to_spacing(affine, r=3, dtype=<class 'float'>, suppress_zeros=True)[source]#

Computing the current spacing from the affine matrix.

Parameters:

affine (~NdarrayTensor) – a d x d affine matrix.
r (int) – indexing based on the spatial rank, spacing is computed from affine[:r, :r].
dtype – data type of the output.
suppress_zeros (bool) – whether to suppress the zeros with ones.

Return type:

~NdarrayTensor

Returns:

an r dimensional vector of spacing.

monai.data.utils.compute_importance_map(patch_size, mode=constant, sigma_scale=0.125, device='cpu', dtype=torch.float32)[source]#

Get importance map for different weight modes.

Parameters:

patch_size (tuple[int, …]) – Size of the required importance map. This should be either H, W [,D].
mode (UnionType[BlendMode, str]) –
{"constant", "gaussian"} How to blend output of overlapping windows. Defaults to "constant".
- "constant”: gives equal weight to all predictions.
- "gaussian”: gives less weight to predictions on edges of windows.
sigma_scale (UnionType[Sequence[float], float]) – Sigma_scale to calculate sigma for each dimension (sigma = sigma_scale * dim_size). Used for gaussian mode only.
device (UnionType[device, int, str]) – Device to put importance map on.
dtype (UnionType[dtype, str, None]) – Data type of the output importance map.

Raises:

ValueError – When mode is not one of [“constant”, “gaussian”].

Return type:

Tensor

Returns:

Tensor of size patch_size.

monai.data.utils.compute_shape_offset(spatial_shape, in_affine, out_affine, scale_extent=False)[source]#

Given input and output affine, compute appropriate shapes in the output space based on the input array’s shape. This function also returns the offset to put the shape in a good position with respect to the world coordinate system.

Parameters:

spatial_shape (UnionType[ndarray, Sequence[int]]) – input array’s shape
in_affine (matrix) – 2D affine matrix
out_affine (matrix) – 2D affine matrix
scale_extent (bool) –
whether the scale is computed based on the spacing or the full extent of voxels, for example, for a factor of 0.5 scaling:

option 1, “o” represents a voxel, scaling the distance between voxels:
```
o--o--o
o-----o
```
option 2, each voxel has a physical extent, scaling the full voxel extent:
```
| voxel 1 | voxel 2 | voxel 3 | voxel 4 |
|      voxel 1      |      voxel 2      |
```
Option 1 may reduce the number of locations that requiring interpolation. Option 2 is more resolution agnostic, that is, resampling coordinates depend on the scaling factor, not on the number of voxels. Default is False, using option 1 to compute the shape and offset.

Return type:

tuple[ndarray, ndarray]

monai.data.utils.convert_tables_to_dicts(dfs, row_indices=None, col_names=None, col_types=None, col_groups=None, **kwargs)[source]#

Utility to join pandas tables, select rows, columns and generate groups. Will return a list of dictionaries, every dictionary maps to a row of data in tables.

Parameters:

dfs – data table in pandas Dataframe format. if providing a list of tables, will join them.
row_indices (UnionType[Sequence[UnionType[int, str]], None]) – indices of the expected rows to load. it should be a list, every item can be a int number or a range [start, end) for the indices. for example: row_indices=[[0, 100], 200, 201, 202, 300]. if None, load all the rows in the file.
col_names (UnionType[Sequence[str], None]) – names of the expected columns to load. if None, load all the columns.
col_types (UnionType[dict[str, UnionType[dict[str, Any], None]], None]) –
type and default value to convert the loaded columns, if None, use original data. it should be a dictionary, every item maps to an expected column, the key is the column name and the value is None or a dictionary to define the default value and data type. the supported keys in dictionary are: [“type”, “default”], and note that the value of default should not be None. for example:
```
col_types = {
    "subject_id": {"type": str},
    "label": {"type": int, "default": 0},
    "ehr_0": {"type": float, "default": 0.0},
    "ehr_1": {"type": float, "default": 0.0},
}
```
col_groups (UnionType[dict[str, Sequence[str]], None]) – args to group the loaded columns to generate a new column, it should be a dictionary, every item maps to a group, the key will be the new column name, the value is the names of columns to combine. for example: col_groups={“ehr”: [f”ehr_{i}” for i in range(10)], “meta”: [“meta_1”, “meta_2”]}
kwargs – additional arguments for pandas.merge() API to join tables.

Return type:

list[dict[str, Any]]

monai.data.utils.correct_nifti_header_if_necessary(img_nii)[source]#

Check nifti object header’s format, update the header if needed. In the updated image pixdim matches the affine.

Parameters:: img_nii – nifti image object

monai.data.utils.create_file_basename(postfix, input_file_name, folder_path, data_root_dir='', separate_folder=True, patch_index=None, makedirs=True)[source]#

Utility function to create the path to the output file based on the input filename (file name extension is not added by this function). When data_root_dir is not specified, the output file name is:

folder_path/input_file_name (no ext.) /input_file_name (no ext.)[_postfix][_patch_index]

otherwise the relative path with respect to data_root_dir will be inserted, for example:

from monai.data import create_file_basename
create_file_basename(
    postfix="seg",
    input_file_name="/foo/bar/test1/image.png",
    folder_path="/output",
    data_root_dir="/foo/bar",
    separate_folder=True,
    makedirs=False)
# output: /output/test1/image/image_seg

Parameters:

postfix (str) – output name’s postfix
input_file_name (Union[str, PathLike]) – path to the input image file.
folder_path (Union[str, PathLike]) – path for the output file
data_root_dir (Union[str, PathLike]) – if not empty, it specifies the beginning parts of the input file’s absolute path. This is used to compute input_file_rel_path, the relative path to the file from data_root_dir to preserve folder structure when saving in case there are files in different folders with the same file names.
separate_folder (bool) – whether to save every file in a separate folder, for example: if input filename is image.nii, postfix is seg and folder_path is output, if True, save as: output/image/image_seg.nii, if False, save as output/image_seg.nii. default to True.
patch_index – if not None, append the patch index to filename.
makedirs (bool) – whether to create the folder if it does not exist.

Return type:

str

monai.data.utils.decollate_batch(batch, detach=True, pad=True, fill_value=None)[source]#

De-collate a batch of data (for example, as produced by a DataLoader).

Returns a list of structures with the original tensor’s 0-th dimension sliced into elements using torch.unbind.

Images originally stored as (B,C,H,W,[D]) will be returned as (C,H,W,[D]). Other information, such as metadata, may have been stored in a list (or a list inside nested dictionaries). In this case we return the element of the list corresponding to the batch idx.

Return types aren’t guaranteed to be the same as the original, since numpy arrays will have been converted to torch.Tensor, sequences may be converted to lists of tensors, mappings may be converted into dictionaries.

For example:

batch_data = {
    "image": torch.rand((2,1,10,10)),
    DictPostFix.meta("image"): {"scl_slope": torch.Tensor([0.0, 0.0])}
}
out = decollate_batch(batch_data)
print(len(out))
>>> 2

print(out[0])
>>> {'image': tensor([[[4.3549e-01...43e-01]]]), DictPostFix.meta("image"): {'scl_slope': 0.0}}

batch_data = [torch.rand((2,1,10,10)), torch.rand((2,3,5,5))]
out = decollate_batch(batch_data)
print(out[0])
>>> [tensor([[[4.3549e-01...43e-01]]], tensor([[[5.3435e-01...45e-01]]])]

batch_data = torch.rand((2,1,10,10))
out = decollate_batch(batch_data)
print(out[0])
>>> tensor([[[4.3549e-01...43e-01]]])

batch_data = {
    "image": [1, 2, 3], "meta": [4, 5],  # undetermined batch size
}
out = decollate_batch(batch_data, pad=True, fill_value=0)
print(out)
>>> [{'image': 1, 'meta': 4}, {'image': 2, 'meta': 5}, {'image': 3, 'meta': 0}]
out = decollate_batch(batch_data, pad=False)
print(out)
>>> [{'image': 1, 'meta': 4}, {'image': 2, 'meta': 5}]

Parameters:

batch – data to be de-collated.
detach (bool) – whether to detach the tensors. Scalars tensors will be detached into number types instead of torch tensors.
pad – when the items in a batch indicate different batch size, whether to pad all the sequences to the longest. If False, the batch size will be the length of the shortest sequence.
fill_value – when pad is True, the fillvalue to use when padding, defaults to None.

monai.data.utils.dense_patch_slices(image_size, patch_size, scan_interval, return_slice=True)[source]#

Enumerate all slices defining ND patches of size patch_size from an image_size input image.

Parameters:

image_size (Sequence[int]) – dimensions of image to iterate over
patch_size (Sequence[int]) – size of patches to generate slices
scan_interval (Sequence[int]) – dense patch sampling interval
return_slice (bool) – whether to return a list of slices (or tuples of indices), defaults to True

Return type:

list[tuple[slice, …]]

Returns:

a list of slice objects defining each patch

monai.data.utils.get_extra_metadata_keys()[source]#

Get a list of unnecessary keys for metadata that can be removed.

Return type:: list[str]
Returns:: List of keys to be removed.

monai.data.utils.get_random_patch(dims, patch_size, rand_state=None)[source]#

Returns a tuple of slices to define a random patch in an array of shape dims with size patch_size or the as close to it as possible within the given dimension. It is expected that patch_size is a valid patch for a source of shape dims as returned by get_valid_patch_size.

Parameters:

dims (Sequence[int]) – shape of source array
patch_size (Sequence[int]) – shape of patch size to generate
rand_state (UnionType[RandomState, None]) – a random state object to generate random numbers from

Returns:

a tuple of slice objects defining the patch

Return type:

(tuple of slice)

monai.data.utils.get_valid_patch_size(image_size, patch_size)[source]#

Given an image of dimensions image_size, return a patch size tuple taking the dimension from patch_size if this is not 0/None. Otherwise, or if patch_size is shorter than image_size, the dimension from image_size is taken. This ensures the returned patch size is within the bounds of image_size. If patch_size is a single number this is interpreted as a patch of the same dimensionality of image_size with that size in each dimension.

Return type:: tuple[int, …]

monai.data.utils.is_no_channel(val)[source]#

Returns whether val indicates “no_channel”, for MetaKeys.ORIGINAL_CHANNEL_DIM.

Return type:: bool

monai.data.utils.is_supported_format(filename, suffixes)[source]#

Verify whether the specified file or files format match supported suffixes. If supported suffixes is None, skip the verification and return True.

Parameters:

filename (Union[Sequence[Union[str, PathLike]], str, PathLike]) – file name or a list of file names to read. if a list of files, verify all the suffixes.
suffixes (Sequence[str]) – all the supported image suffixes of current reader, must be a list of lower case suffixes.

Return type:

bool

monai.data.utils.iter_patch(arr, patch_size=0, start_pos=(), overlap=0.0, copy_back=True, mode=wrap, **pad_opts)[source]#

Yield successive patches from arr of size patch_size. The iteration can start from position start_pos in arr but drawing from a padded array extended by the patch_size in each dimension (so these coordinates can be negative to start in the padded region). If copy_back is True the values from each patch are written back to arr.

Parameters:

arr (Union[ndarray, Tensor]) – array to iterate over
patch_size (UnionType[Sequence[int], int]) – size of patches to generate slices for, 0 or None selects whole dimension. For 0 or None, padding and overlap ratio of the corresponding dimension will be 0.
start_pos (Sequence[int]) – starting position in the array, default is 0 for each dimension
overlap (UnionType[Sequence[float], float]) – the amount of overlap of neighboring patches in each dimension (a value between 0.0 and 1.0). If only one float number is given, it will be applied to all dimensions. Defaults to 0.0.
copy_back (bool) – if True data from the yielded patches is copied back to arr once the generator completes
mode (UnionType[str, None]) – available modes: (Numpy) {"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} (PyTorch) {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. If None, no wrapping is performed. Defaults to "wrap". See also: https://numpy.org/doc/stable/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html requires pytorch >= 1.10 for best compatibility.
pad_opts (dict) – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

Yields:

Patches of array data from arr which are views into a padded array which can be modified, if copy_back is True these changes will be reflected in arr once the iteration completes.

Note

coordinate format is:

[1st_dim_start, 1st_dim_end,
2nd_dim_start, 2nd_dim_end, …, Nth_dim_start, Nth_dim_end]]

Return type:: Generator[tuple[Union[ndarray, Tensor], ndarray], None, None]

monai.data.utils.iter_patch_position(image_size, patch_size, start_pos=(), overlap=0.0, padded=False)[source]#

Yield successive tuples of upper left corner of patches of size patch_size from an array of dimensions image_size. The iteration starts from position start_pos in the array, or starting at the origin if this isn’t provided. Each patch is chosen in a contiguous grid using a rwo-major ordering.

Parameters:

image_size (Sequence[int]) – dimensions of array to iterate over
patch_size (UnionType[Sequence[int], int, ndarray]) – size of patches to generate slices for, 0 or None selects whole dimension
start_pos (Sequence[int]) – starting position in the array, default is 0 for each dimension
overlap (UnionType[Sequence[float], float, Sequence[int], int]) – the amount of overlap of neighboring patches in each dimension. Either a float or list of floats between 0.0 and 1.0 to define relative overlap to patch size, or an int or list of ints to define number of pixels for overlap. If only one float/int number is given, it will be applied to all dimensions. Defaults to 0.0.
padded (bool) – if the image is padded so the patches can go beyond the borders. Defaults to False.

Yields:

Tuples of positions defining the upper left corner of each patch

monai.data.utils.iter_patch_slices(image_size, patch_size, start_pos=(), overlap=0.0, padded=True)[source]#

Yield successive tuples of slices defining patches of size patch_size from an array of dimensions image_size. The iteration starts from position start_pos in the array, or starting at the origin if this isn’t provided. Each patch is chosen in a contiguous grid using a rwo-major ordering.

Parameters:

image_size (Sequence[int]) – dimensions of array to iterate over
patch_size (UnionType[Sequence[int], int]) – size of patches to generate slices for, 0 or None selects whole dimension
start_pos (Sequence[int]) – starting position in the array, default is 0 for each dimension
overlap (UnionType[Sequence[float], float]) – the amount of overlap of neighboring patches in each dimension (a value between 0.0 and 1.0). If only one float number is given, it will be applied to all dimensions. Defaults to 0.0.
padded (bool) – if the image is padded so the patches can go beyond the borders. Defaults to False.

Yields:

Tuples of slice objects defining each patch

Return type:

Generator[tuple[slice, …], None, None]

monai.data.utils.json_hashing(item)[source]#

Parameters:: item – data item to be hashed

Returns: the corresponding hash key

Return type:: bytes

monai.data.utils.list_data_collate(batch)[source]#: Enhancement for PyTorch DataLoader default collate. If dataset already returns a list of batch data that generated in transforms, need to merge all data to 1 list. Then it’s same as the default collate behavior.

Note

Need to use this collate if apply some transforms that can generate batch data.

monai.data.utils.no_collation(x)[source]#: No any collation operation.

monai.data.utils.orientation_ras_lps(affine)[source]#

Convert the affine between the RAS and LPS orientation by flipping the first two spatial dimensions.

Parameters:: affine (~NdarrayTensor) – a 2D affine matrix.
Return type:: ~NdarrayTensor

monai.data.utils.pad_list_data_collate(batch, method=symmetric, mode=constant, **kwargs)[source]#

Function version of monai.transforms.croppad.batch.PadListDataCollate.

Same as MONAI’s list_data_collate, except any tensors are centrally padded to match the shape of the biggest tensor in each dimension. This transform is useful if some of the applied transforms generate batch data of different sizes.

This can be used on both list and dictionary data. Note that in the case of the dictionary data, this decollate function may add the transform information of PadListDataCollate to the list of invertible transforms if input batch have different spatial shape, so need to call static method: monai.transforms.croppad.batch.PadListDataCollate.inverse before inverting other transforms.

Parameters:

batch (Sequence) – batch of data to pad-collate
method (str) – padding method (see monai.transforms.SpatialPad)
mode (str) – padding mode (see monai.transforms.SpatialPad)
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

monai.data.utils.partition_dataset(data, ratios=None, num_partitions=None, shuffle=False, seed=0, drop_last=False, even_divisible=False)[source]#

Split the dataset into N partitions. It can support shuffle based on specified random seed. Will return a set of datasets, every dataset contains 1 partition of original dataset. And it can split the dataset based on specified ratios or evenly split into num_partitions. Refer to: https://pytorch.org/docs/stable/distributed.html#module-torch.distributed.launch.

Note

It also can be used to partition dataset for ranks in distributed training. For example, partition dataset before training and use CacheDataset, every rank trains with its own data. It can avoid duplicated caching content in each rank, but will not do global shuffle before every epoch:

data_partition = partition_dataset(
    data=train_files,
    num_partitions=dist.get_world_size(),
    shuffle=True,
    even_divisible=True,
)[dist.get_rank()]

train_ds = SmartCacheDataset(
    data=data_partition,
    transform=train_transforms,
    replace_rate=0.2,
    cache_num=15,
)

Parameters:

data (Sequence) – input dataset to split, expect a list of data.
ratios (UnionType[Sequence[float], None]) – a list of ratio number to split the dataset, like [8, 1, 1].
num_partitions (UnionType[int, None]) – expected number of the partitions to evenly split, only works when ratios not specified.
shuffle (bool) – whether to shuffle the original dataset before splitting.
seed (int) – random seed to shuffle the dataset, only works when shuffle is True.
drop_last (bool) – only works when even_divisible is False and no ratios specified. if True, will drop the tail of the data to make it evenly divisible across partitions. if False, will add extra indices to make the data evenly divisible across partitions.
even_divisible (bool) – if True, guarantee every partition has same length.

Examples:

>>> data = [1, 2, 3, 4, 5]
>>> partition_dataset(data, ratios=[0.6, 0.2, 0.2], shuffle=False)
[[1, 2, 3], [4], [5]]
>>> partition_dataset(data, num_partitions=2, shuffle=False)
[[1, 3, 5], [2, 4]]
>>> partition_dataset(data, num_partitions=2, shuffle=False, even_divisible=True, drop_last=True)
[[1, 3], [2, 4]]
>>> partition_dataset(data, num_partitions=2, shuffle=False, even_divisible=True, drop_last=False)
[[1, 3, 5], [2, 4, 1]]
>>> partition_dataset(data, num_partitions=2, shuffle=False, even_divisible=False, drop_last=False)
[[1, 3, 5], [2, 4]]

monai.data.utils.partition_dataset_classes(data, classes, ratios=None, num_partitions=None, shuffle=False, seed=0, drop_last=False, even_divisible=False)[source]#

Split the dataset into N partitions based on the given class labels. It can make sure the same ratio of classes in every partition. Others are same as monai.data.partition_dataset.

Parameters:

data (Sequence) – input dataset to split, expect a list of data.
classes (Sequence[int]) – a list of labels to help split the data, the length must match the length of data.
ratios (UnionType[Sequence[float], None]) – a list of ratio number to split the dataset, like [8, 1, 1].
num_partitions (UnionType[int, None]) – expected number of the partitions to evenly split, only works when no ratios.
shuffle (bool) – whether to shuffle the original dataset before splitting.
seed (int) – random seed to shuffle the dataset, only works when shuffle is True.
drop_last (bool) – only works when even_divisible is False and no ratios specified. if True, will drop the tail of the data to make it evenly divisible across partitions. if False, will add extra indices to make the data evenly divisible across partitions.
even_divisible (bool) – if True, guarantee every partition has same length.

Examples:

>>> data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
>>> classes = [2, 0, 2, 1, 3, 2, 2, 0, 2, 0, 3, 3, 1, 3]
>>> partition_dataset_classes(data, classes, shuffle=False, ratios=[2, 1])
[[2, 8, 4, 1, 3, 6, 5, 11, 12], [10, 13, 7, 9, 14]]

monai.data.utils.pickle_hashing(item, protocol=5)[source]#

Parameters:

item – data item to be hashed
protocol – protocol version used for pickling, defaults to pickle.HIGHEST_PROTOCOL.

Returns: the corresponding hash key

Return type:: bytes

monai.data.utils.rectify_header_sform_qform(img_nii)[source]#

Look at the sform and qform of the nifti object and correct it if any incompatibilities with pixel dimensions

Adapted from NifTK/NiftyNet

Parameters:: img_nii – nifti image object

monai.data.utils.remove_extra_metadata(meta)[source]#

Remove extra metadata from the dictionary. Operates in-place so nothing is returned.

Parameters:: meta (dict) – dictionary containing metadata to be modified.
Return type:: None
Returns:: None

monai.data.utils.remove_keys(data, keys)[source]#

Remove keys from a dictionary. Operates in-place so nothing is returned.

Parameters:

data (dict) – dictionary to be modified.
keys (list[str]) – keys to be deleted from dictionary.

Return type:

None

Returns:

None

monai.data.utils.reorient_spatial_axes(data_shape, init_affine, target_affine)[source]#

Given the input init_affine, compute the orientation transform between it and target_affine by rearranging/flipping the axes.

Returns the orientation transform and the updated affine (tensor or ndarray depends on the input affine data type). Note that this function requires external module nibabel.orientations.

Return type:: tuple[ndarray, Union[ndarray, Tensor]]

monai.data.utils.resample_datalist(data, factor, random_pick=False, seed=0)[source]#

Utility function to resample the loaded datalist for training, for example: If factor < 1.0, randomly pick part of the datalist and set to Dataset, useful to quickly test the program. If factor > 1.0, repeat the datalist to enhance the Dataset.

Parameters:

data (Sequence) – original datalist to scale.
factor (float) – scale factor for the datalist, for example, factor=4.5, repeat the datalist 4 times and plus 50% of the original datalist.
random_pick (bool) – whether to randomly pick data if scale factor has decimal part.
seed (int) – random seed to randomly pick data.

monai.data.utils.select_cross_validation_folds(partitions, folds)[source]#

Select cross validation data based on data partitions and specified fold index. if a list of fold indices is provided, concatenate the partitions of these folds.

Parameters:

partitions (Sequence[Iterable]) – a sequence of datasets, each item is a iterable
folds (UnionType[Sequence[int], int]) – the indices of the partitions to be combined.

Return type:

list

Returns:

A list of combined datasets.

Example:

>>> partitions = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
>>> select_cross_validation_folds(partitions, 2)
[5, 6]
>>> select_cross_validation_folds(partitions, [1, 2])
[3, 4, 5, 6]
>>> select_cross_validation_folds(partitions, [-1, 2])
[9, 10, 5, 6]

monai.data.utils.set_rnd(obj, seed)[source]#

Set seed or random state for all randomizable properties of obj.

Parameters:

obj – object to set seed or random state for.
seed (int) – set the random state with an integer seed.

Return type:

int

monai.data.utils.sorted_dict(item, key=None, reverse=False)[source]#: Return a new sorted dictionary from the item.

monai.data.utils.to_affine_nd(r, affine, dtype=<class 'numpy.float64'>)[source]#

Using elements from affine, to create a new affine matrix by assigning the rotation/zoom/scaling matrix and the translation vector.

When r is an integer, output is an (r+1)x(r+1) matrix, where the top left kxk elements are copied from affine, the last column of the output affine is copied from affine’s last column. k is determined by min(r, len(affine) - 1).

When r is an affine matrix, the output has the same shape as r, and the top left kxk elements are copied from affine, the last column of the output affine is copied from affine’s last column. k is determined by min(len(r) - 1, len(affine) - 1).

Parameters:

r (int or matrix) – number of spatial dimensions or an output affine to be filled.
affine (matrix) – 2D affine matrix
dtype – data type of the output array.

Raises:

ValueError – When affine dimensions is not 2.
ValueError – When r is nonpositive.

Return type:

~NdarrayTensor

Returns:

an (r+1) x (r+1) matrix (tensor or ndarray depends on the input affine data type)

monai.data.utils.worker_init_fn(worker_id)[source]#

Callback function for PyTorch DataLoader worker_init_fn. It can set different random seed for the transforms in different workers.

Return type:: None

monai.data.utils.zoom_affine(affine, scale, diagonal=True)[source]#

To make column norm of affine the same as scale. If diagonal is False, returns an affine that combines orthogonal rotation and the new scale. This is done by first decomposing affine, then setting the zoom factors to scale, and composing a new affine; the shearing factors are removed. If diagonal is True, returns a diagonal matrix, the scaling factors are set to the diagonal elements. This function always return an affine with zero translations.

Parameters:

affine (nxn matrix) – a square matrix.
scale (UnionType[ndarray, Sequence[float]]) – new scaling factor along each dimension. if the components of the scale are non-positive values, will use the corresponding components of the original pixdim, which is computed from the affine.
diagonal (bool) – whether to return a diagonal scaling matrix. Defaults to True.

Raises:

ValueError – When affine is not a square matrix.
ValueError – When scale contains a nonpositive scalar.

Returns:

the updated n x n affine.

Partition Dataset#

monai.data.partition_dataset(data, ratios=None, num_partitions=None, shuffle=False, seed=0, drop_last=False, even_divisible=False)[source]#

Note

data_partition = partition_dataset(
    data=train_files,
    num_partitions=dist.get_world_size(),
    shuffle=True,
    even_divisible=True,
)[dist.get_rank()]

train_ds = SmartCacheDataset(
    data=data_partition,
    transform=train_transforms,
    replace_rate=0.2,
    cache_num=15,
)

Parameters:

data (Sequence) – input dataset to split, expect a list of data.
ratios (UnionType[Sequence[float], None]) – a list of ratio number to split the dataset, like [8, 1, 1].
num_partitions (UnionType[int, None]) – expected number of the partitions to evenly split, only works when ratios not specified.
shuffle (bool) – whether to shuffle the original dataset before splitting.
seed (int) – random seed to shuffle the dataset, only works when shuffle is True.
drop_last (bool) – only works when even_divisible is False and no ratios specified. if True, will drop the tail of the data to make it evenly divisible across partitions. if False, will add extra indices to make the data evenly divisible across partitions.
even_divisible (bool) – if True, guarantee every partition has same length.

Examples:

>>> data = [1, 2, 3, 4, 5]
>>> partition_dataset(data, ratios=[0.6, 0.2, 0.2], shuffle=False)
[[1, 2, 3], [4], [5]]
>>> partition_dataset(data, num_partitions=2, shuffle=False)
[[1, 3, 5], [2, 4]]
>>> partition_dataset(data, num_partitions=2, shuffle=False, even_divisible=True, drop_last=True)
[[1, 3], [2, 4]]
>>> partition_dataset(data, num_partitions=2, shuffle=False, even_divisible=True, drop_last=False)
[[1, 3, 5], [2, 4, 1]]
>>> partition_dataset(data, num_partitions=2, shuffle=False, even_divisible=False, drop_last=False)
[[1, 3, 5], [2, 4]]

Partition Dataset based on classes#

monai.data.partition_dataset_classes(data, classes, ratios=None, num_partitions=None, shuffle=False, seed=0, drop_last=False, even_divisible=False)[source]#

Split the dataset into N partitions based on the given class labels. It can make sure the same ratio of classes in every partition. Others are same as monai.data.partition_dataset.

Parameters:

data (Sequence) – input dataset to split, expect a list of data.
classes (Sequence[int]) – a list of labels to help split the data, the length must match the length of data.
ratios (UnionType[Sequence[float], None]) – a list of ratio number to split the dataset, like [8, 1, 1].
num_partitions (UnionType[int, None]) – expected number of the partitions to evenly split, only works when no ratios.
shuffle (bool) – whether to shuffle the original dataset before splitting.
seed (int) – random seed to shuffle the dataset, only works when shuffle is True.
drop_last (bool) – only works when even_divisible is False and no ratios specified. if True, will drop the tail of the data to make it evenly divisible across partitions. if False, will add extra indices to make the data evenly divisible across partitions.
even_divisible (bool) – if True, guarantee every partition has same length.

Examples:

>>> data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
>>> classes = [2, 0, 2, 1, 3, 2, 2, 0, 2, 0, 3, 3, 1, 3]
>>> partition_dataset_classes(data, classes, shuffle=False, ratios=[2, 1])
[[2, 8, 4, 1, 3, 6, 5, 11, 12], [10, 13, 7, 9, 14]]

DistributedSampler#

class monai.data.DistributedSampler(dataset, even_divisible=True, num_replicas=None, rank=None, shuffle=True, **kwargs)[source]#

Enhance PyTorch DistributedSampler to support non-evenly divisible sampling.

Parameters:

dataset (Dataset) – Dataset used for sampling.
even_divisible (bool) – if False, different ranks can have different data length. for example, input data: [1, 2, 3, 4, 5], rank 0: [1, 3, 5], rank 1: [2, 4].
num_replicas (UnionType[int, None]) – number of processes participating in distributed training. by default, world_size is retrieved from the current distributed group.
rank (UnionType[int, None]) – rank of the current process within num_replicas. by default, rank is retrieved from the current distributed group.
shuffle (bool) – if True, sampler will shuffle the indices, default to True.
kwargs – additional arguments for DistributedSampler super class, can be seed and drop_last.

More information about DistributedSampler, please check: https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler.

DistributedWeightedRandomSampler#

class monai.data.DistributedWeightedRandomSampler(dataset, weights, num_samples_per_rank=None, generator=None, even_divisible=True, num_replicas=None, rank=None, **kwargs)[source]#

Extend the DistributedSampler to support weighted sampling. Refer to torch.utils.data.WeightedRandomSampler, for more details please check: https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler.

Parameters:

dataset (Dataset) – Dataset used for sampling.
weights (Sequence[float]) – a sequence of weights, not necessary summing up to one, length should exactly match the full dataset.
num_samples_per_rank (UnionType[int, None]) – number of samples to draw for every rank, sample from the distributed subset of dataset. if None, default to the length of dataset split by DistributedSampler.
generator (UnionType[Generator, None]) – PyTorch Generator used in sampling.
even_divisible (bool) – if False, different ranks can have different data length. for example, input data: [1, 2, 3, 4, 5], rank 0: [1, 3, 5], rank 1: [2, 4].’
num_replicas (UnionType[int, None]) – number of processes participating in distributed training. by default, world_size is retrieved from the current distributed group.
rank (UnionType[int, None]) – rank of the current process within num_replicas. by default, rank is retrieved from the current distributed group.
kwargs – additional arguments for DistributedSampler super class, can be seed and drop_last.

DatasetSummary#

class monai.data.DatasetSummary(dataset, image_key='image', label_key='label', meta_key=None, meta_key_postfix='meta_dict', num_workers=0, **kwargs)[source]#

This class provides a way to calculate a reasonable output voxel spacing according to the input dataset. The achieved values can used to resample the input in 3d segmentation tasks (like using as the pixdim parameter in monai.transforms.Spacingd). In addition, it also supports to compute the mean, std, min and max intensities of the input, and these statistics are helpful for image normalization (as parameters of monai.transforms.ScaleIntensityRanged and monai.transforms.NormalizeIntensityd).

The algorithm for calculation refers to: Automated Design of Deep Learning Methods for Biomedical Image Segmentation.

Decathlon Datalist#

monai.data.load_decathlon_datalist(data_list_file_path, is_segmentation=True, data_list_key='training', base_dir=None)[source]#

Load image/label paths of decathlon challenge from JSON file

JSON file should follow the format of the Medical Segmentation Decathlon datalist.json files, see http://medicaldecathlon.com. The files are structured as follows:

{
    "metadata_key_0": "metadata_value_0",
    "metadata_key_1": "metadata_value_1",
    ...,
    "training": [
        {"image": "path/to/image_1.nii.gz", "label": "path/to/label_1.nii.gz"},
        {"image": "path/to/image_2.nii.gz", "label": "path/to/label_2.nii.gz"},
        ...
    ],
    "test": [
        "path/to/image_3.nii.gz",
        "path/to/image_4.nii.gz",
        ...
    ]
}

The metadata keys are optional for loading the datalist, but include:

some string items: name, description, reference, licence, release, tensorImageSize
two dict items: modality (keyed by channel index), and labels (keyed by label index)
and two integer items: numTraining and numTest, with the number of items.

The training key contains a list of dictionaries, each of which has at least the image and label keys. The image and label are loaded by monai.transforms.LoadImaged(), so both can be either a single file path or a list of file paths, in which case they are loaded as multi-channel images. Each item can also include a fold key for cross-validation purposes. The “test” key contains a list of image paths, without labels, MONAI also supports a “validation” list with the same format as the “training” list.

Parameters:

data_list_file_path (Union[str, PathLike]) – the path to the json file of datalist.
is_segmentation (bool) – whether the datalist is for segmentation task, default is True.
data_list_key (str) – the key to get a list of dictionary to be used, default is “training”.
base_dir (Union[str, PathLike, None]) – the base directory of the dataset, if None, use the datalist directory.

Raises:

ValueError – When data_list_file_path does not point to a file.
ValueError – When data_list_key is not specified in the data list file.

Returns a list of data items, each of which is a dict keyed by element names, for example:

[
    {'image': '/workspace/data/chest_19.nii.gz',  'label': '/workspace/labels/chest_19.nii.gz'},
    {'image': '/workspace/data/chest_31.nii.gz',  'label': '/workspace/labels/chest_31.nii.gz'},
]

Return type:: list[dict]

monai.data.load_decathlon_properties(data_property_file_path, property_keys)[source]#

Extract the properties with the specified keys from the Decathlon JSON file. See under load_decathlon_datalist for the expected keys in the Decathlon challenge.

Parameters:

data_property_file_path (Union[str, PathLike]) – the path to the JSON file of data properties.
property_keys (UnionType[Sequence[str], str]) – expected keys to load from the JSON file, for example, we have these keys in the decathlon challenge: name, description, reference, licence, tensorImageSize, modality, labels, numTraining, numTest, etc.

Return type:

dict

monai.data.check_missing_files(datalist, keys, root_dir=None, allow_missing_keys=False)[source]#

Checks whether some files in the Decathlon datalist are missing. It would be helpful to check missing files before a heavy training run.

Parameters:

datalist (list[dict]) – a list of data items, every item is a dictionary. usually generated by load_decathlon_datalist API.
keys (Union[Collection[Hashable], Hashable]) – expected keys to check in the datalist.
root_dir (Union[str, PathLike, None]) – if not None, provides the root dir for the relative file paths in datalist.
allow_missing_keys (bool) – whether allow missing keys in the datalist items. if False, raise exception if missing. default to False.

Returns:

A list of missing filenames.

monai.data.create_cross_validation_datalist(datalist, nfolds, train_folds, val_folds, train_key='training', val_key='validation', filename=None, shuffle=True, seed=0, check_missing=False, keys=None, root_dir=None, allow_missing_keys=False, raise_error=True)[source]#

Utility to create new Decathlon style datalist based on cross validation partition.

Parameters:

datalist (list[dict]) – loaded list of dictionaries for all the items to partition.
nfolds (int) – number of the kfold split.
train_folds (UnionType[Sequence[int], int]) – indices of folds for training part.
val_folds (UnionType[Sequence[int], int]) – indices of folds for validation part.
train_key (str) – the key of train part in the new datalist, defaults to “training”.
val_key (str) – the key of validation part in the new datalist, defaults to “validation”.
filename (UnionType[Path, str, None]) – if not None and ends with “.json”, save the new datalist into JSON file.
shuffle (bool) – whether to shuffle the datalist before partition, defaults to True.
seed (int) – if shuffle is True, set the random seed, defaults to 0.
check_missing (bool) – whether to check all the files specified by keys are existing.
keys (Union[Collection[Hashable], Hashable, None]) – if not None and check_missing_files is True, the expected keys to check in the datalist.
root_dir (UnionType[str, None]) – if not None, provides the root dir for the relative file paths in datalist.
allow_missing_keys (bool) – if check_missing_files is True, whether allow missing keys in the datalist items. if False, raise exception if missing. default to False.
raise_error (bool) – when found missing files, if True, raise exception and stop, if False, print warning.

DataLoader#

class monai.data.DataLoader(dataset, num_workers=0, **kwargs)[source]#

Provides an iterable over the given dataset. It inherits the PyTorch DataLoader and adds enhanced collate_fn and worker_fn by default.

Although this class could be configured to be the same as torch.utils.data.DataLoader, its default configuration is recommended, mainly for the following extra features:

It handles MONAI randomizable objects with appropriate random state managements for deterministic behaviour.

It is aware of the patch-based transform (such as monai.transforms.RandSpatialCropSamplesDict) samples for preprocessing with enhanced data collating behaviour. See: monai.transforms.Compose.

For more details about torch.utils.data.DataLoader, please see: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader.

For example, to construct a randomized dataset and iterate with the data loader:

import torch

from monai.data import DataLoader
from monai.transforms import Randomizable


class RandomDataset(torch.utils.data.Dataset, Randomizable):
    def __getitem__(self, index):
        return self.R.randint(0, 1000, (1,))

    def __len__(self):
        return 16


dataset = RandomDataset()
dataloader = DataLoader(dataset, batch_size=2, num_workers=4)
for epoch in range(2):
    for i, batch in enumerate(dataloader):
        print(epoch, i, batch.data.numpy().flatten().tolist())

Parameters:

dataset (Dataset) – dataset from which to load the data.
num_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
collate_fn – default to monai.data.utils.list_data_collate().
worker_init_fn – default to monai.data.utils.worker_init_fn().
kwargs – other parameters for PyTorch DataLoader.

ThreadBuffer#

class monai.data.ThreadBuffer(src, buffer_size=1, timeout=0.01)[source]#

Iterates over values from self.src in a separate thread but yielding them in the current thread. This allows values to be queued up asynchronously. The internal thread will continue running so long as the source has values or until the stop() method is called.

One issue raised by using a thread in this way is that during the lifetime of the thread the source object is being iterated over, so if the thread hasn’t finished another attempt to iterate over it will raise an exception or yield unexpected results. To ensure the thread releases the iteration and proper cleanup is done the stop() method must be called which will join with the thread.

Parameters:

src – Source data iterable
buffer_size (int) – Number of items to buffer from the source
timeout (float) – Time to wait for an item from the buffer, or to wait while the buffer is full when adding items

ThreadDataLoader#

class monai.data.ThreadDataLoader(dataset, buffer_size=1, buffer_timeout=0.01, repeats=1, use_thread_workers=False, **kwargs)[source]#

Subclass of DataLoader using a ThreadBuffer object to implement __iter__ method asynchronously. This will iterate over data from the loader as expected however the data is generated on a separate thread. Use this class where a DataLoader instance is required and not just an iterable object.

The default behaviour with repeats set to 1 is to yield each batch as it is generated, however with a higher value the generated batch is yielded that many times while underlying dataset asynchronously generates the next. Typically not all relevant information is learned from a batch in a single iteration so training multiple times on the same batch will still produce good training with minimal short-term overfitting while allowing a slow batch generation process more time to produce a result. This duplication is done by simply yielding the same object many times and not by regenerating the data.

Another typical usage is to accelerate light-weight preprocessing (usually cached all the deterministic transforms and no IO operations), because it leverages the separate thread to execute preprocessing to avoid unnecessary IPC between multiple workers of DataLoader. And as CUDA may not work well with the multi-processing of DataLoader, ThreadDataLoader can be useful for GPU transforms. For more details: Project-MONAI/tutorials.

The use_thread_workers will cause workers to be created as threads rather than processes although everything else in terms of how the class works is unchanged. This allows multiple workers to be used in Windows for example, or in any other situation where thread semantics is desired. Please note that some MONAI components like several datasets and random transforms are not thread-safe and can’t work as expected with thread workers, need to check all the preprocessing components carefully before enabling use_thread_workers.

See:

Fischetti et al. “Faster SGD training by minibatch persistency.” ArXiv (2018) https://arxiv.org/abs/1806.07353
Dami et al., “Faster Neural Network Training with Data Echoing” ArXiv (2020) https://arxiv.org/abs/1907.05550
Ramezani et al. “GCN meets GPU: Decoupling “When to Sample” from “How to Sample”.” NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/file/d714d2c5a796d5814c565d78dd16188d-Paper.pdf

Parameters:

dataset (Dataset) – input dataset.
buffer_size (int) – number of items to buffer from the data source.
buffer_timeout (float) – time to wait for an item from the buffer, or to wait while the buffer is full when adding items.
repeats (int) – number of times to yield the same batch.
use_thread_workers (bool) – if True and num_workers > 0 the workers are created as threads instead of processes
kwargs – other arguments for DataLoader except for dataset.

TestTimeAugmentation#

class monai.data.TestTimeAugmentation(transform, batch_size, num_workers=0, inferrer_fn=<function _identity>, device='cpu', image_key=image, orig_key=label, nearest_interp=True, orig_meta_keys=None, meta_key_postfix='meta_dict', to_tensor=True, output_device='cpu', post_func=<function _identity>, return_full_data=False, progress=True)[source]#

Class for performing test time augmentations. This will pass the same image through the network multiple times.

The user passes transform(s) to be applied to each realization, and provided that at least one of those transforms is random, the network’s output will vary. Provided that inverse transformations exist for all supplied spatial transforms, the inverse can be applied to each realization of the network’s output. Once in the same spatial reference, the results can then be combined and metrics computed.

Test time augmentations are a useful feature for computing network uncertainty, as well as observing the network’s dependency on the applied random transforms.

Reference:: Wang et al., Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks, https://doi.org/10.1016/j.neucom.2019.01.103

Parameters:

transform (InvertibleTransform) – transform (or composed) to be applied to each realization. At least one transform must be of type
RandomizableTrait (i.e. Randomizable, RandomizableTransform, or RandomizableTrait) – . All random transforms must be of type InvertibleTransform.
batch_size (int) – number of realizations to infer at once.
num_workers (int) – how many subprocesses to use for data.
inferrer_fn (Callable) – function to use to perform inference.
device (UnionType[str, device]) – device on which to perform inference.
image_key – key used to extract image from input dictionary.
orig_key – the key of the original input data in the dict. will get the applied transform information for this input data, then invert them for the expected data with image_key.
orig_meta_keys (UnionType[str, None]) – the key of the metadata of original input data, will get the affine, data_shape, etc. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {orig_key}_{meta_key_postfix}.
meta_key_postfix – use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field. this arg only works when meta_keys=None.
to_tensor (bool) – whether to convert the inverted data into PyTorch Tensor first, default to True.
output_device (UnionType[str, device]) – if converted the inverted data to Tensor, move the inverted results to target device before post_func, default to “cpu”.
post_func (Callable) – post processing for the inverted data, should be a callable function.
return_full_data (bool) – normally, metrics are returned (mode, mean, std, vvc). Setting this flag to True will return the full data. Dimensions will be same size as when passing a single image through inferrer_fn, with a dimension appended equal in size to num_examples (N), i.e., [N,C,H,W,[D]].
progress (bool) – whether to display a progress bar.

Example

model = UNet(...).to(device)
transform = Compose([RandAffined(keys, ...), ...])
transform.set_random_state(seed=123)  # ensure deterministic evaluation

tt_aug = TestTimeAugmentation(
    transform, batch_size=5, num_workers=0, inferrer_fn=model, device=device
)
mode, mean, std, vvc = tt_aug(test_data)

N-Dim Fourier Transform#

monai.data.fft_utils.fftn_centered(im, spatial_dims, is_complex=True)[source]#

Pytorch-based fft for spatial_dims-dim signals. “centered” means this function automatically takes care of the required ifft and fft shifts. This function calls monai.networks.blocks.fft_utils_t.fftn_centered_t. This is equivalent to do ifft in numpy based on numpy.fft.fftn, numpy.fft.fftshift, and numpy.fft.ifftshift

Parameters:

im (Union[ndarray, Tensor]) – image that can be 1) real-valued: the shape is (C,H,W) for 2D spatial inputs and (C,H,W,D) for 3D, or 2) complex-valued: the shape is (C,H,W,2) for 2D spatial data and (C,H,W,D,2) for 3D. C is the number of channels.
spatial_dims (int) – number of spatial dimensions (e.g., is 2 for an image, and is 3 for a volume)
is_complex (bool) – if True, then the last dimension of the input im is expected to be 2 (representing real and imaginary channels)

Return type:

Union[ndarray, Tensor]

Returns:

“out” which is the output kspace (fourier of im)

Example

import torch
im = torch.ones(1,3,3,2) # the last dim belongs to real/imaginary parts
# output1 and output2 will be identical
output1 = torch.fft.fftn(torch.view_as_complex(torch.fft.ifftshift(im,dim=(-3,-2))), dim=(-2,-1), norm="ortho")
output1 = torch.fft.fftshift( torch.view_as_real(output1), dim=(-3,-2) )

output2 = fftn_centered(im, spatial_dims=2, is_complex=True)

monai.data.fft_utils.ifftn_centered(ksp, spatial_dims, is_complex=True)[source]#

Pytorch-based ifft for spatial_dims-dim signals. “centered” means this function automatically takes care of the required ifft and fft shifts. This function calls monai.networks.blocks.fft_utils_t.ifftn_centered_t. This is equivalent to do fft in numpy based on numpy.fft.ifftn, numpy.fft.fftshift, and numpy.fft.ifftshift

Parameters:

ksp (Union[ndarray, Tensor]) – k-space data that can be 1) real-valued: the shape is (C,H,W) for 2D spatial inputs and (C,H,W,D) for 3D, or 2) complex-valued: the shape is (C,H,W,2) for 2D spatial data and (C,H,W,D,2) for 3D. C is the number of channels.
spatial_dims (int) – number of spatial dimensions (e.g., is 2 for an image, and is 3 for a volume)
is_complex (bool) – if True, then the last dimension of the input ksp is expected to be 2 (representing real and imaginary channels)

Return type:

Union[ndarray, Tensor]

Returns:

“out” which is the output image (inverse fourier of ksp)

Example

import torch
ksp = torch.ones(1,3,3,2) # the last dim belongs to real/imaginary parts
# output1 and output2 will be identical
output1 = torch.fft.ifftn(torch.view_as_complex(torch.fft.ifftshift(ksp,dim=(-3,-2))), dim=(-2,-1), norm="ortho")
output1 = torch.fft.fftshift( torch.view_as_real(output1), dim=(-3,-2) )

output2 = ifftn_centered(ksp, spatial_dims=2, is_complex=True)

ITK Torch Bridge#

monai.data.itk_torch_bridge.get_itk_image_center(image)[source]#

Calculates the center of the ITK image based on its origin, size, and spacing. This center is equivalent to the implicit image center that MONAI uses.

Parameters:: image – The ITK image.
Returns:: The center of the image as a list of coordinates.

monai.data.itk_torch_bridge.itk_image_to_metatensor(image, channel_dim=None, dtype=<class 'float'>)[source]#

Converts an ITK image to a MetaTensor object.

Parameters:

image – The ITK image to be converted.
channel_dim (UnionType[str, int, None]) – the channel dimension of the input image, default is None. This is used to set original_channel_dim in the metadata, EnsureChannelFirst reads this field. If None, the channel_dim is inferred automatically. If the input array doesn’t have a channel dim, this value should be 'no_channel'.
dtype (Union[dtype, type, str, None, dtype]) – output dtype, defaults to the Python built-in float.

Return type:

MetaTensor

Returns:

A MetaTensor object containing the array data and metadata in ChannelFirst format.

monai.data.itk_torch_bridge.itk_to_monai_affine(image, matrix, translation, center_of_rotation=None, reference_image=None)[source]#

Converts an ITK affine matrix (2x2 for 2D or 3x3 for 3D matrix and translation vector) to a MONAI affine matrix.

Parameters:

image – The ITK image object. This is used to extract the spacing and direction information.
matrix – The 2x2 or 3x3 ITK affine matrix.
translation – The 2-element or 3-element ITK affine translation vector.
center_of_rotation – The center of rotation. If provided, the affine matrix will be adjusted to account for the difference between the center of the image and the center of rotation.
reference_image – The coordinate space that matrix and translation were defined in respect to. If not supplied, the coordinate space of image is used.

Return type:

Tensor

Returns:

A 4x4 MONAI affine matrix.

monai.data.itk_torch_bridge.metatensor_to_itk_image(meta_tensor, channel_dim=0, dtype=<class 'numpy.float32'>, **kwargs)[source]#

Converts a MetaTensor object to an ITK image. Expects the MetaTensor to be in ChannelFirst format.

Parameters:

meta_tensor (MetaTensor) – The MetaTensor to be converted.
channel_dim (UnionType[int, None]) – channel dimension of the data array, defaults to 0 (Channel-first). None indicates no channel dimension. This is used to create a Vector Image if it is not None.
dtype (Union[dtype, type, str, None]) – output data type, defaults to np.float32.
kwargs – additional keyword arguments. Currently itk.GetImageFromArray will get ttype from this dictionary.

Returns:

The ITK image.

See also: ITKWriter.create_backend_obj()

monai.data.itk_torch_bridge.monai_to_itk_affine(image, affine_matrix, center_of_rotation=None)[source]#

Converts a MONAI affine matrix to an ITK affine matrix (2x2 for 2D or 3x3 for 3D matrix and translation vector). See also ‘itk_to_monai_affine’.

Parameters:

image – The ITK image object. This is used to extract the spacing and direction information.
affine_matrix – The 3x3 for 2D or 4x4 for 3D MONAI affine matrix.
center_of_rotation – The center of rotation. If provided, the affine matrix will be adjusted to account for the difference between the center of the image and the center of rotation.

Returns:

The ITK matrix and the translation vector.

monai.data.itk_torch_bridge.monai_to_itk_ddf(image, ddf)[source]#

converting the dense displacement field from the MONAI space to the ITK :param image: itk image of array shape 2D: (H, W) or 3D: (D, H, W) :param ddf: numpy array of shape 2D: (2, H, W) or 3D: (3, D, H, W)

Returns:: itk image of the corresponding displacement field
Return type:: displacement_field

Meta Object#

class monai.data.meta_obj.MetaObj[source]#

Abstract base class that stores data as well as any extra metadata.

This allows for subclassing torch.Tensor and np.ndarray through multiple inheritance.

Metadata is stored in the form of a dictionary.

Behavior should be the same as extended class (e.g., torch.Tensor or np.ndarray) aside from the extended meta functionality.

Copying of information:

For c = a + b, then auxiliary data (e.g., metadata) will be copied from the first instance of MetaObj if a.is_batch is False (For batched data, the metadata will be shallow copied for efficiency purposes).

property applied_operations: list[dict]#

Get the applied operations. Defaults to [].

Return type:: list[dict]

static copy_items(data)[source]#: returns a copy of the data. list and dict are shallow copied for efficiency purposes.

copy_meta_from(input_objs, copy_attr=True, keys=None)[source]#

Copy metadata from a MetaObj or an iterable of MetaObj instances.

Parameters:

input_objs – list of MetaObj to copy data from.
copy_attr – whether to copy each attribute with MetaObj.copy_item. note that if the attribute is a nested list or dict, only a shallow copy will be done.
keys – the keys of attributes to copy from the input_objs. If None, all keys from the input_objs will be copied.

static flatten_meta_objs(*args)[source]#

Recursively flatten input and yield all instances of MetaObj. This means that for both torch.add(a, b), torch.stack([a, b]) (and their numpy equivalents), we return [a, b] if both a and b are of type MetaObj.

Parameters:: args (Iterable) – Iterables of inputs to be flattened.
Returns:: list of nested MetaObj from input.

static get_default_applied_operations()[source]#

Get the default applied operations.

Return type:: list
Returns:: default applied operations.

static get_default_meta()[source]#

Get the default meta.

Return type:: dict
Returns:: default metadata.

property has_pending_operations: bool#: Determine whether there are pending operations. :rtype: bool :returns: True if there are pending operations; False if not

property is_batch: bool#

Return whether object is part of batch or not.

Return type:: bool

property meta: dict#

Get the meta. Defaults to {}.

Return type:: dict

property pending_operations: list[dict]#

Get the pending operations. Defaults to [].

Return type:: list[dict]

monai.data.meta_obj.get_track_meta()[source]#

Return the boolean as to whether metadata is tracked. If True, metadata will be associated its data by using subclasses of MetaObj. If False, then data will be returned with empty metadata.

If set_track_meta is False, then standard data objects will be returned (e.g., torch.Tensor and np.ndarray) as opposed to MONAI’s enhanced objects.

By default, this is True, and most users will want to leave it this way. However, if you are experiencing any problems regarding metadata, and aren’t interested in preserving metadata, then you can disable it.

Return type:: bool

monai.data.meta_obj.set_track_meta(val)[source]#

Boolean to set whether metadata is tracked. If True, metadata will be associated its data by using subclasses of MetaObj. If False, then data will be returned with empty metadata.

If set_track_meta is False, then standard data objects will be returned (e.g., torch.Tensor and np.ndarray) as opposed to MONAI’s enhanced objects.

Return type:: None

MetaTensor#

class monai.data.MetaTensor(x, affine=None, meta=None, applied_operations=None, *_args, **_kwargs)[source]#

Bases: MetaObj, Tensor

Class that inherits from both torch.Tensor and MetaObj, adding support for metadata.

Metadata is stored in the form of a dictionary. Nested, an affine matrix will be stored. This should be in the form of torch.Tensor.

Behavior should be the same as torch.Tensor aside from the extended meta functionality.

Copying of information:

For c = a + b, then auxiliary data (e.g., metadata) will be copied from the first instance of MetaTensor if a.is_batch is False (For batched data, the metadata will be shallow copied for efficiency purposes).

Example

import torch
from monai.data import MetaTensor

t = torch.tensor([1,2,3])
affine = torch.as_tensor([[2,0,0,0],
                          [0,2,0,0],
                          [0,0,2,0],
                          [0,0,0,1]], dtype=torch.float64)
meta = {"some": "info"}
m = MetaTensor(t, affine=affine, meta=meta)
m2 = m + m
assert isinstance(m2, MetaTensor)
assert m2.meta["some"] == "info"
assert torch.all(m2.affine == affine)

Notes

Requires pytorch 1.9 or newer for full compatibility.
Older versions of pytorch (<=1.8), torch.jit.trace(net, im) may not work if im is of type MetaTensor. This can be resolved with torch.jit.trace(net, im.as_tensor()).
For pytorch < 1.8, sharing MetaTensor instances across processes may not be supported.
For pytorch < 1.9, next(iter(meta_tensor)) returns a torch.Tensor. see: pytorch/pytorch#54457
A warning will be raised if in the constructor affine is not None and meta already contains the key affine.
You can query whether the MetaTensor is a batch with the is_batch attribute.
With a batch of data, batch[0] will return the 0th image with the 0th metadata. When the batch dimension is non-singleton, e.g., batch[:, 0], batch[…, -1] and batch[1:3], then all (or a subset in the last example) of the metadata will be returned, and is_batch will return True.
When creating a batch with this class, use monai.data.DataLoader as opposed to torch.utils.data.DataLoader, as this will take care of collating the metadata properly.

H#

Returns a view of a matrix (2-D tensor) conjugated and transposed.

x.H is equivalent to x.transpose(0, 1).conj() for complex matrices and x.transpose(0, 1) for real matrices.

See also

mH: An attribute that also works on batches of matrices.

T#

Returns a view of this tensor with its dimensions reversed.

If n is the number of dimensions in x, x.T is equivalent to x.permute(n-1, n-2, ..., 0).

Warning

The use of Tensor.T() on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider mT to transpose batches of matrices or x.permute(*torch.arange(x.ndim - 1, -1, -1)) to reverse the dimensions of a tensor.

__init__(x, affine=None, meta=None, applied_operations=None, *_args, **_kwargs)[source]#

Parameters:

x – initial array for the MetaTensor. Can be a list, tuple, NumPy ndarray, scalar, and other types.
affine (UnionType[Tensor, None]) – optional 4x4 array.
meta (UnionType[dict, None]) – dictionary of metadata.
applied_operations (UnionType[list, None]) – list of previously applied operations on the MetaTensor, the list is typically maintained by monai.transforms.TraceableTransform. See also: monai.transforms.TraceableTransform
_args – additional args (currently not in use in this constructor).
_kwargs – additional kwargs (currently not in use in this constructor).

Note

If a meta dictionary is given, use it. Else, if meta exists in the input tensor x, use it. Else, use the default value. Similar for the affine, except this could come from four places, priority: affine, meta[“affine”], x.affine, get_default_affine.

abs() → Tensor#: See torch.abs()

abs_() → Tensor#: In-place version of abs()

absolute() → Tensor#: Alias for abs()

absolute_() → Tensor#: In-place version of absolute() Alias for abs_()

acos() → Tensor#: See torch.acos()

acos_() → Tensor#: In-place version of acos()

acosh() → Tensor#: See torch.acosh()

acosh_() → Tensor#: In-place version of acosh()

add(other, *, alpha=1) → Tensor#

Add a scalar or tensor to self tensor. If both alpha and other are specified, each element of other is scaled by alpha before being used.

When other is a tensor, the shape of other must be broadcastable with the shape of the underlying tensor

See torch.add()

add_(other, *, alpha=1) → Tensor#: In-place version of add()

addbmm(batch1, batch2, *, beta=1, alpha=1) → Tensor#: See torch.addbmm()

addbmm_(batch1, batch2, *, beta=1, alpha=1) → Tensor#: In-place version of addbmm()

addcdiv(tensor1, tensor2, *, value=1) → Tensor#: See torch.addcdiv()

addcdiv_(tensor1, tensor2, *, value=1) → Tensor#: In-place version of addcdiv()

addcmul(tensor1, tensor2, *, value=1) → Tensor#: See torch.addcmul()

addcmul_(tensor1, tensor2, *, value=1) → Tensor#: In-place version of addcmul()

addmm(mat1, mat2, *, beta=1, alpha=1) → Tensor#: See torch.addmm()

addmm_(mat1, mat2, *, beta=1, alpha=1) → Tensor#: In-place version of addmm()

addmv(mat, vec, *, beta=1, alpha=1) → Tensor#: See torch.addmv()

addmv_(mat, vec, *, beta=1, alpha=1) → Tensor#: In-place version of addmv()

addr(vec1, vec2, *, beta=1, alpha=1) → Tensor#: See torch.addr()

addr_(vec1, vec2, *, beta=1, alpha=1) → Tensor#: In-place version of addr()

adjoint() → Tensor#: Alias for adjoint()

property affine: Tensor#

Get the affine. Defaults to torch.eye(4, dtype=torch.float64)

Return type:: Tensor

align_as(other) → Tensor#

Permutes the dimensions of the self tensor to match the dimension order in the other tensor, adding size-one dims for any new names.

This operation is useful for explicit broadcasting by names (see examples).

All of the dims of self must be named in order to use this method. The resulting tensor is a view on the original tensor.

All dimension names of self must be present in other.names. other may contain named dimensions that are not in self.names; the output tensor has a size-one dimension for each of those new names.

To align a tensor to a specific order, use align_to().

Examples:

# Example 1: Applying a mask
>>> mask = torch.randint(2, [127, 128], dtype=torch.bool).refine_names('W', 'H')
>>> imgs = torch.randn(32, 128, 127, 3, names=('N', 'H', 'W', 'C'))
>>> imgs.masked_fill_(mask.align_as(imgs), 0)


# Example 2: Applying a per-channel-scale
>>> def scale_channels(input, scale):
>>>    scale = scale.refine_names('C')
>>>    return input * scale.align_as(input)

>>> num_channels = 3
>>> scale = torch.randn(num_channels, names=('C',))
>>> imgs = torch.rand(32, 128, 128, num_channels, names=('N', 'H', 'W', 'C'))
>>> more_imgs = torch.rand(32, num_channels, 128, 128, names=('N', 'C', 'H', 'W'))
>>> videos = torch.randn(3, num_channels, 128, 128, 128, names=('N', 'C', 'H', 'W', 'D'))

# scale_channels is agnostic to the dimension order of the input
>>> scale_channels(imgs, scale)
>>> scale_channels(more_imgs, scale)
>>> scale_channels(videos, scale)

Warning

The named tensor API is experimental and subject to change.

align_to(*names)#

Permutes the dimensions of the self tensor to match the order specified in names, adding size-one dims for any new names.

All of the dims of self must be named in order to use this method. The resulting tensor is a view on the original tensor.

All dimension names of self must be present in names. names may contain additional names that are not in self.names; the output tensor has a size-one dimension for each of those new names.

names may contain up to one Ellipsis (...). The Ellipsis is expanded to be equal to all dimension names of self that are not mentioned in names, in the order that they appear in self.

Python 2 does not support Ellipsis but one may use a string literal instead ('...').

Parameters:: names (iterable of str) – The desired dimension ordering of the output tensor. May contain up to one Ellipsis that is expanded to all unmentioned dim names of self.

Examples:

>>> tensor = torch.randn(2, 2, 2, 2, 2, 2)
>>> named_tensor = tensor.refine_names('A', 'B', 'C', 'D', 'E', 'F')

# Move the F and E dims to the front while keeping the rest in order
>>> named_tensor.align_to('F', 'E', ...)

Warning

The named tensor API is experimental and subject to change.

all(dim=None, keepdim=False) → Tensor#: See torch.all()

allclose(other, rtol=1e-05, atol=1e-08, equal_nan=False) → Tensor#: See torch.allclose()

amax(dim=None, keepdim=False) → Tensor#: See torch.amax()

amin(dim=None, keepdim=False) → Tensor#: See torch.amin()

aminmax(*, dim=None, keepdim=False) -> (Tensor min, Tensor max)#: See torch.aminmax()

angle() → Tensor#: See torch.angle()

any(dim=None, keepdim=False) → Tensor#: See torch.any()

apply_(callable) → Tensor#: Applies the function callable to each element in the tensor, replacing each element with the value returned by callable.

Note

This function only works with CPU tensors and should not be used in code sections that require high performance.

arccos() → Tensor#: See torch.arccos()

arccos_() → Tensor#: In-place version of arccos()

arccosh()#

acosh() -> Tensor

See torch.arccosh()

arccosh_()#

acosh_() -> Tensor

In-place version of arccosh()

arcsin() → Tensor#: See torch.arcsin()

arcsin_() → Tensor#: In-place version of arcsin()

arcsinh() → Tensor#: See torch.arcsinh()

arcsinh_() → Tensor#: In-place version of arcsinh()

arctan() → Tensor#: See torch.arctan()

arctan2(other) → Tensor#: See torch.arctan2()

arctan2_()#

atan2_(other) -> Tensor

In-place version of arctan2()

arctan_() → Tensor#: In-place version of arctan()

arctanh() → Tensor#: See torch.arctanh()

arctanh_(other) → Tensor#: In-place version of arctanh()

argmax(dim=None, keepdim=False) → LongTensor#: See torch.argmax()

argmin(dim=None, keepdim=False) → LongTensor#: See torch.argmin()

argsort(dim=-1, descending=False) → LongTensor#: See torch.argsort()

argwhere() → Tensor#: See torch.argwhere()

property array#

Returns a numpy array of self. The array and self shares the same underlying storage if self is on cpu. Changes to self (it’s a subclass of torch.Tensor) will be reflected in the ndarray and vice versa. If self is not on cpu, the call will move the array to cpu and then the storage is not shared.

Getter:: see also: MetaTensor.get_array()
Setter:: see also: MetaTensor.set_array()

as_dict(key, output_type=<class 'torch.Tensor'>, dtype=None)[source]#

Get the object as a dictionary for backwards compatibility. This method does not make a deep copy of the objects.

Parameters:

key (str) – Base key to store main data. The key for the metadata will be determined using PostFix.
output_type – torch.Tensor or np.ndarray for the main data.
dtype – dtype of output data. Converted to correct library type (e.g., np.float32 is converted to torch.float32 if output type is torch.Tensor). If left blank, it remains unchanged.

Return type:

dict

Returns:

A dictionary consisting of three keys, the main data (stored under key) and the metadata.

as_strided(size, stride, storage_offset=None) → Tensor#: See torch.as_strided()

as_strided_(size, stride, storage_offset=None) → Tensor#: In-place version of as_strided()

as_strided_scatter(src, size, stride, storage_offset=None) → Tensor#: See torch.as_strided_scatter()

as_subclass(cls) → Tensor#: Makes a cls instance with the same data pointer as self. Changes in the output mirror changes in self, and the output stays attached to the autograd graph. cls must be a subclass of Tensor.

as_tensor()[source]#

Return the MetaTensor as a torch.Tensor. It is OS dependent as to whether this will be a deep copy or not.

Return type:: Tensor

asin() → Tensor#: See torch.asin()

asin_() → Tensor#: In-place version of asin()

asinh() → Tensor#: See torch.asinh()

asinh_() → Tensor#: In-place version of asinh()

astype(dtype, device=None, *_args, **_kwargs)[source]#

Cast to dtype, sharing data whenever possible.

Parameters:

dtype – dtypes such as np.float32, torch.float, “np.float32”, float.
device – the device if dtype is a torch data type.
_args – additional args (currently unused).
_kwargs – additional kwargs (currently unused).

Returns:

data array instance

atan() → Tensor#: See torch.atan()

atan2(other) → Tensor#: See torch.atan2()

atan2_(other) → Tensor#: In-place version of atan2()

atan_() → Tensor#: In-place version of atan()

atanh() → Tensor#: See torch.atanh()

atanh_(other) → Tensor#: In-place version of atanh()

backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)#

Computes the gradient of current tensor wrt graph leaves.

The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying a gradient. It should be a tensor of matching type and shape, that represents the gradient of the differentiated function w.r.t. self.

This function accumulates gradients in the leaves - you might need to zero .grad attributes or set them to None before calling it. See Default gradient layouts for details on the memory layout of accumulated gradients.

Note

If you run any forward ops, create gradient, and/or call backward in a user-specified CUDA stream context, see Stream semantics of backward passes.

Note

When inputs are provided and a given input is not a leaf, the current implementation will call its grad_fn (though it is not strictly needed to get this gradients). It is an implementation detail on which the user should not rely. See pytorch/pytorch#60521 for more details.

Parameters:

gradient (Tensor, optional) – The gradient of the function being differentiated w.r.t. self. This argument can be omitted if self is a scalar. Defaults to None.
retain_graph (bool, optional) – If False, the graph used to compute the grads will be freed; If True, it will be retained. The default is None, in which case the value is inferred from create_graph (i.e., the graph is retained only when higher-order derivative tracking is requested). Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way.
create_graph (bool, optional) – If True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to False.
inputs (Sequence[Tensor], optional) – Inputs w.r.t. which the gradient will be accumulated into .grad. All other tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the tensors. Defaults to None.

baddbmm(batch1, batch2, *, beta=1, alpha=1) → Tensor#: See torch.baddbmm()

baddbmm_(batch1, batch2, *, beta=1, alpha=1) → Tensor#: In-place version of baddbmm()

bernoulli(*, generator=None) → Tensor#

Returns a result tensor where each $\texttt{result[i]}$ is independently sampled from $\text{Bernoulli}(\texttt{self[i]})$. self must have floating point dtype, and the result will have the same dtype.

See torch.bernoulli()

bernoulli_(p=0.5, *, generator=None) → Tensor#

Fills each location of self with an independent sample from $\text{Bernoulli}(\texttt{p})$. self can have integral dtype.

p should either be a scalar or tensor containing probabilities to be used for drawing the binary random number.

If it is a tensor, the $\text{i}^{th}$ element of self tensor will be set to a value sampled from $\text{Bernoulli}(\texttt{p\_tensor[i]})$. In this case p must have floating point dtype.

See also bernoulli() and torch.bernoulli()

bfloat16(memory_format=torch.preserve_format) → Tensor#

self.bfloat16() is equivalent to self.to(torch.bfloat16). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

bincount(weights=None, minlength=0) → Tensor#: See torch.bincount()

bitwise_and() → Tensor#: See torch.bitwise_and()

bitwise_and_() → Tensor#: In-place version of bitwise_and()

bitwise_left_shift(other) → Tensor#: See torch.bitwise_left_shift()

bitwise_left_shift_(other) → Tensor#: In-place version of bitwise_left_shift()

bitwise_not() → Tensor#: See torch.bitwise_not()

bitwise_not_() → Tensor#: In-place version of bitwise_not()

bitwise_or() → Tensor#: See torch.bitwise_or()

bitwise_or_() → Tensor#: In-place version of bitwise_or()

bitwise_right_shift(other) → Tensor#: See torch.bitwise_right_shift()

bitwise_right_shift_(other) → Tensor#: In-place version of bitwise_right_shift()

bitwise_xor() → Tensor#: See torch.bitwise_xor()

bitwise_xor_() → Tensor#: In-place version of bitwise_xor()

bmm(batch2) → Tensor#: See torch.bmm()

bool(memory_format=torch.preserve_format) → Tensor#

self.bool() is equivalent to self.to(torch.bool). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

broadcast_to(shape) → Tensor#: See torch.broadcast_to().

byte(memory_format=torch.preserve_format) → Tensor#

self.byte() is equivalent to self.to(torch.uint8). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

cauchy_(median=0, sigma=1, *, generator=None) → Tensor#: Fills the tensor with numbers drawn from the Cauchy distribution:

\[f(x) = \dfrac{1}{\pi} \dfrac{\sigma}{(x - \text{median})^2 + \sigma^2}\]

Note

Sigma ($\sigma$) is used to denote the scale parameter in Cauchy distribution.

cdouble(memory_format=torch.preserve_format) → Tensor#

self.cdouble() is equivalent to self.to(torch.complex128). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

ceil() → Tensor#: See torch.ceil()

ceil_() → Tensor#: In-place version of ceil()

cfloat(memory_format=torch.preserve_format) → Tensor#

self.cfloat() is equivalent to self.to(torch.complex64). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

chalf(memory_format=torch.preserve_format) → Tensor#

self.chalf() is equivalent to self.to(torch.complex32). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

char(memory_format=torch.preserve_format) → Tensor#

self.char() is equivalent to self.to(torch.int8). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

cholesky(upper=False) → Tensor#: See torch.cholesky()

cholesky_inverse(upper=False) → Tensor#: See torch.cholesky_inverse()

cholesky_solve(input2, upper=False) → Tensor#: See torch.cholesky_solve()

chunk(chunks, dim=0) → List of Tensors#: See torch.chunk()

clamp(min=None, max=None) → Tensor#: See torch.clamp()

clamp_(min=None, max=None) → Tensor#: In-place version of clamp()

clip(min=None, max=None) → Tensor#: Alias for clamp().

clip_(min=None, max=None) → Tensor#: Alias for clamp_().

clone(**kwargs)[source]#

Returns a copy of the MetaTensor instance.

Parameters:: kwargs – additional keyword arguments to torch.clone.

coalesce() → Tensor#

Returns a coalesced copy of self if self is an uncoalesced tensor.

Returns self if self is a coalesced tensor.

Warning

Throws an error if self is not a sparse COO tensor.

col_indices() → IntTensor#

Returns the tensor containing the column indices of the self tensor when self is a sparse CSR tensor of layout sparse_csr. The col_indices tensor is strictly of shape (self.nnz()) and of type int32 or int64. When using MKL routines such as sparse matrix multiplication, it is necessary to use int32 indexing in order to avoid downcasting and potentially losing information.

Example:

>>> csr = torch.eye(5,5).to_sparse_csr()
>>> csr.col_indices()
tensor([0, 1, 2, 3, 4], dtype=torch.int32)

conj() → Tensor#: See torch.conj()

conj_physical() → Tensor#: See torch.conj_physical()

conj_physical_() → Tensor#: In-place version of conj_physical()

contiguous(memory_format=torch.contiguous_format) → Tensor#

Returns a contiguous in memory tensor containing the same data as self tensor. If self tensor is already in the specified memory format, this function returns the self tensor.

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.contiguous_format.

copy_(src, non_blocking=False) → Tensor#

Copies the elements from src into self tensor and returns self.

The src tensor must be broadcastable with the self tensor. It may be of a different data type or reside on a different device.

Parameters:

src (Tensor) – the source tensor to copy from
non_blocking (bool) – if True and this copy is between CPU and GPU, the copy may occur asynchronously with respect to the host. For other cases, this argument has no effect.

copysign(other) → Tensor#: See torch.copysign()

copysign_(other) → Tensor#: In-place version of copysign()

corrcoef() → Tensor#: See torch.corrcoef()

cos() → Tensor#: See torch.cos()

cos_() → Tensor#: In-place version of cos()

cosh() → Tensor#: See torch.cosh()

cosh_() → Tensor#: In-place version of cosh()

count_nonzero(dim=None) → Tensor#: See torch.count_nonzero()

cov(*, correction=1, fweights=None, aweights=None) → Tensor#: See torch.cov()

cpu(memory_format=torch.preserve_format) → Tensor#

Returns a copy of this object in CPU memory.

If this object is already in CPU memory, then no copy is performed and the original object is returned.

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

cross(other, dim=None) → Tensor#: See torch.cross()

crow_indices() → IntTensor#

Returns the tensor containing the compressed row indices of the self tensor when self is a sparse CSR tensor of layout sparse_csr. The crow_indices tensor is strictly of shape (self.size(0) + 1) and of type int32 or int64. When using MKL routines such as sparse matrix multiplication, it is necessary to use int32 indexing in order to avoid downcasting and potentially losing information.

Example:

>>> csr = torch.eye(5,5).to_sparse_csr()
>>> csr.crow_indices()
tensor([0, 1, 2, 3, 4, 5], dtype=torch.int32)

cuda(device=None, non_blocking=False, memory_format=torch.preserve_format) → Tensor#

Returns a copy of this object in CUDA memory.

If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned.

Parameters:

device (torch.device) – The destination GPU device. Defaults to the current CUDA device.
non_blocking (bool) – If True and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. Default: False.
memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

cummax(dim)#: See torch.cummax()

cummin(dim)#: See torch.cummin()

cumprod(dim, dtype=None) → Tensor#: See torch.cumprod()

cumprod_(dim, dtype=None) → Tensor#: In-place version of cumprod()

cumsum(dim, dtype=None) → Tensor#: See torch.cumsum()

cumsum_(dim, dtype=None) → Tensor#: In-place version of cumsum()

data_ptr() → int#: Returns the address of the first element of self tensor.

deg2rad() → Tensor#: See torch.deg2rad()

deg2rad_() → Tensor#: In-place version of deg2rad()

dense_dim() → int#

Return the number of dense dimensions in a sparse tensor self.

Note

Returns len(self.shape) if self is not a sparse tensor.

See also Tensor.sparse_dim() and hybrid tensors.

dequantize() → Tensor#: Given a quantized Tensor, dequantize it and return the dequantized float Tensor.

det() → Tensor#: See torch.det()

detach()#

Returns a new Tensor, detached from the current graph.

The result will never require gradient.

This method also affects forward mode AD gradients and the result will never have forward mode AD gradients.

Note

Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks.

detach_()#

Detaches the Tensor from the graph that created it, making it a leaf. Views cannot be detached in-place.

This method also affects forward mode AD gradients and the result will never have forward mode AD gradients.

device#: Is the torch.device where this Tensor is.

diag(diagonal=0) → Tensor#: See torch.diag()

diag_embed(offset=0, dim1=-2, dim2=-1) → Tensor#: See torch.diag_embed()

diagflat(offset=0) → Tensor#: See torch.diagflat()

diagonal(offset=0, dim1=0, dim2=1) → Tensor#: See torch.diagonal()

diagonal_scatter(src, offset=0, dim1=0, dim2=1) → Tensor#: See torch.diagonal_scatter()

diff(n=1, dim=-1, prepend=None, append=None) → Tensor#: See torch.diff()

digamma() → Tensor#: See torch.digamma()

digamma_() → Tensor#: In-place version of digamma()

dim() → int#: Returns the number of dimensions of self tensor.

dim_order(*, ambiguity_check=False)#

Returns the uniquely determined tuple of int describing the dim order or physical layout of self.

The dim order represents how dimensions are laid out in memory of dense tensors, starting from the outermost to the innermost dimension.

Note that the dim order may not always be uniquely determined. If ambiguity_check is True, this function raises a RuntimeError when the dim order cannot be uniquely determined; If ambiguity_check is a list of memory formats, this function raises a RuntimeError when tensor can not be interpreted into exactly one of the given memory formats, or it cannot be uniquely determined. If ambiguity_check is False, it will return one of legal dim order(s) without checking its uniqueness. Otherwise, it will raise TypeError.

Parameters:: ambiguity_check (bool or List[torch.memory_format]) – The check method for ambiguity of dim order.

Examples:

>>> torch.empty((2, 3, 5, 7)).dim_order()
(0, 1, 2, 3)
>>> torch.empty((2, 3, 5, 7)).transpose(1, 2).dim_order()
(0, 2, 1, 3)
>>> torch.empty((2, 3, 5, 7), memory_format=torch.channels_last).dim_order()
(0, 2, 3, 1)
>>> torch.empty((1, 2, 3, 4)).dim_order()
(0, 1, 2, 3)
>>> try:
...     torch.empty((1, 2, 3, 4)).dim_order(ambiguity_check=True)
... except RuntimeError as e:
...     print(e)
The tensor does not have unique dim order, or cannot map to exact one of the given memory formats.
>>> torch.empty((1, 2, 3, 4)).dim_order(
...     ambiguity_check=[torch.contiguous_format, torch.channels_last]
... )  # It can be mapped to contiguous format
(0, 1, 2, 3)
>>> try:
...     torch.empty((1, 2, 3, 4)).dim_order(ambiguity_check="ILLEGAL")
... except TypeError as e:
...     print(e)
The ambiguity_check argument must be a bool or a list of memory formats.

Warning

The dim_order tensor API is experimental and subject to change.

dist(other, p=2) → Tensor#: See torch.dist()

div(value, *, rounding_mode=None) → Tensor#: See torch.div()

div_(value, *, rounding_mode=None) → Tensor#: In-place version of div()

divide(value, *, rounding_mode=None) → Tensor#: See torch.divide()

divide_(value, *, rounding_mode=None) → Tensor#: In-place version of divide()

dot(other) → Tensor#: See torch.dot()

double(memory_format=torch.preserve_format) → Tensor#

self.double() is equivalent to self.to(torch.float64). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

dsplit(split_size_or_sections) → List of Tensors#: See torch.dsplit()

element_size() → int#

Returns the size in bytes of an individual element.

Example:

>>> torch.tensor([]).element_size()
4
>>> torch.tensor([], dtype=torch.uint8).element_size()
1

static ensure_torch_and_prune_meta(im, meta, simple_keys=False, pattern=None, sep='.')[source]#

Convert the image to MetaTensor (when meta is not None). If affine is in the meta dictionary, convert that to torch.Tensor, too. Remove any superfluous metadata.

Parameters:

im (~NdarrayTensor) – Input image (np.ndarray or torch.Tensor)
meta (UnionType[dict, None]) – Metadata dictionary. When it’s None, the metadata is not tracked, this method returns a torch.Tensor.
simple_keys (bool) – whether to keep only a simple subset of metadata keys.
pattern (UnionType[str, None]) – combined with sep, a regular expression used to match and prune keys in the metadata (nested dictionary), default to None, no key deletion.
sep (str) – combined with pattern, used to match and delete keys in the metadata (nested dictionary). default is “.”, see also monai.transforms.DeleteItemsd. e.g. pattern=".*_code$", sep=" " removes any meta keys that ends with "_code".

Returns:

By default, a MetaTensor is returned. However, if get_track_meta() is False or meta=None, a torch.Tensor is returned.

eq(other) → Tensor#: See torch.eq()

eq_(other) → Tensor#: In-place version of eq()

equal(other) → bool#: See torch.equal()

erf() → Tensor#: See torch.erf()

erf_() → Tensor#: In-place version of erf()

erfc() → Tensor#: See torch.erfc()

erfc_() → Tensor#: In-place version of erfc()

erfinv() → Tensor#: See torch.erfinv()

erfinv_() → Tensor#: In-place version of erfinv()

exp() → Tensor#: See torch.exp()

exp2() → Tensor#: See torch.exp2()

exp2_() → Tensor#: In-place version of exp2()

exp_() → Tensor#: In-place version of exp()

expand(*sizes) → Tensor#

Returns a new view of the self tensor with singleton dimensions expanded to a larger size.

Passing -1 as the size for a dimension means not changing the size of that dimension.

Tensor can be also expanded to a larger number of dimensions, and the new ones will be appended at the front. For the new dimensions, the size cannot be set to -1.

Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor where a dimension of size one is expanded to a larger size by setting the stride to 0. Any dimension of size 1 can be expanded to an arbitrary value without allocating new memory.

Parameters:: *sizes (torch.Size or int...) – the desired expanded size

Warning

More than one element of an expanded tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first.

Example:

>>> x = torch.tensor([[1], [2], [3]])
>>> x.size()
torch.Size([3, 1])
>>> x.expand(3, 4)
tensor([[ 1,  1,  1,  1],
        [ 2,  2,  2,  2],
        [ 3,  3,  3,  3]])
>>> x.expand(-1, 4)   # -1 means not changing the size of that dimension
tensor([[ 1,  1,  1,  1],
        [ 2,  2,  2,  2],
        [ 3,  3,  3,  3]])

expand_as(other) → Tensor#

Expand this tensor to the same size as other. self.expand_as(other) is equivalent to self.expand(other.size()).

Please see expand() for more information about expand.

Parameters:: other (torch.Tensor) – The result tensor has the same size as other.

expm1() → Tensor#: See torch.expm1()

expm1_() → Tensor#: In-place version of expm1()

exponential_(lambd=1, *, generator=None) → Tensor#: Fills self tensor with elements drawn from the PDF (probability density function):

\[f(x) = \lambda e^{-\lambda x}, x > 0\]

Note

In probability theory, exponential distribution is supported on interval [0, $\inf$) (i.e., $x >= 0$) implying that zero can be sampled from the exponential distribution. However, torch.Tensor.exponential_() does not sample zero, which means that its actual support is the interval (0, $\inf$).

Note that torch.distributions.exponential.Exponential() is supported on the interval [0, $\inf$) and can sample zero.

fill_(value) → Tensor#: Fills self tensor with the specified value.

fill_diagonal_(fill_value, wrap=False) → Tensor#

Fill the main diagonal of a tensor that has at least 2-dimensions. When dims>2, all dimensions of input must be of equal length. This function modifies the input tensor in-place, and returns the input tensor.

Parameters:

fill_value (Scalar) – the fill value
wrap (bool) – the diagonal ‘wrapped’ after N columns for tall matrices.

Example:

>>> a = torch.zeros(3, 3)
>>> a.fill_diagonal_(5)
tensor([[5., 0., 0.],
        [0., 5., 0.],
        [0., 0., 5.]])
>>> b = torch.zeros(7, 3)
>>> b.fill_diagonal_(5)
tensor([[5., 0., 0.],
        [0., 5., 0.],
        [0., 0., 5.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
>>> c = torch.zeros(7, 3)
>>> c.fill_diagonal_(5, wrap=True)
tensor([[5., 0., 0.],
        [0., 5., 0.],
        [0., 0., 5.],
        [0., 0., 0.],
        [5., 0., 0.],
        [0., 5., 0.],
        [0., 0., 5.]])

fix() → Tensor#: See torch.fix().

fix_() → Tensor#: In-place version of fix()

flatten(start_dim=0, end_dim=-1) → Tensor#: See torch.flatten()

flip(dims) → Tensor#: See torch.flip()

fliplr() → Tensor#: See torch.fliplr()

flipud() → Tensor#: See torch.flipud()

float(memory_format=torch.preserve_format) → Tensor#

self.float() is equivalent to self.to(torch.float32). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

float_power(exponent) → Tensor#: See torch.float_power()

float_power_(exponent) → Tensor#: In-place version of float_power()

floor() → Tensor#: See torch.floor()

floor_() → Tensor#: In-place version of floor()

floor_divide(value) → Tensor#: See torch.floor_divide()

floor_divide_(value) → Tensor#: In-place version of floor_divide()

fmax(other) → Tensor#: See torch.fmax()

fmin(other) → Tensor#: See torch.fmin()

fmod(divisor) → Tensor#: See torch.fmod()

fmod_(divisor) → Tensor#: In-place version of fmod()

frac() → Tensor#: See torch.frac()

frac_() → Tensor#: In-place version of frac()

frexp(input) -> (Tensor mantissa, Tensor exponent)#: See torch.frexp()

gather(dim, index) → Tensor#: See torch.gather()

gcd(other) → Tensor#: See torch.gcd()

gcd_(other) → Tensor#: In-place version of gcd()

ge(other) → Tensor#: See torch.ge().

ge_(other) → Tensor#: In-place version of ge().

geometric_(p, *, generator=None) → Tensor#: Fills self tensor with elements drawn from the geometric distribution:

\[P(X=k) = (1 - p)^{k - 1} p, k = 1, 2, ...\]

Note

torch.Tensor.geometric_() k-th trial is the first success hence draws samples in $\{1, 2, \ldots\}$, whereas torch.distributions.geometric.Geometric() $(k+1)$-th trial is the first success hence draws samples in $\{0, 1, \ldots\}$.

geqrf()#: See torch.geqrf()

ger(vec2) → Tensor#: See torch.ger()

get_array(output_type=<class 'numpy.ndarray'>, dtype=None, device=None, *_args, **_kwargs)[source]#

Returns a new array in output_type, the array shares the same underlying storage when the output is a numpy array. Changes to self tensor will be reflected in the ndarray and vice versa.

Parameters:

output_type – output type, see also: monai.utils.convert_data_type().
dtype – dtype of output data. Converted to correct library type (e.g., np.float32 is converted to torch.float32 if output type is torch.Tensor). If left blank, it remains unchanged.
device – if the output is a torch.Tensor, select device (if None, unchanged).
_args – currently unused parameters.
_kwargs – currently unused parameters.

get_device() -> Device ordinal (Integer)#

For CUDA tensors, this function returns the device ordinal of the GPU on which the tensor resides. For CPU tensors, this function returns -1.

Example:

>>> x = torch.randn(3, 4, 5, device='cuda:0')
>>> x.get_device()
0
>>> x.cpu().get_device()
-1

grad#: This attribute is None by default and becomes a Tensor the first time a call to backward() computes gradients for self. The attribute will then contain the gradients computed and future calls to backward() will accumulate (add) gradients into it.

greater(other) → Tensor#: See torch.greater().

greater_(other) → Tensor#: In-place version of greater().

greater_equal(other) → Tensor#: See torch.greater_equal().

greater_equal_(other) → Tensor#: In-place version of greater_equal().

gt(other) → Tensor#: See torch.gt().

gt_(other) → Tensor#: In-place version of gt().

half(memory_format=torch.preserve_format) → Tensor#

self.half() is equivalent to self.to(torch.float16). See to().

Parameters:: memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

hardshrink(lambd=0.5) → Tensor#: See torch.nn.functional.hardshrink()

has_names()#: Is True if any of this tensor’s dimensions are named. Otherwise, is False.

heaviside(values) → Tensor#: See torch.heaviside()

heaviside_(values) → Tensor#: In-place version of heaviside()

histc(bins=100, min=0, max=0) → Tensor#: See torch.histc()

histogram(input, bins, *, range=None, weight=None, density=False)#: See torch.histogram()

hsplit(split_size_or_sections) → List of Tensors#: See torch.hsplit()

hypot(other) → Tensor#: See torch.hypot()

hypot_(other) → Tensor#: In-place version of hypot()

i0() → Tensor#: See torch.i0()

i0_() → Tensor#: In-place version of i0()

igamma(other) → Tensor#: See torch.igamma()

igamma_(other) → Tensor#: In-place version of igamma()

igammac(other) → Tensor#: See torch.igammac()

igammac_(other) → Tensor#: In-place version of igammac()

imag#

Returns a new tensor containing imaginary values of the self tensor. The returned tensor and self share the same underlying storage.

Warning

imag() is only supported for tensors with complex dtypes.

Example:

>>> x=torch.randn(4, dtype=torch.cfloat)
>>> x
tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)])
>>> x.imag
tensor([ 0.3553, -0.7896, -0.0633, -0.8119])

index_add(dim, index, source, *, alpha=1) → Tensor#: Out-of-place version of torch.Tensor.index_add_().

index_add_(dim, index, source, *, alpha=1) → Tensor#

Accumulate the elements of alpha times source into the self tensor by adding to the indices in the order given in index. For example, if dim == 0, index[i] == j, and alpha=-1, then the ith row of source is subtracted from the jth row of self.

The dimth dimension of source must have the same size as the length of index (which must be a vector), and all other dimensions must match self, or an error will be raised.

For a 3-D tensor the output is given as:

self[index[i], :, :] += alpha * src[i, :, :]  # if dim == 0
self[:, index[i], :] += alpha * src[:, i, :]  # if dim == 1
self[:, :, index[i]] += alpha * src[:, :, i]  # if dim == 2

Note

This operation may behave nondeterministically when given tensors on a CUDA device. See /notes/randomness for more information.

Parameters:

dim (int) – dimension along which to index
index (Tensor) – indices of source to select from, should have dtype either torch.int64 or torch.int32
source (Tensor) – the tensor containing values to add

Keyword Arguments:

alpha (Number) – the scalar multiplier for source

Example:

>>> x = torch.ones(5, 3)
>>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float)
>>> index = torch.tensor([0, 4, 2])
>>> x.index_add_(0, index, t)
tensor([[  2.,   3.,   4.],
        [  1.,   1.,   1.],
        [  8.,   9.,  10.],
        [  1.,   1.,   1.],
        [  5.,   6.,   7.]])
>>> x.index_add_(0, index, t, alpha=-1)
tensor([[  1.,   1.,   1.],
        [  1.,   1.,   1.],
        [  1.,   1.,   1.],
        [  1.,   1.,   1.],
        [  1.,   1.,   1.]])

index_copy(dim, index, tensor2) → Tensor#: Out-of-place version of torch.Tensor.index_copy_().

index_copy_(dim, index, tensor) → Tensor#

Copies the elements of tensor into the self tensor by selecting the indices in the order given in index. For example, if dim == 0 and index[i] == j, then the ith row of tensor is copied to the jth row of self.

The dimth dimension of tensor must have the same size as the length of index (which must be a vector), and all other dimensions must match self, or an error will be raised.

Note

If index contains duplicate entries, multiple elements from tensor will be copied to the same index of self. The result is nondeterministic since it depends on which copy occurs last.

Parameters:

dim (int) – dimension along which to index
index (LongTensor) – indices of tensor to select from
tensor (Tensor) – the tensor containing values to copy

Example:

>>> x = torch.zeros(5, 3)
>>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float)
>>> index = torch.tensor([0, 4, 2])
>>> x.index_copy_(0, index, t)
tensor([[ 1.,  2.,  3.],
        [ 0.,  0.,  0.],
        [ 7.,  8.,  9.],
        [ 0.,  0.,  0.],
        [ 4.,  5.,  6.]])

index_fill(dim, index, value) → Tensor#: Out-of-place version of torch.Tensor.index_fill_().

index_fill_(dim, index, value) → Tensor#

Fills the elements of the self tensor with value value by selecting the indices in the order given in index.

Parameters:

dim (int) – dimension along which to index
index (LongTensor) – indices of self tensor to fill in
value (float) – the value to fill with

Example:

>>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float)
>>> index = torch.tensor([0, 2])
>>> x.index_fill_(1, index, -1)
tensor([[-1.,  2., -1.],
        [-1.,  5., -1.],
        [-1.,  8., -1.]])

index_put(indices, values, accumulate=False) → Tensor#: Out-place version of index_put_().

index_put_(indices, values, accumulate=False) → Tensor#

Puts values from the tensor values into the tensor self using the indices specified in indices (which is a tuple of Tensors). The expression tensor.index_put_(indices, values) is equivalent to tensor[indices] = values. Returns self.

If accumulate is True, the elements in values are added to self. If accumulate is False, the behavior is undefined if indices contain duplicate elements.

Parameters:

indices (tuple of LongTensor) – tensors used to index into self.
values (Tensor) – tensor of same dtype as self.
accumulate (bool) – whether to accumulate into self

index_reduce_(dim, index, source, reduce, *, include_self=True) → Tensor#

Accumulate the elements of source into the self tensor by accumulating to the indices in the order given in index using the reduction given by the reduce argument. For example, if dim == 0, index[i] == j, reduce == prod and include_self == True then the ith row of source is multiplied by the jth row of self. If include_self="True", the values in the self tensor are included in the reduction, otherwise, rows in the self tensor that are accumulated to are treated as if they were filled with the reduction identites.

The dimth dimension of source must have the same size as the length of index (which must be a vector), and all other dimensions must match self, or an error will be raised.

For a 3-D tensor with reduce="prod" and include_self=True the output is given as:

self[index[i], :, :] *= src[i, :, :]  # if dim == 0
self[:, index[i], :] *= src[:, i, :]  # if dim == 1
self[:, :, index[i]] *= src[:, :, i]  # if dim == 2

Note

This operation may behave nondeterministically when given tensors on a CUDA device. See /notes/randomness for more information.

Note

This function only supports floating point tensors.

Warning

This function is in beta and may change in the near future.

Parameters:

dim (int) – dimension along which to index
index (Tensor) – indices of source to select from, should have dtype either torch.int64 or torch.int32
source (FloatTensor) – the tensor containing values to accumulate
reduce (str) – the reduction operation to apply ("prod", "mean", "amax", "amin")

Keyword Arguments:

include_self (bool) – whether the elements from the self tensor are included in the reduction

Example:

>>> x = torch.empty(5, 3).fill_(2)
>>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=torch.float)
>>> index = torch.tensor([0, 4, 2, 0])
>>> x.index_reduce_(0, index, t, 'prod')
tensor([[20., 44., 72.],
        [ 2.,  2.,  2.],
        [14., 16., 18.],
        [ 2.,  2.,  2.],
        [ 8., 10., 12.]])
>>> x = torch.empty(5, 3).fill_(2)
>>> x.index_reduce_(0, index, t, 'prod', include_self=False)
tensor([[10., 22., 36.],
        [ 2.,  2.,  2.],
        [ 7.,  8.,  9.],
        [ 2.,  2.,  2.],
        [ 4.,  5.,  6.]])

index_select(dim, index) → Tensor#: See torch.index_select()

indices() → Tensor#

Return the indices tensor of a sparse COO tensor.

Warning

Throws an error if self is not a sparse COO tensor.

Whole slide image reader#

BaseWSIReader#

class monai.data.BaseWSIReader(level=None, mpp=None, mpp_rtol=0.05, mpp_atol=0.0, power=None, power_rtol=0.05, power_atol=0.0, channel_dim=0, dtype=<class 'numpy.uint8'>, device=None, mode='RGB', **kwargs)[source]#

An abstract class that defines APIs to load patches from whole slide image files.

Parameters:

level (UnionType[int, None]) – the whole slide image level at which the patches are extracted.
mpp (UnionType[float, tuple[float, float], None]) – the resolution in micron per pixel at which the patches are extracted.
mpp_rtol (float) – the acceptable relative tolerance for resolution in micro per pixel.
mpp_atol (float) – the acceptable absolute tolerance for resolution in micro per pixel.
power (UnionType[int, None]) – the objective power at which the patches are extracted.
power_rtol (float) – the acceptable relative tolerance for objective power.
power_atol (float) – the acceptable absolute tolerance for objective power.
channel_dim (int) – the desired dimension for color channel.
dtype (Union[dtype, type, str, None, dtype]) – the data type of output image.
device (UnionType[device, str, None]) – target device to put the extracted patch. Note that if device is “cuda””, the output will be converted to torch tenor and sent to the gpu even if the dtype is numpy.
mode (str) – the output image color mode, e.g., “RGB” or “RGBA”.
kwargs – additional args for the reader
Notes – Only one of resolution parameters, level, mpp, or power, should be provided. If such parameters are provided in get_data method, those will override the values provided here. If none of them are provided here or in get_data, level=0 will be used.

Typical usage of a concrete implementation of this class is:

image_reader = MyWSIReader()
wsi = image_reader.read(filepath, **kwargs)
img_data, meta_data = image_reader.get_data(wsi)

The read call converts an image filename into whole slide image object,
The get_data call fetches the image data, as well as metadata.

The following methods needs to be implemented for any concrete implementation of this class:

read reads a whole slide image object from a given file
get_size returns the size of the whole slide image of a given wsi object at a given level.
get_level_count returns the number of levels in the whole slide image
_get_patch extracts and returns a patch image form the whole slide image
_get_metadata extracts and returns metadata for a whole slide image and a specific patch.

get_data(wsi, location=(0, 0), size=None, level=None, mpp=None, power=None, mode=None)[source]#

Verifies inputs, extracts patches from WSI image and generates metadata.

Parameters:

wsi – a whole slide image object loaded from a file or a list of such objects.
location (tuple[int, int]) – (top, left) tuple giving the top left pixel in the level 0 reference frame. Defaults to (0, 0).
size (UnionType[tuple[int, int], None]) – (height, width) tuple giving the patch size at the given level (level). If not provided or None, it is set to the full image size at the given level.
level (UnionType[int, None]) – the whole slide image level at which the patches are extracted.
mpp (UnionType[float, tuple[float, float], None]) – the resolution in micron per pixel at which the patches are extracted.
power (UnionType[int, None]) – the objective power at which the patches are extracted.
dtype – the data type of output image.
mode (UnionType[str, None]) – the output image mode, ‘RGB’ or ‘RGBA’.

Return type:

tuple[ndarray, dict]

Returns:

a tuples, where the first element is an image patch [CxHxW] or stack of patches,: and second element is a dictionary of metadata.

Notes

Only one of resolution parameters, level, mpp, or power, should be provided. If none of them are provided, it uses the defaults that are set during class instantiation. If none of them are set here or during class instantiation, level=0 will be used.

abstractmethod get_downsample_ratio(wsi, level)[source]#

Returns the down-sampling ratio of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the downsample ratio is calculated.

Return type:

float

abstractmethod get_file_path(wsi)[source]#

Return the file path for the WSI object

Return type:: str

abstractmethod get_level_count(wsi)[source]#

Returns the number of levels in the whole slide image.

Parameters:: wsi – a whole slide image object loaded from a file.
Return type:: int

abstractmethod get_mpp(wsi, level)[source]#

Returns the micro-per-pixel resolution of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the mpp is calculated.

Return type:

tuple[float, float]

abstractmethod get_power(wsi, level)[source]#

Returns the objective power of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the objective power is calculated.

Return type:

float

abstractmethod get_size(wsi, level)[source]#

Returns the size (height, width) of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the size is calculated.

Return type:

tuple[int, int]

get_valid_level(wsi, level, mpp, power)[source]#

Returns the level associated to the resolution parameters in the whole slide image.

Parameters:

wsi – a whole slide image object loaded from a file.
level (UnionType[int, None]) – the level number.
mpp (UnionType[float, tuple[float, float], None]) – the micron-per-pixel resolution.
power (UnionType[int, None]) – the objective power.

Return type:

int

verify_suffix(filename)[source]#

Verify whether the specified file or files format is supported by WSI reader.

The list of supported suffixes are read from self.supported_suffixes.

Parameters:: filename (Union[Sequence[Union[str, PathLike]], str, PathLike]) – filename or a list of filenames to read.
Return type:: bool

WSIReader#

class monai.data.WSIReader(backend='cucim', level=None, mpp=None, mpp_rtol=0.05, mpp_atol=0.0, power=None, power_rtol=0.05, power_atol=0.0, channel_dim=0, dtype=<class 'numpy.uint8'>, device=None, mode='RGB', **kwargs)[source]#

Read whole slide images and extract patches using different backend libraries

Parameters:

backend – the name of backend whole slide image reader library, the default is cuCIM.
level (UnionType[int, None]) – the whole slide image level at which the patches are extracted.
mpp (UnionType[float, tuple[float, float], None]) – the resolution in micron per pixel at which the patches are extracted.
mpp_rtol (float) – the acceptable relative tolerance for resolution in micro per pixel.
mpp_atol (float) – the acceptable absolute tolerance for resolution in micro per pixel.
power (UnionType[int, None]) – the objective power at which the patches are extracted.
power_rtol (float) – the acceptable relative tolerance for objective power.
power_atol (float) – the acceptable absolute tolerance for objective power.
channel_dim (int) – the desired dimension for color channel. Default to 0 (channel first).
dtype (Union[dtype, type, str, None, dtype]) – the data type of output image. Defaults to np.uint8.
device (UnionType[device, str, None]) – target device to put the extracted patch. Note that if device is “cuda””, the output will be converted to torch tenor and sent to the gpu even if the dtype is numpy.
mode (str) – the output image color mode, “RGB” or “RGBA”. Defaults to “RGB”.
num_workers – number of workers for multi-thread image loading (cucim backend only).
kwargs – additional arguments to be passed to the backend library
Notes – Only one of resolution parameters, level, mpp, or power, should be provided. If such parameters are provided in get_data method, those will override the values provided here. If none of them are provided here or in get_data, level=0 will be used.

get_downsample_ratio(wsi, level)[source]#

Returns the down-sampling ratio of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the downsample ratio is calculated.

Return type:

float

get_file_path(wsi)[source]#

Return the file path for the WSI object

Return type:: str

get_level_count(wsi)[source]#

Returns the number of levels in the whole slide image.

Parameters:: wsi – a whole slide image object loaded from a file.
Return type:: int

get_mpp(wsi, level)[source]#

Returns the micro-per-pixel resolution of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the mpp is calculated.

Return type:

tuple[float, float]

get_power(wsi, level)[source]#

Returns the micro-per-pixel resolution of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the objective power is calculated.

Return type:

float

get_size(wsi, level)[source]#

Returns the size (height, width) of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the size is calculated.

Return type:

tuple[int, int]

read(data, **kwargs)[source]#

Read whole slide image objects from given file or list of files.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike, ndarray]) – file name or a list of file names to read.
kwargs – additional args for the reader module (overrides self.kwargs for existing keys).

Returns:

whole slide image object or list of such objects.

CuCIMWSIReader#

class monai.data.CuCIMWSIReader(num_workers=0, **kwargs)[source]#

Read whole slide images and extract patches using cuCIM library.

Parameters:

level – the whole slide image level at which the patches are extracted.
mpp – the resolution in micron per pixel at which the patches are extracted.
mpp_rtol – the acceptable relative tolerance for resolution in micro per pixel.
mpp_atol – the acceptable absolute tolerance for resolution in micro per pixel.
power – the objective power at which the patches are extracted.
power_rtol – the acceptable relative tolerance for objective power.
power_atol – the acceptable absolute tolerance for objective power.
channel_dim – the desired dimension for color channel. Default to 0 (channel first).
dtype – the data type of output image. Defaults to np.uint8.
device – target device to put the extracted patch. Note that if device is “cuda””, the output will be converted to torch tenor and sent to the gpu even if the dtype is numpy.
mode – the output image color mode, “RGB” or “RGBA”. Defaults to “RGB”.
num_workers (int) – number of workers for multi-thread image loading.
kwargs – additional args for cucim.CuImage module: rapidsai/cucim
Notes – Only one of resolution parameters, level, mpp, or power, should be provided. If such parameters are provided in get_data method, those will override the values provided here. If none of them are provided here or in get_data, level=0 will be used.

get_downsample_ratio(wsi, level)[source]#

Returns the down-sampling ratio of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the downsample ratio is calculated.

Return type:

float

static get_file_path(wsi)[source]#

Return the file path for the WSI object

Return type:: str

static get_level_count(wsi)[source]#

Returns the number of levels in the whole slide image.

Parameters:: wsi – a whole slide image object loaded from a file.
Return type:: int

get_mpp(wsi, level)[source]#

Returns the micro-per-pixel resolution of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the mpp is calculated.

Return type:

tuple[float, float]

get_power(wsi, level)[source]#

Returns the objective power of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the objective power is calculated.

Return type:

float

get_size(wsi, level)[source]#

Returns the size (height, width) of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the size is calculated.

Return type:

tuple[int, int]

read(data, **kwargs)[source]#

Read whole slide image objects from given file or list of files.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike, ndarray]) – file name or a list of file names to read.
kwargs – additional args that overrides self.kwargs for existing keys. For more details look at rapidsai/cucim

Returns:

whole slide image object or list of such objects.

OpenSlideWSIReader#

class monai.data.OpenSlideWSIReader(**kwargs)[source]#

Read whole slide images and extract patches using OpenSlide library.

Parameters:

level – the whole slide image level at which the patches are extracted.
mpp – the resolution in micron per pixel at which the patches are extracted.
mpp_rtol – the acceptable relative tolerance for resolution in micro per pixel.
mpp_atol – the acceptable absolute tolerance for resolution in micro per pixel.
power – the objective power at which the patches are extracted.
power_rtol – the acceptable relative tolerance for objective power.
power_atol – the acceptable absolute tolerance for objective power.
channel_dim – the desired dimension for color channel. Default to 0 (channel first).
dtype – the data type of output image. Defaults to np.uint8.
device – target device to put the extracted patch. Note that if device is “cuda””, the output will be converted to torch tenor and sent to the gpu even if the dtype is numpy.
mode – the output image color mode, “RGB” or “RGBA”. Defaults to “RGB”.
kwargs – additional args for openslide.OpenSlide module.
Notes – Only one of resolution parameters, level, mpp, or power, should be provided. If such parameters are provided in get_data method, those will override the values provided here. If none of them are provided here or in get_data, level=0 will be used.

get_downsample_ratio(wsi, level)[source]#

Returns the down-sampling ratio of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the downsample ratio is calculated.

Return type:

float

static get_file_path(wsi)[source]#

Return the file path for the WSI object

Return type:: str

static get_level_count(wsi)[source]#

Returns the number of levels in the whole slide image.

Parameters:: wsi – a whole slide image object loaded from a file.
Return type:: int

get_mpp(wsi, level)[source]#

Returns the micro-per-pixel resolution of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the mpp is calculated.

Return type:

tuple[float, float]

get_power(wsi, level)[source]#

Returns the objective power of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the objective power is calculated.

Return type:

float

get_size(wsi, level)[source]#

Returns the size (height, width) of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the size is calculated.

Return type:

tuple[int, int]

read(data, **kwargs)[source]#

Read whole slide image objects from given file or list of files.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike, ndarray]) – file name or a list of file names to read.
kwargs – additional args that overrides self.kwargs for existing keys.

Returns:

whole slide image object or list of such objects.

TiffFileWSIReader#

class monai.data.TiffFileWSIReader(**kwargs)[source]#

Read whole slide images and extract patches using TiffFile library.

Parameters:

level – the whole slide image level at which the patches are extracted.
mpp – the resolution in micron per pixel at which the patches are extracted.
mpp_rtol – the acceptable relative tolerance for resolution in micro per pixel.
mpp_atol – the acceptable absolute tolerance for resolution in micro per pixel.
channel_dim – the desired dimension for color channel. Default to 0 (channel first).
dtype – the data type of output image. Defaults to np.uint8.
device – target device to put the extracted patch. Note that if device is “cuda””, the output will be converted to torch tenor and sent to the gpu even if the dtype is numpy.
mode – the output image color mode, “RGB” or “RGBA”. Defaults to “RGB”.
kwargs – additional args for tifffile.TiffFile module.
Notes –
- Objective power cannot be obtained via TiffFile backend.
- Only one of resolution parameters, level or mpp, should be provided.
  If such parameters are provided in get_data method, those will override the values provided here. If none of them are provided here or in get_data, level=0 will be used.

get_downsample_ratio(wsi, level)[source]#

Returns the down-sampling ratio of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the downsample ratio is calculated.

Return type:

float

static get_file_path(wsi)[source]#

Return the file path for the WSI object

Return type:: str

static get_level_count(wsi)[source]#

Returns the number of levels in the whole slide image.

Parameters:: wsi – a whole slide image object loaded from a file.
Return type:: int

get_mpp(wsi, level)[source]#

Returns the micro-per-pixel resolution of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the mpp is calculated.

Return type:

tuple[float, float]

get_power(wsi, level)[source]#

Returns the objective power of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the objective power is calculated.

Return type:

float

get_size(wsi, level)[source]#

Returns the size (height, width) of the whole slide image at a given level.

Parameters:

wsi – a whole slide image object loaded from a file.
level (int) – the level number where the size is calculated.

Return type:

tuple[int, int]

read(data, **kwargs)[source]#

Read whole slide image objects from given file or list of files.

Parameters:

data (Union[Sequence[Union[str, PathLike]], str, PathLike, ndarray]) – file name or a list of file names to read.
kwargs – additional args that overrides self.kwargs for existing keys.

Returns:

whole slide image object or list of such objects.

Whole slide image datasets#

PatchWSIDataset#

class monai.data.PatchWSIDataset(data, patch_size=None, patch_level=None, transform=None, include_label=True, center_location=True, additional_meta_keys=None, reader='cuCIM', **kwargs)[source]#

This dataset extracts patches from whole slide images (without loading the whole image) It also reads labels for each patch and provides each patch with its associated class labels.

Parameters:

data (Sequence) – the list of input samples including image, location, and label (see the note below for more details).
patch_size (UnionType[int, tuple[int, int], None]) – the size of patch to be extracted from the whole slide image.
patch_level (UnionType[int, None]) – the level at which the patches to be extracted (default to 0).
transform (UnionType[Callable, None]) – transforms to be executed on input data.
include_label (bool) – whether to load and include labels in the output
center_location (bool) – whether the input location information is the position of the center of the patch
additional_meta_keys (UnionType[Sequence[str], None]) – the list of keys for items to be copied to the output metadata from the input data
reader –
the module to be used for loading whole slide imaging. If reader is
- a string, it defines the backend of monai.data.WSIReader. Defaults to cuCIM.
- a class (inherited from BaseWSIReader), it is initialized and set as wsi_reader.
- an instance of a class inherited from BaseWSIReader, it is set as the wsi_reader.
kwargs – additional arguments to pass to WSIReader or provided whole slide reader class

Returns:

a dictionary of loaded image (in MetaTensor format) along with the labels (if requested). {“image”: MetaTensor, “label”: torch.Tensor}

Return type:

dict

Note

The input data has the following form as an example:

[
    {"image": "path/to/image1.tiff", "location": [200, 500], "label": 0},
    {"image": "path/to/image2.tiff", "location": [100, 700], "patch_size": [20, 20], "patch_level": 2, "label": 1}
]

MaskedPatchWSIDataset#

class monai.data.MaskedPatchWSIDataset(data, patch_size=None, patch_level=None, mask_level=7, transform=None, include_label=False, center_location=False, additional_meta_keys=(mask_location, name), reader='cuCIM', **kwargs)[source]#

This dataset extracts patches from whole slide images at the locations where foreground mask at a given level is non-zero.

Parameters:

data (Sequence) – the list of input samples including image, location, and label (see the note below for more details).
patch_size (UnionType[int, tuple[int, int], None]) – the size of patch to be extracted from the whole slide image.
patch_level (UnionType[int, None]) – the level at which the patches to be extracted (default to 0).
mask_level (int) – the resolution level at which the mask is created.
transform (UnionType[Callable, None]) – transforms to be executed on input data.
include_label (bool) – whether to load and include labels in the output
center_location (bool) – whether the input location information is the position of the center of the patch
additional_meta_keys (Sequence[str]) – the list of keys for items to be copied to the output metadata from the input data
reader –
the module to be used for loading whole slide imaging. Defaults to cuCIM. If reader is
- a string, it defines the backend of monai.data.WSIReader.
- a class (inherited from BaseWSIReader), it is initialized and set as wsi_reader,
- an instance of a class inherited from BaseWSIReader, it is set as the wsi_reader.
kwargs – additional arguments to pass to WSIReader or provided whole slide reader class

Note

The input data has the following form as an example:

[
    {"image": "path/to/image1.tiff"},
    {"image": "path/to/image2.tiff", "size": [20, 20], "level": 2}
]

SlidingPatchWSIDataset#

class monai.data.SlidingPatchWSIDataset(data, patch_size=None, patch_level=None, mask_level=0, overlap=0.0, offset=(0, 0), offset_limits=None, transform=None, include_label=False, center_location=False, additional_meta_keys=(mask_location, mask_size, num_patches), reader='cuCIM', seed=0, **kwargs)[source]#

This dataset extracts patches in sliding-window manner from whole slide images (without loading the whole image). It also reads labels for each patch and provides each patch with its associated class labels.

Parameters:

data (Sequence) – the list of input samples including image, location, and label (see the note below for more details).
patch_size (UnionType[int, tuple[int, int], None]) – the size of patch to be extracted from the whole slide image.
patch_level (UnionType[int, None]) – the level at which the patches to be extracted (default to 0).
mask_level (int) – the resolution level at which the mask/map is created (for ProbMapProducer for instance).
overlap (UnionType[tuple[float, float], float]) – the amount of overlap of neighboring patches in each dimension (a value between 0.0 and 1.0). If only one float number is given, it will be applied to all dimensions. Defaults to 0.0.
offset (UnionType[tuple[int, int], int, str]) – the offset of image to extract patches (the starting position of the upper left patch).
offset_limits (UnionType[tuple[tuple[int, int], tuple[int, int]], tuple[int, int], None]) – if offset is set to “random”, a tuple of integers defining the lower and upper limit of the random offset for all dimensions, or a tuple of tuples that defines the limits for each dimension.
transform (UnionType[Callable, None]) – transforms to be executed on input data.
include_label (bool) – whether to load and include labels in the output
center_location (bool) – whether the input location information is the position of the center of the patch
additional_meta_keys (Sequence[str]) – the list of keys for items to be copied to the output metadata from the input data
reader –
the module to be used for loading whole slide imaging. Defaults to cuCIM. If reader is
- a string, it defines the backend of monai.data.WSIReader.
- a class (inherited from BaseWSIReader), it is initialized and set as wsi_reader,
- an instance of a class inherited from BaseWSIReader, it is set as the wsi_reader.
seed (int) – random seed to randomly generate offsets. Defaults to 0.
kwargs – additional arguments to pass to WSIReader or provided whole slide reader class

Note

The input data has the following form as an example:

[
    {"image": "path/to/image1.tiff"},
    {"image": "path/to/image2.tiff", "patch_size": [20, 20], "patch_level": 2}
]

Unlike MaskedPatchWSIDataset, this dataset does not filter any patches.

Bounding box#

This utility module mainly supports rectangular bounding boxes with a few different parameterizations and methods for converting between them. It provides reliable access to the spatial coordinates of the box vertices in the “canonical ordering”: [xmin, ymin, xmax, ymax] for 2D and [xmin, ymin, zmin, xmax, ymax, zmax] for 3D. We currently define this ordering as monai.data.box_utils.StandardMode and the rest of the detection pipelines mainly assumes boxes in StandardMode.

class monai.data.box_utils.BoxMode[source]#

An abstract class of a BoxMode.

A BoxMode is callable that converts box mode of boxes, which are Nx4 (2D) or Nx6 (3D) torch tensor or ndarray. BoxMode has several subclasses that represents different box modes, including

CornerCornerModeTypeA: represents [xmin, ymin, xmax, ymax] for 2D and [xmin, ymin, zmin, xmax, ymax, zmax] for 3D
CornerCornerModeTypeB: represents [xmin, xmax, ymin, ymax] for 2D and [xmin, xmax, ymin, ymax, zmin, zmax] for 3D
CornerCornerModeTypeC: represents [xmin, ymin, xmax, ymax] for 2D and [xmin, ymin, xmax, ymax, zmin, zmax] for 3D
CornerSizeMode: represents [xmin, ymin, xsize, ysize] for 2D and [xmin, ymin, zmin, xsize, ysize, zsize] for 3D
CenterSizeMode: represents [xcenter, ycenter, xsize, ysize] for 2D and [xcenter, ycenter, zcenter, xsize, ysize, zsize] for 3D

We currently define StandardMode = CornerCornerModeTypeA, and monai detection pipelines mainly assume boxes are in StandardMode.

The implementation should be aware of:

remember to define class variable name, a dictionary that maps spatial_dims to BoxModeName.
boxes_to_corners() and corners_to_boxes() should not modify inputs in place.

abstractmethod boxes_to_corners(boxes)[source]#

Convert the bounding boxes of the current mode to corners.

Parameters:: boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor
Returns:: corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Return type:: tuple

Example

boxes = torch.ones(10,6)
boxmode = BoxMode()
boxmode.boxes_to_corners(boxes) # will return a 6-element tuple, each element is a 10x1 tensor

abstractmethod corners_to_boxes(corners)[source]#

Convert the given box corners to the bounding boxes of the current mode.

Parameters:: corners (Sequence) – corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Returns:: bounding boxes, Nx4 or Nx6 torch tensor
Return type:: Tensor

Example

corners = (torch.ones(10,1), torch.ones(10,1), torch.ones(10,1), torch.ones(10,1))
boxmode = BoxMode()
boxmode.corners_to_boxes(corners) # will return a 10x4 tensor

classmethod get_name(spatial_dims)[source]#

Get the mode name for the given spatial dimension using class variable name.

Parameters:: spatial_dims (int) – number of spatial dimensions of the bounding boxes.
Returns:: mode string name
Return type:: str

class monai.data.box_utils.CenterSizeMode[source]#

A subclass of BoxMode.

Also represented as “ccwh” or “cccwhd”, with format of [xmin, ymin, xsize, ysize] or [xmin, ymin, zmin, xsize, ysize, zsize].

Example

CenterSizeMode.get_name(spatial_dims=2) # will return "ccwh"
CenterSizeMode.get_name(spatial_dims=3) # will return "cccwhd"

boxes_to_corners(boxes)[source]#

Convert the bounding boxes of the current mode to corners.

Parameters:: boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor
Returns:: corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Return type:: tuple

Example

boxes = torch.ones(10,6)
boxmode = BoxMode()
boxmode.boxes_to_corners(boxes) # will return a 6-element tuple, each element is a 10x1 tensor

corners_to_boxes(corners)[source]#

Convert the given box corners to the bounding boxes of the current mode.

Parameters:: corners (Sequence) – corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Returns:: bounding boxes, Nx4 or Nx6 torch tensor
Return type:: Tensor

Example

corners = (torch.ones(10,1), torch.ones(10,1), torch.ones(10,1), torch.ones(10,1))
boxmode = BoxMode()
boxmode.corners_to_boxes(corners) # will return a 10x4 tensor

class monai.data.box_utils.CornerCornerModeTypeA[source]#

A subclass of BoxMode.

Also represented as “xyxy” or “xyzxyz”, with format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].

Example

CornerCornerModeTypeA.get_name(spatial_dims=2) # will return "xyxy"
CornerCornerModeTypeA.get_name(spatial_dims=3) # will return "xyzxyz"

boxes_to_corners(boxes)[source]#

Convert the bounding boxes of the current mode to corners.

Parameters:: boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor
Returns:: corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Return type:: tuple

Example

boxes = torch.ones(10,6)
boxmode = BoxMode()
boxmode.boxes_to_corners(boxes) # will return a 6-element tuple, each element is a 10x1 tensor

corners_to_boxes(corners)[source]#

Convert the given box corners to the bounding boxes of the current mode.

Parameters:: corners (Sequence) – corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Returns:: bounding boxes, Nx4 or Nx6 torch tensor
Return type:: Tensor

Example

corners = (torch.ones(10,1), torch.ones(10,1), torch.ones(10,1), torch.ones(10,1))
boxmode = BoxMode()
boxmode.corners_to_boxes(corners) # will return a 10x4 tensor

class monai.data.box_utils.CornerCornerModeTypeB[source]#

A subclass of BoxMode.

Also represented as “xxyy” or “xxyyzz”, with format of [xmin, xmax, ymin, ymax] or [xmin, xmax, ymin, ymax, zmin, zmax].

Example

CornerCornerModeTypeB.get_name(spatial_dims=2) # will return "xxyy"
CornerCornerModeTypeB.get_name(spatial_dims=3) # will return "xxyyzz"

boxes_to_corners(boxes)[source]#

Convert the bounding boxes of the current mode to corners.

Parameters:: boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor
Returns:: corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Return type:: tuple

Example

boxes = torch.ones(10,6)
boxmode = BoxMode()
boxmode.boxes_to_corners(boxes) # will return a 6-element tuple, each element is a 10x1 tensor

corners_to_boxes(corners)[source]#

Convert the given box corners to the bounding boxes of the current mode.

Parameters:: corners (Sequence) – corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Returns:: bounding boxes, Nx4 or Nx6 torch tensor
Return type:: Tensor

Example

corners = (torch.ones(10,1), torch.ones(10,1), torch.ones(10,1), torch.ones(10,1))
boxmode = BoxMode()
boxmode.corners_to_boxes(corners) # will return a 10x4 tensor

class monai.data.box_utils.CornerCornerModeTypeC[source]#

A subclass of BoxMode.

Also represented as “xyxy” or “xyxyzz”, with format of [xmin, ymin, xmax, ymax] or [xmin, ymin, xmax, ymax, zmin, zmax].

Example

CornerCornerModeTypeC.get_name(spatial_dims=2) # will return "xyxy"
CornerCornerModeTypeC.get_name(spatial_dims=3) # will return "xyxyzz"

boxes_to_corners(boxes)[source]#

Convert the bounding boxes of the current mode to corners.

Parameters:: boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor
Returns:: corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Return type:: tuple

Example

boxes = torch.ones(10,6)
boxmode = BoxMode()
boxmode.boxes_to_corners(boxes) # will return a 6-element tuple, each element is a 10x1 tensor

corners_to_boxes(corners)[source]#

Convert the given box corners to the bounding boxes of the current mode.

Parameters:: corners (Sequence) – corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Returns:: bounding boxes, Nx4 or Nx6 torch tensor
Return type:: Tensor

Example

corners = (torch.ones(10,1), torch.ones(10,1), torch.ones(10,1), torch.ones(10,1))
boxmode = BoxMode()
boxmode.corners_to_boxes(corners) # will return a 10x4 tensor

class monai.data.box_utils.CornerSizeMode[source]#

A subclass of BoxMode.

Also represented as “xywh” or “xyzwhd”, with format of [xmin, ymin, xsize, ysize] or [xmin, ymin, zmin, xsize, ysize, zsize].

Example

CornerSizeMode.get_name(spatial_dims=2) # will return "xywh"
CornerSizeMode.get_name(spatial_dims=3) # will return "xyzwhd"

boxes_to_corners(boxes)[source]#

Convert the bounding boxes of the current mode to corners.

Parameters:: boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor
Returns:: corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Return type:: tuple

Example

boxes = torch.ones(10,6)
boxmode = BoxMode()
boxmode.boxes_to_corners(boxes) # will return a 6-element tuple, each element is a 10x1 tensor

corners_to_boxes(corners)[source]#

Convert the given box corners to the bounding boxes of the current mode.

Parameters:: corners (Sequence) – corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor. It represents (xmin, ymin, xmax, ymax) or (xmin, ymin, zmin, xmax, ymax, zmax)
Returns:: bounding boxes, Nx4 or Nx6 torch tensor
Return type:: Tensor

Example

corners = (torch.ones(10,1), torch.ones(10,1), torch.ones(10,1), torch.ones(10,1))
boxmode = BoxMode()
boxmode.corners_to_boxes(corners) # will return a 10x4 tensor

monai.data.box_utils.StandardMode[source]#: alias of CornerCornerModeTypeA

monai.data.box_utils.batched_nms(boxes, scores, labels, nms_thresh, max_proposals=-1, box_overlap_metric=<function box_iou>)[source]#

Performs non-maximum suppression in a batched fashion. Each labels value correspond to a category, and NMS will not be applied between elements of different categories.

Adapted from MIC-DKFZ/nnDetection

Parameters:

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
scores (Union[ndarray, Tensor]) – prediction scores of the boxes, sized (N,). This function keeps boxes with higher scores.
labels (Union[ndarray, Tensor]) – indices of the categories for each one of the boxes. sized(N,), value range is (0, num_classes)
nms_thresh (float) – threshold of NMS. Discards all overlapping boxes with box_overlap > nms_thresh.
max_proposals (int) – maximum number of boxes it keeps. If max_proposals = -1, there is no limit on the number of boxes that are kept.
box_overlap_metric (Callable) – the metric to compute overlap between boxes.

Return type:

Union[ndarray, Tensor]

Returns:

Indexes of boxes that are kept after NMS.

monai.data.box_utils.box_area(boxes)[source]#

This function computes the area (2D) or volume (3D) of each box. Half precision is not recommended for this function as it may cause overflow, especially for 3D images.

Parameters:: boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
Return type:: Union[ndarray, Tensor]
Returns:: area (2D) or volume (3D) of boxes, with size of (N,).

Example

boxes = torch.ones(10,6)
# we do computation with torch.float32 to avoid overflow
compute_dtype = torch.float32
area = box_area(boxes=boxes.to(dtype=compute_dtype))  # torch.float32, size of (10,)

monai.data.box_utils.box_centers(boxes)[source]#

Compute center points of boxes

Parameters:: boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
Return type:: Union[ndarray, Tensor]
Returns:: center points with size of (N, spatial_dims)

monai.data.box_utils.box_giou(boxes1, boxes2)[source]#

Compute the generalized intersection over union (GIoU) of two sets of boxes. The two inputs can have different shapes and the func return an NxM matrix, (in contrary to box_pair_giou() , which requires the inputs to have the same shape and returns N values).

Parameters:

boxes1 (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
boxes2 (Union[ndarray, Tensor]) – bounding boxes, Mx4 or Mx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

Return type:

Union[ndarray, Tensor]

Returns:

GIoU, with size of (N,M) and same data type as boxes1

Reference:: https://giou.stanford.edu/GIoU.pdf

monai.data.box_utils.box_iou(boxes1, boxes2)[source]#

Compute the intersection over union (IoU) of two set of boxes.

Parameters:

boxes1 (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
boxes2 (Union[ndarray, Tensor]) – bounding boxes, Mx4 or Mx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

Return type:

Union[ndarray, Tensor]

Returns:

IoU, with size of (N,M) and same data type as boxes1

monai.data.box_utils.box_pair_giou(boxes1, boxes2)[source]#

Compute the generalized intersection over union (GIoU) of a pair of boxes. The two inputs should have the same shape and the func return an (N,) array, (in contrary to box_giou() , which does not require the inputs to have the same shape and returns NxM matrix).

Parameters:

boxes1 (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
boxes2 (Union[ndarray, Tensor]) – bounding boxes, same shape with boxes1. The box mode is assumed to be StandardMode

Return type:

Union[ndarray, Tensor]

Returns:

paired GIoU, with size of (N,) and same data type as boxes1

Reference:: https://giou.stanford.edu/GIoU.pdf

monai.data.box_utils.boxes_center_distance(boxes1, boxes2, euclidean=True)[source]#

Distance of center points between two sets of boxes

Parameters:

boxes1 (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
boxes2 (Union[ndarray, Tensor]) – bounding boxes, Mx4 or Mx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
euclidean (bool) – computed the euclidean distance otherwise it uses the l1 distance

Return type:

tuple[Union[ndarray, Tensor], Union[ndarray, Tensor], Union[ndarray, Tensor]]

Returns:

The pairwise distances for every element in boxes1 and boxes2, with size of (N,M) and same data type as boxes1.
Center points of boxes1, with size of (N,spatial_dims) and same data type as boxes1.
Center points of boxes2, with size of (M,spatial_dims) and same data type as boxes1.

Reference:: MIC-DKFZ/nnDetection

monai.data.box_utils.centers_in_boxes(centers, boxes, eps=0.01)[source]#

Checks which center points are within boxes

Parameters:

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode.
centers (Union[ndarray, Tensor]) – center points, Nx2 or Nx3 torch tensor or ndarray.
eps (float) – minimum distance to border of boxes.

Return type:

Union[ndarray, Tensor]

Returns:

boolean array indicating which center points are within the boxes, sized (N,).

Reference:: MIC-DKFZ/nnDetection

monai.data.box_utils.clip_boxes_to_image(boxes, spatial_size, remove_empty=True)[source]#

This function clips the boxes to makes sure the bounding boxes are within the image.

Parameters:

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
spatial_size (Union[Sequence[int], ndarray, Tensor]) – The spatial size of the image where the boxes are attached. len(spatial_size) should be in [2, 3].
remove_empty (bool) – whether to remove the boxes that are actually empty

Return type:

tuple[Union[ndarray, Tensor], Union[ndarray, Tensor]]

Returns:

clipped boxes, boxes[keep], does not share memory with original boxes
keep, it indicates whether each box in boxes are kept when remove_empty=True.

monai.data.box_utils.convert_box_mode(boxes, src_mode=None, dst_mode=None)[source]#

This function converts the boxes in src_mode to the dst_mode.

Parameters:

boxes (Union[ndarray, Tensor]) – source bounding boxes, Nx4 or Nx6 torch tensor or ndarray.
src_mode (UnionType[str, BoxMode, type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with mode in get_boxmode().
dst_mode (UnionType[str, BoxMode, type[BoxMode], None]) – target box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with mode in get_boxmode().

Return type:

Union[ndarray, Tensor]

Returns:

bounding boxes with target mode, with same data type as boxes, does not share memory with boxes

Example

boxes = torch.ones(10,4)
# The following three lines are equivalent
# They convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize].
convert_box_mode(boxes=boxes, src_mode="xyxy", dst_mode="ccwh")
convert_box_mode(boxes=boxes, src_mode="xyxy", dst_mode=monai.data.box_utils.CenterSizeMode)
convert_box_mode(boxes=boxes, src_mode="xyxy", dst_mode=monai.data.box_utils.CenterSizeMode())

monai.data.box_utils.convert_box_to_standard_mode(boxes, mode=None)[source]#

Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].

Parameters:

boxes (Union[ndarray, Tensor]) – source bounding boxes, Nx4 or Nx6 torch tensor or ndarray.
mode (UnionType[str, BoxMode, type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with mode in get_boxmode().

Return type:

Union[ndarray, Tensor]

Returns:

bounding boxes with standard mode, with same data type as boxes, does not share memory with boxes

Example

boxes = torch.ones(10,6)
# The following two lines are equivalent
# They convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax]
convert_box_to_standard_mode(boxes=boxes, mode="xxyyzz")
convert_box_mode(boxes=boxes, src_mode="xxyyzz", dst_mode="xyzxyz")

monai.data.box_utils.get_boxmode(mode=None, *args, **kwargs)[source]#

This function that return a BoxMode object giving a representation of box mode

Parameters:: mode (UnionType[str, BoxMode, type[BoxMode], None]) – a representation of box mode. If it is not given, this func will assume it is StandardMode().

Note

StandardMode = CornerCornerModeTypeA, also represented as “xyxy” for 2D and “xyzxyz” for 3D.

mode can be:

str: choose from BoxModeName, for example,
- “xyxy”: boxes has format [xmin, ymin, xmax, ymax]
- “xyzxyz”: boxes has format [xmin, ymin, zmin, xmax, ymax, zmax]
- “xxyy”: boxes has format [xmin, xmax, ymin, ymax]
- “xxyyzz”: boxes has format [xmin, xmax, ymin, ymax, zmin, zmax]
- “xyxyzz”: boxes has format [xmin, ymin, xmax, ymax, zmin, zmax]
- “xywh”: boxes has format [xmin, ymin, xsize, ysize]
- “xyzwhd”: boxes has format [xmin, ymin, zmin, xsize, ysize, zsize]
- “ccwh”: boxes has format [xcenter, ycenter, xsize, ysize]
- “cccwhd”: boxes has format [xcenter, ycenter, zcenter, xsize, ysize, zsize]
BoxMode class: choose from the subclasses of BoxMode, for example,
- CornerCornerModeTypeA: equivalent to “xyxy” or “xyzxyz”
- CornerCornerModeTypeB: equivalent to “xxyy” or “xxyyzz”
- CornerCornerModeTypeC: equivalent to “xyxy” or “xyxyzz”
- CornerSizeMode: equivalent to “xywh” or “xyzwhd”
- CenterSizeMode: equivalent to “ccwh” or “cccwhd”
BoxMode object: choose from the subclasses of BoxMode, for example,
- CornerCornerModeTypeA(): equivalent to “xyxy” or “xyzxyz”
- CornerCornerModeTypeB(): equivalent to “xxyy” or “xxyyzz”
- CornerCornerModeTypeC(): equivalent to “xyxy” or “xyxyzz”
- CornerSizeMode(): equivalent to “xywh” or “xyzwhd”
- CenterSizeMode(): equivalent to “ccwh” or “cccwhd”
None: will assume mode is StandardMode()

Return type:: BoxMode
Returns:: BoxMode object

Example

mode = "xyzxyz"
get_boxmode(mode) # will return CornerCornerModeTypeA()

monai.data.box_utils.get_spatial_dims(boxes=None, points=None, corners=None, spatial_size=None)[source]#

Get spatial dimension for the giving setting and check the validity of them. Missing input is allowed. But at least one of the input value should be given. It raises ValueError if the dimensions of multiple inputs do not match with each other.

Parameters:

boxes (UnionType[Tensor, ndarray, None]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray
points (UnionType[Tensor, ndarray, None]) – point coordinates, [x, y] or [x, y, z], Nx2 or Nx3 torch tensor or ndarray
corners (UnionType[Sequence, None]) – corners of boxes, 4-element or 6-element tuple, each element is a Nx1 torch tensor or ndarray
spatial_size (UnionType[Sequence[int], Tensor, ndarray, None]) – The spatial size of the image where the boxes are attached. len(spatial_size) should be in [2, 3].

Returns:

spatial_dims, number of spatial dimensions of the bounding boxes.

Return type:

int

Example

boxes = torch.ones(10,6)
get_spatial_dims(boxes, spatial_size=[100,200,200]) # will return 3
get_spatial_dims(boxes, spatial_size=[100,200]) # will raise ValueError
get_spatial_dims(boxes) # will return 3

monai.data.box_utils.is_valid_box_values(boxes)[source]#

This function checks whether the box size is non-negative.

Parameters:: boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
Return type:: bool
Returns:: whether boxes is valid

monai.data.box_utils.non_max_suppression(boxes, scores, nms_thresh, max_proposals=-1, box_overlap_metric=<function box_iou>)[source]#

Non-maximum suppression (NMS).

Parameters:

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
scores (Union[ndarray, Tensor]) – prediction scores of the boxes, sized (N,). This function keeps boxes with higher scores.
nms_thresh (float) – threshold of NMS. Discards all overlapping boxes with box_overlap > nms_thresh.
max_proposals (int) – maximum number of boxes it keeps. If max_proposals = -1, there is no limit on the number of boxes that are kept.
box_overlap_metric (Callable) – the metric to compute overlap between boxes.

Return type:

Union[ndarray, Tensor]

Returns:

Indexes of boxes that are kept after NMS.

Example

boxes = torch.ones(10,6)
scores = torch.ones(10)
keep = non_max_suppression(boxes, scores, num_thresh=0.1)
boxes_after_nms = boxes[keep]

monai.data.box_utils.spatial_crop_boxes(boxes, roi_start, roi_end, remove_empty=True)[source]#

This function generate the new boxes when the corresponding image is cropped to the given ROI. When remove_empty=True, it makes sure the bounding boxes are within the new cropped image.

Parameters:

boxes (~NdarrayTensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
roi_start (Union[Sequence[int], ndarray, Tensor]) – voxel coordinates for start of the crop ROI, negative values allowed.
roi_end (Union[Sequence[int], ndarray, Tensor]) – voxel coordinates for end of the crop ROI, negative values allowed.
remove_empty (bool) – whether to remove the boxes that are actually empty

Return type:

tuple[~NdarrayTensor, Union[ndarray, Tensor]]

Returns:

cropped boxes, boxes[keep], does not share memory with original boxes
keep, it indicates whether each box in boxes are kept when remove_empty=True.

monai.data.box_utils.standardize_empty_box(boxes, spatial_dims)[source]#

When boxes are empty, this function standardize it to shape of (0,4) or (0,6).

Parameters:

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 or empty torch tensor or ndarray
spatial_dims (int) – number of spatial dimensions of the bounding boxes.

Return type:

Union[ndarray, Tensor]

Returns:

bounding boxes with shape (N,4) or (N,6), N can be 0.

Example

boxes = torch.ones(0,)
standardize_empty_box(boxes, 3)

Video datasets#

VideoDataset#

class monai.data.video_dataset.VideoDataset(video_source, transform=None, max_num_frames=None, color_order=RGB, multiprocessing=False, channel_dim=0)[source]#

VideoFileDataset#

class monai.data.video_dataset.VideoFileDataset(*args, **kwargs)[source]#

Video dataset from file.

This class requires that OpenCV be installed.

CameraDataset#

class monai.data.video_dataset.CameraDataset(video_source, transform=None, max_num_frames=None, color_order=RGB, multiprocessing=False, channel_dim=0)[source]#

Video dataset from a capture device (e.g., webcam).

This class requires that OpenCV be installed.

Parameters:

video_source (UnionType[str, int]) – index of capture device. get_num_devices can be used to determine possible devices.
transform (UnionType[Callable, None]) – transform to be applied to each frame.
max_num_frames (UnionType[int, None]) – Max number of frames to iterate across. If None is passed, then the dataset will iterate infinitely.

Raises:

RuntimeError – OpenCV not installed.

Data#

Generic Interfaces#

Dataset#

IterableDataset#

DatasetFunc#

ShuffleBuffer#

CSVIterableDataset#

PersistentDataset#

GDSDataset#

CacheNTransDataset#

LMDBDataset#

CacheDataset#

SmartCacheDataset#

ZipDataset#

ArrayDataset#

ImageDataset#

NPZDictItemDataset#

CSVDataset#

Patch-based dataset#

GridPatchDataset#

PatchDataset#

PatchIter#

PatchIterd#

Image reader#

ImageReader#

ITKReader#

NibabelReader#

NumpyReader#

PILReader#

NrrdReader#

Image writer#

resolve_writer#

register_writer#

ImageWriter#

ITKWriter#

NibabelWriter#

PILWriter#

Synthetic#

Ouput folder layout#

Utilities#

Partition Dataset#

Partition Dataset based on classes#

DistributedSampler#

DistributedWeightedRandomSampler#

DatasetSummary#

Decathlon Datalist#

DataLoader#

ThreadBuffer#

ThreadDataLoader#

TestTimeAugmentation#

N-Dim Fourier Transform#

ITK Torch Bridge#

Meta Object#

MetaTensor#

Whole slide image reader#

BaseWSIReader#

WSIReader#

CuCIMWSIReader#

OpenSlideWSIReader#

TiffFileWSIReader#

Whole slide image datasets#

PatchWSIDataset#

MaskedPatchWSIDataset#

SlidingPatchWSIDataset#

Bounding box#

Video datasets#

VideoDataset#

VideoFileDataset#

CameraDataset#

This Page