ngff_zarr.compute_omero

Compute OMERO metadata from NgffImage data.

Module Contents

Functions

_validate_quantiles

Validate that quantiles are in valid range and properly ordered.

_validate_color

Validate that a color is a valid 6-digit hexadecimal string.

get_default_colors

Get default colors for channels.

_compute_channel_statistics

Compute min, max, and quantiles for a single channel.

_compute_quantiles_approximate

Compute quantiles using Dask’s approximate percentile algorithm.

_compute_quantiles_dense

Compute quantiles using histogram-based dense sampling.

compute_omero_from_ngff_image

Compute OMERO metadata from an NgffImage.

_select_image_for_omero_stats

Select the best image level for OMERO statistics computation.

compute_omero_from_multiscales

Compute OMERO metadata from a NgffMultiscales object.

Data

API

ngff_zarr.compute_omero.GLASBEY_COLORS: list[str]

[‘30A2DA’, ‘FC4F30’, ‘E5AE38’, ‘6D904F’, ‘8B8B8B’, ‘17BECF’, ‘9467BD’, ‘D62728’, ‘1F77B4’, ‘E377C2’,…

ngff_zarr.compute_omero._OMERO_STATS_LARGE_IMAGE_THRESHOLD

None

ngff_zarr.compute_omero._OMERO_STATS_MIN_PIXELS

None

ngff_zarr.compute_omero._validate_quantiles(quantiles: tuple[float, float]) None

Validate that quantiles are in valid range and properly ordered.

Args: quantiles: Tuple of (low, high) quantile values

Raises: ValueError: If quantiles are invalid

ngff_zarr.compute_omero._validate_color(color: str) None

Validate that a color is a valid 6-digit hexadecimal string.

Args: color: Hex color string (without # prefix)

Raises: ValueError: If color is invalid

ngff_zarr.compute_omero.get_default_colors(n_channels: int) list[str]

Get default colors for channels.

For a single channel, returns white (FFFFFF). For multiple channels, uses the Glasbey color progression.

Args: n_channels: Number of channels

Returns: List of hex color strings (without # prefix)

ngff_zarr.compute_omero._compute_channel_statistics(
data: dask.array.Array,
quantiles: tuple[float, float],
dense: bool = False,
) tuple[float, float, float, float]

Compute min, max, and quantiles for a single channel.

Uses dask.array operations to efficiently compute statistics without loading all data into memory.

Args: data: Dask array for a single channel (can be multi-dimensional) quantiles: Tuple of (low, high) quantile values (e.g., (0.02, 0.98)) dense: If True, use histogram-based dense sampling for exact quantile computation over the full data. If False (default), use Dask’s approximate percentile algorithm which is faster but may produce less accurate quantiles for large multi-chunk datasets.

Returns: Tuple of (min, max, q_low, q_high) as floats

Note: When dense=False, the quantile computation uses Dask’s approximate percentile algorithm which processes chunks independently and merges results. This is memory-efficient and suitable for visualization window parameters, though results may differ slightly from exact quantiles.

When dense=True, a histogram with fine bins is computed over the full
data via ``da.histogram``, then quantiles are derived from the
cumulative distribution. This gives exact results within bin
resolution and is well-suited for large datasets where approximate
percentiles may be inaccurate.
ngff_zarr.compute_omero._compute_quantiles_approximate(
flat_data: dask.array.Array,
quantiles: tuple[float, float],
min_val: float,
max_val: float,
) tuple[float, float, float, float]

Compute quantiles using Dask’s approximate percentile algorithm.

This is fast and memory-efficient but may produce less accurate results for large datasets with many chunks, since it computes per-chunk percentiles and merges them.

Args: flat_data: Flattened 1-D dask array quantiles: Tuple of (low, high) quantile values min_val: Pre-computed minimum value max_val: Pre-computed maximum value

Returns: Tuple of (min, max, q_low, q_high) as floats

ngff_zarr.compute_omero._compute_quantiles_dense(
flat_data: dask.array.Array,
quantiles: tuple[float, float],
min_val: float,
max_val: float,
dtype: numpy.dtype,
) tuple[float, float, float, float]

Compute quantiles using histogram-based dense sampling.

Builds a fine-grained histogram over the full data using da.histogram, then derives exact quantiles from the cumulative distribution function. This processes all data in a parallelized pass through dask’s lazy evaluation and gives exact results within bin resolution.

For integer dtypes, the number of bins equals the number of distinct possible values (capped at 65536). For float dtypes, 65536 bins are used.

Args: flat_data: Flattened 1-D dask array quantiles: Tuple of (low, high) quantile values min_val: Pre-computed minimum value max_val: Pre-computed maximum value dtype: Data type of the array (used to choose bin count)

Returns: Tuple of (min, max, q_low, q_high) as floats

ngff_zarr.compute_omero.compute_omero_from_ngff_image(
ngff_image: ngff_zarr.ngff_image.NgffImage,
quantiles: tuple[float, float] = (0.02, 0.98),
colors: collections.abc.Sequence[str] | None = None,
labels: collections.abc.Sequence[str] | None = None,
dense: bool = False,
) ngff_zarr.v04.zarr_metadata.Omero

Compute OMERO metadata from an NgffImage.

This function computes visualization parameters (OMERO metadata) from image data:

  • min/max: The actual data range (exact values)

  • start/end: Display window based on quantiles (default 2% and 98%)

For multi-channel images (with ‘c’ dimension), statistics are computed separately for each channel, resulting in per-channel OMERO windows.

Edge cases:

  • If all values in a channel are NaN, the statistics will be NaN.

  • If a channel has constant values, min/max/start/end will all be the same.

Args: ngff_image: The NgffImage to compute metadata for quantiles: Tuple of (low, high) quantile values for the display window. Must be between 0 and 1, with low < high. Default is (0.02, 0.98) for 2% and 98% quantiles. colors: Optional list of hex color strings (without #) for each channel. Must be 6-digit hexadecimal strings (e.g., “FF0000” for red). If not provided, uses white for single channel or Glasbey progression for multi-channel. labels: Optional list of label strings for each channel. If not provided, uses channel_names from NgffImage if available. If channel_names is also not available or has fewer entries than channels, uses empty strings for remaining channels. When explicitly provided, must have at least as many labels as channels (ValueError raised if insufficient). dense: If True, use histogram-based dense sampling for exact quantile computation over the full data. If False (default), use Dask’s approximate percentile algorithm which is faster but may produce less accurate quantiles for large multi-chunk datasets.

Returns: Omero metadata with computed window parameters for each channel.

Raises: ValueError: If quantiles are invalid, colors are invalid format, or not enough colors/labels explicitly provided.

Note: The behavior differs between explicit labels and channel_names from NgffImage: - Explicit labels parameter: Must provide at least as many labels as channels, otherwise ValueError is raised. This ensures intentional labeling is complete. - channel_names from NgffImage: Can be shorter than the number of channels, in which case remaining channels get empty string labels. This allows partial metadata from sources like OME-XML where not all channels may have names.

Example: >>> # Using channel_names from NgffImage (e.g., from OME-TIFF) >>> image = to_ngff_image(data, dims=[“c”, “z”, “y”, “x”]) >>> image.channel_names = [“DAPI”, “GFP”, “RFP”] >>> omero = compute_omero_from_ngff_image(image) >>> omero.channels[0].label # “DAPI”

>>> # Explicit labels override channel_names
>>> omero = compute_omero_from_ngff_image(
...     image, labels=["Red", "Green", "Blue"]
... )
>>> omero.channels[0].label  # "Red"
ngff_zarr.compute_omero._select_image_for_omero_stats(
multiscales: NgffMultiscales,
) ngff_zarr.ngff_image.NgffImage

Select the best image level for OMERO statistics computation.

For large multi-level images (e.g. whole-slide images), using the full resolution level with Dask’s approximate percentile algorithm gives very inaccurate results because most chunks contain only uniform background pixels. The per-chunk percentile of a background chunk is near the maximum value, causing the merged 2% quantile to be vastly overestimated.

The lowest-resolution level contains the same pixel value distribution (just spatially downsampled), but has far fewer chunks, so the approximate percentile is much more accurate. As a heuristic, any level with more than 1 M spatial pixels in the x/y plane is considered “large” and we step down through the pyramid until we find a level that is small enough.

Args: multiscales: The NgffMultiscales object whose images to search.

Returns: The selected NgffImage level.

ngff_zarr.compute_omero.compute_omero_from_multiscales(
multiscales: ngff_zarr.multiscales.NgffMultiscales,
quantiles: tuple[float, float] = (0.02, 0.98),
colors: collections.abc.Sequence[str] | None = None,
labels: collections.abc.Sequence[str] | None = None,
dense: bool = False,
) ngff_zarr.v04.zarr_metadata.Omero

Compute OMERO metadata from a NgffMultiscales object.

This is a convenience function that computes OMERO metadata from the multiscales pyramid.

For small images the highest-resolution level is used (most accurate). For large images (e.g. whole-slide images with > 1 M spatial pixels) the lowest-resolution level with at least 256x256 spatial pixels is used instead. Dask’s approximate percentile algorithm is much more accurate on a small, representative image than on a very large one where most chunks contain only background pixels.

Uses memory-efficient computation that processes data in chunks. This makes it safe to use with very large datasets without risk of memory exhaustion.

Args: multiscales: The NgffMultiscales object to compute metadata for quantiles: Tuple of (low, high) quantile values for the display window. colors: Optional list of hex color strings for each channel. labels: Optional list of label strings for each channel. dense: If True, use histogram-based dense sampling for exact quantile computation over the full data. If False (default), use Dask’s approximate percentile algorithm.

Returns: Omero metadata with computed window parameters for each channel.