ngff_zarr.compute_omero¶
Compute OMERO metadata from NgffImage data.
Module Contents¶
Functions¶
Validate that quantiles are in valid range and properly ordered. |
|
Validate that a color is a valid 6-digit hexadecimal string. |
|
Get default colors for channels. |
|
Compute min, max, and quantiles for a single channel. |
|
Compute quantiles using Dask’s approximate percentile algorithm. |
|
Compute quantiles using histogram-based dense sampling. |
|
Compute OMERO metadata from an NgffImage. |
|
Select the best image level for OMERO statistics computation. |
|
Compute OMERO metadata from a NgffMultiscales object. |
Data¶
API¶
- ngff_zarr.compute_omero.GLASBEY_COLORS: list[str]¶
[‘30A2DA’, ‘FC4F30’, ‘E5AE38’, ‘6D904F’, ‘8B8B8B’, ‘17BECF’, ‘9467BD’, ‘D62728’, ‘1F77B4’, ‘E377C2’,…
- ngff_zarr.compute_omero._OMERO_STATS_LARGE_IMAGE_THRESHOLD¶
None
- ngff_zarr.compute_omero._OMERO_STATS_MIN_PIXELS¶
None
- ngff_zarr.compute_omero._validate_quantiles(quantiles: tuple[float, float]) None¶
Validate that quantiles are in valid range and properly ordered.
Args: quantiles: Tuple of (low, high) quantile values
Raises: ValueError: If quantiles are invalid
- ngff_zarr.compute_omero._validate_color(color: str) None¶
Validate that a color is a valid 6-digit hexadecimal string.
Args: color: Hex color string (without # prefix)
Raises: ValueError: If color is invalid
- ngff_zarr.compute_omero.get_default_colors(n_channels: int) list[str]¶
Get default colors for channels.
For a single channel, returns white (FFFFFF). For multiple channels, uses the Glasbey color progression.
Args: n_channels: Number of channels
Returns: List of hex color strings (without # prefix)
- ngff_zarr.compute_omero._compute_channel_statistics(
- data: dask.array.Array,
- quantiles: tuple[float, float],
- dense: bool = False,
Compute min, max, and quantiles for a single channel.
Uses dask.array operations to efficiently compute statistics without loading all data into memory.
Args: data: Dask array for a single channel (can be multi-dimensional) quantiles: Tuple of (low, high) quantile values (e.g., (0.02, 0.98)) dense: If True, use histogram-based dense sampling for exact quantile computation over the full data. If False (default), use Dask’s approximate percentile algorithm which is faster but may produce less accurate quantiles for large multi-chunk datasets.
Returns: Tuple of (min, max, q_low, q_high) as floats
Note: When dense=False, the quantile computation uses Dask’s approximate percentile algorithm which processes chunks independently and merges results. This is memory-efficient and suitable for visualization window parameters, though results may differ slightly from exact quantiles.
When dense=True, a histogram with fine bins is computed over the full data via ``da.histogram``, then quantiles are derived from the cumulative distribution. This gives exact results within bin resolution and is well-suited for large datasets where approximate percentiles may be inaccurate.
- ngff_zarr.compute_omero._compute_quantiles_approximate( ) tuple[float, float, float, float]¶
Compute quantiles using Dask’s approximate percentile algorithm.
This is fast and memory-efficient but may produce less accurate results for large datasets with many chunks, since it computes per-chunk percentiles and merges them.
Args: flat_data: Flattened 1-D dask array quantiles: Tuple of (low, high) quantile values min_val: Pre-computed minimum value max_val: Pre-computed maximum value
Returns: Tuple of (min, max, q_low, q_high) as floats
- ngff_zarr.compute_omero._compute_quantiles_dense(
- flat_data: dask.array.Array,
- quantiles: tuple[float, float],
- min_val: float,
- max_val: float,
- dtype: numpy.dtype,
Compute quantiles using histogram-based dense sampling.
Builds a fine-grained histogram over the full data using
da.histogram, then derives exact quantiles from the cumulative distribution function. This processes all data in a parallelized pass through dask’s lazy evaluation and gives exact results within bin resolution.For integer dtypes, the number of bins equals the number of distinct possible values (capped at 65536). For float dtypes, 65536 bins are used.
Args: flat_data: Flattened 1-D dask array quantiles: Tuple of (low, high) quantile values min_val: Pre-computed minimum value max_val: Pre-computed maximum value dtype: Data type of the array (used to choose bin count)
Returns: Tuple of (min, max, q_low, q_high) as floats
- ngff_zarr.compute_omero.compute_omero_from_ngff_image(
- ngff_image: ngff_zarr.ngff_image.NgffImage,
- quantiles: tuple[float, float] = (0.02, 0.98),
- colors: collections.abc.Sequence[str] | None = None,
- labels: collections.abc.Sequence[str] | None = None,
- dense: bool = False,
Compute OMERO metadata from an NgffImage.
This function computes visualization parameters (OMERO metadata) from image data:
min/max: The actual data range (exact values)
start/end: Display window based on quantiles (default 2% and 98%)
For multi-channel images (with ‘c’ dimension), statistics are computed separately for each channel, resulting in per-channel OMERO windows.
Edge cases:
If all values in a channel are NaN, the statistics will be NaN.
If a channel has constant values, min/max/start/end will all be the same.
Args: ngff_image: The NgffImage to compute metadata for quantiles: Tuple of (low, high) quantile values for the display window. Must be between 0 and 1, with low < high. Default is (0.02, 0.98) for 2% and 98% quantiles. colors: Optional list of hex color strings (without #) for each channel. Must be 6-digit hexadecimal strings (e.g., “FF0000” for red). If not provided, uses white for single channel or Glasbey progression for multi-channel. labels: Optional list of label strings for each channel. If not provided, uses channel_names from NgffImage if available. If channel_names is also not available or has fewer entries than channels, uses empty strings for remaining channels. When explicitly provided, must have at least as many labels as channels (ValueError raised if insufficient). dense: If True, use histogram-based dense sampling for exact quantile computation over the full data. If False (default), use Dask’s approximate percentile algorithm which is faster but may produce less accurate quantiles for large multi-chunk datasets.
Returns: Omero metadata with computed window parameters for each channel.
Raises: ValueError: If quantiles are invalid, colors are invalid format, or not enough colors/labels explicitly provided.
Note: The behavior differs between explicit labels and channel_names from NgffImage: - Explicit labels parameter: Must provide at least as many labels as channels, otherwise ValueError is raised. This ensures intentional labeling is complete. - channel_names from NgffImage: Can be shorter than the number of channels, in which case remaining channels get empty string labels. This allows partial metadata from sources like OME-XML where not all channels may have names.
Example: >>> # Using channel_names from NgffImage (e.g., from OME-TIFF) >>> image = to_ngff_image(data, dims=[“c”, “z”, “y”, “x”]) >>> image.channel_names = [“DAPI”, “GFP”, “RFP”] >>> omero = compute_omero_from_ngff_image(image) >>> omero.channels[0].label # “DAPI”
>>> # Explicit labels override channel_names >>> omero = compute_omero_from_ngff_image( ... image, labels=["Red", "Green", "Blue"] ... ) >>> omero.channels[0].label # "Red"
- ngff_zarr.compute_omero._select_image_for_omero_stats(
- multiscales: NgffMultiscales,
Select the best image level for OMERO statistics computation.
For large multi-level images (e.g. whole-slide images), using the full resolution level with Dask’s approximate percentile algorithm gives very inaccurate results because most chunks contain only uniform background pixels. The per-chunk percentile of a background chunk is near the maximum value, causing the merged 2% quantile to be vastly overestimated.
The lowest-resolution level contains the same pixel value distribution (just spatially downsampled), but has far fewer chunks, so the approximate percentile is much more accurate. As a heuristic, any level with more than 1 M spatial pixels in the x/y plane is considered “large” and we step down through the pyramid until we find a level that is small enough.
Args: multiscales: The NgffMultiscales object whose images to search.
Returns: The selected NgffImage level.
- ngff_zarr.compute_omero.compute_omero_from_multiscales(
- multiscales: ngff_zarr.multiscales.NgffMultiscales,
- quantiles: tuple[float, float] = (0.02, 0.98),
- colors: collections.abc.Sequence[str] | None = None,
- labels: collections.abc.Sequence[str] | None = None,
- dense: bool = False,
Compute OMERO metadata from a NgffMultiscales object.
This is a convenience function that computes OMERO metadata from the multiscales pyramid.
For small images the highest-resolution level is used (most accurate). For large images (e.g. whole-slide images with > 1 M spatial pixels) the lowest-resolution level with at least 256x256 spatial pixels is used instead. Dask’s approximate percentile algorithm is much more accurate on a small, representative image than on a very large one where most chunks contain only background pixels.
Uses memory-efficient computation that processes data in chunks. This makes it safe to use with very large datasets without risk of memory exhaustion.
Args: multiscales: The NgffMultiscales object to compute metadata for quantiles: Tuple of (low, high) quantile values for the display window. colors: Optional list of hex color strings for each channel. labels: Optional list of label strings for each channel. dense: If True, use histogram-based dense sampling for exact quantile computation over the full data. If False (default), use Dask’s approximate percentile algorithm.
Returns: Omero metadata with computed window parameters for each channel.