Skip to content

Cookbook

Practical, end-to-end recipes for common Pictologics workflows. Each recipe shows how to combine loading, preprocessing, feature extraction, and result export into reusable scripts.

Share Your Configurations

All custom configurations used in these recipes can be exported to YAML/JSON files for reproducibility. Use pipeline.save_configs("my_configs.yaml") to save and share configurations. See the Configuration & Reproducibility guide for details.

Case 1: Batch radiomics from a folder of NIfTI files (no masks)

Scenario

You have a folder of NIfTI volumes where each file is a separate pre-cropped lesion export. There are no separate mask files — voxels outside the region of interest may be filled with a sentinel value (e.g., -2048 HU).

You want to:

  • Process every *.nii / *.nii.gz file in a folder.
  • Use the whole image as the initial ROI (because no mask is provided).
  • Automatically detect and handle sentinel values in each image.
  • Use sentinel-aware resampling and filtering (masked interpolation / normalized convolution).
  • Resample to 0.5×0.5×0.5 mm.
  • Restrict the ROI to intensities in [-100, 3000] (CT HU example).
  • Remove disjoint parts by keeping the largest connected component.
  • Compute all radiomic feature families for four discretisation settings:
  • FBN with n_bins=8
  • FBN with n_bins=16
  • FBS with bin_width=8
  • FBS with bin_width=16
  • Export a single wide CSV where:
  • Each row is one input NIfTI file.
  • Columns include metadata (e.g., filename) + all computed features.
  • Save pipeline logs to a separate folder.

Important notes

Note

  • Maskless pipeline runs: RadiomicsPipeline.run(...) accepts mask=None, mask="", or an omitted mask argument. In that case, Pictologics generates a full (all-ones) ROI mask internally (whole-image ROI).
  • Empty ROI is an error: If your preprocessing removes all voxels (e.g., a too-tight HU range), the pipeline raises a clear error instead of silently returning empty outputs.
  • Morphology on whole-image ROI: Shape features describe the shape of the ROI mask. With a maskless run, the ROI starts as the full image volume. The resegment step then restricts the ROI to voxels within a valid intensity range, removing sentinel-valued voxels. After resegmentation and connected-component filtering, the resulting morphology features describe the shape of the actual data region (e.g., the lesion) rather than the full image bounding box. This is generally the desired behavior, but verify that the derived ROI matches your scientific intent.
  • Sentinel value handling — two complementary mechanisms: These NIfTI files may contain sentinel values (e.g., -1024, -2048) marking missing or invalid data. Handling them properly requires both of the following:

    1. source_mode="auto" creates a source mask that protects resampling and filtering only. It applies masked interpolation (resampling ignores sentinel neighbors) and normalized convolution (filter kernels exclude sentinel voxels). However, it does not change the ROI — sentinel voxels still remain in the region used for feature extraction.
    2. resegment restricts the ROI to a valid intensity range, effectively excluding sentinel voxels from feature extraction. Without this step, sentinel values would bias intensity statistics, texture matrices, and all other computed features.

    You typically need both: source_mode="auto" to protect preprocessing, and resegment to define the correct ROI for analysis.

  • Automatic sentinel detection: With source_mode="auto", the pipeline scans for common sentinel values (-2048, -1024, -1000, 0, -32768) and emits a UserWarning if one is found, so you can verify which value was detected in each image.

  • Mask generation: When mask is omitted, the pipeline generates a full ROI mask. The resegment step then removes sentinel voxels from the ROI, effectively deriving the correct analysis mask automatically from the non-sentinel voxels.

Performance Tip: Manual Feature Separation

This example demonstrates manual feature separation for maximum efficiency:

  • case1_orig: Extracts morphology and intensity features (discretization-independent) — computed once
  • case1_fbn_* / case1_fbs_*: Extract only texture, histogram, and IVH features (discretization-dependent)

This approach avoids redundant computation entirely. The orig results contain morphology/intensity that apply to all discretization strategies, while each discretization config contains only the features that actually depend on it. See Case 8 for a detailed explanation of this technique.

Note that you don't have to do this manually — the pipeline's automatic deduplication can detect shared preprocessing steps across configurations and avoid recomputing them for you. This is demonstrated in Case 7.

Full example script

Full example script

from pathlib import Path

from pictologics import RadiomicsPipeline
from pictologics.results import format_results, save_results


def main():
    # Configure paths
    input_dir = Path("path/to/nifti_folder")
    output_csv = Path("results.csv")
    log_dir = Path("logs")
    log_dir.mkdir(parents=True, exist_ok=True)

    # Define common preprocessing steps
    base_steps = [
        # Resample to 0.5mm isotropic. Round intensities to integers (useful for HU).
        {"step": "resample", "params": {"new_spacing": (0.5, 0.5, 0.5), "round_intensities": True}},
        # Exclude voxels outside standard HU range
        {"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
        # Keep only the largest connected component of the ROI mask
        {"step": "keep_largest_component", "params": {"apply_to": "morph"}},
    ]

    # Shared feature extraction for discretization-independent features
    extract_orig = {
        "step": "extract_features",
        "params": {
            "families": ["morphology", "intensity"],  # Not affected by discretization
            "include_spatial_intensity": False,
            "include_local_intensity": False,
        },
    }

    # Shared feature extraction for discretization-dependent features
    extract_discretized = {
        "step": "extract_features",
        "params": {
            "families": ["texture", "histogram", "ivh"],  # Require discretization
            "include_spatial_intensity": False,
            "include_local_intensity": False,
        },
    }

    # Initialize the pipeline (deduplication enabled by default)
    pipeline = RadiomicsPipeline()

    # Add "orig" config for discretization-independent features (computed once)
    # source_mode="auto" detects sentinel values and protects resampling/filtering
    pipeline.add_config(
        "case1_orig",
        base_steps + [extract_orig],
        source_mode="auto",
    )

    # Add discretization-dependent configs (texture/histogram/ivh only)
    for n_bins in (8, 16):
        pipeline.add_config(
            f"case1_fbn_{n_bins}",
            base_steps + [
                {"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
                extract_discretized,
            ],
            source_mode="auto",
        )

    for bin_width in (8, 16):
        pipeline.add_config(
            f"case1_fbs_{bin_width}",
            base_steps + [
                {"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}},
                extract_discretized,
            ],
            source_mode="auto",
        )

    # Prepare for batch processing
    rows = []
    nifti_paths = sorted(input_dir.glob("*.nii*"))
    if not nifti_paths:
        raise ValueError(f"No NIfTI files found in: {input_dir}")

    # Process each NIfTI file
    for path in nifti_paths:
        # Simple suffix stripping using Python 3.9+ removesuffix
        subject_id = path.name.removesuffix(".nii.gz").removesuffix(".nii")
        pipeline.clear_log()

        # Run pipeline (mask omitted -> whole-image ROI)
        # "case1_orig" contains morphology/intensity, others contain texture/histogram/ivh
        results = pipeline.run(
            image=str(path),
            subject_id=subject_id,
            config_names=["case1_orig", "case1_fbn_8", "case1_fbn_16", "case1_fbs_8", "case1_fbs_16"],
        )

        # Format results as flat dictionary and add metadata
        row = format_results(
            results,
            fmt="wide",
            meta={"subject_id": subject_id, "file": str(path)}
        )
        rows.append(row)

        # Save per-case logs
        pipeline.save_log(str(log_dir / f"{subject_id}.json"))

    # Consolidated export of all results
    save_results(rows, output_csv)
    print(f"Wrote {len(rows)} rows to {output_csv}")

if __name__ == "__main__":
    main()

Output format

  • One row per file.
  • Columns include:
  • subject_id
  • file
  • Feature columns prefixed by configuration name (e.g., case1_fbn_8__mean_intensity_Q4LE).

Alternative: Explicit Sentinel Value

If you know the sentinel value in advance, use source_mode="roi_only" with sentinel_value for deterministic behavior:

pipeline.add_config(
    "case1_orig",
    base_steps + [extract_orig],
    source_mode="roi_only",
    sentinel_value=-2048,
)

Case 2: Batch radiomics from DICOM case folders (Image + Segmentation)

Scenario

You have a folder of cases. Each case is a separate folder and contains two subfolders:

  • Image/: the DICOM image series (CT/MR/etc.), stored at an arbitrary depth.
  • Segmentation/: the DICOM segmentation, also stored at an arbitrary depth.

You want to:

  • For each case, recursively discover the DICOM series folders.
  • Load the image series and segmentation series.
  • Convert the segmentation to a binary ROI mask by keeping all voxels where the segmentation value is > 0 (handled automatically during loading).
  • Resample to 1×1×1 mm.
  • Do not apply intensity filtering/resegmentation, and do not keep the largest connected component.
  • Compute all radiomic feature families for two discretisations:
  • FBS with bin_width=256
  • FBN with n_bins=64
  • Export a long-format CSV (tidy data, one row per feature).
  • Save pipeline logs into a separate folder (one JSON per case).
  • Show a progress bar during batch processing.

Notes

Note

  • Progress bar dependency: This example uses tqdm.
    • If you are running this outside the library repo, install it with pip install tqdm.
    • If you are adding it to your Poetry-managed project, use poetry add tqdm.
  • Segmentation DICOM at arbitrary depth: The helper _find_dicom_series_root(...) looks for the subfolder with the most .dcm files and uses that as the series root.
  • Multiple masks in one SEG: If your segmentation DICOM encodes multiple labels (e.g., values 1..N), the _binarize_segmentation_mask(...) step turns it into a single ROI by keeping all voxels where the value is > 0.
  • Multi-phase DICOM series: If your image series contains multiple phases (e.g., cardiac CT with 10%, 20%... phases), use get_dicom_phases() to discover available phases and dataset_index to select one:
    from pictologics.utilities import get_dicom_phases
    
    phases = get_dicom_phases(str(image_root))
    print(f"Found {len(phases)} phases")
    # Load a specific phase (default is 0)
    image = load_image(str(image_root), recursive=True, dataset_index=0)
    

Full example script

Full example script

from pathlib import Path
import numpy as np
from pictologics import Image, RadiomicsPipeline, load_image, load_and_merge_images
from pictologics.results import format_results, save_results


def main():
    # Configure paths
    cases_dir = Path("path/to/cases_root")
    output_csv = Path("dicom_results_long.csv")
    log_dir = Path("dicom_logs")
    log_dir.mkdir(parents=True, exist_ok=True)

    # Find all case directories
    case_dirs = sorted([p for p in cases_dir.iterdir() if p.is_dir()])
    if not case_dirs:
        raise ValueError(f"No case folders found in: {cases_dir}")

    # Shared feature extraction settings
    extract_all = {
        "step": "extract_features",
        "params": {
            "families": ["intensity", "morphology", "texture", "histogram", "ivh"],
            "include_spatial_intensity": False,
            "include_local_intensity": False,
        },
    }

    # Initialize the pipeline
    pipeline = RadiomicsPipeline()

    # Configuration A: Fixed Bin Size
    pipeline.add_config(
        "case2_fbs_256",
        [
            {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
            {"step": "discretise", "params": {"method": "FBS", "bin_width": 256.0}},
            extract_all,
        ],
    )

    # Configuration B: Fixed Bin Number
    pipeline.add_config(
        "case2_fbn_64",
        [
            {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
            {"step": "discretise", "params": {"method": "FBN", "n_bins": 64}},
            extract_all,
        ],
    )

    # Use tqdm for a progress bar
    from tqdm import tqdm

    rows = []
    for case_dir in tqdm(case_dirs, desc="Radiomics (DICOM cases)", unit="case"):
        subject_id = case_dir.name
        image_root = case_dir / "Image"
        seg_root = case_dir / "Segmentation"

        # load_image with recursive=True finds the best series folder automatically
        image = load_image(str(image_root), recursive=True)

        # Load the segmentation using load_and_merge_images with binarize=True
        # recursive=True ensures we find the DICOM series inside the Segmentation folder
        mask = load_and_merge_images(
            [str(seg_root)], 
            binarize=True, 
            recursive=True
        )

        pipeline.clear_log()

        # Execute extraction for both configurations
        results = pipeline.run(
            image=image,
            mask=mask,
            subject_id=subject_id,
            config_names=["case2_fbs_256", "case2_fbn_64"],
        )

        # Format results and store
        row = format_results(
            results,
            fmt="long",  # Tidy format: [subject_id, config, feature_name, value]
            meta={
                "subject_id": subject_id,
                "image_root": str(image_root),
                "seg_root": str(seg_root),
            },
        )
        rows.append(row)

        # Save per-case logs
        pipeline.save_log(str(log_dir / f"{subject_id}.json"))

    # Final export
    save_results(rows, output_csv)
    print(f"Wrote {len(rows)} rows to {output_csv}")

if __name__ == "__main__":
    main()

Case 3: Batch radiomics from a flat NIfTI folder (multiple masks per image)

Scenario

You have a single large folder containing both images and masks as NIfTI files.

  • Image files end with _IMG (e.g., CASE001_IMG.nii.gz).
  • Mask files end with MASKn where n is a number (e.g., CASE001_MASK1.nii.gz, CASE001_MASK2.nii.gz).
  • There can be multiple segmentation masks per image.

You want to:

  • For each image, automatically find all its corresponding masks.
  • Merge all masks into a single ROI using load_and_merge_images(...).
  • Do not apply any preprocessing (no resampling, no thresholding, no connected components).
  • Compute radiomics for the six standard discretisations (FBN 8/16/32 and FBS 8/16/32).
  • Export a long-format JSON file for easy ingestion into NoSQL databases or web apps.
  • Save logs for each case into a separate folder.
  • Show a progress bar during batch processing.

Tip

Flexible Image Loading The load_and_merge_images function uses load_image for each path in its input list. This means you can merge: - Multiple NIfTI files (as shown here). - DICOM series folders. - Single DICOM slice files. - Or a mix of these formats, provided they share the same spatial geometry.

You can also pass recursive=True (to search subfolders) or dataset_index=N (for 4D files) to control how each mask is loaded.

Performance Tip: Deduplication

This example runs 6 configurations where only discretisation differs. With deduplication enabled by default, morphology and intensity features are computed once and reused across all configurations, providing significant speedup. To disable this optimization, set deduplicate=False. See Deduplication for details.

Full example script

Full example script

from pathlib import Path

import numpy as np

from pictologics import Image, RadiomicsPipeline, load_and_merge_images, load_image
from pictologics.results import format_results, save_results


def main():
    # Configure paths
    input_dir = Path("path/to/nifti_folder")
    output_file = Path("case3_results_long.json")
    log_dir = Path("case3_logs")
    log_dir.mkdir(parents=True, exist_ok=True)

    # Use tqdm for a progress bar
    from tqdm import tqdm

    # Identify all images (files ending in _IMG)
    image_paths = sorted(input_dir.glob("*_IMG.nii*"))
    if not image_paths:
        raise ValueError(f"No *_IMG NIfTI files found in: {input_dir}")

    # Initialize the pipeline (deduplication is enabled by default)
    # Since all 6 configs share identical preprocessing (only discretisation differs),
    # morphology and intensity features are computed once and reused across all configs
    pipeline = RadiomicsPipeline()

    # Shared feature extraction settings
    extract_all = {
        "step": "extract_features",
        "params": {
            "families": ["intensity", "morphology", "texture", "histogram", "ivh"],
            "include_spatial_intensity": False,
            "include_local_intensity": False,
        },
    }

    # Add 6 target configurations (no preprocessing requested)
    for n_bins in (8, 16, 32):
        pipeline.add_config(
            f"case3_fbn_{n_bins}",
            [
                {"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
                extract_all,
            ],
        )

    for bin_width in (8.0, 16.0, 32.0):
        pipeline.add_config(
            f"case3_fbs_{int(bin_width)}",
            [
                {"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}},
                extract_all,
            ],
        )

    target_configs = [
        "case3_fbn_8", "case3_fbn_16", "case3_fbn_32",
        "case3_fbs_8", "case3_fbs_16", "case3_fbs_32",
    ]

    # Process each image and its associated masks
    rows = []
    for img_path in tqdm(image_paths, desc="Radiomics (NIfTI images)", unit="image"):
        # Strip extensions and _IMG suffix to get case ID
        subject_id = img_path.name.removesuffix(".nii.gz").removesuffix(".nii").removesuffix("_IMG")

        # Find all corresponding masks (e.g., CASE001_MASK1.nii.gz, etc.)
        mask_paths = sorted(input_dir.glob(f"{subject_id}_MASK*.nii*"))
        if not mask_paths:
            raise ValueError(f"No masks found for {subject_id}")

        image = load_image(str(img_path))

        # Merge multiple masks into one and ensure binary semantics
        mask = load_and_merge_images(
            [str(p) for p in mask_paths],
            reference_image=image,
            binarize=True
        )

        pipeline.clear_log()

        # Run extraction for all target configurations
        results = pipeline.run(
            image=image,
            mask=mask,
            subject_id=subject_id,
            config_names=target_configs,
        )

        # Format results and store metadata
        row = format_results(
            results,
            fmt="long",
            meta={
                "subject_id": subject_id,
                "image": str(img_path),
                "masks": ";".join(str(p) for p in mask_paths),
            },
        )
        rows.append(row)

        # Save per-case logs
        pipeline.save_log(str(log_dir / f"{subject_id}.json"))

    # Consolidated export (save_results handles list of DataFrames for long format automatically)
    save_results(rows, output_file)
    print(f"Wrote {len(rows)} cases to {output_file}")


if __name__ == "__main__":
    main()

Case 4: Parallel batch radiomics from DICOM cases (merge multiple segmentation folders)

Scenario

You have a folder of cases. Each case is a separate folder and contains:

  • Image/: the DICOM image series (CT/MR/etc.), stored at an arbitrary depth.
  • Segmentation/: multiple subfolders, each containing a segmentation series at an arbitrary depth.

You want to:

  • For each case, recursively discover the DICOM image series folder.
  • Discover all segmentation series folders under Segmentation/ and load them.
  • Convert each segmentation to a binary mask (values > 0 become 1).
  • Merge all binary masks into a single ROI mask per case.
  • Apply all preprocessing steps supported by the pipeline (resample, resegment, outlier filtering, intensity rounding, and largest connected component).
  • Compute radiomics for six discretisations (FBN 8/16/32 and FBS 8/16/32).
  • Run cases in parallel on multiple CPU cores, with a user-controlled n_jobs.
  • Export a single wide JSON file (one object per case) and save per-case logs.

Notes

Note

  • Progress bar dependency: This example uses tqdm.
    • If you are running this outside the library repo, install it with pip install tqdm.
    • If you are adding it to your Poetry-managed project, use poetry add tqdm.
  • Multiprocessing requirement: On Windows/macOS, keep the parallel execution inside if __name__ == "__main__": (as shown) to avoid process-spawn issues.
  • JIT warmup in parallel workers: Pictologics performs a Numba JIT warmup at package import. With ProcessPoolExecutor, each worker is a separate Python process, so warmup happens once per worker process (on its first import of pictologics) and then stays warm for all cases that worker processes. It is not re-run for every case unless you explicitly call warmup_jit() inside your per-case function. You can disable auto-warmup via PICTOLOGICS_DISABLE_WARMUP=1 if you prefer to skip the upfront cost.
  • Preprocessing parameters are dataset-dependent: The resegment range here uses the CT HU example [-100, 3000]. Adjust or remove it for non-CT data.

Performance Tip: Deduplication

This example demonstrates manual feature separation combined with parallel processing:

  • case4_orig: Extracts morphology and intensity features (discretization-independent) — computed once per case
  • case4_fbn_* / case4_fbs_*: Extract only texture, histogram, and IVH features (discretization-dependent)

This approach ensures discretization-independent features are never recomputed. See Case 8 for a detailed explanation of this technique.

Full example script

Full example script

from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
import numpy as np
from pictologics import Image, RadiomicsPipeline, load_image, load_and_merge_images
from pictologics.results import format_results, save_results


def collect_segmentation_series_roots(seg_root):
    """Collect all segmentation series subfolders."""
    if not seg_root.exists():
        raise ValueError(f"Folder does not exist: {seg_root}")

    subdirs = sorted([p for p in seg_root.iterdir() if p.is_dir()])
    if not subdirs:
        return [seg_root]

    return subdirs

def build_case4_pipeline():
    """Define the pipeline with manual feature separation for efficiency."""
    pipeline = RadiomicsPipeline()

    # Define standard CT preprocessing
    preprocess_steps = [
        {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
        {"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
        {"step": "filter_outliers", "params": {"sigma": 3.0}},
        {"step": "round_intensities", "params": {}},
        {"step": "keep_largest_component", "params": {"apply_to": "morph"}},
    ]

    # Discretization-independent features (computed once)
    extract_orig = {
        "step": "extract_features",
        "params": {
            "families": ["morphology", "intensity"],
            "include_spatial_intensity": False,
            "include_local_intensity": False,
        },
    }

    # Discretization-dependent features only
    extract_discretized = {
        "step": "extract_features",
        "params": {
            "families": ["texture", "histogram", "ivh"],
            "include_spatial_intensity": False,
            "include_local_intensity": False,
        },
    }

    # Add "orig" config for discretization-independent features
    pipeline.add_config("case4_orig", preprocess_steps + [extract_orig])

    # Add discretisation variants (texture/histogram/ivh only)
    for n_bins in (8, 16, 32):
        pipeline.add_config(
            f"case4_fbn_{n_bins}",
            preprocess_steps + [{"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}}, extract_discretized],
        )

    for bin_width in (8.0, 16.0, 32.0):
        pipeline.add_config(
            f"case4_fbs_{int(bin_width)}",
            preprocess_steps + [{"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}}, extract_discretized],
        )

    config_names = ["case4_orig"] + [f"case4_fbn_{b}" for b in (8, 16, 32)] + [f"case4_fbs_{b}" for b in (8, 16, 32)]
    return pipeline, config_names

def process_case(case_dir, log_dir):
    """Worker function for single case processing."""
    case_path = Path(case_dir)
    subject_id = case_path.name
    image_root = case_path / "Image"
    seg_root = case_path / "Segmentation"

    # Load image recursively
    image = load_image(str(image_root), recursive=True)

    # Load and merge all found segmentations
    seg_roots = collect_segmentation_series_roots(seg_root)

    # load_and_merge_images handles multiple paths, geometry checking, and binarization
    try:
        mask = load_and_merge_images(
            [str(p) for p in seg_roots], 
            reference_image=image, 
            binarize=True, 
            recursive=True
        )
    except ValueError as e:
         raise ValueError(f"Failed to load/merge masks for {subject_id}: {e}")

    # Setup and run pipeline
    pipeline, config_names = build_case4_pipeline()
    pipeline.clear_log()
    results = pipeline.run(image=image, mask=mask, subject_id=subject_id, config_names=config_names)

    # Save results and log
    Path(log_dir).mkdir(parents=True, exist_ok=True)
    pipeline.save_log(str(Path(log_dir) / f"{subject_id}.json"))

    return format_results(
        results,
        fmt="wide",
        meta={
            "subject_id": subject_id,
            "image_root": str(image_root),
            "seg_roots": ";".join(str(p) for p in seg_roots),
        },
    )

def main():
    # Configure paths
    cases_dir = Path("path/to/cases_root")
    output_file = Path("case4_parallel_results.json")
    log_dir = Path("case4_logs")
    log_dir.mkdir(parents=True, exist_ok=True)

    # Start parallel processing
    n_jobs = 4
    case_dirs = sorted([p for p in cases_dir.iterdir() if p.is_dir()])
    if not case_dirs:
        raise ValueError(f"No case folders found in: {cases_dir}")

    from tqdm import tqdm
    rows = []
    errors = []

    # Map each case to the worker function
    with ProcessPoolExecutor(max_workers=n_jobs) as executor:
        futures = {
            executor.submit(process_case, str(case_dir), str(log_dir)): case_dir
            for case_dir in case_dirs
        }

        with tqdm(total=len(futures), desc="Radiomics (parallel cases)", unit="case") as pbar:
            for fut in as_completed(futures):
                case_dir = futures[fut]
                try:
                    rows.append(fut.result())
                except Exception as e:
                    errors.append((str(case_dir), repr(e)))
                finally:
                    pbar.update(1)

    if errors:
        msg = "\n".join(f"- {case}: {err}" for case, err in errors)
        raise RuntimeError(f"One or more cases failed:\n{msg}")

    # Final data export
    save_results(rows, output_file)
    print(f"Wrote {len(rows)} cases to {output_file}")

if __name__ == "__main__":
    main()

Case 5: Batch radiomics from DICOM SEG files with multiple segments

Scenario

You have a folder of cases. Each case contains:

  • Image/: the DICOM image series (CT/MR/etc.)
  • segmentation.dcm: a DICOM SEG file with multiple labeled segments (e.g., liver, spleen, kidneys)

You want to:

  • Load the image and inspect the available segments in the SEG file.
  • Process each segment separately to get per-organ radiomics.
  • Apply standard preprocessing and compute features for each segment.
  • Export results with segment labels in the output.

Notes

Note

  • get_segment_info() returns metadata about each segment (number, label, algorithm).
  • load_seg() with combine_segments=False returns a dict mapping segment numbers to Image objects.
  • Alignment: Use reference_image to ensure the SEG mask matches the CT geometry.

Full example script

Full example script

from pathlib import Path
from pictologics import load_image, load_seg, RadiomicsPipeline
from pictologics.loaders import get_segment_info
from pictologics.results import format_results, save_results


def main():
    # Configure paths
    cases_dir = Path("path/to/cases_root")
    output_csv = Path("case5_per_segment_results.csv")
    log_dir = Path("case5_logs")
    log_dir.mkdir(parents=True, exist_ok=True)

    # Find all case directories
    case_dirs = sorted([p for p in cases_dir.iterdir() if p.is_dir()])
    if not case_dirs:
        raise ValueError(f"No case folders found in: {cases_dir}")

    # Initialize pipeline with standard config
    pipeline = RadiomicsPipeline()
    pipeline.add_config(
        "case5_fbn_32",
        [
            {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
            {"step": "discretise", "params": {"method": "FBN", "n_bins": 32}},
            {
                "step": "extract_features",
                "params": {
                    "families": ["intensity", "morphology", "texture", "histogram"],
                    "include_spatial_intensity": False,
                    "include_local_intensity": False,
                },
            },
        ],
    )

    from tqdm import tqdm
    rows = []

    for case_dir in tqdm(case_dirs, desc="Radiomics (per-segment)", unit="case"):
        subject_id = case_dir.name
        image_root = case_dir / "Image"
        seg_file = case_dir / "segmentation.dcm"

        # Load the reference image
        image = load_image(str(image_root), recursive=True)

        # Inspect available segments
        segments = get_segment_info(str(seg_file))
        print(f"\n{subject_id}: Found {len(segments)} segments")
        for seg in segments:
            print(f"  Segment {seg['segment_number']}: {seg['segment_label']}")

        # Load each segment separately, aligned to image geometry
        segment_masks = load_seg(
            str(seg_file),
            combine_segments=False,  # Returns dict {seg_num: Image}
            reference_image=image
        )

        # Process each segment
        for seg_num, mask in segment_masks.items():
            # Find segment label from metadata
            seg_info = next(s for s in segments if s["segment_number"] == seg_num)
            seg_label = seg_info["segment_label"]

            pipeline.clear_log()
            results = pipeline.run(
                image=image,
                mask=mask,
                subject_id=f"{subject_id}_{seg_label}",
                config_names=["case5_fbn_32"],
            )

            row = format_results(
                results,
                fmt="wide",
                meta={
                    "subject_id": subject_id,
                    "segment_number": seg_num,
                    "segment_label": seg_label,
                },
            )
            rows.append(row)

            # Save per-segment log
            pipeline.save_log(str(log_dir / f"{subject_id}_{seg_label}.json"))

    # Export all results
    save_results(rows, output_csv)
    print(f"\nWrote {len(rows)} rows to {output_csv}")


if __name__ == "__main__":
    main()

Output format

  • One row per segment per case.
  • Columns include:
  • subject_id - Case identifier
  • segment_number - Numeric segment ID from SEG file
  • segment_label - Human-readable segment name (e.g., "Liver", "Spleen")
  • Feature columns prefixed by configuration name

Case 6: Filtered radiomics using IBSI 2 filters

Scenario

You want to extract radiomic features from filtered response maps (IBSI 2 paradigm). This is useful for:

  • Capturing texture at multiple scales (LoG with different σ values).
  • Extracting directional texture patterns (Gabor filters).
  • Multi-resolution analysis (wavelet decomposition).

You want to:

  • Apply multiple filters to each image.
  • Extract intensity features from each filtered response map.
  • Compare features across filter types and parameters.

Key concepts

  • Filter step before feature extraction: Apply filter after preprocessing but before extract_features.
  • Intensity features from filtered images: First-order statistics (mean, variance, skewness) capture texture properties at different scales.
  • Morphology is filter-independent: Morphology features depend only on mask geometry, not intensity values. Compute morphology once from the original (unfiltered) image and reuse across all filter configurations.
  • No discretisation for filtered images: IBSI 2 Phase 2 recommends intensity features from continuous filtered values.

Full example script

Full example script

from pathlib import Path
from pictologics import RadiomicsPipeline
from pictologics.results import format_results, save_results


def main():
    # Configure paths
    image_path = Path("path/to/image.nii.gz")
    mask_path = Path("path/to/mask.nii.gz")
    output_csv = Path("filtered_radiomics.csv")

    # Initialize pipeline
    pipeline = RadiomicsPipeline()

    # IBSI 2 Phase 2 preprocessing (Config B)
    preprocess_steps = [
        {"step": "resample", "params": {
            "new_spacing": (1.0, 1.0, 1.0), 
            "interpolation": "cubic"
        }},
        {"step": "round_intensities", "params": {}},
        {"step": "resegment", "params": {"range_min": -1000, "range_max": 400}},
    ]

    # --- Configuration: Morphology from original image (computed once) ---
    # Morphology features depend on mask geometry, NOT intensity values,
    # so they are identical regardless of which filter is applied.
    pipeline.add_config(
        "orig",
        preprocess_steps + [
            {"step": "extract_features", "params": {"families": ["morphology"]}},
        ],
    )

    # Feature extraction (intensity only for filtered images)
    extract_intensity = {
        "step": "extract_features",
        "params": {"families": ["intensity"]},
    }

    # Collect filter config names
    filter_configs = []

    # --- Configuration 1: LoG at multiple scales ---
    for sigma in [1.5, 3.0, 5.0]:
        name = f"log_sigma_{sigma}"
        pipeline.add_config(
            name,
            preprocess_steps + [
                {"step": "filter", "params": {
                    "type": "log",
                    "sigma_mm": sigma,
                    "truncate": 4.0,
                }},
                extract_intensity,
            ],
        )
        filter_configs.append(name)

    # --- Configuration 2: Gabor filter ---
    name = "gabor_5mm"
    pipeline.add_config(
        name,
        preprocess_steps + [
            {"step": "filter", "params": {
                "type": "gabor",
                "sigma_mm": 5.0,
                "lambda_mm": 2.0,
                "gamma": 1.5,
                "rotation_invariant": True,
                "pooling": "average",
            }},
            extract_intensity,
        ],
    )
    filter_configs.append(name)

    # --- Configuration 3: Wavelet decomposition ---
    for decomp in ["LLH", "HHL", "HHH"]:
        name = f"wavelet_{decomp}"
        pipeline.add_config(
            name,
            preprocess_steps + [
                {"step": "filter", "params": {
                    "type": "wavelet",
                    "wavelet": "db3",
                    "level": 1,
                    "decomposition": decomp,
                    "rotation_invariant": True,
                    "pooling": "average",
                }},
                extract_intensity,
            ],
        )
        filter_configs.append(name)

    # --- Configuration 4: Laws texture energy ---
    name = "laws_L5E5E5"
    pipeline.add_config(
        name,
        preprocess_steps + [
            {"step": "filter", "params": {
                "type": "laws",
                "kernel": "L5E5E5",
                "rotation_invariant": True,
                "pooling": "max",
                "compute_energy": True,
                "energy_distance": 7,
            }},
            extract_intensity,
        ],
    )
    filter_configs.append(name)

    # Run pipeline - include "orig" for morphology + all filter configs for intensity
    all_configs = ["orig"] + filter_configs
    results = pipeline.run(
        image=str(image_path),
        mask=str(mask_path),
        subject_id="subject_001",
        config_names=all_configs,
    )

    # Format and save
    row = format_results(
        results,
        fmt="wide",
        meta={"subject_id": "subject_001"},
    )
    save_results([row], output_csv)
    print(f"Saved filtered radiomics to {output_csv}")

    # Display key features for each filter
    print("\nMean values from each filter:")
    for config in filter_configs:
        mean_val = results[config].get("mean_intensity_Q4LE", "N/A")
        print(f"  {config}: {mean_val:.4f}" if isinstance(mean_val, float) else f"  {config}: {mean_val}")


if __name__ == "__main__":
    main()

Output format

  • One row per subject with features from each filter as separate columns.
  • Column names use the pattern {filter_config}__mean_intensity_Q4LE (e.g., log_sigma_1.5__mean_intensity_Q4LE).

Combining with standard texture features

You can also run both filtered and standard texture configs together:

# Standard texture config (unfiltered)
pipeline.add_config("standard_fbn_32", [
    {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
    {"step": "discretise", "params": {"method": "FBN", "n_bins": 32}},
    {"step": "extract_features", "params": {"families": ["texture"]}},
])

# Run all configs together
all_configs = filter_configs + ["standard_fbn_32"]
results = pipeline.run(image, mask, config_names=all_configs)

Output Options

For details on result formatting (wide vs long, output_type options) and export functions, see the Pipeline & Preprocessing - Working with Results.

Case 7: Multi-configuration batch with deduplication

Scenario

You want to run a comprehensive radiomic analysis using multiple discretization strategies (e.g., FBN with 8/16/32 bins and FBS with 8/16/32 bin widths) while maximizing performance. All configurations share the same preprocessing steps.

Deduplication is Enabled by Default

Starting with pictologics v0.3.2, deduplication is enabled by default (deduplicate=True). You don't need to explicitly enable it—just create a pipeline and run multiple configurations to benefit from automatic optimization.

You want to:

  • Apply standard preprocessing (resample, resegment, filter outliers, keep largest component).
  • Run 6 discretization strategies for comprehensive coverage.
  • Optimize performance by avoiding redundant computation of features that don't depend on discretization.
  • Track and report deduplication statistics to understand the performance benefit.

Key concepts

The deduplication system understands which feature families depend on which preprocessing steps:

Feature Family Depends On
Morphology Resampling, mask operations (resegment, keep_largest_component, filter_outliers)
Intensity Resampling, intensity preprocessing (resegment, filter_outliers, round_intensities)
Texture / Histogram All of the above plus discretization
IVH Depends on IVH-specific settings (ivh_use_continuous, ivh_discretisation)

When configs share preprocessing but differ only in discretization:

  • Morphology and intensity features are computed once and reused.
  • Texture, histogram, and (typically) IVH are computed per configuration.

How Results Are Handled

Complete Results for All Configurations

When deduplication reuses features, they are copied into the results dictionary—never empty or missing. Every configuration returns a complete feature set with identical values for reused families.

What happens Details
First config All feature families are computed and stored in cache
Subsequent configs Matching families are copied from cache; differing families are computed fresh
Final results Every config has complete features—no NaN values, no missing columns

This means when you concatenate results into a DataFrame for machine learning or statistical analysis, all rows are complete:

import pandas as pd

# Each config has complete results
df = pd.DataFrame([results[cfg] for cfg in config_names])
print(df.isna().sum().sum())  # 0 - no missing values!

This can reduce total computation time significantly—especially for morphology features which involve expensive surface area calculations.

Full example script

Full example script

from pathlib import Path
import time

from pictologics import RadiomicsPipeline
from pictologics.results import format_results, save_results


def main():
    # Configure paths
    image_path = Path("path/to/image.nii.gz")
    mask_path = Path("path/to/mask.nii.gz")
    output_csv = Path("case7_dedup_results.csv")

    # Define shared preprocessing steps
    preprocess_steps = [
        {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
        {"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
        {"step": "filter_outliers", "params": {"sigma": 3.0}},
        {"step": "round_intensities", "params": {}},
        {"step": "keep_largest_component", "params": {"apply_to": "morph"}},
    ]

    # Shared feature extraction settings
    extract_all = {
        "step": "extract_features",
        "params": {
            "families": ["intensity", "morphology", "texture", "histogram", "ivh"],
            "include_spatial_intensity": False,
            "include_local_intensity": False,
        },
    }

    # ========================================
    # Run WITH deduplication (enabled by default)
    # ========================================
    pipeline_dedup = RadiomicsPipeline()  # deduplicate=True is the default!

    # Add 6 configurations (only discretisation differs)
    for n_bins in (8, 16, 32):
        pipeline_dedup.add_config(
            f"fbn_{n_bins}",
            preprocess_steps + [
                {"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
                extract_all,
            ],
        )

    for bin_width in (8.0, 16.0, 32.0):
        pipeline_dedup.add_config(
            f"fbs_{int(bin_width)}",
            preprocess_steps + [
                {"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}},
                extract_all,
            ],
        )

    config_names = ["fbn_8", "fbn_16", "fbn_32", "fbs_8", "fbs_16", "fbs_32"]

    # Run with deduplication
    start = time.perf_counter()
    results = pipeline_dedup.run(
        image=str(image_path),
        mask=str(mask_path),
        subject_id="subject_001",
        config_names=config_names,
    )
    elapsed_dedup = time.perf_counter() - start

    # ========================================
    # Check deduplication statistics
    # ========================================
    stats = pipeline_dedup.deduplication_stats
    print("\n=== Deduplication Statistics ===")
    print(f"Reused feature families: {stats['reused_families']}")
    print(f"Computed feature families: {stats['computed_families']}")
    print(f"Cache hit rate: {stats['cache_hit_rate']:.1%}")
    print(f"Time with deduplication: {elapsed_dedup:.2f}s")

    # ========================================
    # Format and save results
    # ========================================
    row = format_results(
        results,
        fmt="wide",
        meta={"subject_id": "subject_001"},
    )
    save_results([row], output_csv)
    print(f"\nSaved results to {output_csv}")

    # Verify all configs have the same morphology features (computed once, reused)
    ref_volume = results["fbn_8"].get("volume_mesh_ml_HTUR", None)
    for config in config_names[1:]:
        vol = results[config].get("volume_mesh_ml_HTUR", None)
        assert vol == ref_volume, f"Volume mismatch for {config}"
    print("✓ Morphology features identical across all configurations (as expected)")


if __name__ == "__main__":
    main()

Expected output

When running 6 configurations with shared preprocessing:

=== Deduplication Statistics ===
Reused feature families: 10
Computed feature families: 20
Cache hit rate: 33.3%
Time with deduplication: 12.34s

Saved results to case7_dedup_results.csv
✓ Morphology features identical across all configurations (as expected)

The cache hit rate reflects that:

  • Morphology (1 family) is computed once, reused 5 times → 5 cache hits
  • Intensity (1 family) is computed once, reused 5 times → 5 cache hits
  • Texture (6 subfamilies × 6 configs = 36) computed fresh each time
  • Histogram (1 × 6 configs) computed fresh each time
  • IVH (1 × 6 configs) computed fresh each time

The actual speedup depends on your data, but morphology calculations (especially surface area) are often the most expensive, so avoiding 5 redundant morphology computations can provide significant time savings.

Configuration options

Deduplication is enabled by default, but you can disable or customize it:

# Default behavior - deduplication enabled (no explicit setting needed)
pipeline = RadiomicsPipeline()

# Disable deduplication (features computed independently for each config)
pipeline_no_dedup = RadiomicsPipeline(deduplicate=False)

# Lock to a specific rules version for reproducibility
pipeline_versioned = RadiomicsPipeline(deduplication_rules="1.0.0")

# Access deduplication configuration
print(f"Deduplication enabled: {pipeline.deduplication_enabled}")
print(f"Rules version: {pipeline.deduplication_rules.version}")

When to disable deduplication

In most cases, you should leave deduplication enabled (the default). However, you might disable it if:

Scenario Reason
Debugging To verify that features are computed correctly for each config
Memory constraints Caching results consumes memory during the run
Single config No benefit when running only one configuration

Serialization

Deduplication settings are preserved when exporting/importing pipeline configurations:

# Export includes deduplication settings
pipeline.save_configs("my_pipeline.yaml")

# Import preserves settings
loaded = RadiomicsPipeline.load_configs("my_pipeline.yaml")
print(f"Loaded deduplicate setting: {loaded.deduplication_enabled}")

Learn More

For detailed documentation of the deduplication system, including the underlying classes and how feature family dependencies are determined, see the Deduplication section in the Pipeline guide and the Deduplication API reference.

Case 8: Manual Feature Separation (Advanced)

Case 7 vs Case 8

  • Case 7 uses automatic deduplication (enabled by default) where the pipeline internally optimizes computation and copies results to matching configs
  • Case 8 demonstrates manual feature separation where YOU explicitly control which features are computed in which configs

For most users, automatic deduplication (Case 7) is simpler and sufficient. Manual separation gives you maximum control and transparency.

Scenario

You want complete control over which features are computed for which configurations, avoiding any redundant computation. This is an alternative to automatic deduplication that gives you explicit control over your pipeline structure.

This approach is ideal when:

  • You want clear separation between "baseline" features and "variant" features.
  • You're combining multiple filters with multiple discretization strategies.
  • You need to ensure morphology is computed exactly once across all configurations.

The Core Concept

Different feature families have different dependencies:

Feature Family Depends On Independent Of
Morphology Mask geometry (resample, binarize_mask, keep_largest_component) Intensity values, filters, discretization
Intensity Intensity preprocessing (resample, resegment, filter_outliers, filter) Discretization
Texture, Histogram, IVH All of the above + discretization

This means:

  1. Morphology can be computed once and reused across all configurations (including filtered ones).
  2. Intensity can be computed once per unique preprocessing+filter combination.
  3. Texture/Histogram/IVH must be computed for each discretization strategy.

Example 1: Standard Radiomics with Manual Separation

from pictologics import RadiomicsPipeline

# Common preprocessing
preprocess = [
    {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
    {"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
    {"step": "keep_largest_component", "params": {"apply_to": "morph"}},
]

pipeline = RadiomicsPipeline()

# Config 1: "orig" - morphology + intensity (computed ONCE)
pipeline.add_config("orig", preprocess + [
    {"step": "extract_features", "params": {
        "families": ["morphology", "intensity"],
    }},
])

# Configs 2-7: Discretization-dependent features only
for n_bins in (8, 16, 32):
    pipeline.add_config(f"fbn_{n_bins}", preprocess + [
        {"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
        {"step": "extract_features", "params": {
            "families": ["texture", "histogram", "ivh"],  # NO morphology/intensity
        }},
    ])

for bin_width in (8.0, 16.0, 32.0):
    pipeline.add_config(f"fbs_{int(bin_width)}", preprocess + [
        {"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}},
        {"step": "extract_features", "params": {
            "families": ["texture", "histogram", "ivh"],
        }},
    ])

# Run all configs
results = pipeline.run(
    image="path/to/image.nii.gz",
    mask="path/to/mask.nii.gz",
    config_names=["orig", "fbn_8", "fbn_16", "fbn_32", "fbs_8", "fbs_16", "fbs_32"],
)

# Results structure:
# - results["orig"] contains morphology + intensity (applies to all)
# - results["fbn_8"] contains texture/histogram/ivh for FBN with 8 bins
# - etc.

Example 2: Filtered Radiomics with Manual Separation

When combining filters, remember:

  • Morphology is filter-independent (computed from mask geometry)
  • Intensity is filter-dependent (computed from filtered response map)
from pictologics import RadiomicsPipeline

preprocess = [
    {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
    {"step": "resegment", "params": {"range_min": -1000, "range_max": 400}},
]

pipeline = RadiomicsPipeline()

# Config 1: Morphology from original image (computed ONCE, reused for all filters)
pipeline.add_config("orig_morphology", preprocess + [
    {"step": "extract_features", "params": {"families": ["morphology"]}},
])

# Config 2: Intensity from original (unfiltered) image
pipeline.add_config("orig_intensity", preprocess + [
    {"step": "extract_features", "params": {"families": ["intensity"]}},
])

# Configs 3+: Intensity from each filtered response map
for sigma in [1.5, 3.0, 5.0]:
    pipeline.add_config(f"log_{sigma}", preprocess + [
        {"step": "filter", "params": {"type": "log", "sigma_mm": sigma}},
        {"step": "extract_features", "params": {"families": ["intensity"]}},
    ])

pipeline.add_config("gabor", preprocess + [
    {"step": "filter", "params": {
        "type": "gabor", "sigma_mm": 5.0, "lambda_mm": 2.0, "gamma": 1.0,
        "rotation_invariant": True, "pooling": "average",
    }},
    {"step": "extract_features", "params": {"families": ["intensity"]}},
])

# Run all
results = pipeline.run(
    image="path/to/image.nii.gz",
    mask="path/to/mask.nii.gz",
)

# Results structure:
# - results["orig_morphology"] contains morphology (applies to ALL configs)
# - results["orig_intensity"] contains intensity from original image
# - results["log_1.5"] contains intensity from LoG σ=1.5 filtered image
# - etc.

Example 3: Combined Filters + Discretization

The most complex scenario: multiple filters AND multiple discretization strategies.

from pictologics import RadiomicsPipeline

preprocess = [
    {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
    {"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
]

pipeline = RadiomicsPipeline()

# === TIER 1: Morphology (computed ONCE) ===
pipeline.add_config("orig_morphology", preprocess + [
    {"step": "extract_features", "params": {"families": ["morphology"]}},
])

# === TIER 2: Intensity per filter ===
# Original (unfiltered)
pipeline.add_config("orig_intensity", preprocess + [
    {"step": "extract_features", "params": {"families": ["intensity"]}},
])

# LoG filters
for sigma in [1.5, 3.0]:
    pipeline.add_config(f"log_{sigma}_intensity", preprocess + [
        {"step": "filter", "params": {"type": "log", "sigma_mm": sigma}},
        {"step": "extract_features", "params": {"families": ["intensity"]}},
    ])

# === TIER 3: Texture per discretization (original image only) ===
for n_bins in (16, 32):
    pipeline.add_config(f"orig_fbn_{n_bins}", preprocess + [
        {"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
        {"step": "extract_features", "params": {"families": ["texture", "histogram"]}},
    ])

# Note: Texture from filtered images would require discretization of the filtered
# response map, which is less common but can be added similarly if needed.

Comparison: Manual vs Automatic Deduplication

Aspect Manual Separation Automatic Deduplication
Setup complexity More config definitions Simpler (one config per variant)
Result structure Features split across configs All features in each config
Memory usage Lower (no caching) Higher (cached results)
Transparency Explicit control Magic happens internally
Filter handling You control morphology placement Morphology auto-shared

When to Use Manual Separation

Recommended For

  • Complex filter + discretization combinations
  • Maximum memory efficiency
  • When you want explicit control over computation
  • Research scenarios requiring clear provenance

Consider Automatic Deduplication Instead

  • Simple discretization-only scenarios (automatic dedup handles this well)
  • When you want all features in each result dictionary
  • Rapid prototyping