Cookbook
Practical, end-to-end recipes for common Pictologics workflows. Each recipe shows how to combine loading, preprocessing, feature extraction, and result export into reusable scripts.
Share Your Configurations
All custom configurations used in these recipes can be exported to YAML/JSON files for reproducibility.
Use pipeline.save_configs("my_configs.yaml") to save and share configurations.
See the Configuration & Reproducibility guide for details.
Case 1: Batch radiomics from a folder of NIfTI files (no masks)
Scenario
You have a folder of NIfTI volumes where each file is a separate pre-cropped lesion export. There are no separate mask files — voxels outside the region of interest may be filled with a sentinel value (e.g., -2048 HU).
You want to:
- Process every
*.nii/*.nii.gzfile in a folder. - Use the whole image as the initial ROI (because no mask is provided).
- Automatically detect and handle sentinel values in each image.
- Use sentinel-aware resampling and filtering (masked interpolation / normalized convolution).
- Resample to 0.5×0.5×0.5 mm.
- Restrict the ROI to intensities in [-100, 3000] (CT HU example).
- Remove disjoint parts by keeping the largest connected component.
- Compute all radiomic feature families for four discretisation settings:
- FBN with
n_bins=8 - FBN with
n_bins=16 - FBS with
bin_width=8 - FBS with
bin_width=16 - Export a single wide CSV where:
- Each row is one input NIfTI file.
- Columns include metadata (e.g., filename) + all computed features.
- Save pipeline logs to a separate folder.
Important notes
Note
- Maskless pipeline runs:
RadiomicsPipeline.run(...)acceptsmask=None,mask="", or an omitted mask argument. In that case, Pictologics generates a full (all-ones) ROI mask internally (whole-image ROI). - Empty ROI is an error: If your preprocessing removes all voxels (e.g., a too-tight HU range), the pipeline raises a clear error instead of silently returning empty outputs.
- Morphology on whole-image ROI: Shape features describe the shape of the ROI mask.
With a maskless run, the ROI starts as the full image volume. The
resegmentstep then restricts the ROI to voxels within a valid intensity range, removing sentinel-valued voxels. After resegmentation and connected-component filtering, the resulting morphology features describe the shape of the actual data region (e.g., the lesion) rather than the full image bounding box. This is generally the desired behavior, but verify that the derived ROI matches your scientific intent. -
Sentinel value handling — two complementary mechanisms: These NIfTI files may contain sentinel values (e.g., -1024, -2048) marking missing or invalid data. Handling them properly requires both of the following:
source_mode="auto"creates a source mask that protects resampling and filtering only. It applies masked interpolation (resampling ignores sentinel neighbors) and normalized convolution (filter kernels exclude sentinel voxels). However, it does not change the ROI — sentinel voxels still remain in the region used for feature extraction.resegmentrestricts the ROI to a valid intensity range, effectively excluding sentinel voxels from feature extraction. Without this step, sentinel values would bias intensity statistics, texture matrices, and all other computed features.
You typically need both:
source_mode="auto"to protect preprocessing, andresegmentto define the correct ROI for analysis. -
Automatic sentinel detection: With
source_mode="auto", the pipeline scans for common sentinel values (-2048, -1024, -1000, 0, -32768) and emits aUserWarningif one is found, so you can verify which value was detected in each image. - Mask generation: When
maskis omitted, the pipeline generates a full ROI mask. Theresegmentstep then removes sentinel voxels from the ROI, effectively deriving the correct analysis mask automatically from the non-sentinel voxels.
Performance Tip: Manual Feature Separation
This example demonstrates manual feature separation for maximum efficiency:
case1_orig: Extracts morphology and intensity features (discretization-independent) — computed oncecase1_fbn_*/case1_fbs_*: Extract only texture, histogram, and IVH features (discretization-dependent)
This approach avoids redundant computation entirely. The orig results contain morphology/intensity
that apply to all discretization strategies, while each discretization config contains only the
features that actually depend on it. See Case 8 for
a detailed explanation of this technique.
Note that you don't have to do this manually — the pipeline's automatic deduplication can detect shared preprocessing steps across configurations and avoid recomputing them for you. This is demonstrated in Case 7.
Full example script
Full example script
from pathlib import Path
from pictologics import RadiomicsPipeline
from pictologics.results import format_results, save_results
def main():
# Configure paths
input_dir = Path("path/to/nifti_folder")
output_csv = Path("results.csv")
log_dir = Path("logs")
log_dir.mkdir(parents=True, exist_ok=True)
# Define common preprocessing steps
base_steps = [
# Resample to 0.5mm isotropic. Round intensities to integers (useful for HU).
{"step": "resample", "params": {"new_spacing": (0.5, 0.5, 0.5), "round_intensities": True}},
# Exclude voxels outside standard HU range
{"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
# Keep only the largest connected component of the ROI mask
{"step": "keep_largest_component", "params": {"apply_to": "morph"}},
]
# Shared feature extraction for discretization-independent features
extract_orig = {
"step": "extract_features",
"params": {
"families": ["morphology", "intensity"], # Not affected by discretization
"include_spatial_intensity": False,
"include_local_intensity": False,
},
}
# Shared feature extraction for discretization-dependent features
extract_discretized = {
"step": "extract_features",
"params": {
"families": ["texture", "histogram", "ivh"], # Require discretization
"include_spatial_intensity": False,
"include_local_intensity": False,
},
}
# Initialize the pipeline (deduplication enabled by default)
pipeline = RadiomicsPipeline()
# Add "orig" config for discretization-independent features (computed once)
# source_mode="auto" detects sentinel values and protects resampling/filtering
pipeline.add_config(
"case1_orig",
base_steps + [extract_orig],
source_mode="auto",
)
# Add discretization-dependent configs (texture/histogram/ivh only)
for n_bins in (8, 16):
pipeline.add_config(
f"case1_fbn_{n_bins}",
base_steps + [
{"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
extract_discretized,
],
source_mode="auto",
)
for bin_width in (8, 16):
pipeline.add_config(
f"case1_fbs_{bin_width}",
base_steps + [
{"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}},
extract_discretized,
],
source_mode="auto",
)
# Prepare for batch processing
rows = []
nifti_paths = sorted(input_dir.glob("*.nii*"))
if not nifti_paths:
raise ValueError(f"No NIfTI files found in: {input_dir}")
# Process each NIfTI file
for path in nifti_paths:
# Simple suffix stripping using Python 3.9+ removesuffix
subject_id = path.name.removesuffix(".nii.gz").removesuffix(".nii")
pipeline.clear_log()
# Run pipeline (mask omitted -> whole-image ROI)
# "case1_orig" contains morphology/intensity, others contain texture/histogram/ivh
results = pipeline.run(
image=str(path),
subject_id=subject_id,
config_names=["case1_orig", "case1_fbn_8", "case1_fbn_16", "case1_fbs_8", "case1_fbs_16"],
)
# Format results as flat dictionary and add metadata
row = format_results(
results,
fmt="wide",
meta={"subject_id": subject_id, "file": str(path)}
)
rows.append(row)
# Save per-case logs
pipeline.save_log(str(log_dir / f"{subject_id}.json"))
# Consolidated export of all results
save_results(rows, output_csv)
print(f"Wrote {len(rows)} rows to {output_csv}")
if __name__ == "__main__":
main()
Output format
- One row per file.
- Columns include:
subject_idfile- Feature columns prefixed by configuration name (e.g.,
case1_fbn_8__mean_intensity_Q4LE).
Alternative: Explicit Sentinel Value
If you know the sentinel value in advance, use source_mode="roi_only" with sentinel_value for deterministic behavior:
Case 2: Batch radiomics from DICOM case folders (Image + Segmentation)
Scenario
You have a folder of cases. Each case is a separate folder and contains two subfolders:
Image/: the DICOM image series (CT/MR/etc.), stored at an arbitrary depth.Segmentation/: the DICOM segmentation, also stored at an arbitrary depth.
You want to:
- For each case, recursively discover the DICOM series folders.
- Load the image series and segmentation series.
- Convert the segmentation to a binary ROI mask by keeping all voxels where the segmentation value is
> 0(handled automatically during loading). - Resample to 1×1×1 mm.
- Do not apply intensity filtering/resegmentation, and do not keep the largest connected component.
- Compute all radiomic feature families for two discretisations:
- FBS with
bin_width=256 - FBN with
n_bins=64 - Export a long-format CSV (tidy data, one row per feature).
- Save pipeline logs into a separate folder (one JSON per case).
- Show a progress bar during batch processing.
Notes
Note
- Progress bar dependency: This example uses
tqdm.- If you are running this outside the library repo, install it with
pip install tqdm. - If you are adding it to your Poetry-managed project, use
poetry add tqdm.
- If you are running this outside the library repo, install it with
- Segmentation DICOM at arbitrary depth: The helper
_find_dicom_series_root(...)looks for the subfolder with the most.dcmfiles and uses that as the series root. - Multiple masks in one SEG: If your segmentation DICOM encodes multiple labels (e.g., values 1..N), the
_binarize_segmentation_mask(...)step turns it into a single ROI by keeping all voxels where the value is> 0. - Multi-phase DICOM series: If your image series contains multiple phases (e.g., cardiac CT with 10%, 20%... phases),
use
get_dicom_phases()to discover available phases anddataset_indexto select one:
Full example script
Full example script
from pathlib import Path
import numpy as np
from pictologics import Image, RadiomicsPipeline, load_image, load_and_merge_images
from pictologics.results import format_results, save_results
def main():
# Configure paths
cases_dir = Path("path/to/cases_root")
output_csv = Path("dicom_results_long.csv")
log_dir = Path("dicom_logs")
log_dir.mkdir(parents=True, exist_ok=True)
# Find all case directories
case_dirs = sorted([p for p in cases_dir.iterdir() if p.is_dir()])
if not case_dirs:
raise ValueError(f"No case folders found in: {cases_dir}")
# Shared feature extraction settings
extract_all = {
"step": "extract_features",
"params": {
"families": ["intensity", "morphology", "texture", "histogram", "ivh"],
"include_spatial_intensity": False,
"include_local_intensity": False,
},
}
# Initialize the pipeline
pipeline = RadiomicsPipeline()
# Configuration A: Fixed Bin Size
pipeline.add_config(
"case2_fbs_256",
[
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "discretise", "params": {"method": "FBS", "bin_width": 256.0}},
extract_all,
],
)
# Configuration B: Fixed Bin Number
pipeline.add_config(
"case2_fbn_64",
[
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "discretise", "params": {"method": "FBN", "n_bins": 64}},
extract_all,
],
)
# Use tqdm for a progress bar
from tqdm import tqdm
rows = []
for case_dir in tqdm(case_dirs, desc="Radiomics (DICOM cases)", unit="case"):
subject_id = case_dir.name
image_root = case_dir / "Image"
seg_root = case_dir / "Segmentation"
# load_image with recursive=True finds the best series folder automatically
image = load_image(str(image_root), recursive=True)
# Load the segmentation using load_and_merge_images with binarize=True
# recursive=True ensures we find the DICOM series inside the Segmentation folder
mask = load_and_merge_images(
[str(seg_root)],
binarize=True,
recursive=True
)
pipeline.clear_log()
# Execute extraction for both configurations
results = pipeline.run(
image=image,
mask=mask,
subject_id=subject_id,
config_names=["case2_fbs_256", "case2_fbn_64"],
)
# Format results and store
row = format_results(
results,
fmt="long", # Tidy format: [subject_id, config, feature_name, value]
meta={
"subject_id": subject_id,
"image_root": str(image_root),
"seg_root": str(seg_root),
},
)
rows.append(row)
# Save per-case logs
pipeline.save_log(str(log_dir / f"{subject_id}.json"))
# Final export
save_results(rows, output_csv)
print(f"Wrote {len(rows)} rows to {output_csv}")
if __name__ == "__main__":
main()
Case 3: Batch radiomics from a flat NIfTI folder (multiple masks per image)
Scenario
You have a single large folder containing both images and masks as NIfTI files.
- Image files end with
_IMG(e.g.,CASE001_IMG.nii.gz). - Mask files end with
MASKnwherenis a number (e.g.,CASE001_MASK1.nii.gz,CASE001_MASK2.nii.gz). - There can be multiple segmentation masks per image.
You want to:
- For each image, automatically find all its corresponding masks.
- Merge all masks into a single ROI using
load_and_merge_images(...). - Do not apply any preprocessing (no resampling, no thresholding, no connected components).
- Compute radiomics for the six standard discretisations (FBN 8/16/32 and FBS 8/16/32).
- Export a long-format JSON file for easy ingestion into NoSQL databases or web apps.
- Save logs for each case into a separate folder.
- Show a progress bar during batch processing.
Tip
Flexible Image Loading
The load_and_merge_images function uses load_image for each path in its input list. This means you can merge:
- Multiple NIfTI files (as shown here).
- DICOM series folders.
- Single DICOM slice files.
- Or a mix of these formats, provided they share the same spatial geometry.
You can also pass recursive=True (to search subfolders) or dataset_index=N (for 4D files) to control how each mask is loaded.
Performance Tip: Deduplication
This example runs 6 configurations where only discretisation differs. With deduplication enabled
by default, morphology and intensity features are computed once and reused across all configurations,
providing significant speedup. To disable this optimization, set deduplicate=False.
See Deduplication for details.
Full example script
Full example script
from pathlib import Path
import numpy as np
from pictologics import Image, RadiomicsPipeline, load_and_merge_images, load_image
from pictologics.results import format_results, save_results
def main():
# Configure paths
input_dir = Path("path/to/nifti_folder")
output_file = Path("case3_results_long.json")
log_dir = Path("case3_logs")
log_dir.mkdir(parents=True, exist_ok=True)
# Use tqdm for a progress bar
from tqdm import tqdm
# Identify all images (files ending in _IMG)
image_paths = sorted(input_dir.glob("*_IMG.nii*"))
if not image_paths:
raise ValueError(f"No *_IMG NIfTI files found in: {input_dir}")
# Initialize the pipeline (deduplication is enabled by default)
# Since all 6 configs share identical preprocessing (only discretisation differs),
# morphology and intensity features are computed once and reused across all configs
pipeline = RadiomicsPipeline()
# Shared feature extraction settings
extract_all = {
"step": "extract_features",
"params": {
"families": ["intensity", "morphology", "texture", "histogram", "ivh"],
"include_spatial_intensity": False,
"include_local_intensity": False,
},
}
# Add 6 target configurations (no preprocessing requested)
for n_bins in (8, 16, 32):
pipeline.add_config(
f"case3_fbn_{n_bins}",
[
{"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
extract_all,
],
)
for bin_width in (8.0, 16.0, 32.0):
pipeline.add_config(
f"case3_fbs_{int(bin_width)}",
[
{"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}},
extract_all,
],
)
target_configs = [
"case3_fbn_8", "case3_fbn_16", "case3_fbn_32",
"case3_fbs_8", "case3_fbs_16", "case3_fbs_32",
]
# Process each image and its associated masks
rows = []
for img_path in tqdm(image_paths, desc="Radiomics (NIfTI images)", unit="image"):
# Strip extensions and _IMG suffix to get case ID
subject_id = img_path.name.removesuffix(".nii.gz").removesuffix(".nii").removesuffix("_IMG")
# Find all corresponding masks (e.g., CASE001_MASK1.nii.gz, etc.)
mask_paths = sorted(input_dir.glob(f"{subject_id}_MASK*.nii*"))
if not mask_paths:
raise ValueError(f"No masks found for {subject_id}")
image = load_image(str(img_path))
# Merge multiple masks into one and ensure binary semantics
mask = load_and_merge_images(
[str(p) for p in mask_paths],
reference_image=image,
binarize=True
)
pipeline.clear_log()
# Run extraction for all target configurations
results = pipeline.run(
image=image,
mask=mask,
subject_id=subject_id,
config_names=target_configs,
)
# Format results and store metadata
row = format_results(
results,
fmt="long",
meta={
"subject_id": subject_id,
"image": str(img_path),
"masks": ";".join(str(p) for p in mask_paths),
},
)
rows.append(row)
# Save per-case logs
pipeline.save_log(str(log_dir / f"{subject_id}.json"))
# Consolidated export (save_results handles list of DataFrames for long format automatically)
save_results(rows, output_file)
print(f"Wrote {len(rows)} cases to {output_file}")
if __name__ == "__main__":
main()
Case 4: Parallel batch radiomics from DICOM cases (merge multiple segmentation folders)
Scenario
You have a folder of cases. Each case is a separate folder and contains:
Image/: the DICOM image series (CT/MR/etc.), stored at an arbitrary depth.Segmentation/: multiple subfolders, each containing a segmentation series at an arbitrary depth.
You want to:
- For each case, recursively discover the DICOM image series folder.
- Discover all segmentation series folders under
Segmentation/and load them. - Convert each segmentation to a binary mask (values
> 0become 1). - Merge all binary masks into a single ROI mask per case.
- Apply all preprocessing steps supported by the pipeline (resample, resegment, outlier filtering, intensity rounding, and largest connected component).
- Compute radiomics for six discretisations (FBN 8/16/32 and FBS 8/16/32).
- Run cases in parallel on multiple CPU cores, with a user-controlled
n_jobs. - Export a single wide JSON file (one object per case) and save per-case logs.
Notes
Note
- Progress bar dependency: This example uses
tqdm.- If you are running this outside the library repo, install it with
pip install tqdm. - If you are adding it to your Poetry-managed project, use
poetry add tqdm.
- If you are running this outside the library repo, install it with
- Multiprocessing requirement: On Windows/macOS, keep the parallel execution inside
if __name__ == "__main__":(as shown) to avoid process-spawn issues. - JIT warmup in parallel workers: Pictologics performs a Numba JIT warmup at package import.
With
ProcessPoolExecutor, each worker is a separate Python process, so warmup happens once per worker process (on its first import ofpictologics) and then stays warm for all cases that worker processes. It is not re-run for every case unless you explicitly callwarmup_jit()inside your per-case function. You can disable auto-warmup viaPICTOLOGICS_DISABLE_WARMUP=1if you prefer to skip the upfront cost. - Preprocessing parameters are dataset-dependent: The
resegmentrange here uses the CT HU example[-100, 3000]. Adjust or remove it for non-CT data.
Performance Tip: Deduplication
This example demonstrates manual feature separation combined with parallel processing:
case4_orig: Extracts morphology and intensity features (discretization-independent) — computed once per casecase4_fbn_*/case4_fbs_*: Extract only texture, histogram, and IVH features (discretization-dependent)
This approach ensures discretization-independent features are never recomputed. See Case 8 for a detailed explanation of this technique.
Full example script
Full example script
from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
import numpy as np
from pictologics import Image, RadiomicsPipeline, load_image, load_and_merge_images
from pictologics.results import format_results, save_results
def collect_segmentation_series_roots(seg_root):
"""Collect all segmentation series subfolders."""
if not seg_root.exists():
raise ValueError(f"Folder does not exist: {seg_root}")
subdirs = sorted([p for p in seg_root.iterdir() if p.is_dir()])
if not subdirs:
return [seg_root]
return subdirs
def build_case4_pipeline():
"""Define the pipeline with manual feature separation for efficiency."""
pipeline = RadiomicsPipeline()
# Define standard CT preprocessing
preprocess_steps = [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
{"step": "filter_outliers", "params": {"sigma": 3.0}},
{"step": "round_intensities", "params": {}},
{"step": "keep_largest_component", "params": {"apply_to": "morph"}},
]
# Discretization-independent features (computed once)
extract_orig = {
"step": "extract_features",
"params": {
"families": ["morphology", "intensity"],
"include_spatial_intensity": False,
"include_local_intensity": False,
},
}
# Discretization-dependent features only
extract_discretized = {
"step": "extract_features",
"params": {
"families": ["texture", "histogram", "ivh"],
"include_spatial_intensity": False,
"include_local_intensity": False,
},
}
# Add "orig" config for discretization-independent features
pipeline.add_config("case4_orig", preprocess_steps + [extract_orig])
# Add discretisation variants (texture/histogram/ivh only)
for n_bins in (8, 16, 32):
pipeline.add_config(
f"case4_fbn_{n_bins}",
preprocess_steps + [{"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}}, extract_discretized],
)
for bin_width in (8.0, 16.0, 32.0):
pipeline.add_config(
f"case4_fbs_{int(bin_width)}",
preprocess_steps + [{"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}}, extract_discretized],
)
config_names = ["case4_orig"] + [f"case4_fbn_{b}" for b in (8, 16, 32)] + [f"case4_fbs_{b}" for b in (8, 16, 32)]
return pipeline, config_names
def process_case(case_dir, log_dir):
"""Worker function for single case processing."""
case_path = Path(case_dir)
subject_id = case_path.name
image_root = case_path / "Image"
seg_root = case_path / "Segmentation"
# Load image recursively
image = load_image(str(image_root), recursive=True)
# Load and merge all found segmentations
seg_roots = collect_segmentation_series_roots(seg_root)
# load_and_merge_images handles multiple paths, geometry checking, and binarization
try:
mask = load_and_merge_images(
[str(p) for p in seg_roots],
reference_image=image,
binarize=True,
recursive=True
)
except ValueError as e:
raise ValueError(f"Failed to load/merge masks for {subject_id}: {e}")
# Setup and run pipeline
pipeline, config_names = build_case4_pipeline()
pipeline.clear_log()
results = pipeline.run(image=image, mask=mask, subject_id=subject_id, config_names=config_names)
# Save results and log
Path(log_dir).mkdir(parents=True, exist_ok=True)
pipeline.save_log(str(Path(log_dir) / f"{subject_id}.json"))
return format_results(
results,
fmt="wide",
meta={
"subject_id": subject_id,
"image_root": str(image_root),
"seg_roots": ";".join(str(p) for p in seg_roots),
},
)
def main():
# Configure paths
cases_dir = Path("path/to/cases_root")
output_file = Path("case4_parallel_results.json")
log_dir = Path("case4_logs")
log_dir.mkdir(parents=True, exist_ok=True)
# Start parallel processing
n_jobs = 4
case_dirs = sorted([p for p in cases_dir.iterdir() if p.is_dir()])
if not case_dirs:
raise ValueError(f"No case folders found in: {cases_dir}")
from tqdm import tqdm
rows = []
errors = []
# Map each case to the worker function
with ProcessPoolExecutor(max_workers=n_jobs) as executor:
futures = {
executor.submit(process_case, str(case_dir), str(log_dir)): case_dir
for case_dir in case_dirs
}
with tqdm(total=len(futures), desc="Radiomics (parallel cases)", unit="case") as pbar:
for fut in as_completed(futures):
case_dir = futures[fut]
try:
rows.append(fut.result())
except Exception as e:
errors.append((str(case_dir), repr(e)))
finally:
pbar.update(1)
if errors:
msg = "\n".join(f"- {case}: {err}" for case, err in errors)
raise RuntimeError(f"One or more cases failed:\n{msg}")
# Final data export
save_results(rows, output_file)
print(f"Wrote {len(rows)} cases to {output_file}")
if __name__ == "__main__":
main()
Case 5: Batch radiomics from DICOM SEG files with multiple segments
Scenario
You have a folder of cases. Each case contains:
Image/: the DICOM image series (CT/MR/etc.)segmentation.dcm: a DICOM SEG file with multiple labeled segments (e.g., liver, spleen, kidneys)
You want to:
- Load the image and inspect the available segments in the SEG file.
- Process each segment separately to get per-organ radiomics.
- Apply standard preprocessing and compute features for each segment.
- Export results with segment labels in the output.
Notes
Note
get_segment_info()returns metadata about each segment (number, label, algorithm).load_seg()withcombine_segments=Falsereturns a dict mapping segment numbers toImageobjects.- Alignment: Use
reference_imageto ensure the SEG mask matches the CT geometry.
Full example script
Full example script
from pathlib import Path
from pictologics import load_image, load_seg, RadiomicsPipeline
from pictologics.loaders import get_segment_info
from pictologics.results import format_results, save_results
def main():
# Configure paths
cases_dir = Path("path/to/cases_root")
output_csv = Path("case5_per_segment_results.csv")
log_dir = Path("case5_logs")
log_dir.mkdir(parents=True, exist_ok=True)
# Find all case directories
case_dirs = sorted([p for p in cases_dir.iterdir() if p.is_dir()])
if not case_dirs:
raise ValueError(f"No case folders found in: {cases_dir}")
# Initialize pipeline with standard config
pipeline = RadiomicsPipeline()
pipeline.add_config(
"case5_fbn_32",
[
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "discretise", "params": {"method": "FBN", "n_bins": 32}},
{
"step": "extract_features",
"params": {
"families": ["intensity", "morphology", "texture", "histogram"],
"include_spatial_intensity": False,
"include_local_intensity": False,
},
},
],
)
from tqdm import tqdm
rows = []
for case_dir in tqdm(case_dirs, desc="Radiomics (per-segment)", unit="case"):
subject_id = case_dir.name
image_root = case_dir / "Image"
seg_file = case_dir / "segmentation.dcm"
# Load the reference image
image = load_image(str(image_root), recursive=True)
# Inspect available segments
segments = get_segment_info(str(seg_file))
print(f"\n{subject_id}: Found {len(segments)} segments")
for seg in segments:
print(f" Segment {seg['segment_number']}: {seg['segment_label']}")
# Load each segment separately, aligned to image geometry
segment_masks = load_seg(
str(seg_file),
combine_segments=False, # Returns dict {seg_num: Image}
reference_image=image
)
# Process each segment
for seg_num, mask in segment_masks.items():
# Find segment label from metadata
seg_info = next(s for s in segments if s["segment_number"] == seg_num)
seg_label = seg_info["segment_label"]
pipeline.clear_log()
results = pipeline.run(
image=image,
mask=mask,
subject_id=f"{subject_id}_{seg_label}",
config_names=["case5_fbn_32"],
)
row = format_results(
results,
fmt="wide",
meta={
"subject_id": subject_id,
"segment_number": seg_num,
"segment_label": seg_label,
},
)
rows.append(row)
# Save per-segment log
pipeline.save_log(str(log_dir / f"{subject_id}_{seg_label}.json"))
# Export all results
save_results(rows, output_csv)
print(f"\nWrote {len(rows)} rows to {output_csv}")
if __name__ == "__main__":
main()
Output format
- One row per segment per case.
- Columns include:
subject_id- Case identifiersegment_number- Numeric segment ID from SEG filesegment_label- Human-readable segment name (e.g., "Liver", "Spleen")- Feature columns prefixed by configuration name
Case 6: Filtered radiomics using IBSI 2 filters
Scenario
You want to extract radiomic features from filtered response maps (IBSI 2 paradigm). This is useful for:
- Capturing texture at multiple scales (LoG with different σ values).
- Extracting directional texture patterns (Gabor filters).
- Multi-resolution analysis (wavelet decomposition).
You want to:
- Apply multiple filters to each image.
- Extract intensity features from each filtered response map.
- Compare features across filter types and parameters.
Key concepts
- Filter step before feature extraction: Apply
filterafter preprocessing but beforeextract_features. - Intensity features from filtered images: First-order statistics (mean, variance, skewness) capture texture properties at different scales.
- Morphology is filter-independent: Morphology features depend only on mask geometry, not intensity values. Compute morphology once from the original (unfiltered) image and reuse across all filter configurations.
- No discretisation for filtered images: IBSI 2 Phase 2 recommends intensity features from continuous filtered values.
Full example script
Full example script
from pathlib import Path
from pictologics import RadiomicsPipeline
from pictologics.results import format_results, save_results
def main():
# Configure paths
image_path = Path("path/to/image.nii.gz")
mask_path = Path("path/to/mask.nii.gz")
output_csv = Path("filtered_radiomics.csv")
# Initialize pipeline
pipeline = RadiomicsPipeline()
# IBSI 2 Phase 2 preprocessing (Config B)
preprocess_steps = [
{"step": "resample", "params": {
"new_spacing": (1.0, 1.0, 1.0),
"interpolation": "cubic"
}},
{"step": "round_intensities", "params": {}},
{"step": "resegment", "params": {"range_min": -1000, "range_max": 400}},
]
# --- Configuration: Morphology from original image (computed once) ---
# Morphology features depend on mask geometry, NOT intensity values,
# so they are identical regardless of which filter is applied.
pipeline.add_config(
"orig",
preprocess_steps + [
{"step": "extract_features", "params": {"families": ["morphology"]}},
],
)
# Feature extraction (intensity only for filtered images)
extract_intensity = {
"step": "extract_features",
"params": {"families": ["intensity"]},
}
# Collect filter config names
filter_configs = []
# --- Configuration 1: LoG at multiple scales ---
for sigma in [1.5, 3.0, 5.0]:
name = f"log_sigma_{sigma}"
pipeline.add_config(
name,
preprocess_steps + [
{"step": "filter", "params": {
"type": "log",
"sigma_mm": sigma,
"truncate": 4.0,
}},
extract_intensity,
],
)
filter_configs.append(name)
# --- Configuration 2: Gabor filter ---
name = "gabor_5mm"
pipeline.add_config(
name,
preprocess_steps + [
{"step": "filter", "params": {
"type": "gabor",
"sigma_mm": 5.0,
"lambda_mm": 2.0,
"gamma": 1.5,
"rotation_invariant": True,
"pooling": "average",
}},
extract_intensity,
],
)
filter_configs.append(name)
# --- Configuration 3: Wavelet decomposition ---
for decomp in ["LLH", "HHL", "HHH"]:
name = f"wavelet_{decomp}"
pipeline.add_config(
name,
preprocess_steps + [
{"step": "filter", "params": {
"type": "wavelet",
"wavelet": "db3",
"level": 1,
"decomposition": decomp,
"rotation_invariant": True,
"pooling": "average",
}},
extract_intensity,
],
)
filter_configs.append(name)
# --- Configuration 4: Laws texture energy ---
name = "laws_L5E5E5"
pipeline.add_config(
name,
preprocess_steps + [
{"step": "filter", "params": {
"type": "laws",
"kernel": "L5E5E5",
"rotation_invariant": True,
"pooling": "max",
"compute_energy": True,
"energy_distance": 7,
}},
extract_intensity,
],
)
filter_configs.append(name)
# Run pipeline - include "orig" for morphology + all filter configs for intensity
all_configs = ["orig"] + filter_configs
results = pipeline.run(
image=str(image_path),
mask=str(mask_path),
subject_id="subject_001",
config_names=all_configs,
)
# Format and save
row = format_results(
results,
fmt="wide",
meta={"subject_id": "subject_001"},
)
save_results([row], output_csv)
print(f"Saved filtered radiomics to {output_csv}")
# Display key features for each filter
print("\nMean values from each filter:")
for config in filter_configs:
mean_val = results[config].get("mean_intensity_Q4LE", "N/A")
print(f" {config}: {mean_val:.4f}" if isinstance(mean_val, float) else f" {config}: {mean_val}")
if __name__ == "__main__":
main()
Output format
- One row per subject with features from each filter as separate columns.
- Column names use the pattern
{filter_config}__mean_intensity_Q4LE(e.g.,log_sigma_1.5__mean_intensity_Q4LE).
Combining with standard texture features
You can also run both filtered and standard texture configs together:
# Standard texture config (unfiltered)
pipeline.add_config("standard_fbn_32", [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "discretise", "params": {"method": "FBN", "n_bins": 32}},
{"step": "extract_features", "params": {"families": ["texture"]}},
])
# Run all configs together
all_configs = filter_configs + ["standard_fbn_32"]
results = pipeline.run(image, mask, config_names=all_configs)
Output Options
For details on result formatting (wide vs long, output_type options) and export functions, see the Pipeline & Preprocessing - Working with Results.
Case 7: Multi-configuration batch with deduplication
Scenario
You want to run a comprehensive radiomic analysis using multiple discretization strategies (e.g., FBN with 8/16/32 bins and FBS with 8/16/32 bin widths) while maximizing performance. All configurations share the same preprocessing steps.
Deduplication is Enabled by Default
Starting with pictologics v0.3.2, deduplication is enabled by default (deduplicate=True). You don't need to explicitly enable it—just create a pipeline and run multiple configurations to benefit from automatic optimization.
You want to:
- Apply standard preprocessing (resample, resegment, filter outliers, keep largest component).
- Run 6 discretization strategies for comprehensive coverage.
- Optimize performance by avoiding redundant computation of features that don't depend on discretization.
- Track and report deduplication statistics to understand the performance benefit.
Key concepts
The deduplication system understands which feature families depend on which preprocessing steps:
| Feature Family | Depends On |
|---|---|
| Morphology | Resampling, mask operations (resegment, keep_largest_component, filter_outliers) |
| Intensity | Resampling, intensity preprocessing (resegment, filter_outliers, round_intensities) |
| Texture / Histogram | All of the above plus discretization |
| IVH | Depends on IVH-specific settings (ivh_use_continuous, ivh_discretisation) |
When configs share preprocessing but differ only in discretization:
- Morphology and intensity features are computed once and reused.
- Texture, histogram, and (typically) IVH are computed per configuration.
How Results Are Handled
Complete Results for All Configurations
When deduplication reuses features, they are copied into the results dictionary—never empty or missing. Every configuration returns a complete feature set with identical values for reused families.
| What happens | Details |
|---|---|
| First config | All feature families are computed and stored in cache |
| Subsequent configs | Matching families are copied from cache; differing families are computed fresh |
| Final results | Every config has complete features—no NaN values, no missing columns |
This means when you concatenate results into a DataFrame for machine learning or statistical analysis, all rows are complete:
import pandas as pd
# Each config has complete results
df = pd.DataFrame([results[cfg] for cfg in config_names])
print(df.isna().sum().sum()) # 0 - no missing values!
This can reduce total computation time significantly—especially for morphology features which involve expensive surface area calculations.
Full example script
Full example script
from pathlib import Path
import time
from pictologics import RadiomicsPipeline
from pictologics.results import format_results, save_results
def main():
# Configure paths
image_path = Path("path/to/image.nii.gz")
mask_path = Path("path/to/mask.nii.gz")
output_csv = Path("case7_dedup_results.csv")
# Define shared preprocessing steps
preprocess_steps = [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
{"step": "filter_outliers", "params": {"sigma": 3.0}},
{"step": "round_intensities", "params": {}},
{"step": "keep_largest_component", "params": {"apply_to": "morph"}},
]
# Shared feature extraction settings
extract_all = {
"step": "extract_features",
"params": {
"families": ["intensity", "morphology", "texture", "histogram", "ivh"],
"include_spatial_intensity": False,
"include_local_intensity": False,
},
}
# ========================================
# Run WITH deduplication (enabled by default)
# ========================================
pipeline_dedup = RadiomicsPipeline() # deduplicate=True is the default!
# Add 6 configurations (only discretisation differs)
for n_bins in (8, 16, 32):
pipeline_dedup.add_config(
f"fbn_{n_bins}",
preprocess_steps + [
{"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
extract_all,
],
)
for bin_width in (8.0, 16.0, 32.0):
pipeline_dedup.add_config(
f"fbs_{int(bin_width)}",
preprocess_steps + [
{"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}},
extract_all,
],
)
config_names = ["fbn_8", "fbn_16", "fbn_32", "fbs_8", "fbs_16", "fbs_32"]
# Run with deduplication
start = time.perf_counter()
results = pipeline_dedup.run(
image=str(image_path),
mask=str(mask_path),
subject_id="subject_001",
config_names=config_names,
)
elapsed_dedup = time.perf_counter() - start
# ========================================
# Check deduplication statistics
# ========================================
stats = pipeline_dedup.deduplication_stats
print("\n=== Deduplication Statistics ===")
print(f"Reused feature families: {stats['reused_families']}")
print(f"Computed feature families: {stats['computed_families']}")
print(f"Cache hit rate: {stats['cache_hit_rate']:.1%}")
print(f"Time with deduplication: {elapsed_dedup:.2f}s")
# ========================================
# Format and save results
# ========================================
row = format_results(
results,
fmt="wide",
meta={"subject_id": "subject_001"},
)
save_results([row], output_csv)
print(f"\nSaved results to {output_csv}")
# Verify all configs have the same morphology features (computed once, reused)
ref_volume = results["fbn_8"].get("volume_mesh_ml_HTUR", None)
for config in config_names[1:]:
vol = results[config].get("volume_mesh_ml_HTUR", None)
assert vol == ref_volume, f"Volume mismatch for {config}"
print("✓ Morphology features identical across all configurations (as expected)")
if __name__ == "__main__":
main()
Expected output
When running 6 configurations with shared preprocessing:
=== Deduplication Statistics ===
Reused feature families: 10
Computed feature families: 20
Cache hit rate: 33.3%
Time with deduplication: 12.34s
Saved results to case7_dedup_results.csv
✓ Morphology features identical across all configurations (as expected)
The cache hit rate reflects that:
- Morphology (1 family) is computed once, reused 5 times → 5 cache hits
- Intensity (1 family) is computed once, reused 5 times → 5 cache hits
- Texture (6 subfamilies × 6 configs = 36) computed fresh each time
- Histogram (1 × 6 configs) computed fresh each time
- IVH (1 × 6 configs) computed fresh each time
The actual speedup depends on your data, but morphology calculations (especially surface area) are often the most expensive, so avoiding 5 redundant morphology computations can provide significant time savings.
Configuration options
Deduplication is enabled by default, but you can disable or customize it:
# Default behavior - deduplication enabled (no explicit setting needed)
pipeline = RadiomicsPipeline()
# Disable deduplication (features computed independently for each config)
pipeline_no_dedup = RadiomicsPipeline(deduplicate=False)
# Lock to a specific rules version for reproducibility
pipeline_versioned = RadiomicsPipeline(deduplication_rules="1.0.0")
# Access deduplication configuration
print(f"Deduplication enabled: {pipeline.deduplication_enabled}")
print(f"Rules version: {pipeline.deduplication_rules.version}")
When to disable deduplication
In most cases, you should leave deduplication enabled (the default). However, you might disable it if:
| Scenario | Reason |
|---|---|
| Debugging | To verify that features are computed correctly for each config |
| Memory constraints | Caching results consumes memory during the run |
| Single config | No benefit when running only one configuration |
Serialization
Deduplication settings are preserved when exporting/importing pipeline configurations:
# Export includes deduplication settings
pipeline.save_configs("my_pipeline.yaml")
# Import preserves settings
loaded = RadiomicsPipeline.load_configs("my_pipeline.yaml")
print(f"Loaded deduplicate setting: {loaded.deduplication_enabled}")
Learn More
For detailed documentation of the deduplication system, including the underlying classes and how feature family dependencies are determined, see the Deduplication section in the Pipeline guide and the Deduplication API reference.
Case 8: Manual Feature Separation (Advanced)
Case 7 vs Case 8
- Case 7 uses automatic deduplication (enabled by default) where the pipeline internally optimizes computation and copies results to matching configs
- Case 8 demonstrates manual feature separation where YOU explicitly control which features are computed in which configs
For most users, automatic deduplication (Case 7) is simpler and sufficient. Manual separation gives you maximum control and transparency.
Scenario
You want complete control over which features are computed for which configurations, avoiding any redundant computation. This is an alternative to automatic deduplication that gives you explicit control over your pipeline structure.
This approach is ideal when:
- You want clear separation between "baseline" features and "variant" features.
- You're combining multiple filters with multiple discretization strategies.
- You need to ensure morphology is computed exactly once across all configurations.
The Core Concept
Different feature families have different dependencies:
| Feature Family | Depends On | Independent Of |
|---|---|---|
| Morphology | Mask geometry (resample, binarize_mask, keep_largest_component) | Intensity values, filters, discretization |
| Intensity | Intensity preprocessing (resample, resegment, filter_outliers, filter) | Discretization |
| Texture, Histogram, IVH | All of the above + discretization | — |
This means:
- Morphology can be computed once and reused across all configurations (including filtered ones).
- Intensity can be computed once per unique preprocessing+filter combination.
- Texture/Histogram/IVH must be computed for each discretization strategy.
Example 1: Standard Radiomics with Manual Separation
from pictologics import RadiomicsPipeline
# Common preprocessing
preprocess = [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
{"step": "keep_largest_component", "params": {"apply_to": "morph"}},
]
pipeline = RadiomicsPipeline()
# Config 1: "orig" - morphology + intensity (computed ONCE)
pipeline.add_config("orig", preprocess + [
{"step": "extract_features", "params": {
"families": ["morphology", "intensity"],
}},
])
# Configs 2-7: Discretization-dependent features only
for n_bins in (8, 16, 32):
pipeline.add_config(f"fbn_{n_bins}", preprocess + [
{"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
{"step": "extract_features", "params": {
"families": ["texture", "histogram", "ivh"], # NO morphology/intensity
}},
])
for bin_width in (8.0, 16.0, 32.0):
pipeline.add_config(f"fbs_{int(bin_width)}", preprocess + [
{"step": "discretise", "params": {"method": "FBS", "bin_width": bin_width}},
{"step": "extract_features", "params": {
"families": ["texture", "histogram", "ivh"],
}},
])
# Run all configs
results = pipeline.run(
image="path/to/image.nii.gz",
mask="path/to/mask.nii.gz",
config_names=["orig", "fbn_8", "fbn_16", "fbn_32", "fbs_8", "fbs_16", "fbs_32"],
)
# Results structure:
# - results["orig"] contains morphology + intensity (applies to all)
# - results["fbn_8"] contains texture/histogram/ivh for FBN with 8 bins
# - etc.
Example 2: Filtered Radiomics with Manual Separation
When combining filters, remember:
- Morphology is filter-independent (computed from mask geometry)
- Intensity is filter-dependent (computed from filtered response map)
from pictologics import RadiomicsPipeline
preprocess = [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "resegment", "params": {"range_min": -1000, "range_max": 400}},
]
pipeline = RadiomicsPipeline()
# Config 1: Morphology from original image (computed ONCE, reused for all filters)
pipeline.add_config("orig_morphology", preprocess + [
{"step": "extract_features", "params": {"families": ["morphology"]}},
])
# Config 2: Intensity from original (unfiltered) image
pipeline.add_config("orig_intensity", preprocess + [
{"step": "extract_features", "params": {"families": ["intensity"]}},
])
# Configs 3+: Intensity from each filtered response map
for sigma in [1.5, 3.0, 5.0]:
pipeline.add_config(f"log_{sigma}", preprocess + [
{"step": "filter", "params": {"type": "log", "sigma_mm": sigma}},
{"step": "extract_features", "params": {"families": ["intensity"]}},
])
pipeline.add_config("gabor", preprocess + [
{"step": "filter", "params": {
"type": "gabor", "sigma_mm": 5.0, "lambda_mm": 2.0, "gamma": 1.0,
"rotation_invariant": True, "pooling": "average",
}},
{"step": "extract_features", "params": {"families": ["intensity"]}},
])
# Run all
results = pipeline.run(
image="path/to/image.nii.gz",
mask="path/to/mask.nii.gz",
)
# Results structure:
# - results["orig_morphology"] contains morphology (applies to ALL configs)
# - results["orig_intensity"] contains intensity from original image
# - results["log_1.5"] contains intensity from LoG σ=1.5 filtered image
# - etc.
Example 3: Combined Filters + Discretization
The most complex scenario: multiple filters AND multiple discretization strategies.
from pictologics import RadiomicsPipeline
preprocess = [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "resegment", "params": {"range_min": -100, "range_max": 3000}},
]
pipeline = RadiomicsPipeline()
# === TIER 1: Morphology (computed ONCE) ===
pipeline.add_config("orig_morphology", preprocess + [
{"step": "extract_features", "params": {"families": ["morphology"]}},
])
# === TIER 2: Intensity per filter ===
# Original (unfiltered)
pipeline.add_config("orig_intensity", preprocess + [
{"step": "extract_features", "params": {"families": ["intensity"]}},
])
# LoG filters
for sigma in [1.5, 3.0]:
pipeline.add_config(f"log_{sigma}_intensity", preprocess + [
{"step": "filter", "params": {"type": "log", "sigma_mm": sigma}},
{"step": "extract_features", "params": {"families": ["intensity"]}},
])
# === TIER 3: Texture per discretization (original image only) ===
for n_bins in (16, 32):
pipeline.add_config(f"orig_fbn_{n_bins}", preprocess + [
{"step": "discretise", "params": {"method": "FBN", "n_bins": n_bins}},
{"step": "extract_features", "params": {"families": ["texture", "histogram"]}},
])
# Note: Texture from filtered images would require discretization of the filtered
# response map, which is less common but can be added similarly if needed.
Comparison: Manual vs Automatic Deduplication
| Aspect | Manual Separation | Automatic Deduplication |
|---|---|---|
| Setup complexity | More config definitions | Simpler (one config per variant) |
| Result structure | Features split across configs | All features in each config |
| Memory usage | Lower (no caching) | Higher (cached results) |
| Transparency | Explicit control | Magic happens internally |
| Filter handling | You control morphology placement | Morphology auto-shared |
When to Use Manual Separation
Recommended For
- Complex filter + discretization combinations
- Maximum memory efficiency
- When you want explicit control over computation
- Research scenarios requiring clear provenance
Consider Automatic Deduplication Instead
- Simple discretization-only scenarios (automatic dedup handles this well)
- When you want all features in each result dictionary
- Rapid prototyping