Pipeline & Preprocessing
The RadiomicsPipeline is the core engine of Pictologics for executing reproducible, standardized radiomic feature extraction workflows. It manages the entire lifecycle from preprocessing to feature extraction and logging.
Why Use the Pipeline?
- Reproducibility: Define a configuration once and apply it consistently to every image.
- State Management: The pipeline tracks the image and masks (morphological and intensity) through every step.
- Standardisation: Built-in configurations follow IBSI standards.
- Batch Processing: Run multiple configurations (e.g., different binning strategies) on the same image in a single pass.
- Flexibility: Steps execute linearly, so you can arrange them in any order or repeat steps.
Getting Started
from pictologics import RadiomicsPipeline, format_results, save_results
# 1. Initialize the pipeline
pipeline = RadiomicsPipeline()
# 2. Run a predefined configuration
results = pipeline.run(
image="path/to/image.nii.gz",
mask="path/to/mask.nii.gz",
subject_id="Subject_001",
config_names=["standard_fbn_32"],
)
# 3. Format and save results
row = format_results(results, fmt="wide", meta={"subject_id": "Subject_001"})
save_results([row], "results.csv")
Masks Are Optional
RadiomicsPipeline.run(...) accepts an optional mask argument:
- Pass a mask path or
Imageobject → used as the ROI (standard workflow). - Omit
mask(or passmask=None/mask="") → Pictologics generates a full (all-ones) ROI mask, treating the entire image as the initial ROI.
Empty ROI is an Error
If preprocessing removes all ROI voxels (e.g., too strict resegment thresholds), the pipeline raises a clear EmptyROIMaskError rather than returning empty/partial feature sets.
Morphology with Whole-Image ROI
With a maskless run, morphology features describe the ROI mask after mask-refining steps
(e.g., resegment, keep_largest_component). This is valid computationally, but may not be
scientifically meaningful for all studies.
Predefined Configurations
Pictologics includes 6 standard configurations designed for common radiomics workflows. All share:
- Resampling: 0.5mm × 0.5mm × 0.5mm isotropic spacing
- Feature Families: intensity, morphology, texture, histogram, and IVH
- Performance: Spatial/local intensity disabled by default
| Configuration | Method | Parameters |
|---|---|---|
standard_fbn_8 |
Fixed Bin Number | n_bins=8 |
standard_fbn_16 |
Fixed Bin Number | n_bins=16 |
standard_fbn_32 |
Fixed Bin Number | n_bins=32 |
standard_fbs_8 |
Fixed Bin Size | bin_width=8.0 |
standard_fbs_16 |
Fixed Bin Size | bin_width=16.0 |
standard_fbs_32 |
Fixed Bin Size | bin_width=32.0 |
# Run a single configuration
results = pipeline.run(image, mask, config_names=["standard_fbn_32"])
# Run all 6 standard configurations
all_results = pipeline.run(image, mask, config_names=["all_standard"])
Configuration Management
For detailed documentation on FBN vs FBS guidance, export/import, YAML/JSON formats, schema versioning, and sharing configurations, see the Configuration & Reproducibility guide.
Linear Step Execution
Steps are applied one after another in the exact sequence you define. You can repeat steps, arrange steps in any order, and implement complex multi-stage preprocessing:
# Example: Complex workflow with repeated steps
complex_config = [
{"step": "resample", "params": {"new_spacing": (2.0, 2.0, 2.0)}},
{"step": "keep_largest_component", "params": {"apply_to": "morph"}},
{"step": "resegment", "params": {"range_min": -1000, "range_max": 400}},
{"step": "filter_outliers", "params": {"sigma": 3.0}},
{"step": "round_intensities", "params": {}},
{"step": "discretise", "params": {"method": "FBN", "n_bins": 32}},
{"step": "extract_features", "params": {"families": ["texture", "histogram"]}},
]
Intelligent Image Routing
After discretisation, the pipeline maintains both the original (raw) image and the discretised image, ensuring each feature type gets the appropriate input automatically:
| Feature Family | Image Used | Why |
|---|---|---|
| Intensity | Raw image | Statistics require original continuous values |
| Morphology | Raw image | Volume/surface calculations use original geometry |
| Histogram | Discretised | Bin-based statistics require integer bins |
| Texture (GLCM, GLRLM, etc.) | Discretised | Co-occurrence matrices require discrete grey levels |
| IVH | Configurable | Can use raw (continuous) or discretised values |
Available Preprocessing Steps
1. resample
Resamples the image and mask to a new voxel spacing.
| Parameter | Type | Default | Description |
|---|---|---|---|
new_spacing |
tuple |
(required) | Target spacing (x, y, z) in mm |
interpolation |
str |
"linear" |
Image interpolation: "linear", "cubic", "nearest" |
mask_interpolation |
str |
"nearest" |
Mask interpolation: "nearest", "linear" |
mask_threshold |
float |
0.5 |
Threshold for non-nearest mask interpolation |
round_intensities |
bool |
False |
Round intensities to nearest integer after resampling |
2. resegment
Refines the ROI mask based on intensity thresholds, excluding voxels outside the specified range from feature extraction. This is essential for removing sentinel/NA values (e.g., -1024, -2048) from the ROI.
Memory Usage Alert
If your image has a background that resamples to 0, and 0 is within your resegment range, you must use source_mode="auto".
Otherwise, resegment will include the entire background in the ROI, causing memory exhaustion during texture calculation. source_mode="auto" ensures the background remains excluded.
| Parameter | Type | Default | Description |
|---|---|---|---|
range_min |
float |
None |
Minimum intensity value |
range_max |
float |
None |
Maximum intensity value |
3. filter_outliers
Removes outliers from the intensity mask based on standard deviations from the mean.
| Parameter | Type | Default | Description |
|---|---|---|---|
sigma |
float |
3.0 |
Number of standard deviations |
4. keep_largest_component
Restricts the mask to the largest connected component.
| Parameter | Type | Default | Description |
|---|---|---|---|
apply_to |
str |
"both" |
"both", "morph", or "intensity" |
5. round_intensities
Rounds image intensities to the nearest integer. Useful before discretisation if values are close to integers.
No parameters.
6. binarize_mask
Creates a binary mask from a multi-label mask.
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold |
float |
0.5 |
Threshold value for binarization |
mask_values |
int, list, or tuple |
None |
Specific label(s) to select. Tuple (min, max) selects a range |
apply_to |
str |
"both" |
"both", "morph", or "intensity" |
7. discretise
Discretises image intensities into bins. Required before texture feature extraction.
| Parameter | Type | Default | Description |
|---|---|---|---|
method |
str |
(required) | "FBN" (Fixed Bin Number) or "FBS" (Fixed Bin Size) |
n_bins |
int |
None |
Number of bins (for FBN) |
bin_width |
float |
None |
Width of each bin (for FBS) |
8. filter
Applies an IBSI 2 image filter. See the Image Filtering guide for detailed documentation.
| Parameter | Type | Default | Description |
|---|---|---|---|
type |
str |
(required) | "mean", "log", "laws", "gabor", "wavelet", "simoncelli", "riesz" |
boundary |
str |
"mirror" |
Boundary condition |
Filter-specific parameters:
| Filter | Required Params | Optional Params |
|---|---|---|
mean |
support |
boundary |
log |
sigma_mm |
truncate, boundary |
laws |
kernel |
rotation_invariant, pooling, compute_energy, energy_distance, boundary |
gabor |
sigma_mm, lambda_mm, gamma |
rotation_invariant, delta_theta, pooling, boundary |
wavelet |
wavelet, level, decomposition |
rotation_invariant, pooling, boundary |
simoncelli |
level |
— |
riesz |
order |
variant, sigma_mm, level |
Automatic Spacing Injection
For filters requiring physical spacing (log, gabor), the pipeline uses the image's voxel spacing automatically.
9. extract_features
Calculates radiomic features from the current state.
| Parameter | Type | Default | Description |
|---|---|---|---|
families |
list[str] |
(required) | Feature families to extract (see table below) |
include_spatial_intensity |
bool |
False |
Include Moran's I / Geary's C |
include_local_intensity |
bool |
False |
Include local intensity peaks |
ivh_params |
dict |
None |
Parameters for IVH: bin_width, min_val, max_val, etc. |
ivh_discretisation |
dict |
None |
Temporary discretisation for IVH only |
ivh_use_continuous |
bool |
False |
Use raw values for IVH |
texture_matrix_params |
dict |
None |
E.g., {"ngldm_alpha": 1} |
Available feature families:
| Family | Description |
|---|---|
"intensity" |
First-order statistics (Mean, Skewness, etc.) |
"spatial_intensity" |
Moran's I / Geary's C only |
"local_intensity" |
Local/global intensity peak features only |
"morphology" |
Shape and size features (Volume, Sphericity, etc.) |
"texture" |
GLCM, GLRLM, GLSZM, GLDZM, NGTDM, NGLDM |
"histogram" |
Intensity histogram features |
"ivh" |
Intensity-Volume Histogram features |
Working with Results
The format_results() function converts pipeline output into different formats for analysis or export.
Format Options
One row per subject with all features as columns. Column names use the pattern {config}__{feature}.
Output Types
| Type | Returns |
|---|---|
"dict" (default) |
Python dictionary (wide) or list of dicts (long) |
"pandas" |
pandas.DataFrame |
"json" |
JSON string |
Batch Processing Pattern
all_rows = []
for file in image_files:
res = pipeline.run(image=file, ...)
all_rows.append(format_results(res, fmt="wide", meta={"filename": file.name}))
# Save everything at once
save_results(all_rows, "full_study_results.csv")
Deduplication (Performance Optimization)
When running multiple configurations that share preprocessing steps, the pipeline automatically avoids redundant computation.
Enabled by Default
Deduplication is enabled by default (deduplicate=True). Just run multiple configs to benefit.
How It Works
The system analyzes your configurations and identifies reusable features:
| Feature Family | Depends On | Independent Of |
|---|---|---|
| Morphology | Mask geometry (resample, binarize_mask, keep_largest_component) | Intensity values, filters, discretization |
| Intensity | Intensity preprocessing (resample, resegment, filter_outliers, filter) | Discretization |
| Texture / Histogram | All of the above plus discretization | — |
When configs share preprocessing but differ only in discretization:
- Morphology and intensity are computed once and reused
- Texture and histogram are computed per configuration
Checking Statistics
stats = pipeline.deduplication_stats
print(f"Cache hit rate: {stats['cache_hit_rate']:.1%}")
print(f"Reused: {stats['reused_families']} families")
print(f"Computed: {stats['computed_families']} families")
Results Are Always Complete
When deduplication reuses features, they are deep copied into each configuration's results. Every config returns a complete feature set — no missing values.
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
deduplicate |
bool |
True |
Enable/disable deduplication |
deduplication_rules |
str or DeduplicationRules |
"1.0.0" |
Rules version for reproducibility |
API Reference
For detailed documentation of ConfigurationAnalyzer, DeduplicationPlan, PreprocessingSignature, and DeduplicationRules, see the Deduplication API reference.
Logging
The pipeline maintains a detailed log of every step executed, including parameters and errors.
# Save log after running
pipeline.save_log("pipeline_execution_log.json")
# Clear log between runs
pipeline.clear_log()
The log file contains:
- Timestamp and subject ID
- Configuration name
- Source mode and sentinel detection status
- List of executed steps with parameters
- Status of each step
Examples
Standard Suite (Fast Baseline)
Run all 6 built-in configurations:
from pictologics import RadiomicsPipeline
pipeline = RadiomicsPipeline()
results = pipeline.run(
image="path/to/image.nii.gz",
mask="path/to/mask.nii.gz",
config_names=["all_standard"],
)
Enable Spatial/Local Intensity Extras
cfg = [
{"step": "resample", "params": {"new_spacing": (0.5, 0.5, 0.5)}},
{"step": "discretise", "params": {"method": "FBN", "n_bins": 32}},
{
"step": "extract_features",
"params": {
"families": ["intensity", "morphology", "texture", "histogram", "ivh"],
"include_spatial_intensity": True, # Moran's I / Geary's C
"include_local_intensity": True, # Local intensity peaks
},
},
]
pipeline = RadiomicsPipeline().add_config("with_extras", cfg)
results = pipeline.run("image.nii.gz", "mask.nii.gz", config_names=["with_extras"])
IVH with Physical-Unit Mapping
cfg = [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "discretise", "params": {"method": "FBS", "bin_width": 25.0, "min_val": -1000}},
{
"step": "extract_features",
"params": {
"families": ["ivh"],
"ivh_params": {"bin_width": 25.0, "min_val": -1000, "target_range_max": 400},
},
},
]
Custom CT Pipeline
custom_config = [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
{"step": "resegment", "params": {"range_min": -150, "range_max": 250}},
{"step": "discretise", "params": {"method": "FBN", "n_bins": 64}},
{"step": "extract_features", "params": {
"families": ["intensity", "morphology", "texture", "histogram", "ivh"]
}},
]
pipeline = RadiomicsPipeline().add_config("my_custom_ct", custom_config)
results = pipeline.run(image, mask, config_names=["my_custom_ct"])
LoG Filtered Features
log_config = [
{"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0), "interpolation": "cubic"}},
{"step": "round_intensities", "params": {}},
{"step": "resegment", "params": {"range_min": -1000, "range_max": 400}},
{"step": "filter", "params": {"type": "log", "sigma_mm": 1.5, "truncate": 4.0}},
{"step": "extract_features", "params": {"families": ["intensity", "morphology", "histogram"]}},
]
Manual Step-by-Step Extraction
If you need granular control, call features directly without the pipeline:
import numpy as np
from pictologics import load_image
from pictologics.preprocessing import (
resample_image, resegment_mask, filter_outliers,
discretise_image, apply_mask
)
from pictologics.features.intensity import calculate_intensity_features
from pictologics.features.morphology import calculate_morphology_features
from pictologics.features.texture import calculate_all_texture_features
# Load and preprocess
image = load_image("image.nii.gz")
mask = load_image("mask.nii.gz")
image = resample_image(image, new_spacing=(1.0, 1.0, 1.0))
mask = resample_image(mask, new_spacing=(1.0, 1.0, 1.0), interpolation="nearest")
mask = resegment_mask(image, mask, range_min=-1000, range_max=400)
# Discretise for texture
disc_image = discretise_image(image, method="FBN", n_bins=32, roi_mask=mask)
# Extract features
morph = calculate_morphology_features(mask, image=image, intensity_mask=mask)
intensity = calculate_intensity_features(apply_mask(image, mask))
texture = calculate_all_texture_features(disc_image.array, mask.array, n_bins=32)
all_features = {**morph, **intensity, **texture}
print(f"Extracted {len(all_features)} features")
Use the Pipeline Instead
The RadiomicsPipeline accomplishes the same workflow with automatic image routing, logging,
deduplication, and configuration export. Manual extraction is mainly useful for debugging
or understanding the underlying process.
Performance Tips
- Spatial/local intensity can be extremely slow on large ROIs. Keep them disabled unless needed.
- Texture requires discretisation. Without a
discretisestep, the pipeline raises an error. - For large 3D images, consider coarser spacing for exploratory work.
- For CT in Hounsfield Units, FBS (
bin_width) is often more interpretable; for MRI/PET, FBN (n_bins) may be preferable.