Configuration & Reproducibility

Pictologics provides a comprehensive configuration management system designed for reproducible radiomics research. This guide covers the standard configurations, configuration file formats, and tools for sharing and managing pipeline configurations.

Overview

The configuration system enables you to:

Use standard configurations – Pre-tested, IBSI-compliant setups for common radiomics workflows
Export and import configurations – Save your pipeline settings to YAML/JSON files for version control
Share configurations – Collaborate by exchanging configuration files with colleagues
Ensure reproducibility – Schema versioning ensures configurations remain compatible across versions

Using Standard Configurations

Pictologics includes 6 standard configurations optimized for radiomics feature extraction. All standard configurations share these characteristics:

Isotropic resampling to 0.5mm × 0.5mm × 0.5mm
Cubic interpolation for images, nearest-neighbor for masks
Complete feature extraction: intensity, morphology, texture, histogram, and IVH
Performance-optimized: spatial and local intensity features disabled by default

Running a Single Configuration

from pictologics import RadiomicsPipeline

pipeline = RadiomicsPipeline()
results = pipeline.run(image, mask, config_names=["standard_fbn_32"])

Running All Standard Configurations

Process all 6 standard configurations in a single call using the special "all_standard" shorthand:

from pictologics import RadiomicsPipeline

pipeline = RadiomicsPipeline()
all_results = pipeline.run(image, mask, config_names=["all_standard"], subject_id="patient_001")

Running Multiple Specific Configurations

from pictologics import RadiomicsPipeline

pipeline = RadiomicsPipeline()
results = pipeline.run(
    image,
    mask,
    config_names=["standard_fbn_16", "standard_fbn_32", "standard_fbs_16"],
    subject_id="patient_001",
)

Accessing Results

Results are returned as a dictionary mapping configuration names to pandas.Series objects:

# Access features for a specific configuration
features = results["standard_fbn_32"]
print(features["mean_intensity_Q4LE"])  # Access a single feature by name

# Iterate over all configurations
for config_name, series in results.items():
    print(f"{config_name}: {len(series)} features")

# Convert to a pandas DataFrame (one row per configuration)
import pandas as pd
df = pd.DataFrame(results).T

Configuration Specifications

Fixed Bin Number (FBN) Configurations

FBN discretisation divides the intensity range into a fixed number of bins, regardless of the actual intensity values. This approach is useful when you want consistent bin counts across different images.

Configuration	Bins	Use Case
`standard_fbn_8`	8	Low-resolution texture analysis, small ROIs
`standard_fbn_16`	16	Balanced resolution and noise robustness
`standard_fbn_32`	32	High-resolution texture analysis (recommended default)

Full Specification: `standard_fbn_32`

standard_fbn_32:
  description: "Standard FBN-32: 0.5mm isotropic resampling, 32 fixed bins"
  steps:
    - step: resample
      params:
        new_spacing: [0.5, 0.5, 0.5]
        interpolation: cubic
    - step: discretise
      params:
        method: FBN
        n_bins: 32
    - step: extract_features
      params:
        families:
          - intensity
          - morphology
          - texture
          - histogram
          - ivh
        include_spatial_intensity: false
        include_local_intensity: false

Fixed Bin Size (FBS) Configurations

FBS discretisation uses a fixed bin width (in Hounsfield Units for CT), which preserves the physical meaning of intensity values. This is preferred when comparing across studies or when absolute intensity values are clinically meaningful.

Configuration	Bin Width	Use Case
`standard_fbs_8`	8.0 HU	High-resolution intensity preservation
`standard_fbs_16`	16.0 HU	Balanced resolution (recommended for CT)
`standard_fbs_32`	32.0 HU	Noise-robust analysis

Full Specification: `standard_fbs_16`

standard_fbs_16:
  description: "Standard FBS-16: 0.5mm isotropic resampling, 16.0 HU bin width"
  steps:
    - step: resample
      params:
        new_spacing: [0.5, 0.5, 0.5]
        interpolation: cubic
    - step: discretise
      params:
        method: FBS
        bin_width: 16.0
    - step: extract_features
      params:
        families:
          - intensity
          - morphology
          - texture
          - histogram
          - ivh
        include_spatial_intensity: false
        include_local_intensity: false

Choosing the Right Configuration

FBN vs FBS: Decision Guide

Factor	FBN	FBS
Intensity range	Variable (adapts to image)	Fixed (preserves HU meaning)
Cross-study comparison	Less suitable	Preferred
Small ROIs	Better (ensures bin coverage)	May have empty bins
CT imaging	Acceptable	Recommended
MRI imaging	Recommended	Less suitable (no standard units)
IBSI recommendation	Supported	Supported

Performance Considerations

The standard configurations have include_spatial_intensity and include_local_intensity set to false for performance:

Spatial intensity features: Require distance calculations to ROI boundary (computationally expensive)
Local intensity features: Require local neighborhood analysis

To enable these features, create a custom configuration variant:

from pictologics import RadiomicsPipeline

pipeline = RadiomicsPipeline()

# Get the standard config and modify it
config = pipeline.get_config("standard_fbn_32")

# Find the extract_features step and enable spatial intensity
for step in config:
    if step["step"] == "extract_features":
        step["params"]["include_spatial_intensity"] = True
        step["params"]["include_local_intensity"] = True

# Add as a new configuration
pipeline.add_config("fbn_32_with_spatial", config)

Configuration Files

Pictologics supports YAML and JSON formats for configuration files. YAML is recommended for human readability.

YAML Format Specification

schema_version: "1.0"
exported_at: "2026-01-31T12:00:00.000000"
configs:
  my_custom_config:
    - step: resample
      params:
        new_spacing: [1.0, 1.0, 1.0]
    - step: discretise
      params:
        method: FBN
        n_bins: 32
    - step: extract_features
      params:
        families:
          - intensity
          - morphology
          - texture

JSON Format Specification

{
  "schema_version": "1.0",
  "exported_at": "2026-01-31T12:00:00.000000",
  "configs": {
    "my_custom_config": [
      {
        "step": "resample",
        "params": {"new_spacing": [1.0, 1.0, 1.0]}
      },
      {
        "step": "discretise",
        "params": {"method": "FBN", "n_bins": 32}
      },
      {
        "step": "extract_features",
        "params": {"families": ["intensity", "morphology", "texture"]}
      }
    ]
  }
}

Schema Versioning

Configuration files include a schema_version field to ensure forward compatibility:

Current version: 1.0
Files without a version are treated as version 1.0
Future versions will include automatic migration when loading older configs

Exporting Configurations

Save your configurations to share with collaborators or for version control:

from pictologics import RadiomicsPipeline

pipeline = RadiomicsPipeline()

# Add custom configurations
pipeline.add_config("my_study_config", [
    {"step": "resample", "params": {"new_spacing": (1.0, 1.0, 1.0)}},
    {"step": "discretise", "params": {"method": "FBN", "n_bins": 32}},
    {"step": "extract_features", "params": {"families": ["intensity", "morphology", "texture"]}},
])

# Export to YAML (recommended for readability)
pipeline.save_configs("my_configs.yaml")

# Export to JSON
pipeline.save_configs("my_configs.json")

# Export specific configs only
pipeline.save_configs("study_config.yaml", config_names=["my_study_config"])

Importing Configurations

Load configurations from files. The resulting pipeline contains only the configurations defined in the file:

from pictologics import RadiomicsPipeline

# Load from YAML file (only file configs, no standard configs)
pipeline = RadiomicsPipeline.load_configs("my_configs.yaml")

# Load from JSON file
pipeline = RadiomicsPipeline.load_configs("my_configs.json")

# Load with validation (logs warnings for unknown parameters)
pipeline = RadiomicsPipeline.load_configs("my_configs.yaml", validate=True)

# Load file configs AND include standard configs
pipeline = RadiomicsPipeline.load_configs("my_configs.yaml", load_standard=True)

Standard Configurations

By default, load_configs(), from_yaml(), from_json(), and from_dict() create a pipeline with only the provided configurations. Standard configurations (e.g., standard_fbn_32) are not included unless you pass load_standard=True.

If you need both your file configs and standard configs, you have two options:

Pass load_standard=True to any loading method
Load your file, then merge with a standard pipeline:

loaded = RadiomicsPipeline.load_configs("my_configs.yaml")
loaded.merge_configs(RadiomicsPipeline())  # Add standard configs

String-based Export/Import

For integration with databases or web services:

from pictologics import RadiomicsPipeline

pipeline = RadiomicsPipeline()
pipeline.add_config("my_config", [...])

# Export to string
yaml_string = pipeline.to_yaml()
json_string = pipeline.to_json()

# Import from string
pipeline2 = RadiomicsPipeline.from_yaml(yaml_string)
pipeline3 = RadiomicsPipeline.from_json(json_string)

Merging Configurations

Combine configurations from multiple sources:

from pictologics import RadiomicsPipeline

# Load configs from different sources
pipeline1 = RadiomicsPipeline.load_configs("team_a_configs.yaml")
pipeline2 = RadiomicsPipeline.load_configs("team_b_configs.yaml")

# Merge into pipeline1
pipeline1.merge_configs(pipeline2)

# Handle duplicates explicitly
pipeline1.merge_configs(pipeline2, overwrite=True)  # Overwrite existing
pipeline1.merge_configs(pipeline2, overwrite=False)  # Keep existing (default)

Configuration Validation

When loading configurations, enable validation to catch potential issues:

# Validation logs warnings for:
# - Unknown step types
# - Unknown parameters for known steps
# - Missing required parameters

pipeline = RadiomicsPipeline.load_configs("config.yaml", validate=True)

Template System (Advanced)

For programmatic access to configuration templates, Pictologics provides a template loading API.

Loading Templates

from pictologics.templates import (
    list_template_files,
    load_template_file,
    get_standard_templates,
    get_all_templates,
    get_template_metadata,
)

# List available template files
files = list_template_files()
# ['standard_configs.yaml']

# Load all standard configurations
standard_configs = get_standard_templates()
# {'standard_fbn_8': [...], 'standard_fbn_16': [...], ...}

# Get metadata from a template file
metadata = get_template_metadata("standard_configs.yaml")
# {'schema_version': '1.0', 'description': '...', 'config_count': 6}

# Load a specific template file
all_configs = load_template_file("standard_configs.yaml")

Creating Custom Template Files

You can create your own template files and load them:

from pictologics import RadiomicsPipeline
import yaml

# Define your organization's standard configs
org_configs = {
    "schema_version": "1.0",
    "description": "Organization standard configurations",
    "configs": {
        "org_standard_ct": [
            {"step": "resample", "params": {"new_spacing": [0.5, 0.5, 0.5]}},
            {"step": "discretise", "params": {"method": "FBS", "bin_width": 25.0}},
            {"step": "extract_features", "params": {"families": ["intensity", "morphology", "texture"]}},
        ],
        "org_standard_pet": [
            {"step": "resample", "params": {"new_spacing": [2.0, 2.0, 2.0]}},
            {"step": "discretise", "params": {"method": "FBN", "n_bins": 64}},
            {"step": "extract_features", "params": {"families": ["intensity", "texture"]}},
        ],
    }
}

# Save to file
with open("org_configs.yaml", "w") as f:
    yaml.dump(org_configs, f)

# Load into pipeline
pipeline = RadiomicsPipeline.load_configs("org_configs.yaml")

Best Practices for Reproducibility

1. Version Control Your Configurations

Always include configuration files in your version control system:

# Add to git
git add configs/study_configs.yaml
git commit -m "Add radiomics configuration for study XYZ"

2. Document Configuration Choices

Include comments in your YAML files explaining parameter choices:

schema_version: "1.0"
description: |
  Configurations for lung nodule analysis study.
  FBS-25 chosen based on literature recommendation for CT texture analysis.
  0.5mm resampling matches thin-slice CT acquisition protocol.
configs:
  lung_nodule_primary:
    - step: resample
      params:
        new_spacing: [0.5, 0.5, 0.5]  # Match acquisition protocol
    # ... rest of config

3. Export Before Major Changes

Before modifying your pipeline, export the current state:

pipeline.save_configs("configs_backup_2026-01-31.yaml")

# Always validate configs from external sources
pipeline = RadiomicsPipeline.load_configs("collaborator_config.yaml", validate=True)

5. Include Schema Version in Publications

When publishing radiomics research, report:

Pictologics version
Configuration file (as supplementary material)
Schema version used

End-to-End Example: Multi-Site Radiomics Study

This section provides a complete, real-world workflow demonstrating how to create, save, load, and apply configurations across a multi-site radiomics study. The scenario involves:

Site A (Primary): Creates and validates the study configuration
Site B (Collaborator): Receives and applies the same configuration
Both sites: Process their local cohorts with identical settings

Step 1: Design the Study Configuration (Site A)

Site A designs a custom configuration tailored to their lung nodule CT analysis study:

from pictologics import RadiomicsPipeline

# Initialize pipeline
pipeline = RadiomicsPipeline()

# Define a study-specific configuration
# This config is designed for thin-slice chest CT with lung nodule segmentations
lung_nodule_config = [
    # Step 1: Resample to 0.5mm isotropic (matches thin-slice CT)
    {
        "step": "resample",
        "params": {
            "new_spacing": [0.5, 0.5, 0.5],
            "interpolation": "cubic",  # Cubic spline for smooth interpolation
        }
    },
    # Step 2: Resegment to lung window and remove outliers
    {
        "step": "resegment",
        "params": {
            "range_min": -1000,  # Air
            "range_max": 400,    # Soft tissue upper limit
        }
    },
    # Step 3: Keep only the largest connected component (remove satellite lesions)
    {
        "step": "keep_largest_component",
        "params": {}
    },
    # Step 4: Discretise using Fixed Bin Size (preserves HU meaning)
    {
        "step": "discretise",
        "params": {
            "method": "FBS",
            "bin_width": 25.0,  # 25 HU bins (common for CT texture)
        }
    },
    # Step 5: Extract all feature families
    {
        "step": "extract_features",
        "params": {
            "families": ["intensity", "morphology", "texture", "histogram", "ivh"],
            "include_spatial_intensity": False,  # Skip for performance
            "include_local_intensity": False,
        }
    },
]

# Add the configuration to the pipeline
pipeline.add_config("lung_nodule_fbs25", lung_nodule_config)

# Also add a variant with different bin width for sensitivity analysis
lung_nodule_fbs50 = lung_nodule_config.copy()
lung_nodule_fbs50[3] = {
    "step": "discretise",
    "params": {"method": "FBS", "bin_width": 50.0}
}
pipeline.add_config("lung_nodule_fbs50", lung_nodule_fbs50)

# Verify the configurations are registered
print("Available configurations:", pipeline.list_configs())

Step 2: Test the Configuration Locally (Site A)

Before sharing, validate the configuration on a sample case:

from pictologics import load_image

# Load a test case
image = load_image("test_data/ct_scan.nii.gz")
mask = load_image("test_data/nodule_segmentation.nii.gz")

# Run the primary configuration
results = pipeline.run(
    image=image,
    mask=mask,
    subject_id="test_case_001",
    config_names=["lung_nodule_fbs25"],
)

# Inspect results
config_features = results["lung_nodule_fbs25"]
print(f"Extracted {len(config_features)} features")
print(f"Sample features:")
print(f"  - Volume: {config_features.get('volume_mesh_ml_HTUR', 'N/A'):.2f} mm³")
print(f"  - Mean intensity: {config_features.get('mean_intensity_Q4LE', 'N/A'):.2f} HU")

# Save execution log for audit
pipeline.save_log("logs/test_run_001.json")

Save the configuration to a file that can be shared with collaborators:

# Export to YAML (human-readable, good for version control)
pipeline.save_configs(
    "configs/lung_nodule_study_v1.yaml",
    config_names=["lung_nodule_fbs25", "lung_nodule_fbs50"]
)

print("Configuration exported successfully!")
print("Share 'configs/lung_nodule_study_v1.yaml' with Site B")

The exported YAML file will look like this:

schema_version: "1.0"
exported_at: "2026-01-31T10:30:00.000000"
configs:
  lung_nodule_fbs25:
    - step: resample
      params:
        new_spacing: [0.5, 0.5, 0.5]
        interpolation: cubic
    - step: resegment
      params:
        range_min: -1000
        range_max: 400
    - step: keep_largest_component
      params: {}
    - step: discretise
      params:
        method: FBS
        bin_width: 25.0
    - step: extract_features
      params:
        families: [intensity, morphology, texture, histogram, ivh]
        include_spatial_intensity: false
        include_local_intensity: false
  lung_nodule_fbs50:
    # ... similar structure with bin_width: 50.0

Step 4: Load and Apply Configuration (Site B)

Site B receives the configuration file and processes their cohort:

from pathlib import Path
from pictologics import RadiomicsPipeline, load_image
from pictologics.results import format_results, save_results

# Load the shared configuration (with validation)
pipeline = RadiomicsPipeline.load_configs(
    "configs/lung_nodule_study_v1.yaml",
    validate=True  # Logs warnings for any issues
)

# Verify loaded configurations
print("Loaded configurations:", pipeline.list_configs())

# Define the local data directory
data_dir = Path("site_b_data/")
output_dir = Path("site_b_results/")
output_dir.mkdir(exist_ok=True)

# Process all cases
all_results = []
for case_folder in sorted(data_dir.glob("patient_*")):
    patient_id = case_folder.name

    # Load image and mask
    image = load_image(case_folder / "ct.nii.gz")
    mask = load_image(case_folder / "nodule.nii.gz")

    # Run the primary configuration
    results = pipeline.run(
        image=image,
        mask=mask,
        subject_id=patient_id,
        config_names=["lung_nodule_fbs25"],
    )

    # Format for CSV export
    row = format_results(
        results,
        fmt="wide",
        meta={"subject_id": patient_id},
    )
    all_results.append(row)

    config_features = results["lung_nodule_fbs25"]
    print(f"Processed {patient_id}: {len(config_features)} features")

# Save all results to CSV
save_results(all_results, output_dir / "site_b_features.csv")
print(f"Results saved to {output_dir / 'site_b_features.csv'}")

Step 5: Combine Results from Both Sites

After both sites complete processing, merge the results:

import pandas as pd

# Load results from both sites
site_a_df = pd.read_csv("site_a_results/site_a_features.csv")
site_b_df = pd.read_csv("site_b_results/site_b_features.csv")

# Add site identifier
site_a_df["site"] = "A"
site_b_df["site"] = "B"

# Combine into a single dataset
combined_df = pd.concat([site_a_df, site_b_df], ignore_index=True)

# Verify consistent feature extraction
print(f"Total patients: {len(combined_df)}")
print(f"Features per patient: {len(combined_df.columns) - 2}")  # Exclude id and site
print(f"Site A: {len(site_a_df)} patients")
print(f"Site B: {len(site_b_df)} patients")

# Save combined dataset
combined_df.to_csv("combined_study_features.csv", index=False)

Step 6: Archive Configuration with Study Data

For long-term reproducibility, archive the configuration alongside results:

from datetime import datetime
import shutil

# Create study archive
archive_dir = Path(f"study_archive_{datetime.now().strftime('%Y%m%d')}")
archive_dir.mkdir(exist_ok=True)

# Copy configuration
shutil.copy("configs/lung_nodule_study_v1.yaml", archive_dir / "configuration.yaml")

# Copy results
shutil.copy("combined_study_features.csv", archive_dir / "features.csv")

# Export pipeline logs
pipeline.save_log(archive_dir / "processing_log.json")

# Create a README with study metadata
readme = f"""
# Lung Nodule Radiomics Study
Date: {datetime.now().isoformat()}
Pictologics Version: 1.0.0
Configuration Schema: 1.0

## Configurations Used
- lung_nodule_fbs25: Primary analysis (25 HU bin width)
- lung_nodule_fbs50: Sensitivity analysis (50 HU bin width)

## Sites
- Site A: {len(site_a_df)} patients
- Site B: {len(site_b_df)} patients

## Files
- configuration.yaml: Pipeline configuration (shareable)
- features.csv: Extracted radiomic features
- processing_log.json: Execution audit log
"""

with open(archive_dir / "README.md", "w") as f:
    f.write(readme)

print(f"Study archive created: {archive_dir}/")

Key Takeaways

This workflow demonstrates several important practices:

Practice	Benefit
Custom configurations	Tailored to specific imaging protocol and clinical question
YAML export	Human-readable, version-controllable, shareable
Validation on load	Catches configuration issues early
Consistent processing	Both sites use identical preprocessing and extraction
Audit logging	Complete record of processing steps
Archival	Long-term reproducibility for publications

Reproducibility Achieved

By sharing the YAML configuration file, both sites process their data with identical settings, ensuring that any differences in extracted features reflect true biological variation rather than methodological inconsistencies.

Quick Reference

Task	Method
Run standard config	`pipeline.run(image, mask, config_names=["standard_fbn_32"])`
Run all standard configs	`pipeline.run(image, mask, config_names=["all_standard"])`
List available configs	`pipeline.list_configs()`
Get config details	`pipeline.get_config("config_name")`
Add custom config	`pipeline.add_config("name", steps)`
Remove config	`pipeline.remove_config("name")`
Export to YAML	`pipeline.save_configs("file.yaml")`
Export to JSON	`pipeline.save_configs("file.json")`
Import from file	`RadiomicsPipeline.load_configs("file.yaml")`
Import with standard configs	`RadiomicsPipeline.load_configs("file.yaml", load_standard=True)`
Merge configs	`pipeline.merge_configs(other_pipeline)`
Export to string	`pipeline.to_yaml()` / `pipeline.to_json()`
Import from string	`RadiomicsPipeline.from_yaml(s)` / `RadiomicsPipeline.from_json(s)`