Developing Explainable AI for Spine MRI: Key Decisions and Trade-offs

Background

The task of interpreting spine MRIs, especially for lumbar spinal stenosis (LSS), presents significant challenges due to its time-consuming nature, the need for specialist expertise, and significant variability in interpretation between radiologists. The core problem was to reduce this variability through an AI system that provides standardized and reproducible analyses without altering existing clinical workflows.

What We Tried (and Why)

Our initial strategy involved creating a structured AI pipeline with three key stages: segmentation, binary classification, and severity assessment. The segmentation stage utilized a U-Net model to identify anatomical structures, establishing a baseline for the MRI scans. The binary classification stage used a multi-label classifier to detect stenosis in specific areas, while the severity assessment stage applied another model to evaluate the severity of detected stenosis, leveraging both segmentation masks and the original axial images. These steps were essential to facilitate a systematic approach and ensure the AI solutions integrated seamlessly into the radiologists’ existing practices.

What Broke or Didn’t Work

Despite the comprehensive approach, one challenge was achieving a balance between speed and transparency. The addition of Grad-CAM visualizations, while crucial for explainability, increased the inference time significantly. This posed a risk to workflow efficiency, highlighting the delicate balance between speed and comprehensibility.

📌 Takeaway: Balancing transparency and efficiency is critical in clinical AI deployment.

The Shift We Made

To resolve this, we introduced a dual-mode operation: a “quick read” with minimized visual outputs for fast decisions and a full explainable AI (XAI) mode for more complex or uncertain scenarios. This adjustment allowed us to maintain crucial performance metrics and client trust, while accommodating different clinical priorities.

What Worked (and What Still Doesn’t)

Our methods achieved strong AUROC performance in central stenosis classification, highly accurate measurements, and significantly increased clinician preference for our tools over baseline systems. The adoption of visual explanations played a pivotal role in fostering trust and increasing daily usage. However, the latency issue during high-volume workflows still needed optimization.

Tradeoffs and Strategic Decisions

Option A	Option B
Full XAI mode for all cases	Dual-mode with quick read option
Transparency at the cost of speed	Balance speed and explainability
Ensures maximum clinician trust	Provides flexibility to clinicians

Open Questions We’re Still Exploring

How much explainability is necessary in clinical AI?
When does the need for visual trust outweigh the need for speed?
How do we adequately measure and optimize trust within AI tools?

If You’re Solving Something Similar…

We welcome collaboration with others tackling similar issues in medical AI. There’s much to explore in balancing AI explainability with practical clinical utility.

Contact: hello@cadabra.studio
More at: https://cadabra.studio

At Cadabra Studio, we believe we can reframe software delivery from the ground up, where every decision, tool, and interaction is guided by contextual intelligence.

👉 Read more in our extended article on Medium.
👉 Explore applied insights on Cadabra Insights (Notion).