Resilience Overview
Resilience in NPipeline refers to the ability of your data pipelines to detect, handle, and recover from failures without complete system breakdown. This section provides a comprehensive guide to building robust, fault-tolerant pipelines.
⚡ Quick Start: Node Restart
If you want to enable node restarts, start here: Node Restart Quick Start Checklist
Node restart requires three mandatory configuration steps. Missing any one causes silent failures. The quickstart guide is the single canonical source of truth for configuring all three prerequisites correctly.
Why Resilience Matters
In production environments, pipelines inevitably encounter failures from various sources:
- Transient infrastructure issues: Network timeouts, database connection failures
- Data quality problems: Invalid formats, missing values, unexpected data types
- Resource constraints: Memory pressure, CPU saturation, I/O bottlenecks
- External service dependencies: API rate limits, service outages, authentication failures
Without proper resilience mechanisms, these failures can cascade through your pipeline, causing data loss, system instability, and costly manual intervention.
Resilience Strategy Comparison
| Strategy | Best For | Memory Requirements | Complexity | Key Benefits |
|---|---|---|---|---|
| Simple Retry | Transient failures (network timeouts, temporary service issues) | Low | Low | Quick recovery from temporary issues |
| Node Restart | Persistent node failures, resource exhaustion | Medium (requires materialization) | Medium | Complete recovery from node-level failures |
| Circuit Breaker | Protecting against cascading failures, external service dependencies | Low | Medium | Prevents system overload during outages |
| Dead-Letter Queues | Handling problematic items that can't be processed | Low | High | Preserves problematic data for manual review |
| Combined Approach | Production systems with multiple failure types | High | High | Comprehensive protection against all failure types |
Choosing the Right Strategy
- For simple pipelines with basic needs: Start with Simple Retry
- For streaming data processing: Use Node Restart with materialization
- For external service dependencies: Add Circuit Breaker to prevent cascade failures
- For critical data pipelines: Implement Dead-Letter Queues to preserve failed items
- For production systems: Combine multiple strategies for comprehensive protection
Core Resilience Components
NPipeline's resilience framework is built around several interconnected components:
| Component | Role | Critical Dependency |
|---|---|---|
| ResilientExecutionStrategy | Wrapper that enables recovery capabilities for nodes | Prerequisite for all resilience features |
| Materialization & Buffering | Buffers input items to enable replay during restarts | Required for PipelineErrorDecision.RestartNode |
| Error Handling | Determines how to respond to different types of failures | Provides decision logic for recovery actions |
| Retry Options | Configures retry limits and materialization caps | Controls resilience behavior boundaries |
⚠️ Critical Prerequisites for Node Restart (RestartNode)
If you intend to use PipelineErrorDecision.RestartNode to recover from failures, read the Node Restart Quick Start Checklist first.
You must configure all three of the following mandatory prerequisites. The quickstart guide provides detailed step-by-step instructions for each requirement.
💡 Pro Tip: The NPipeline build-time analyzer (NP9002) detects incomplete resilience configurations at compile-time, preventing these silent failures. See Build-Time Resilience Analyzer for details.
Mandatory Requirements Summary
-
Requirement 1: ResilientExecutionStrategy
- The node must be wrapped with
ResilientExecutionStrategy - Without this: Restart decisions are ignored; node cannot recover
- See detailed instructions: Node Restart Quick Start Checklist
- The node must be wrapped with
-
Requirement 2: MaxNodeRestartAttempts Configuration
- Set
MaxNodeRestartAttempts > 0inPipelineRetryOptions - This enables the restart capability
- See detailed instructions: Node Restart Quick Start Checklist
- Set
-
Requirement 3: MaxMaterializedItems Configuration
- Set
MaxMaterializedItems > 0inPipelineRetryOptions(for streaming inputs) - This enables the input stream to be buffered/materialized for replay
- Critical: Without this, even if RestartNode is requested, the pipeline will fall back to
FailPipeline - See detailed instructions: Node Restart Quick Start Checklist
- Set
What Happens If You Miss These
| Missing Component | What Goes Wrong | Observable Behavior |
|---|---|---|
| ResilientExecutionStrategy | Restart capability disabled | Error handler decisions are ignored; pipeline always fails |
| MaxMaterializedItems | Input stream not buffered | RestartNode falls back to FailPipeline; entire pipeline halts unexpectedly |
| Error Handler RestartNode | Restart never triggered | All errors result in pipeline failure, even recoverable ones |
Example of Silent Failure:
// ❌ WRONG: Missing materialization
var options = new PipelineRetryOptions(
MaxItemRetries: 3,
MaxNodeRestartAttempts: 2,
MaxMaterializedItems: null // ← This is the problem!
);
// Developer expects RestartNode to work, but...
// When an error occurs and handler returns RestartNode:
// → Pipeline sees MaxMaterializedItems is not set
// → Falls back to FailPipeline
// → Entire pipeline halts (unexpected failure!)
For complete configuration examples and detailed explanations, see the Node Restart Quick Start Checklist.
The Dependency Chain
Understanding the dependency relationships between resilience components is crucial for proper configuration:
Figure: The dependency chain showing how resilience components must be configured in the correct sequence.
Critical Dependency Rules
- ResilientExecutionStrategy is mandatory: All resilience features require this strategy to be applied to a node
- Materialization enables restarts:
PipelineErrorDecision.RestartNodeonly works if the input stream is materialized viaMaxMaterializedItems - Buffer size matters: The
MaxMaterializedItemsvalue determines how many items can be replayed during a restart - Streaming inputs need materialization: Only streaming inputs require explicit materialization; already-buffered inputs work automatically
Decision Flow for Choosing Resilience Strategies
Use this flow diagram to determine the appropriate resilience configuration for your use case:
Key Scenarios
Scenario 1: Simple Retry Logic
For handling transient failures without node restarts:
- Apply
ResilientExecutionStrategy - Configure
NodeErrorDecision.RetryorNodeErrorDecision.Skip - No materialization required
Scenario 2: Node Restart Capability
For recovering from node-level failures:
- Apply
ResilientExecutionStrategy - Configure
PipelineErrorDecision.RestartNode - Set
MaxMaterializedItemsto enable replay (for streaming inputs) - See detailed configuration: Node Restart Quick Start Checklist
Scenario 3: Memory-Constrained Environment
For systems with limited memory:
- Apply
ResilientExecutionStrategy - Set
MaxMaterializedItemsto a conservative value - Monitor for buffer overflow exceptions
- Consider alternative recovery strategies
Next Steps
- Node Restart Quick Start Checklist: Complete step-by-step guide for configuring node restarts
- Retry Delay Quickstart: Get started quickly with common retry patterns and recommended configurations
- Resilient Execution Strategy: Learn how to wrap execution strategies with error handling
- Materialization and Buffering: Understand how buffering enables replay functionality
- Dependency Chains: Explore the critical prerequisite relationships in detail
- Configuration Guide: Get practical implementation guidance with code examples
- Circuit Breaker Advanced Configuration: Learn when to tune circuit breaker memory cleanup and how defaults behave
- Troubleshooting: Diagnose and resolve common resilience issues