Skip to main content

Data Cleansing Nodes

Cleansing nodes normalize and clean data properties. They perform in-place transformations without changing the object's type or structure.

String Cleansing

Clean and normalize text data:

builder.AddStringCleansing<User>(x => x.Email)
.Trim()
.ToLower();

Available Operations

OperationPurposeExample
Trim()Remove leading/trailing whitespace" hello ""hello"
TrimStart()Remove leading whitespace" hello""hello"
TrimEnd()Remove trailing whitespace"hello ""hello"
CollapseWhitespace()Collapse multiple spaces"hello world""hello world"
RemoveWhitespace()Remove all whitespace"hello world""helloworld"
ToLower()Convert to lowercase"Hello""hello"
ToUpper()Convert to uppercase"hello""HELLO"
ToTitleCase()Convert to title case"hello world""Hello World"
ToCamelCase()Convert to camelCase"hello_world""helloWorld"
ToPascalCase()Convert to PascalCase"hello_world""HelloWorld"
ToKebabCase()Convert to kebab-case"helloWorld""hello-world"
RemoveSpecialCharacters()Remove non-alphanumeric"hello@world!""helloworld"
RemoveDigits()Remove numeric characters"hello123""hello"
RemoveNonAscii()Remove non-ASCII characters"café""caf"
Truncate(length)Truncate to max length"hello world""hello" (5)
EnsurePrefix(prefix)Add prefix if missing"world""hello world"
EnsureSuffix(suffix)Add suffix if missing"hello""hello world"
Replace(old, new)Replace substring"hello""hallo"
StripDiacritics()Remove accent marks"café""cafe"
DefaultIfNullOrWhitespace(default)Use default for empty"""N/A"

Examples

// Email normalization
builder.AddStringCleansing<User>(x => x.Email)
.Trim()
.ToLower()
.DefaultIfNullOrWhitespace("no-email@example.com");

// Name normalization
builder.AddStringCleansing<Person>(x => x.FirstName)
.Trim()
.ToTitleCase();

// Username cleanup
builder.AddStringCleansing<Account>(x => x.Username)
.Trim()
.ToLower()
.RemoveSpecialCharacters();

// Text sanitization
builder.AddStringCleansing<Document>(x => x.Title)
.Trim()
.RemoveNonAscii()
.Truncate(100);

Numeric Cleansing

Clean and normalize numeric data:

builder.AddNumericCleansing<Order>(x => x.Discount)
.Clamp(0, 100)
.Round(2);

Available Operations

OperationTypesExample
Round(digits)double, decimal3.141593.14
Floor()double3.93.0
Ceiling()double3.14.0
Clamp(min, max)all numeric150100 (clamped to max)
Clamp(nullable, min, max)nullable numericclamps while preserving null
Min(minValue)int, double, decimalconvenience method: Clamp(minValue, max)
Max(maxValue)int, double, decimalconvenience method: Clamp(min, maxValue)
AbsoluteValue()double, decimal-5.55.5
Scale(factor)decimal10m × 2.5m25m
DefaultIfNull(default)all nullablenull0
ToZeroIfNegative()double, decimal-5.50

Note: Type-specific methods are inferred from parameter types. For example, Round() works with both double and decimal properties through method overloading.

Examples

// Price normalization
builder.AddNumericCleansing<Product>(x => x.Price)
.Clamp(0, decimal.MaxValue)
.Round(2);

// Discount clamping
builder.AddNumericCleansing<Order>(x => x.Discount)
.Clamp(0, 100);

// Percentage cleanup
builder.AddNumericCleansing<Survey>(x => x.CompletionRate)
.Clamp(0, 100)
.Round(1);

// Age normalization
builder.AddNumericCleansing<Person>(x => x.Age)
.Clamp(0, 150);

// Absolute value conversion
builder.AddNumericCleansing<Measurement>(x => x.Value)
.AbsoluteValue();

// Negative value handling
builder.AddNumericCleansing<Transaction>(x => x.Amount)
.ToZeroIfNegative();

Numeric Constraints with Min/Max

Use Min/Max helper methods for single-bound constraints:

// Ensure age is at least 0 (cleaner than Clamp(0, int.MaxValue))
builder.AddNumericCleansing<Person>(x => x.Age)
.Min(0);

// Ensure quantity doesn't exceed 10000
builder.AddNumericCleansing<Order>(x => x.Quantity)
.Max(10000);

// Ensure price is at least 0.01
builder.AddNumericCleansing<Product>(x => x.Price)
.Min(0.01m);

// Clamp only the upper bound (discount can't exceed 100%)
builder.AddNumericCleansing<Order>(x => x.DiscountPercent)
.Max(100);

// Nullable numeric with minimum constraint
builder.AddNumericCleansing<Item>(x => x.OptionalStock)
.Min(0); // null values pass through, non-null values enforced >= 0

DateTime Cleansing

Clean and normalize date/time data:

builder.AddDateTimeCleansing<Event>(x => x.StartTime)
.SpecifyKind(DateTimeKind.Utc)
.ToUtc();

Available Operations

OperationPurposeSupports Nullable
SpecifyKind(kind)Set DateTimeKindDateTime only
ToUtc()Convert to UTCBoth
ToLocal()Convert to local timeBoth
StripTime()Remove time componentBoth
Truncate(precision)Truncate to precisionBoth
RoundToMinute()Round to nearest minuteDateTime, DateTime?
RoundToHour()Round to nearest hourDateTime, DateTime?
RoundToDay()Round to nearest dayDateTime, DateTime?
Clamp(min, max)Constrain to rangeDateTime, DateTime?
DefaultIfMinValue(default)Replace MinValueDateTime only
DefaultIfMaxValue(default)Replace MaxValueDateTime only
DefaultIfNull(default)Replace null valuesDateTime? only

Examples

// Timestamp normalization
builder.AddDateTimeCleansing<Transaction>(x => x.Timestamp)
.SpecifyKind(DateTimeKind.Utc)
.ToUtc();

// Event time cleanup with rounding
builder.AddDateTimeCleansing<Event>(x => x.StartTime)
.ToUtc()
.RoundToMinute();

// Date normalization (remove time)
builder.AddDateTimeCleansing<Document>(x => x.CreatedDate)
.StripTime();

// Default handling for edge cases
builder.AddDateTimeCleansing<Record>(x => x.DateField)
.DefaultIfMinValue(DateTime.UtcNow)
.DefaultIfMaxValue(DateTime.UtcNow);

DateTime Rounding and Clamping

Round times and constrain to ranges:

// Round timestamps to nearest minute for metrics
builder.AddDateTimeCleansing<Metric>(x => x.RecordedAt)
.RoundToMinute();

// Round optional timestamps
builder.AddDateTimeCleansing<Event>(x => x.OptionalEndTime)
.RoundToMinute(); // null values pass through unchanged

// Round to nearest hour for reports
builder.AddDateTimeCleansing<Report>(x => x.GeneratedAt)
.RoundToHour();

// Clamp dates to valid range
builder.AddDateTimeCleansing<Contract>(x => x.StartDate)
.Clamp(
DateTime.Now.AddYears(-1),
DateTime.Now.AddYears(10),
"Start date must be within ±10 years");

// Clamp optional dates
builder.AddDateTimeCleansing<Reservation>(x => x.OptionalCheckoutDate)
.Clamp(DateTime.Now, DateTime.Now.AddYears(2));

Collection Cleansing

Clean and normalize collection properties:

builder.AddCollectionCleansing<Document>(x => x.Tags)
.RemoveNulls()
.RemoveDuplicates()
.Sort();

Available Operations

OperationPurposeExample
RemoveNulls()Remove null entries[1, null, 3][1, 3]
RemoveDuplicates()Remove duplicates[1, 2, 1, 3][1, 2, 3]
RemoveEmpty()Remove empty strings["a", "", "b"]["a", "b"]
RemoveWhitespace()Remove whitespace strings["a", " ", "b"]["a", "b"]
Sort()Sort ascending[3, 1, 2][1, 2, 3]
Reverse()Reverse order[1, 2, 3][3, 2, 1]
Take(count)Take first N items[1, 2, 3, 4, 5][1, 2, 3] (3)
Skip(count)Skip first N items[1, 2, 3, 4, 5][4, 5] (3)

Examples

// Tag cleanup
builder.AddCollectionCleansing<Article>(x => x.Tags)
.RemoveNulls()
.RemoveEmpty()
.RemoveDuplicates()
.Sort();

// Category deduplication
builder.AddCollectionCleansing<Product>(x => x.Categories)
.RemoveNulls()
.RemoveDuplicates()
.Sort();

// Email list cleaning
builder.AddCollectionCleansing<MailingList>(x => x.Emails)
.RemoveNulls()
.RemoveEmpty()
.RemoveDuplicates();

Chaining Operations

Operations can be chained fluently:

// Multiple operations on same property
builder.AddStringCleansing<User>(x => x.Email)
.Trim()
.ToLower()
.RemoveSpecialCharacters()
.DefaultIfNullOrWhitespace("unknown@example.com");

// Multiple properties
builder.AddStringCleansing<Person>(x => x.FirstName)
.Trim()
.ToTitleCase();

builder.AddStringCleansing<Person>(x => x.LastName)
.Trim()
.ToTitleCase();

builder.AddStringCleansing<Person>(x => x.Email)
.Trim()
.ToLower();

Thread Safety

All cleansing nodes are stateless and thread-safe. They can be safely shared across parallel pipelines.

Performance

Cleansing nodes are optimized for performance:

  • Property access uses compiled expressions (not reflection)
  • String operations use StringBuilder to minimize allocations
  • Numeric operations use native types (no boxing)
  • Collection operations are evaluated lazily where possible

Error Handling

Cleansing nodes integrate with NPipeline's error handling:

// Custom error handler for cleansing failures
builder.WithErrorHandler(cleanseHandle, new CleansingErrorHandler());

// Continue processing even if cleansing fails
builder.WithErrorDecision(cleanseHandle, NodeErrorDecision.Skip);