Distributed LINQ for the
petabyte era

Write idiomatic C# LINQ that executes everywhere. From your local dev machine to petabyte-scale Apache Spark and Snowflake clusters. Let IntelliSense and the compiler do the work.

Get Started View on GitHub

> dotnet add package DataLinq.NET

// 1. CHOOSE YOUR TARGET (Zero logic changes required)
// await using var context = Spark.Connect("yarn");               // Apache Spark
await using var context = Snowflake.Connect("enterprise-account"); // Snowflake

// 2. STREAM THE DATA (O(1) Memory Footprint)
// var data = Read.Csv<Order>("local_dump.csv");                  // Local dev
var data = context.Read.Table<Order>("sales.orders");             // Cluster execution

// 3. COMPILE-TIME SAFE PIPELINE
await data
    .Where(o => o.Amount > 1000) // <-- Pushed down to native SQL optimizer
    .Cases(
        o => o.Amount > 50000, 
        o => o.IsInternational
    )
    .SelectCase(
        vip => EnrichVip(vip),     // <-- Auto-deployed as a cluster UDF!
        intl => EnrichIntl(intl),
        std => std                 // <-- Supra pattern catch-all handled natively
    )
    .AllCases()
    .WriteTable("analytics.processed_orders"); // <-- Zero data hits local RAM

How DataLinq.NET will make your life easier

O(1) Memory Footprint

Process billion-row CSVs without memory leaks. DataLinq uses an advanced streaming row-by-row SUPRA architecture that keeps RAM usage flat regardless of file size.

Server-Side C# UDFs

Write custom C# methods inside `.Where()` or `.Select()`. We automatically package and deploy your code as server-side functions. Zero manual infrastructure.

Type Safety & IntelliSense

No more string-based Python scripts crashing your pipeline after 4 hours of processing. If it compiles, it runs. Strong typing prevents 90% of data integration errors.

Distributed State Sync

Run distributed `ForEach` loops across server run clusters. Our Delta Reflection Protocol automatically synchronizes instance variables and counters back to your local C# application.

Zero-Allocation Engine

Our custom `ObjectMaterializer` runs 4x faster than standard reflection. Built entirely without external dependencies to maintain a pristine, framework-pure mandate.

EF Core Synergy & Integration

Because EF Core outputs `IAsyncEnumerable`, developers can use `Merge` primitives to natively join live SQL Server database streams with their Snowflake datasets in memory.

Build-Time Diagnostics

Catch Big Data performance bottlenecks in your IDE, before you even press F5. Roslyn analyzers warn you instantly if a custom method prevents predicate pushdown.

Zero Data Exfiltration

Your data never leaves your secure cloud cluster. We only send the compiled C# expression tree. Fully compliant with HIPAA, GDPR, and enterprise security standards.

FinOps & Cloud Cost Savings

Generating highly optimized, push-down SQL via LINQ saves thousands in cluster compute resources compared to unoptimized Python scripts that accidentally pull data into memory.

Stop paying the Python Tax

Dynamically-typed Python scripts fail at runtime, hours into a cluster job. DataLinq brings compile-time safety to Big Data.

python_script.py (Runtime Error)

# Typo? You'll find out in 45 minutes when the cluster crashes.
df.filter(pl.col("ammount") > 1000) ✕ DataError

DataLinq.cs (Compile Time)

// Typo? Won't compile. Fails instantly in your IDE.
.Where(o => o.Amount > 1000) ✓ Safe

Distributed LINQ for the petabyte era