Blog Posts

Andrew A Lamb

2025-09-10 [Apache DataFusion Blog] Dynamic Filters: Passing Information Between Operators During Execution for 25x Faster Queries Blog

2025-08-15 [Apache DataFusion Blog] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet Blog

2025-07-14 [Apache DataFusion Blog] Embedding User-Defined Indexes in Apache Parquet Files Blog

2025-07-22 [The New Stack] Why Startups Are Betting Everything on Apache DataFusion Blog

2025-04-19 [Apache DataFusion Blog] User Defined Window Functions in DataFusion Blog

2025-04-10 [Apache DataFusion Blog] tpchgen-rs World’s fastest open source TPC-H data generator, written in Rust. Blog (recording YouTube)

2025-03-31 [InfluxData Blog] Optimizing SQL (and DataFrames) in DataFusion: Part 2 Blog

2025-03-31 [InfluxData Blog] Optimizing SQL (and DataFrames) in DataFusion: Part 1 Blog

2025-01-08 [DataFusion Blog] Using Ordering for Better Plans in Apache DataFusion Blog

2025-01-08 [InfluxData Blog] 2025: The Year of 1,000 DataFusion-Based Systems Blog

2025-01-06 [InfluxData Blog] Apache DataFusion Meetup: Chicago December 2024 Recap Blog

2024-11-18 [DataFusion Blog] Apache DataFusion is now the fastest single node engine for querying Apache Parquet files Blog

2024-09-03 [InfluxData Blog] Using StringView / German Style Strings to Make Queries Faster: Part 2 - String Operations Blog

2024-09-03 [InfluxData Blog] Using StringView / German Style Strings to Make Queries Faster: Part 1 - Reading Parquet Blog

2024-03-18 [InfluxData Blog] Making Most Recent Value Queries Hundreds of Times Faster Blog

2023-10-25 [InfluxData Blog] Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0 Blog

2023-08-01 [InfluxData Blog] Aggregating Millions of Groups Fast in Apache Arrow DataFusion Blog (cross post on Apache Arrow Blog)

2022-12-07 [InfluxData Blog] Querying Parquet with Millisecond Latency Blog (cross post on Apache Arrow Blog)

2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 2 Blog

2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 1 Blog

2022-10-27 [ODBMS.org] On InfluxData’s New Storage Engine. Q&A with Andrew Lamb Blog

2022-10-17 [Apache Arrow Blog] Arrow and Parquet Part 3: Arbitrary Nesting with Lists of Structs and Structs of Lists Blog

2022-10-08 [Apache Arrow Blog] Arrow and Parquet Part 2: Nested and Hierarchical Data using Structs and Lists Blog

2022-10-05 [Apache Arrow Blog] Arrow and Parquet Part 1: Primitive Types and Nullability Blog

2022-01-14 [InfluxData Blog] Rust Object Store Donation Blog

2022-01-14 [The New Stack] Using Rustlang’s Async Tokio Runtime for CPU-Bound Tasks. Blog