Blog Posts

Andrew A Lamb

2025-08-15 [Apache DataFusion Blog] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet Blog

2025-07-14 [Apache DataFusion Blog] Embedding User-Defined Indexes in Apache Parquet Files Blog

2025-07-22 [The New Stack] Why Startups Are Betting Everything on Apache DataFusion Blog

2025-04-19 [Apache DataFusion Blog] User Defined Window Functions in DataFusion Blog

2025-04-10 [Apache DataFusion Blog] tpchgen-rs World’s fastest open source TPC-H data generator, written in Rust. Blog (recording YouTube)

2025-03-31 [InfluxData Blog] Optimizing SQL (and DataFrames) in DataFusion: Part 2

2025-03-31 [InfluxData Blog] Optimizing SQL (and DataFrames) in DataFusion: Part 1

2025-01-08 [DataFusion Blog] Using Ordering for Better Plans in Apache DataFusion

2025-01-08 [InfluxData Blog] 2025: The Year of 1,000 DataFusion-Based Systems

2025-01-06 [InfluxData Blog] Apache DataFusion Meetup: Chicago December 2024 Recap

2024-11-18 [DataFusion Blog] Apache DataFusion is now the fastest single node engine for querying Apache Parquet files

2024-09-03 [InfluxData Blog] Using StringView / German Style Strings to Make Queries Faster: Part 2 - String Operations

2024-09-03 [InfluxData Blog] Using StringView / German Style Strings to Make Queries Faster: Part 1 - Reading Parquet

2024-03-18 [InfluxData Blog] Making Most Recent Value Queries Hundreds of Times Faster

2023-10-25 [InfluxData Blog] Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0

2023-08-01 [InfluxData Blog] Aggregating Millions of Groups Fast in Apache Arrow DataFusion (cross post on Apache Arrow Blog)

2022-12-07 [InfluxData Blog] Querying Parquet with Millisecond Latency (cross post on Apache Arrow Blog)

2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 2

2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 1

2022-10-27 [ODBMS.org] On InfluxData’s New Storage Engine. Q&A with Andrew Lamb

2022-10-17 [Apache Arrow Blog] Arrow and Parquet Part 3: Arbitrary Nesting with Lists of Structs and Structs of Lists

2022-10-08 [Apache Arrow Blog] Arrow and Parquet Part 2: Nested and Hierarchical Data using Structs and Lists

2022-10-05 [Apache Arrow Blog] Arrow and Parquet Part 1: Primitive Types and Nullability

2022-01-14 [InfluxData Blog] Rust Object Store Donation

2022-01-14 [The New Stack] Using Rustlang’s Async Tokio Runtime for CPU-Bound Tasks.