Andrew A Lamb

Staff Engineer at InfluxData | Member Apache Software Foundation | Apache Arrow PMC

LinkedIn | Github
Last Update: April, 2024

I am a software engineer with experience in environments ranging from 2 developers in a VC's office, to large multinational corporations and distributed open source projects (I love small companies). Technically I focus on systems (e.g. databases), and platform engineering, and have ben both an architect and manager/VP.

I currently work in Rust on InfluxDB, focused on query processing and the Apache Arrow ecosystem. I am honored to serve on the Apache Arrow PMC (2023 chair) and actively contribute to the Apache Arrow DataFusion query engine and the Apache Arrow Rust implementation

Selected Technical Writing

(More Conference Papers below)

2024-06-19 [SIGMOD 2024] Apache Arrow DataFusion: a Fast, Embeddable, Modular Analytic Query Engine

2024-03-18 [InfluxData Blog] Making Most Recent Value Queries Hundreds of Times Faster

2023-10-25 [InfluxData Blog] Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0

2023-08-01 [InfluxData Blog] Aggregating Millions of Groups Fast in Apache Arrow DataFusion (cross post on arrow.apache.org/blog )

2022-12-07 [InfluxData Blog] Querying Parquet with Millisecond Latency (cross post on arrow.apache.org/blog )

2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 2

2022-11-07 [Apache Arrow Blog] Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 1

2022-10-27 [ODBMS.org] On InfluxData's New Storage Engine. Q&A with Andrew Lamb

2022-10-17 [Apache Arrow Blog] Arrow and Parquet Part 3: Arbitrary Nesting with Lists of Structs and Structs of Lists

2022-10-08 [Apache Arrow Blog] Arrow and Parquet Part 2: Nested and Hierarchical Data using Structs and Lists

2022-10-05 [Apache Arrow Blog] Arrow and Parquet Part 1: Primitive Types and Nullability

2022-01-14 [InfluxData Blog] Rust Object Store Donation

2022-01-14 Using Rustlang's Async Tokio Runtime for CPU-Bound Tasks.

...

2012-08-27 [VLDB 2012] The Vertica Analytic Database: C-Store 7 Years Later

Technical Talks and Presentations

2024-03-27 DataCouncil 2024: Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet. slides

2024-03-27 Apache Arrow Datafusion Meetup: Introduction, Agenda, Remarks. slides, recording,

2023-09-27 MIT Database Group: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. slides,

2023-06-02 [Dutch Seminar on Database System Design]: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. slides, recording,

2023-05-09 [ODSC East 2023]: Introduction to Apache Arrow and Apache Parquet, using Python and pyarrow. slides

2023-04-05 The Apache Arrow DataFusion Architecture Part 3: Physical Plan and Execution. slides, recording,

2023-04-04 The Apache Arrow DataFusion Architecture Part 2: Logical Plans and Expressions. slides, recording,

2023-03-31 The Apache Arrow DataFusion Architecture Part 1: Query Engines. slides, recording,

2023-02-15 [Invited Talk at Optum Labs]: Building a new time series database "from scratch" Using Apache Arrow, Parquet, DataFusion and Rust slides,

2022-06-27 [DataBricks Data+AI Summit]: DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine. slides, recording

2022-05-23 [The Data Thread 2022]: Apache Arrow and DataFusion: Changing the Game for Implementing Database Systems. slides, recording

2022-04-06 [EM.S20, MIT Sloan School of Management, Guest Speaker]: Managing Software Dependencies and the Supply Chain. slides

2021-10-13 [InfluxData Tech Talk]: Query Processing in InfluxDB IOx. slides, recording

2021-04-20 [USC CSE-132 Database Systems Implementation, Guest Speaker]: Apache Arrow and its impact on the database industry. slides, recording

2021-03 [InfluxData Tech Talk]: Query Engine Design and the Rust-Based DataFusion in Apache Arrow. slides, slides (slideshare), recording

2020-12-09 [InfluxData Tech Talk]: A Rusty Introduction to Apache Arrow and how it applies to a TimeSeries Database. slides, recording

Journal / Conference Papers

2024-06-19 Apache Arrow DataFusion: a Fast, Embeddable, Modular Analytic Query Engine Andrew Lamb, Yijie Shen, Daniël Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Chao Sun, and Liang-Chi Hsieh 2024 International Conference on Management of Data (SIGMOD 2024), June 9-15, 2024, Santiago, Chile

2014-03-31 The Vertica Query Optimizer: The Case for Specialized Query Optimizers. Nga Tran, Andrew Lamb, L. Shrinivas, Sreenath Bodagala and Jaimin Dave, IEEE International Conference on Data Engineering (ICDE - 2014)

2012-08-27 The Vertica Analytic Database: C-Store 7 Years Later. Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, Chuck Bear. 38th International Conference on Very Large Data Bases, Proceedings of the VLDB Endowment, Vol. 5, No. 12

2003-06-08 Linear analysis and optimization of stream programs. Andrew A. Lamb, William Thies and Saman Amarasinghe. ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI)

2002-08-05 A stream compiler for communication-exposed architectures. Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman Amarasinghe. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

Really Old Content

Old Blog
Six Hertz, Six Bytes
Pre-github projects
Class List
School Projects