top of page

Insights

Should You Ditch Spark for DuckDb or Polars? (Benchmark)

"(...) I think the whole narrative that you should consider replacing your Spark workloads with DuckDB or Polars if your data is small is all hype (...)"

Databricks x Snowflake

The real competition between the two... talent

Getting Your Catalog in Order

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton

Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet (Youtube)

In this podcast, Aravind Srinivas, CEO of Perplexity, discusses the future of AI and search technologies. He shares insights into how Perplexity aims to revolutionize the way humans find answers on the Internet.

Delta Lake vs Apache Iceberg. The Lake House Squabble

"(...) I’m sure Hudi might want to interject itself, but we all know that the two clear contenders are Delta Lake and Apache Iceberg (...)".

How to data model correctly: Kimball vs One Big Table (by Zach Wilson)

"(...) I followed this philosophy when I was working at Airbnb on pricing and availability. We moved all the pricing data into a deduped listing-level table instead of an exploded-out listing-night level table and we saw intense gains in efficiency across the warehouse! (...)"

The Curse of Conway and the Data Space

"(...) It’s time for data and analytics engineers to identify as software engineers and regularly apply the practices of the wider software engineering discipline to their own sub-discipline. (...)"

Notebooks X IDEs (Linkedin)

"(...) Despite what people may think, I use notebooks too. I can't deny that it is the easiest way to prototype. On hashtag#Databricks, sometimes the only way (some functionality related to feature engineering and delta tables just does not work in VS code). On the other hand, as an MLOps practitioner, I am against using notebooks outside of the prototype phase and see many challenges when transitioning from a notebook to production-ready code. (...)"

Blue/Green pipelines in a medallion architecture

"(...) Ever wondered what Blue/Green pipelines look like in a medallion architecture? (...)"

The Rise of The Notebook Engineer

"(...) 99% of Engineers and Data Folk who regularly use Notebooks as part of their development and production lifecycles … abuse, overuse, and do so at their own peril and the peril of their Data Platforms at large … and suffer the grave consequences as such. (...)"

Best Practices for Unit Testing PySpark (Youtube)

Unit tests help you reduce production bugs and make your codebase easy to refactor. You will learn how to create PySpark unit tests that run locally and in CI via GitHub actions.

Databricks x Snowflake (Part 1)

."(...) If Snowflake doesn't shift their marketing and sales focus from Databricks back to Snowflake, they will become a marginalized, niche offering"

bottom of page