Insights
The Lost Art and Science of Data Modeling
"(...) Then all that goodness mysteriously faded away without a whimper as the hype of NoSQL, Cloud, and Microservices occupied the whole stage. During this time the engineering team quietly co-opted the ownership of clean data design, and frankly, most of them didn’t know what they were doing (...)"
How to data model correctly: Kimball vs One Big Table (by Zach Wilson)
"(...) One big table data modeling sounds like a joke in some regards. The name reminds me of the “god controller” in full-stack development. Why would we have a table that has everything in it? Is that really the best abstraction that we can come up with? (...)"
How to data model correctly: Kimball vs One Big Table (by Zach Wilson)
"(...) I followed this philosophy when I was working at Airbnb on pricing and availability. We moved all the pricing data into a deduped listing-level table instead of an exploded-out listing-night level table and we saw intense gains in efficiency across the warehouse! (...)"
Notebooks X IDEs (Linkedin)
"(...) Despite what people may think, I use notebooks too. I can't deny that it is the easiest way to prototype. On hashtag#Databricks, sometimes the only way (some functionality related to feature engineering and delta tables just does not work in VS code). On the other hand, as an MLOps practitioner, I am against using notebooks outside of the prototype phase and see many challenges when transitioning from a notebook to production-ready code. (...)"
Test, test, and then test again.
"(...) No tool, framework, or process can overcome an engineering culture that treats testing as an afterthought. Fixing this takes time, but small steps—asking for time to test, planning for testing in project roadmaps, and holding each other accountable—can shift the balance toward quality. (...)"
Testing and Development for Databricks Environment and Code
"Every once in a great while, the question comes up: “How do I test my Databricks codebase?” It’s a fair question, and if you’re new to testing your code, it can seem a little overwhelming on the surface. However, I assure you the opposite is the case. (...)"
Why Python Always Breaks. Long Live Python.
"(...) Python's actually a great language, dare I say the greatest? It's not the best overall (if there even is such a thing), and in many aspects, it will lose to its alternatives, but at the same time, it is also a terrific first choice for assorted problems.
If you want to make the most of it, though, you need to put in the time to understand it and grow in your skills. What ultimately makes or breaks most projects isn't the choice of language, but the developers responsible for its creation. (...)"
Unity Catalog Architecture Patterns
"(...) In practice, effective scope design hinges on clear ownership. Scopes must be defined with accountable owners who are empowered to manage and govern the assets within their domain. Without ownership, scopes quickly become ineffective and unsustainable. (...)"