Creating a better data culture

Apr 11, 2021

Krishna Puttaswamy, an engineer on Uber’s data experimentation team wrote a great piece on creating a better data culture from first principles.

Uber’s Journey Toward Better Data Culture From First Principles

Some of my favorites:

“Data as code: Data should be treated as code. Creation, deprecation, and critical changes to data artifacts should go through the design review process with appropriate written documents where consumers’ views are taken into account. Schema changes have mandatory reviewers who sign off before changes are landed. Schema reuse/extension is preferred to creating new schemes. Data artifacts have tests associated with them and are continuously tested. These are practices we normally apply to service APIs, and we should extend that same rigor to thinking about data.
Data is owned: Data is code and all code must be owned. Each data artifact should have a clear owner, a clear purpose, and should be deprecated when its utility is over.
Accelerate data productivity: Data tools must be designed to optimize collaboration between producers and consumers, with mandatory owners, documentation, and reviewers when necessary. Data tools must integrate with other related tools well bypassing necessary metadata seamlessly. Data tools should meet the same developer grade as services, offering the ability to write and run tests before landing changes, to test changes in a staging environment before rolling to production, and integrating well with the existing monitoring/alerting ecosystem.”

This should definitely be an iterative process. Choose one principle and try to implement it. Once successful, rinse and repeat. Do read the post as there is a ton of gold there!

Twitter Thread I enjoyed:

Matt Lerner @matthlerner

After 17 years, we finally “cracked” a $100M churn problem at PayPal. Zero fancy tech. Just a spreadsheet, some simple SQL, and a physicist named Ben. 👇🏼

James Densmore @jamesdensmore

Two Slack messages that create anxiety for data teams: - "Quick question" - "This numbers on this dashboard don't look right" I've been playing those in my head whenever I'm trying to convince myself to invest more time in data discovery, validation, etc. It's worth the effort.

Podcasts:

Put your whole team on the same page with Atlan a Data Engineering Podcast episode with Atlan’s Co-Founder, Prukalpa. I may be biased, but an incredible story!
Decentralizing Data: From Data Mesh to Data Monolith Barry O’Reillys podcast with with Zhamak Dehghani

Links Roundup:

Building Powerful Data Teams: On Investing in Junior Talent
Data Domains and Team Topologies from Yet Another Data Blog
Scaling Data Culture is a Marathon, Not a Sprint from Fivetran
The Algorithms That Make Instacart Roll
Why Every Data Team Needs a Money Tree

Thanks again to all new subscribers! The reception has been amazing. Please do share with fellow data lovers!

Modern Data Stack

Discussion about this post

Ready for more?