- Don't put data science notebooks into production - ThoughtWorks provide an interesting summary on the pain points of productionizing data science notebooks. I particularly like the focus on solving the people problems to empower teams to deliver.
- Turning Metadata Into Insights with Databook - A comprehesive overview from Uber on how leverage metadata in their Databook. I am hoping that they open source Dragon; their schema generation tool.
- Your data tests failed! Now what? - An outline of how you might go about operationalising data tests in your data pipelines.
- Meet whale! 🐳 The stupidly simple data discovery tool. - Data discovery is a non-trivial problem and this new open source project offers a simple way to get started. I am looking forward to having a play.
- Materialize under the Hood - Materialize is a SQL streaming database. This post provides a high level explanation of how it makes that happen. I am looking forward to going deeper and learning more about timely dataflow and differential dataflow.
5 Interesting Data Engineering Blog Posts I've Recently Read
I found all these worthwhile. They cover the breadth of Data Engineering. Let me know if you have anything that you think I would enjoy reading.
