1. Don't put data science notebooks into production - ThoughtWorks provide an interesting summary on the pain points of productionizing data science notebooks. I particularly like the focus on solving the people problems to empower teams to deliver.
  2. Turning Metadata Into Insights with Databook - A comprehesive overview from Uber on how leverage metadata in their Databook. I am hoping that they open source Dragon; their schema generation tool.
  3. Your data tests failed! Now what? - An outline of how you might go about operationalising data tests in your data pipelines.
  4. Meet whale! 🐳 The stupidly simple data discovery tool. - Data discovery is a non-trivial problem and this new open source project offers a simple way to get started. I am looking forward to having a play.
  5. Materialize under the Hood - Materialize is a SQL streaming database. This post provides a high level explanation of how it makes that happen. I am looking forward to going deeper and learning more about timely dataflow and differential dataflow.