As startups grow, they often struggle with growing their data team. Talent shortages and the overall difficulty of hiring data talent have forced many startups to get creative with their data team topologies.
Situations vary widely, but there are few best practices for building a data team. To start:
Invest heavily in your onboarding documentation
Onboarding data talent is usually relatively slow because it involves a lot of cognitive load in the form of business context AND in the form of extraneous load (eg your data systems and the glue binding them together).
Documentation is out of date the moment it’s written, and the returns on investing in documentation usually diminish quickly. That said, onboarding docs are one area you really can’t afford to cut corners, particularly when you’re building a data team.
Defend against variance without value
Don’t let your engineers bikeshed data schemas/formats & absolutely don’t let those formats proliferate willy nilly across the organization.
CSVs are fine for playing around but generally formats outside of (Parquet, Avro) and (Arrow, JSON schema) create variance without value and should be minimized.
Notable exception: database APIs tend to make variance less of a problem at the online data persistence layer (MySQL vs Cassandra etc), so it’s usually best to just let engineers pick whatever DB is the right tool for their application’s job.
Best Practices for Building a Data Team: Part Two
We’ll revisit more Best Practices for Building a Data Team next week in part two of this series. We’re gonna about how to use strategies & visions to keep your data projects from getting derailed. And we’ll talk a handful of the books that every data professional should read. Stay tuned!