The top data scientists have some fundamental traits that set them apart from the crowd. Highly effective data analysis isn’t learned overnight, but it can be learned faster. It is beneficial when data scientists are starting their journey that they focus heavily on the technical aspects. This means programming, queries, data cleansing, etc.
However, as data scientists grow, they need to focus more on design decisions and communication with management. This will multiply the impact of the more experienced data scientists knowledge. At Part 2, we will look into another 3 habits of data analysis for effectively incorporating, communicating and investing in data analysis geared towards an engineering team.
Value direction over definition
The cost of data collection is often the major hurdle standing in the way of a definitive answer to a business or engineering question. You can almost always get a partial answer that’s better than what you have now.
Even if you don’t have the instrumentation in place to definitively answer whether a specific component is the problem, you can find a cheap way to reduce the uncertainty by eliminating a few components. Maybe you can stitch together a few different sources of data and give some very rough tallies to get things going in the right direction.
Getting yourself or your team moving in the right direction is more important than getting that super accurate, definitive answer.
Value how software works over thoughts on how software works
The beauty of product data analysis is seeing the footprints of actual users using your software product. Sometimes you’ll get a really nice set of footprints. More likely than not, you’ll get partial impressions making your investigation all the more difficult. Regardless, telemetry and log footprints are a reflection of reality.
Architectural knowledge is a great asset. However, the telemetry and logs represent hard evidence of what’s actually going on rather than what we believe is going on. As a product data scientist, you have a unique view of the software. You see the software as it actually is.
This is powerful, because not only do you have evidence of how the software actually works, you can also scale that insight to a broad set of users. You can make claims like “77% of our users go down this code path which contradicts the design.” Believe in the footprints left behind by your users, but always double-check.
Value Center, Unusual Features, Spread, and Shape (CUSS) over trust
Data never comes clean. Data analyst rarely trust all the data is there and in the right format. They always apply Kern’s CUSS acronym from Introduction to Probability and Statistics Using R to understand the data’s Center, Unusual features, Spread, and Shape.
- Center – Where is the general tendency of the data?
- Unusual features – Are there missing data points? Outliers? Clustering?
- Spread – What is the variability of the data?
- Shape – If you plot the data, what is the shape of the data?
Knowing how the data is generated and the CUSS of the data allows you to draw better-reasoned insights and investments.
Data experts provide the largest impact for both themselves and their companies when they go beyond their technical abilities. The value they bring to the table is their experience, it can help guide younger developers to make better design decisions, and help managers make better decisions on which projects will have the best ROI. In turn, this magnifies the impact of their involvement on the team.