Data Science Success and Failures
Although data science has been in practice for a while, the rates of failure of big data and AI projects remain troublingly high. And although data science is highly regarded in the business world, it makes little contribution to the bottom lines of most businesses. Therefore, it is essential to have clear criteria and metrics for data science success.
Data Scientists define the success of their job on the metrics that are mostly about data, algorithms, and technology.
- How accurate is the algorithm?
- How can they gather, collect and label data?
- What techniques and methods to use? Machine Learning vs Deep Learning?
Although these are very important factors, the job is not done if the algorithm is 99.9% accurate and is not changing the behavior of business operations!
It seems that data scientists have forgotten their job is to contribute to the actual business value, by understanding the business problem and then solving it with a tool. Reducing compliance risks, decreasing customer churn, increasing revenue, improving customer experience, and so on. These are the real business values that we want organization’s scientists to deliver results.
Business Problem is Not Only a Problem Statement.
Most of the time, when we talk about a business problem, in the minds of data scientists it becomes a simple problem statement. Then very soon, data science practitioners focus on the solution.
For example, calculating heath premium pricing sounds like a very interesting data science project. And this is mostly how it will be approached:
- Where to collect data related to policies and individuals?
- How to do the feature engineering?
- What techniques are to be used to see if data is biased, how do we label them?
- What algorithms to use for the best performance?
- What platform and tech to use to train a model?
- Test and then deploy the model
And done, we have an algorithm that does the pricing! It is more and less similar to the above process. And this is mostly known as a data-driven approach, to build a decision model that predicts, classifies, offers, recommends and etc. So, how to ensure data science success.
Yet, the better approach is to just forget about what data we can collect, what algorithms should we be using, and so on for now. Start with the business question i.e. business decision.
In our example, how much should an insurer charge for the insurance policy of an individual? e.g., Calculate Health Premium in the below model.
- Decompose this decision “premium price” to many smaller sub decisions e.g. all the blue rectangles on the above model.
- Understand their dependencies and paths of influences
- Understanding the characteristics and requirements of each individual decision in the whole hierarchy
- Identifying what analytics techniques: prescriptive, predictive, diagnostics, etc. is applicable for each decision
- Recognize each individual decisions have their own decision cycle
- Prepare, model, and implement to fulfill each individual decision’s cycle based on what their characteristics are i.e. probabilistic vs deterministic
And now we have a decision graph that incorporates all techniques, from predictive i.e. ML, DL, etc. to prescriptive e.g. rules, procedural and processes, and even optimization models to Calculate Health Premium policy.
Ensure the Success of Data Science Projects Using Decision-Centric Approach
As you see here, the job is not done by building one or two predictive models. But with having a deep understanding of the problem, decomposing it into relevant decisions and implementing those individual decisions, and incorporating all the decisions in a unified decision graph that orchestrates the execution and calculation will help you achieve data science success.
The Decision-Centric Approach reduces the risks of project failure by:
- Narrowing down the exact data you need for your predictive model for training
- Reduces the amount of complexity in feature engineering, identifying biases and data quality
- Gives absolute visibility on how the final decisions are made
- Provides a very clear explanation and interpretation on why of the final decision’s outcome
- Increase reusability and scalability of the decisions
- Reduces the risk of choosing the right technology stack as each individual decision can be implemented, tested separately
And for sure, this will increase the chance of success of data science projects by thinking from the beginning about all aspects of the final decision rather than just a predictive (i.e. ML, DL…) part and stop there!
Last updated November 22nd, 2021 at 02:00 pm, Published June 17th, 2021 at 02:00 pm