In last week’s blog, we highlighted that creating a data driven environment through mutual understanding of each other’s field of activity, is an important catalyst to start reaping the benefits from the data held by your organisation or institution.
Such a culture of understanding and cooperation between the domain experts/business managers, and the data scientists, is truly the fundament of a data driven organisation. Now, on these fundaments, the next layers can be built: setting up the structures to start drawing insights from the data, as and when needed and required by the business or domain experts.
Often, these relevance and importance of creating such structures may initially be somewhat underestimated by the business or domain stakeholders. As a result, data science projects may be not produce the data insights as fluidly as expected, leaving the business people and domain experts who are excited about the potential benefits offered by data science, sometimes rather wondering.
The data scientists from their end, would without a doubt be willing to serve their colleagues with the best and most efficient service, but will be faced with the ever necessary task of data extraction, data cleansing and data preparation. And, as we all know, such tasks may consume quite a bit of the data scientist’s time and energy. On top, ‘garbage in, garbage out’, really is not an option.
Often this series of tasks is left to the data scientist. And indeed this latter will need to advise and guide their colleagues into the steps required to make the data ready for the analytical process.
However, through a clever combination of business and domain experts, the IT function, and the data scientist, three major steps which will largely facilitate the data driven ambitions, can often be achieved:
- Ensuring adequate capturing of the data at the source
- Optimising the data storage, and ensuring ease of data extraction
- Encapsulating repetitive data cleansing, and data preparation steps, as part of the data extraction flow
Again, cooperation between the different functions is what will make all the difference. They need not all to be involved in each part. For example, the business or domain experts need to be heavily involved in the first step, but not necessarily in the third step (which is more of an IT-data science effort). At the start, going through these steps will mean an initial investment of resources, but over time the benefits will be substantial.
Consider the case of a credit card organisation in the US. Already over a decade ago, they heavily relied on prospect scoring to drive their operational sales and marketing ROI. All the necessary data was collected and stored in a structured and organised manner. At the start, data gaps causing sub-optimal insights had been identified and – to the degree possible – optimised. Next, through collaboration between the IT department and the data scientists, the data extraction and data cleansing was highly automated.
As a result, as soon as a marketing campaign was over, the data would be extracted from the system, and provided to the data scientist team in clean and uniform manner. This in turn allowed the data scientist team to proceed immediately with developing the next prospect scoring model, fully in tune with the high frequency of the marketing campaigns.
By no means their data was fully perfect. And for sure room for improvement existed. But as an organisation they continuously managed to extract whatever insight was hidden in the data, limiting the loss of time which would otherwise be caused by repetitive manual data extraction, cleansing and preparation. The business impact of this process, was, substantial.
If you do have any questions and or remarks, please let us know. You can leave your message at Contact – Quantforce