THE GROWING IMPORTANCE OF THE USE OF DATA SCIENCE [3-3]

In last week’s blog, we discussed how being well organised on the collection, storage and data extraction will benefit the journey towards becoming a data driven company.

 

Once the data has been made available to the data scientist, the next step in the process is for them to extract relevant insights from the data, and also to communicate these findings in a non-technical manner to their business partners or domain experts. As ever so often, visualisations of predictive or explanatory data patterns, are most useful when interacting with the colleagues : one graph will often be more powerful than a thousand words.

 

Luckily, software packages often used by the data scientist, such as R, Python, SAS,… , offer a wide variety of functions, code and add-on tools, to help the data scientist wade through the data and start extracting relevant insights.

 

Still, even with these tools, the data scientist may require some good time to run his or her magic on the data. In nearly all projects, data scientists will be faced with the following challenges:

 

  • Progress can be slowed by repetitive, time consuming coding and visualization tasks. Graphs are very powerful, both for the data scientist and the business manager or domain experts, but simple-yet-insightful graphs may not be so easy to produce, and may require quite some data pre-processing.

 

  • Loss of data or records: in nearly all datasets, missing data will be present one way or another As our computing machines do not handle missing data well, either these records need to be removed, or data imputation needs to be performed. Theories and tools exist around missing data handling and imputation, but the final choice will still reside with the individual data scientist. And at times is can be a difficult one to make.

 

  • Powerful algorithms producing suboptimal results: over time, the AI and ML algorithms a data scientist can leverage, have become ever more powerful. In turn however, these techniques are known to enlarge the undesirable effects of ‘garbage in, garbage out’, and puts extra importance on the feature engineering and transformation. Recent algorithmic advances such as XGBoost and Random Forests, rely on a form of quantiling, or ‘binning’. This can be seen as a built-in feature transformation, and consequently these algorithms have produced some very good predictive or explanatory performance. This ‘binning’ is not really new – scoring professionals have known and leveraged this technique already for many years, but the use has of it has so far been limited to their industry.

 

  • Low level of interaction with the business user: the outcome of nearly all data science projects, is to support the business or the domain experts (such as medical doctors). During the data science process therefore, communication and exchange of points of view, between the data scientist and his/her colleagues, is therefore of utmost importance. Visualizing the data and analysis outcome in an easy-to-understand manner, and spending time reviewing this, will benefit all involved. However, as mentioned above, generating insightful visualizations takes time. Often, this may take up so much time, that little is left for the interaction itself. This, to the detriment of the outcome of the project.

 

If you want to react, add comments or ask us some questions, please leave a message here: Contact – Quantforce

Welcome to our new website
We are proud to present our new website. New style, new content. A lot of new items to present and also our new Data Discovery tool. Enjoy it!
Enjoy our new software tool!
We are very proud to present our new Data Discovery Tool. QuantDisqovery is a SaaS Tool for feature engineering for predictive or explanatory modeling QuantDisqovery is an easy to use, second level ‘feature engineering’ software for predictive or explanatory model development. It is designed to provide an effective and efficient solution to an important component in the development of… Continue reading Enjoy our new software tool!
TRUE BUSINESS IMPACT OF DATA SCIENCE: FROM HINDSIGHT TO FORESIGHT
In this blog we describe the importance of value of data for the business user through the course of time. In general the business user is interested in information that will help him to optimize his business. Questions that arise are: What happened? : what to do with data from the past Why did it… Continue reading TRUE BUSINESS IMPACT OF DATA SCIENCE: FROM HINDSIGHT TO FORESIGHT

Contact

Hellingweg 86 B, 2583 WH Den Haag

+31 (0)7 0322 6268

+31 (0)6 1089 2849

Quantforce Software B.V.

Quantforce is registered at the Chamber of Commerce in the Netherlands with number 68164432

VAT number: 857328372B01