Why does Sandbox matter? Databases and datasets are increasing in abundance. The main problem is how to use data to get useful and reliable insights. Selecting specific datasets and using a specific statistical approach is a matter of choice. The more solid and transparent a specific approach is, the better the analysis of data will be. There are two problems that the Data Sandbox aims to answer:
The first is normalisation of data or, in layman’s words, it is the way to avoid comparing oranges and apples. We use geography for ‘normalisation’. In the era of fluctuating identities, geography is the most certain anchor. In the first phase, we started by comparing countries. Later on, the system can move deeper into communities, cities, and other entities.
The second is the data modelling challenge which came to the centre of controversy around the recent WHO report on the number of deaths during the pandemics. WHO did good work. But, they presented fuzzy and probability concepts in a certainty framework (‘data tells us’). It triggered a reaction from quite a few countries. To address this problem, we have been shifting from ‘data tells us’ to ‘data may explain’ (from forced certainty to scientific probability).
The pilot version of Data Sandbox uses a wide range of statistical methods to show how data modelling can bring different results and insights. In this way, by giving people a chance to model data, we can ‘save’ the relevance of data and evidence which is increasingly threatened by ‘data religion’ (use of data with ‘religious’ certainty instead of scientific probability).