Why shouldn't hashtag#data teams let the business query raw data and they are insisting on preparing the data?
Why shouldn’t hashtag#data teams let the business query raw data and they are insisting on preparing the data?
My colleague Bo Lemmers shares her insights on the Xebia blog. The takeaways:
๐ Misinterpreting the results is an issue. Do they know what units each column is using? Where do they find out? What if they assume and, as a result, mess up? ๐ Missing records (or duplicated records). Without the data cleaning step, the business will get frustrated by handling these issues manually! ๐ Slow results. When querying raw, unprocessed data, you risk scanning through billions of rows and columns that are irrelevant to your analysis! ๐ธ High costs. Raw data is not optimized for querying. No optimization, more money needed to query it! โณ Duplicate work. Every team will create their slightly different transformation, wasting hours and hours while work could have been centralized ๐ฆ๏ธ Misaligned decision making. If everyone is left to define key business metrics as monthly active users, there won’t be consensus around these metrics, leading to misaligned decisions.
Read the full article ๐๐ผ
https://xebia.com/blog/questions-were-tired-of-hearing-why-cant-i-just-query-raw-data/
