Describe the most common configuration of data repository in the real world and corporate environment. Concepts such as Operation systems (oltp), Data Warehouse DW, Data Marts, Analitical and statistical systems (olap), etc. Try to draw a conceptual picture of how all these elements works toghether and how the flow of data and informations is processed to extract useful knowledge from raw data.
It’s one thing to collect and store data
“Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.”
it’s another to accurately decipher what the data is saying
Aaron Levenstein, Business Professor at Baruch College
Successful organizations continue to derive business value from their data. One of the first steps towards a successful big data strategy is choosing the underlying technology of how data will be stored, searched, analyzed, and reported on.
Data repository
A data repository is also known as a data library or data archive. Is a large database infrastructure that collect, manage, and store data sets for data analysis, sharing and reporting.
Examples are:
- data warehouse
- data lake
- data marts
We have a source of data (i.e. Businesses), that with a stream store the data in a repository (i.e. Data Lake).
OLTP(On-line Transaction Processing) is the operation systems that provide source data to data repository such as Data Lake.

Data lakes are a great way to store huge amounts of data and drive business insights. But they have limited governance and weak traceability, lineage, and quality. Many lakes have turned into swamps, that is a storage more confusionary.
With some operations, i.e. ETL(Extraxt, Trasform and Load) or ELT, we can build and store the data in Data Warehouse, a more organized data repository.

Data Warehouse stores data in files or folders which helps to organize and use the data to take strategic decisions.
From Data Warehouse we can create another kind of repository that are most focus on specific task: Data Marts.
From Data Marts we have operation systems that analize the data, i.e. OLAP (On-line Analytical Processing)
Conclusions
Data lakes offer the flexibility of storing raw data, including all the meta data and a schema can be applied when extracting the data to be analyzed. Databases and Data Warehouses require ETL processes where the raw data is transformed into a pre-determined structure, also known as, schema-on-write.
Data warehouses typically deal with large data sets, but data analysis requires easy-to-find and readily available data. That’s why smart companies use data marts.
The data marts are one key to efficiently transforming information into insights.
Even with the improved flexibility and efficiency that data marts offer, big data—and big business—is still becoming too big for many on-premises solutions. As data warehouses and data lakes move to the cloud, so too do data marts.


Sources:
https://www.mackenziecorp.com/much-data-not-enough-data-lets-start/
https://www.confluent.io/learn/database-data-lake-data-warehouse-compared/
https://www.talend.com/resources/what-is-data-mart/

