There are fundamental differences in working with databases and data lakes. We have translated a short article on the Data Lake device. It is useful for those who do not have a lot of experience with relational databases.
The storage and compute servers operate separately, which is the key difference between a data lake and a database.
In traditional databases (and the earliest lakes for Hadoop), storage is tightly coupled with servers for computing: storage is built into the server, or the server is directly connected to the storage.
In today’s cloud data lake architecture, storage is platform agnostic. Data is stored in cloud object storage – usually in an open format like Parquet. Stateless servers are used for computing; they can be turned on and off as needed.
The advantages of this approach:
In a Database, data is taken from source systems, transformed and loaded into a table, after which it is no longer used. In Data Lake, data remains forever and is perceived as a valuable asset.
But business users generally cannot work with raw data. So the data is processed to improve quality, make it structured and usable. Finally, this data is stored for use by analysts and business users.
Business users only see processed data and therefore value it much more than the raw data from which it was obtained. But the actual value of data lakes lies in the raw data and how you work with it. In a sense, the processed data is like a materialized view that can be refreshed at any time.
Main advantages:
Information requirements change frequently, and later it may be necessary to analyze some data that was not initially included in the sample. In the case of Database, raw data is irretrievably lost if it is not saved.
Data lakes work differently: if today you decide that certain data does not need to be loaded into the processing system, then nothing terrible will happen – you can add it later. All data is securely stored in Data Lake, and the source with raw data can be recreated at any time.
Main advantages:
Data lakes do not replace databases; each tool has its strengths and weaknesses. It is illogical to use data lakes for OLTP, as well as databases for storing unstructured data. I hope my article helped you understand the differences between the two systems.
Also Read: Differences Between Cloud And Boxed Bitrix24
ZYN, a leader in tar-free and nicotine pouches, started the trend with its breakthrough reward…
Want to learn about Hyvee Huddle as an employee? We cover you. The perks, Hy-Vee…
Qiuzziz stands as a distinctive online platform that has all kinds of Qiuzziz for learners…
In the recent era Instagram has become the most influential social media application. Where likes,…
Zepp Health announces the arrival of Zepp OS 3.5 with Zepp Flow, the natural language…
A new trend appeared on social networks: users are interested not only in photos but…