Comparison of various HDFS file formats
I’ve been working with Big data since 2017. As I’m from data warehousing background, it was easier for me to understand what’s what, and build an analogy between DWH & big data frameworks. However, the various file formats used in HDFS always caught me off guard.
In DWH, I never considered how the files are stored in DB, it’s managed by the database, maybe DBA might know how its done at the backend, as a DB developer it never bothered me.