Apache Hive File Formats Explained
This article outlines the file formats that the Hadoop (HDFS) file system supports. After reading this blog, you'll have a better grasp of the various file formats available in Hive, as well as how and when to use them. Apache Hive can read and write a variety of file types that are often used in Apache Hadoop. Hive tables can be created and stored in a variety of file formats, including ORC, RC, Flatfile, Sequence, and others.
Hive Text File Format Example
Create table textfile_table
(column_specs)
stored as textfile;
Hive Sequence File Format
Example
Create table sequencefile_table
(column_specs)
stored as sequencefile;
Hive RC File Format Example
Create table RCfile_table
(column_specs)
stored as rcfile;
Hive AVRO File Format Example
Create table avro_table
(column_specs)
stored as avro;
Hive ORC File Format Example
Check Architecture behind ORC Format
Create table orc_table
(column_specs)
stored as orc;
Hive Parquet File Format
Example
Create table parquet_table
(column_specs)
stored as parquet;
In the above table, you can learn about the Pros and Cons of different file formats.
The RC and ORC formats outperform the Text and Sequence File formats. When comparing the RC and ORC file formats, ORC is always preferable because it requires less time to retrieve data and takes up less space to store data. The ORC file, on the other hand, adds additional CPU overhead by lengthening the time it takes to decompress relational data.
a) You can use the TEXTFILE format if your data is delimited by some parameters.
b) You can utilize the SEQUENCE FILE format if your data is in little files that are smaller than the block size.
c) RCFILE can be used to perform analytics on your data while also storing it efficiently.
d) You can use the ORCFILE format to store your data in an optimal method that saves storage space and improves performance.
Hive also includes a number of compression techniques, including Gzip, Bzip, LZO, and Snappy. We can use a directory structure to partition the data.
No comments:
Post a Comment
Thank you for submitting your comment! We appreciate your feedback and will review it as soon as possible. Please note that all comments are moderated and may take some time to appear on the site. We ask that you please keep your comments respectful and refrain from using offensive language or making personal attacks. Thank you for contributing to the conversation!