TechAE Blogs - Explore now for new leading-edge technologies

TechAE Blogs - a global platform designed to promote the latest technologies like artificial intelligence, big data analytics, and blockchain.

Full width home advertisement

Post Page Advertisement [Top]

Hive File Formats

Apache Hive File Formats Explained

This article outlines the file formats that the Hadoop (HDFS) file system supports. After reading this blog, you'll have a better grasp of the various file formats available in Hive, as well as how and when to use them. Apache Hive can read and write a variety of file types that are often used in Apache Hadoop. Hive tables can be created and stored in a variety of file formats, including ORC, RC, Flatfile, Sequence, and others.

File Formats

Hive Text File Format Example

Create table textfile_table
(column_specs)
stored as textfile;

Hive Sequence File Format Example

Create table sequencefile_table
(column_specs)
stored as sequencefile;

Hive RC File Format Example

Create table RCfile_table
(column_specs)
stored as rcfile;

Hive AVRO File Format Example

Create table avro_table
(column_specs)
stored as avro;

Hive ORC File Format Example

Check Architecture behind ORC Format

Create table orc_table
(column_specs)
stored as orc;

Hive Parquet File Format Example

Create table parquet_table
(column_specs)
stored as parquet;
Pros and Cons of File Formats

In the above table, you can learn about the Pros and Cons of different file formats.

The RC and ORC formats outperform the Text and Sequence File formats. When comparing the RC and ORC file formats, ORC is always preferable because it requires less time to retrieve data and takes up less space to store data. The ORC file, on the other hand, adds additional CPU overhead by lengthening the time it takes to decompress relational data.

a) You can use the TEXTFILE format if your data is delimited by some parameters.
b) You can utilize the SEQUENCE FILE format if your data is in little files that are smaller than the block size.
c) RCFILE can be used to perform analytics on your data while also storing it efficiently.
d) You can use the ORCFILE format to store your data in an optimal method that saves storage space and improves performance.

Hive also includes a number of compression techniques, including Gzip, Bzip, LZO, and Snappy. We can use a directory structure to partition the data.

No comments:

Post a Comment

Thank you for submitting your comment! We appreciate your feedback and will review it as soon as possible. Please note that all comments are moderated and may take some time to appear on the site. We ask that you please keep your comments respectful and refrain from using offensive language or making personal attacks. Thank you for contributing to the conversation!

Bottom Ad [Post Page]