How To Integrate Tez Over Hive
So after learning about Tez in the previous blog, we are here to set up Apache Tez onto Hive so that our queries run more easily and efficiently.
PREREQUISITES:
STEP 1: Installing Tez
$ wget https://dlcdn.apache.org/tez/0.9.2/apache-tez-0.9.2-bin.tar.gz
# Download the source files for Tez
$ tar xzf apache-tez-0.9.2-bin.tar.gz
# Extracting the source files
$ cd apache-tez-0.9.2-bin/conf
# Changing to the directory's conf folder
STEP 2: Create the tez-site.xml file
Now, we need to create a new tez-site.xml file with the below properties
STEP 3: Modifying .bashrc file
Adding all the Tez configurations in .bashrc file
export TEZ_HOME="apache-tez-0.9.2-bin/"
export TEZ_CONF_DIR="$TEZ_HOME/conf"
export TEZ_JARS="$TEZ_HOME"
if [ -z "$HIVE_AUX_JARS_PATH" ]; then
export HIVE_AUX_JARS_PATH="$TEZ_JARS"
else
export HIVE_AUX_JARS_PATH="$HIVE_AUX_JARS_PATH:$TEZ_JARS"
fi
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
# Adding this to the .bashrc file
$ source ~/.bashrc # Apply the changes by this command
STEP 4: Modifying POM.xml
Change hadoop version to <hadoop.version>3.3.1</hadoop.version>
STEP 5: Modifying mapred-site.xml
Change mapreduce.framework.name from yarn to yarn-tez. Restart the Hadoop cluster to update the settings.
STEP 6: Build binary tar ball
$ mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
# Building Binary Tar Ball
When the build is successful, we move the tez-0.9.2-full.tar.gz to our Tez directory and extract it there.
STEP 7: Copying Jar files in HDFS
$ hadoop fs -mkdir /apache-tez-0.9.2/
$ export TEZ_HOME='apache-tez-0.9.2-bin/'
$ hadoop fs -put $TEZ_HOME/* /apps/apache-tez-0.9.2/
# Copying the relevant jar files onto HDFS directory
STEP 8: Copying the relevant files
$ hdfs dfs -mkdir /user
# Make a new directory “user”
$ hdfs dfs -mkdir /user/tez
$ hdfs dfs -chmod g+w /user/tez
$ cd $TEZ_HOME
# Moving to the tez home directory
$ hdfs dfs -put * /user/tez
$ hadoop fs -put $HIVE_HOME/lib/hive-exec-3.1.2.jar /user/tez
# Copying hive-exec-0.9.2.jar file from $HIVE_HOME/lib directory
into HDFS directory specified in tez.lib.uris property in tez-site.xml file
STEP 9: Integrating Tez on Hive
To run a query on Tez engine, we need whether to set hive.execution.engine=tez; each time for hive session or change this value permanently in hive-site.xml
hive> set hive.execution.engine=tez; # Setting hive execution engine to tez
Or you can permanently change the value in hive-site.xml by changing:
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
Conclusion:
So in this way, you can easily set up tez onto the Hadoop cluster without any hectic procedures.
No comments:
Post a Comment
Thank you for submitting your comment! We appreciate your feedback and will review it as soon as possible. Please note that all comments are moderated and may take some time to appear on the site. We ask that you please keep your comments respectful and refrain from using offensive language or making personal attacks. Thank you for contributing to the conversation!