How To Install Apache Kylin

Looking to improve your big data processing, extreme OLAP engines offer an easy path to success. Try Apache Kylin now!

What is Apache Kylin:

Apache Kylin is a big data open-source Distributed Analytical Data Warehouse that was designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop while supporting extremely large datasets. The latest major version is Kylin 4.0.1.

Latest Features of Kylin 4.0:

It uses Parquet as Storage rather than HBase as in Apache Kylin 3.x.
It uses a new spark build engine instead of MapReduce.
It uses Spark SQL as a query engine instead of Hive/JDBC.

Why Apache Kylin Proved to be Better?

MOLAP Cube Precalculation
Cloud Friendly due to the inclusion of Parquet
Cubing duration and cube size
Interactive SQL Query Interface
Query performance
High concurrency
Seamless integration with BI tools
Real-time OLAP (soon)

Step 1: Installation

Hadoop: 3.3.1
Hive: 3.1.2
Spark: 3.1.1
Mysql: 8.0.29
Zookeeper: 3.6.3
JDK: 1.8
OS: Ubuntu 20.0.4

Run all the necessary commands to start the Hadoop cluster with other services.

Step 2: Kylin setup

Download Apache Kylin 4.0.1 binary package from the Apache Kylin Download Site. For example, the following command line can be used:

wget https://dlcdn.apache.org/kylin/apache-kylin-4.0.1/apache-kylin-4.0.1-bin-spark3.tar.gz

Unzip the tarball and configure the environment variable $KYLIN_HOME to the Kylin folder in the .bashrc file.

tar -zxvf apache-kylin-4.0.1-bin-spark3.tar.gz
cd apache-kylin-4.0.1-bin-spark3
export KYLIN_HOME=`pwd`

Step 3: Configure MySQL metastore

Kylin 4.0 uses MySQL as metadata storage, make the following configuration in kylin.properties:

kylin.metadata.url=kylin_metadata@jdbc,driverClassName=com.mysql.jdbc.Driver,url=jdbc:mysql://localhost/metastore,username=hiveuser,password=hivepassword
kylin.env.zookeeper-connect-string=localhost:2181

Also put MySQL JDBC connector into $KYLIN_HOME/ext/, if there is no such directory, please create it. You can download the MySQL JDBC connector from here.

Two more jar files need to be placed in $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib

Step 4: Start Kylin

Run the script, $KYLIN_HOME/bin/kylin.sh start , to start Kylin. The interface output is as follows:

Retrieving hadoop conf dir...
KYLIN_HOME is set to /home/hdoop/apache-kylin-4.0.1-bin-spark3
......
A new Kylin instance is started by hdoop. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/apache-kylin-4.0.1-bin-spark3/logs/kylin.log
Web UI is at http://localhost:7070/kylin

As soon as you run this command, Web UI will start. The initial username and password are ADMIN/KYLIN.

Stop Kylin

Run the $KYLIN_HOME/bin/kylin.sh stop script to stop Kylin. The console output is as follows:

Stopping Kylin: 14818
Stopping in progress. Will check after 2 secs again...
Kylin with pid 14818 has been stopped.

HDFS Storage

Kylin will generate files on HDFS. The default root directory is “kylin/”, and then the metadata table name of kylin cluster will be used as the second layer directory name, and the default is “kylin_metadata” (can be customized in conf/kylin.properties).

Enable Dashboard

In conf/kylin.properties, add the following configuration:

kylin.web.dashboard-enabled=true

Conclusion:

To conclude, this is an advanced open-source data warehouse where you can explore many functionalities, I have explained the installation of Apache Kylin. Further, I will be explaining how to create models and cubes in my next article.

If you found this article useful, feel free to read more articles on this topic.

Cheers!