How To Install Apache Kylin
Looking to improve your big data processing, extreme OLAP engines offer an easy path to success. Try Apache Kylin now!
What is Apache Kylin:
Apache Kylin is a big data open-source Distributed Analytical Data Warehouse that was designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop while supporting extremely large datasets. The latest major version is Kylin 4.0.1.
Latest Features of Kylin 4.0:
- It uses Parquet as Storage rather than HBase as in Apache Kylin 3.x.
- It uses a new spark build engine instead of MapReduce.
- It uses Spark SQL as a query engine instead of Hive/JDBC.
Why Apache Kylin Proved to be Better?
- MOLAP Cube Precalculation
- Cloud Friendly due to the inclusion of Parquet
- Cubing duration and cube size
- Interactive SQL Query Interface
- Query performance
- High concurrency
- Seamless integration with BI tools
- Real-time OLAP (soon)
Step 1: Installation
Run all the necessary commands to start the Hadoop cluster with other services.
Step 2: Kylin setup
Download Apache Kylin 4.0.1 binary package from the Apache Kylin Download Site. For example, the following command line can be used:
wget https://dlcdn.apache.org/kylin/apache-kylin-4.0.1/apache-kylin-4.0.1-bin-spark3.tar.gz
Unzip the tarball and configure the environment variable $KYLIN_HOME to the Kylin folder in the .bashrc file.
tar -zxvf apache-kylin-4.0.1-bin-spark3.tar.gz
cd apache-kylin-4.0.1-bin-spark3
export KYLIN_HOME=`pwd`
Step 3: Configure MySQL metastore
Kylin 4.0 uses MySQL as metadata storage, make the following configuration in kylin.properties:
kylin.metadata.url=kylin_metadata@jdbc,driverClassName=com.mysql.jdbc.Driver,url=jdbc:mysql://localhost/metastore,username=hiveuser,password=hivepassword
kylin.env.zookeeper-connect-string=localhost:2181
Also put MySQL JDBC connector into $KYLIN_HOME/ext/, if there is no such directory, please create it. You can download the MySQL JDBC connector from here.
Two more jar files need to be placed in $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib
Step 4: Start Kylin
Retrieving hadoop conf dir...
KYLIN_HOME is set to /home/hdoop/apache-kylin-4.0.1-bin-spark3
......
A new Kylin instance is started by hdoop. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/apache-kylin-4.0.1-bin-spark3/logs/kylin.log
Web UI is at http://localhost:7070/kylin
As soon as you run this command, Web UI will start. The initial username and password are ADMIN/KYLIN.
Stop Kylin
Run the $KYLIN_HOME/bin/kylin.sh stop script to stop Kylin. The console output is as follows:
Stopping Kylin: 14818
Stopping in progress. Will check after 2 secs again...
Kylin with pid 14818 has been stopped.
HDFS Storage
Kylin will generate files on HDFS. The default root directory is “kylin/”, and then the metadata table name of kylin cluster will be used as the second layer directory name, and the default is “kylin_metadata” (can be customized in conf/kylin.properties).
Enable Dashboard
In conf/kylin.properties, add the following configuration:
kylin.web.dashboard-enabled=true
Conclusion:
To conclude, this is an advanced open-source data warehouse where you can explore many functionalities, I have explained the installation of Apache Kylin. Further, I will be explaining how to create models and cubes in my next article.
If you found this article useful, feel free to read more articles on this topic.
Cheers!
No comments:
Post a Comment
Thank you for submitting your comment! We appreciate your feedback and will review it as soon as possible. Please note that all comments are moderated and may take some time to appear on the site. We ask that you please keep your comments respectful and refrain from using offensive language or making personal attacks. Thank you for contributing to the conversation!