Using Apache Superset, a Powerful and Free Data Analysis Tool

chauhoangminhnguyen - Jul 7 - - Dev Community

Introduction

Among data analysis tools, Apache Superset, provided as open-source software, is considered one of the best choices for deploying reports at a large scale efficiently and completely free of charge. In this article, I will guide you through installing, configuring Superset, and connecting data sources.

This application was initiated by Maxime Beauchemin (the creator of Apache Airflow) as a hackathon project when he was working at Airbnb, and it joined the Apache Incubator program in 2017.

Essentially, Superset's features are quite similar to other data analysis software, including:

  • Creating and managing dashboards
  • Supporting multiple database types: SQLite, PostgreSQL, MySQL, etc.
  • Supporting direct querying

Apache Superset

Installation and Configuration

Here, I will guide you through installing Superset using the following Docker command:

docker run -d -p {outside port}:{inside port} --name {container name} apache/superset
Enter fullscreen mode Exit fullscreen mode

Example:

docker run -d -p 8080:8088 --name superset apache/superset
Enter fullscreen mode Exit fullscreen mode

After the Superset Docker container is running, we access that container to run the command for initializing an account as follows:

docker exec -it superset superset fab create-admin --username {username} --firstname {firstname} --lastname {lastname} --email {email} --password {password}
Enter fullscreen mode Exit fullscreen mode

Example:

docker exec -it superset superset fab create-admin --username admin --firstname Superset --lastname Admin --email admin@superset.com --password admin
Enter fullscreen mode Exit fullscreen mode

Next, you run the following command to load some pre-existing examples:

docker exec -it superset superset load_examples
Enter fullscreen mode Exit fullscreen mode

To start Superset:

docker exec -it superset superset init
Enter fullscreen mode Exit fullscreen mode

After that, you can access http://localhost:8080 to start using Superset. The result will have some example data that we loaded previously.

Main page

Connecting Data Sources

To analyze data, you first need to create a connection to the database source (such as Postgres, MySQL, etc.). The connection process is simple and similar to how typical data connection tools work. Here, I will guide you on how to connect to PostgreSQL. If you are not familiar with Postgres, you can refer to this article to install and use PostgreSQL basics.

First, access the page to create a new database connection.

Connect a database

Next, enter the SQLALCHEMY URI with the following structure:

postgresql://{username}:{password}@{host}:{port}/{database}  
Enter fullscreen mode Exit fullscreen mode

After successfully connecting, you can use the features that Apache Superset supports, such as creating Dashboards, creating charts (with support for many chart types and diverse customization capabilities), querying data, saving queries, and viewing query history.

Creating Charts based on Datasets

SQL Query

Conclusion

Apache Superset provides relatively comprehensive tools to support data analysis and visualization. It can embed query results into other applications, connect to various data sources, and, importantly, it is open-source and completely free.

Although it may not be comparable to powerful paid tools like Tableau or Power BI in some aspects, overall, Superset is a very worthwhile tool because it meets most data analysis and reporting needs.

What do you think? Leave a comment below!

If you found this content helpful, please visit the original article on my blog to support the author and explore more interesting content.

BlogspotDev.toFacebookX


Some series you might find interesting:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player