How to Connect Apache Superset with Apache SparkSQL

Learn to Connect Apache Superset (open-source modern data exploration and visualization platform) with Apache SparkSQL (Apache Spark :Unified engine for large-scale data analytics)

What is Apache Spark?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

Batch/streaming data

Unify the processing of your data in batches and real-time streaming, using your preferred language: Python, SQL, Scala, Java or R

SQL analytics

Execute fast, distributed ANSI SQL queries for dashboarding and ad-hoc reporting. Runs faster than most data warehouses.

Data science at scale

Perform Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling

Machine learning

Train machine learning algorithms on a laptop and use the same code to scale to fault-tolerant clusters of thousands of machines.

Apache Spark Website

Apache Spark Github Repository

What is Apache Superset?

Apache Superset™ is an open-source modern data exploration and visualization platform

Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.

Powerful yet easy to use

Superset makes it easy to explore your data, using either our simple no-code viz builder or state-of-the-art SQL IDE.

Integrates with modern databases

Superset can connect to any SQL-based databases including modern cloud-native databases and engines at petabyte scale.

Modern architecture

Superset is lightweight and highly scalable, leveraging the power of your existing data infrastructure without requiring yet another ingestion layer.

Rich visualizations and dashboards

Superset ships with 40+ pre-installed visualization types. Our plug-in architecture makes it easy to build custom visualizations.

Apache Superset Website

Apache Superset Github Repository

How to Connect Apache Superset with Apache SparkSQL

Before starting connection, Kindly make sure to install database connection driver as mentioned on Apache Superset Website.

Step 01: Click on Database Connections

Step 02: Click on add database.

Step 03: First click on supported databases and then choose Apache SparkSQL

Step 04: Enter database connection url as follows and click on test connection

hive://localhost:10000/default

Step 05: Once Database Connection is ok then click on connect.

Step 06: Now database is connected, let’s go to SQL Lab and check if Connection is working fine. Click on SQL Lab

Step 07: From database list, Click on Apache SparkSQL

Step 08: Now you can see default schema.