Apache Superset+Apache Drill:Query Anything-Part -01 (Getting Started+JSON File Example)

Learn to Query and Visualize anything: CSV/Text/Excel/JSON/TSV/Avro/videos/Images/Parquet/NoSQL/SQL Databases etc.

https://drill.apache.org/

https://github.com/apache/drill

Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage

Agility

Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.)

Flexibility

Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data

Familiarity

Leverage your existing SQL skillsets and BI tools including Tableau, Qlikview, MicroStrategy, Spotfire, Excel and more

Query any non-relational datastore (well, almost…)

Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.

Drill’s datastore-aware optimizer automatically restructures a query plan to leverage the datastore’s internal processing capabilities. In addition, Drill supports data locality, so it’s a good idea to co-locate Drill and the datastore on the same nodes.

https://github.com/apache/superset

https://superset.apache.org/

Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.

Powerful yet easy to use

Superset makes it easy to explore your data, using either our simple no-code viz builder or state-of-the-art SQL IDE.

Integrates with modern databases

Superset can connect to any SQL-based databases including modern cloud-native databases and engines at petabyte scale.

Modern architecture

Superset is lightweight and highly scalable, leveraging the power of your existing data infrastructure without requiring yet another ingestion layer.

Rich visualizations and dashboards

Superset ships with 40+ pre-installed visualization types. Our plug-in architecture makes it easy to build custom visualizations.

Lets get started with Installation:

Step 01: go to official website as mentioned above and click on download

Step 02: Click on direct file download

Step 03: Now extract the zip file and get into the root directory of apache drill and then into bin folder.

Step 04: Now start Apache drill in embedded mode with below command and press enter

./drill-embedded

Step 05: Now you can open Apache Drill gui at localhost:8047

Here is quick youtube video for visual reference.

Step 06: Now next step is to connect Apache Superset, if you do not have Apache Superset then you can install it by reading below articles.

Step 07: Before you can Connect Apache Superset with Apache Drill, You need to install database dependency, You can install it with below command.

pip install sqlalchemy-drill

Step 08: Now open Apache Superset and click on add database icon.

Step 09: Click on supported Databases and choose Apache Drill.

Step 10: Now enter SQL Alchemy URL as given below

drill+ sadrill://localhost:8047/dfs?use_ss|=False

Step 11: Now click on Test connection to check if everything is ok.

Step 12: Now click on connect so that Apache drill gets connected with Apache Superset.

Step 13: Now click on SQLLab

Step 14: Now run the below sql query for JSON in SQL Lab.

SELECT * FROM cp.'employee.json`

Step 15: Now you can see the results, you can choose create chart to create visualisation as per your requirement.

Step 16: Before you can save chart to dashboard, you need to create dashboard and to do that click on dashboard.

Step 17: Click on add dashboard button

Step 18: Now Provide name to your dashboard

Step 19: Click on save dashboard.

Step 20: Now go back to your chart screen

Step 21: Now click on save chart

Step 22: Now Enter Chart Name, Dataset Name and Dashboard where you want to save the chart and click on save or save and go to dashboard.

Step 23: click on edit dashboard to resize the chart layout.

Step 24: Now drag and drop the chart to fit the layout or as per your specification and then click on save.

Step 25: Now we are done with Part 01, We will see in Next step on How to Query Excel and Import as Dataset (Virtual Table in Apache Superset).

For this Part here is quick Youtube video for visual reference.