Apache Superset+Apache Drill:Query Anything-Part -04 (Apache Druid Example)

Learn to Query and Visualize anything: CSV/Text/Excel/JSON/TSV/Avro/videos/Images/Parquet/NoSQL/SQL Databases etc

https://drill.apache.org/

Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage

Agility

Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.)

Flexibility

Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data

Familiarity

Leverage your existing SQL skillsets and BI tools including Tableau, Qlikview, MicroStrategy, Spotfire, Excel and more

Query any non-relational datastore (well, almost…)

Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.

Drill’s datastore-aware optimizer automatically restructures a query plan to leverage the datastore’s internal processing capabilities. In addition, Drill supports data locality, so it’s a good idea to co-locate Drill and the datastore on the same nodes.

https://github.com/apache/superset

https://superset.apache.org/

Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.

Powerful yet easy to use

Superset makes it easy to explore your data, using either our simple no-code viz builder or state-of-the-art SQL IDE.

Integrates with modern databases

Superset can connect to any SQL-based databases including modern cloud-native databases and engines at petabyte scale.

Modern architecture

Superset is lightweight and highly scalable, leveraging the power of your existing data infrastructure without requiring yet another ingestion layer.

Rich visualizations and dashboards

Superset ships with 40+ pre-installed visualization types. Our plug-in architecture makes it easy to build custom visualizations.

https://druid.apache.org/

Overview

Sub-second queries at any scale

Execute OLAP queries in milliseconds on high-cardinality and high-dimensional data sets with billions to trillions of rows without pre-defining or caching queries in advance.

High concurrency at the lowest cost

Build real-time analytics applications that supports 100s to 100,000s queries per second at consistent performance with a highly efficient architecture that uses less infrastructure than other databases.

Real-time and historical insights

Unlock streaming data potential through Druid’s native integration with Apache Kafka and Amazon Kinesis as it supports query-on-arrival at millions of events per second, low latency ingestion, and guaranteed consistency.® Druid

A high performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load.

Key Druid Features

Interactive Query Engine

Druid utilizes scatter/gather for high speed queries with data preloaded into memory or local storage to avoid data movement and network latency.

Tiering & QoS

Configurable tiering with quality of service enables the ideal price-performance for mixed workloads, guarantees priority, and avoids resource contention.

Optimized Data Format

Ingested data is automatically columnarized, time-indexed, dictionary-encoded, bitmap-indexed, and type-aware compressed.

Elastic Architecture

Loosely coupled components for ingestion, queries, and orchestration combined with a deep storage layer enable easy & quick scale-up & scale-out.

True Stream Ingestion

A connector-free integration with streaming platforms enables query-on-arrival, high scalability, low latency, and guaranteed consistency.

Non-stop Reliability

Automatic data services including continuous backup, automated recovery, and multi-node replication ensure high availability and durability.

Schema Auto-Discovery

Druid can automatically detect, define, and update column names and data types upon ingestion, providing the ease of schemaless and the performance of strongly typed schemas.

Flexible Joins Support

Druid supports join operations during data ingestion and at query-time execution, with the fastest query performance when tables are pre-joined during ingestion.

SQL Support

Developers and analysts can easily use the familiar SQL API for end-to-end data operations across ingestion, transformation, and querying.

Lets get started:

Step 01: Make sure that Apache druid up and running with loaded dataset.