Overview

Oushu Database is a New Data Warehouse powered by Apache HAWQ. It adopts innovative architecture technically combining advantages of both MPP and Hadoop. Oushu Database is highly scalable and has a blazing fast execution engine, which supports interactive queries upon PB level data with full ANSI-SQL standard feature. Oushu Database also supports descriptive analysis and advanced machine learnings so as to get its smooth integration with wide used BI tools. Due to its compatibility with Oracle, GPDB and PostgreSQL, Oushu Database is a good candidate for replacing traditional data warehouses and other SQL-on-Hadoop engines. Targeting cloud environment, Oushu Database supports Kubernetes platforms to enable enterprises to seamlessly migrate to the latest cloud computing platform. Nowadays, HAWQ has been widely deployed and applied in a large number of industries including finance, telecommunication, manufacturing, medical and Internet etc.

Enhancements to Apache HAWQ

  • Brand new executor: 5 - 10 times faster than traditional MPP data warehouses and Hadoop SQL engines
  • Support PaaS / CaaS cloud platform natively
    • The world’s first MPP ++ analytic database that can run in native container cloud platforms
    • Support third party Kubernetes cluster
  • C++ plugable external storage
    • A Replacement of JAVA PXF. It is several times faster and there is no need to install and deploy additional PXF components. This feature greatly simplifies installation, operation and maintenance
    • Natively supports CSV/TEXT formats
    • Can be used to share data among clusters, such as a data warehouse and a data mart
    • Can be used for high-speed data importing and exporting
    • Provides high-speed backup and recovery
    • Supports plugable file systems: such as S3, Ceph and so on
    • Supports plugable file formats: such as ORC, Parquet and so on
  • Support ORC/TEXT/CSV as internal table formats, and support ORC as an external format via C++ pluggable storage interface
  • For CSV and TEXT file formats, multi-charactor delimiters are supported
  • Some critical bug fixes

Main features

  • Blazing fast new executor: 5 - 10 times faster than traditional MPP data warehouses and Hadoop SQL engines
  • On-premise or cloud deployment. Support both Amazon and AliCloud, and also supports popular container cloud platform, such as Kubernetes.
  • Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP extension
  • Extremely high performance. many times faster than traditional data warehouse and other Hadoop SQL engines.
  • World-class parallel optimizer
  • Full transaction capability and consistency guarantee: ACID
  • Dynamic data flow engine through high speed UDP based interconnect
  • Elastic execution engine based on on-demand virtual segments & data locality
  • Support multiple level partitioning and List/Range based partitioned tables.
  • Multiple compression method support: snappy, gzip, lz4, RLE
  • Multi-language user defined function support: python, perl, java, c/c++, R
  • Advanced machine learning and data mining functionalities through MADLib
  • Dynamic node expansion: in seconds
  • Most advanced three level resource management: Integrate with YARN and hierarchical resource queues.
  • Easy access of all HDFS data and external system data (for example, HBase)
  • Hadoop Native: from storage (HDFS), resource management (YARN) to deployment (Ambari).
  • Authentication & Granular authorization: Kerberos, SSL and role based access
  • Advanced C/C++ access library to HDFS and YARN: libhdfs3 & libYARN
  • Support most third party tools: Tableau, SAS et al.
  • Standard connectivity: JDBC/ODBC