Skip to content

Cloudberry Middleware

sadeem edited this page Aug 19, 2019 · 4 revisions

Software Architecture

Neo

This module is the entry point to the client. It sets up the web server that wraps up the middleware logics. The server is implemented by using Play Framework.

  • conf

    This folder contains the following configuration files
    • routes : defines the HTTP entries.
    • logback.xml : defines the log level.
    • application.conf : defines the develop application related configurations.
    • production.conf : the same as application.conf, but is used in the online machines.
  • app

    The server related Scala codes.
    • controllers

      It defines a web server application. The HTTP entries functions defined in the routes are implemented here.
    • views

      It contains the *.scala.html scripted HTML files that will be rendered by Play.
    • actor

      [To be changed] Currently, it contains a NeoActor which translate the JSON request from the web page to the Cloudberry JSON request. It's mainly used to simplify the front-end JS logics since the writer was more comfortable with the strong-typed language. It can be moved to the web page and let the JS send the Cloudberry request directly.
    • db

      It connects the AsterixDB and checks if the berry.meta dataset is there. The metadata will be loaded once into the memory and will be created if not found.
  • public

    The frontend resource folder. Please read the frontend documentation.

Zion

Zion contains the kernel of the middleware work. It is composed of the following general components:

  • Request Parses

    Would parse the incoming request and forward it to the query planner
  • Query Planner

    Responsible for query rewriting depending on the given views information. If there is an appropriate view, the original query will be split into multiple queries to ask different datasets. After all results come back, the Query Planner will merge the results from all queries and return to the client. Not every query can find an appropriate view, especially at the beginning when the system just started. In this case, instead of waiting for the entire query result, Cloudberry can return a serials of partial results as a streaming fashion in a steady pace. It splits the query into a serials of mini-queries. The selectivity of each mini-query is adapted based on the query performance so that each mini-query is guaranteed to finish within a short time limit. Query Slices
  • Data Manager

    This component deals with all data related modules including Metadata manager, View manager, and Datastore manager.

Datastore and Metadata managers are mainly for mapping data format and registering the data Data Store To speed up the query results, views are periodically updated and utilized if available to answer queries View Manager

  • Database Adapter

    Cloudberry maintains an adapter for each datatabase for different languages and connections.

Below are the detailed packages of Zion in code base

  • actor

    It has many types of actors to handle all the workflows.
    • BerryClient

      Each web connection creates one BerryClient that utilizes the JSONParser to get a group of AQLs for DataSetAgent to execute.
    • DataSetAgent

      Each AsterixDB dataset is connected to one unique DataSetAgent. It runs AQLs queries and updates on that dataset. The uniqueness guarantees the read and update consistency.
    • DataStoreManager

      One global actor that sync with the berry.meta AsterixDB dataset which stores the view description and relations.
  • common

    Configuration file (Asterix URL, timeouts)
  • model

    The object model codes
    • datastore

      It defines the interface of the datastore related query model.
    • impl

      • AQLGenerator: It parses a Query by calling generate function that generates a correspondent AQL query with some syntax validation. It uses AQLFuncVisitor util to handle functions such as relation functions(e.g., contains, in) and aggregation functions(e.g., count, min, max). Reference 3
      • JSONParser: It parses a given Query to a JSON record. In addition, because query is part of the DataSetInfo which requires to be serialized and deserialized to and from AsterixDB, we also implement the write interface which can convert a Query back to a JSONRecord. Reference 1, 2
      • DataSetInfo: It is the dataset metadata that contains Schema, CreateQuery(if it is a view), statistical information(e.g., creation time, update time, cardinality). We implemented the corresponding JSON Formater to serialize/deserialize a DataSetInfo to/from a JSONRecord, so it can be stored into AsterixDB. Reference 1, 2
    • schema

      Schema model: Query, Schema, Functions types. Reference 2