Datahub

DataHub is an open-source metadata platform for the data stack. DataHub is a modern data catalog built to enable end-to-end data discovery, data observability, and data governance. It supports various data sources including PostgreSQL.

Because YugabyteDB's YSQL API is wire-compatible with PostgreSQL, Datahub can connect to YugabyteDB as a data source using the PostgreSQL plugin.

Setup

You can run the Docker Compose quickStart example provided in the Datahub GitHub repository against YugabyteDB with the following changes:

  • Replace the MySql Docker image with that of YugabyteDB.
  • Specify the entrypoint command for the YugabyteDB Docker container.
  • Change port from 5432 to 5433
  • Change username and password to yugabyte.
  • Change the driver to org.postgresql.Driver.

Make changes in the following files:

  • In docker/quickstart/docker-compose-without-neo4j.quickstart.yml, change the following:

    • Change the EBEAN_DATASOURCE configuration [lines 80-84 and 126-130] as follows:

      EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
      EBEAN_DATASOURCE_HOST=yugabyte:5433
      EBEAN_DATASOURCE_PASSWORD=yugabyte
      EBEAN_DATASOURCE_URL=jdbc:postgresql://yugabyte:5433/yugabyte
      EBEAN_DATASOURCE_USERNAME=yugabyte
      
    • Change mysql-setup to postgres-setup [line 123].

    • Replace the mysql and mysql-setup container [lines 197 - 231] with yugabyte and postgres-setup container as follows:

      yugabyte:
         container_name: yugabyte
         hostname: yugabyte
         image: yugabytedb/yugabyte:latest
         command: /bin/bash /home/yugabyte/docker-entrypoint-initdb.d/yb-init.sh
         environment:
           POSTGRES_USER: ${POSTGRES_USER:-yugabyte}
           POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-yugabyte}
         ports:
         - '5433:5433'
         volumes:
         - ./yb-setup/:/home/yugabyte/docker-entrypoint-initdb.d/
         healthcheck:
           test: bin/ysqlsh -h `hostname -i` -U yugabyte -tAc 'select 1' -d yugabyte
           interval: 10s
           timeout: 5s
           retries: 20
      postgres-setup:
        container_name: postgres-setup
        depends_on:
          yugabyte:
            condition: service_healthy
        environment:
        - POSTGRES_HOST=yugabyte
        - POSTGRES_PORT=5433
        - POSTGRES_USERNAME=yugabyte
        - POSTGRES_PASSWORD=yugabyte
        - DATAHUB_DB_NAME=yugabyte
        hostname: yugabyte-setup
        image: ${DATAHUB_POSTGRES_SETUP_IMAGE:-acryldata/datahub-postgres-setup}:${DATAHUB_VERSION:-head}
      
  • Create a directory yb-setup in docker/quickstart/ and a script file named yb-init.sh with the following content and place it under docker/quickstart/yb-setup/ in the repository. The script runs during container initialization to launch the YugabyteDB cluster.

    bin/yugabyted start
    
    sleep 5
    
    bin/ysqlsh -h `hostname -i` -f /home/yugabyte/docker-entrypoint-initdb.d/init.sql
    tail -f /dev/null
    
  • Copy the file docker/postgres/init.sql to docker/quickstart/yb-setup/.

Run the example

Run the example using the following command:

docker compose -f docker-compose-without-neo4j.quickstart.yml up -d

After all the containers are running, you can ingest some demo data by running ./datahub/docker/ingestion/ingestion.sh, or head to http://localhost:9002 (username: datahub, password: datahub) to access the UI.