Retail analytics sample database

Install the PostgreSQL-compatible Retail Analytics dataset on the YugabyteDB distributed SQL database.

You can install and use the Retail Analytics sample database using:

A local installation of YugabyteDB. To install YugabyteDB, refer to Quick Start.
Using cloud shell or a client shell to connect to a cluster in YugabyteDB Aeon. Refer to Connect to clusters in YugabyteDB Aeon. To get started with YugabyteDB Aeon, refer to Quick Start.

In either case, you use the YugabyteDB SQL shell (ysqlsh) CLI to interact with YugabyteDB using YSQL.

About the Retail Analytics database

The Retail Analytics dataset includes sample data in the following tables:

Products: Product information
Users: Customers who have bought products
Orders: Orders made by customers
Reviews: Product reviews

Install the Retail Analytics sample database

The Retail Analytics SQL scripts reside in the share folder of your YugabyteDB or client shell installation. They can also be found in the sample directory of the YugabyteDB GitHub repository. The following files will be used for this exercise:

schema.sql — Creates the tables and constraints
orders.sql — Loads product orders
products.sql — Loads products
reviews.sql — Loads product reviews
users.sql — Loads customer information

Follow the steps here to install the Retail Analytics sample database.

Open the YSQL shell

If you are using a local installation of YugabyteDB, run the ysqlsh command from the yugabyte root directory.

$ ./bin/ysqlsh

If you are connecting to YugabyteDB Aeon, open the ysqlsh cloud shell, or run the YSQL connection string for your cluster.

Create a database

You can do this as follows:

yugabyte=# CREATE DATABASE yb_demo;

yugabyte=# GRANT ALL ON DATABASE yb_demo to yugabyte;

yugabyte=# \c yb_demo;

Load data

First create the four tables necessary to store the data.

yb_demo=# \i share/schema.sql;

Now load the data into the tables.

\i share/products.sql;
\i share/users.sql;
\i share/orders.sql;
\i share/reviews.sql;

Explore the Retail Analytics database

Display the schema of the products table as follows:

yb_demo=# \d products

                                        Table "public.products"
   Column   |            Type             | Collation | Nullable |               Default
------------+-----------------------------+-----------+----------+--------------------------------------
 id         | bigint                      |           | not null | nextval('products_id_seq'::regclass)
 created_at | timestamp without time zone |           |          |
 category   | text                        |           |          |
 ean        | text                        |           |          |
 price      | double precision            |           |          |
 quantity   | integer                     |           |          | 5000
 rating     | double precision            |           |          |
 title      | text                        |           |          |
 vendor     | text                        |           |          |
Indexes:
    "products_pkey" PRIMARY KEY, lsm (id HASH)

Simple queries

To see how many products there are in this table, run the following query.

yb_demo=# SELECT count(*) FROM products;

 count
-------
   200
(1 row)

The following query selects the id, title, category, price, and rating columns for the first five products.

yb_demo=# SELECT id, title, category, price, rating
          FROM products
          LIMIT 5;

 id  |           title            | category |      price       | rating
-----+----------------------------+----------+------------------+--------
  22 | Enormous Marble Shoes      | Gizmo    | 21.4245199604423 |    4.2
  38 | Lightweight Leather Gloves | Gadget   | 44.0462485589292 |    3.8
 162 | Gorgeous Copper Knife      | Gadget   | 22.3785988001101 |    3.3
 174 | Rustic Iron Keyboard       | Gadget   | 74.4095392945406 |    4.4
  46 | Rustic Linen Keyboard      | Gadget   | 78.6996782532274 |      4
(5 rows)

To view the next 3 products, add an OFFSET 5 clause to start from the fifth product.

yb_demo=# SELECT id, title, category, price, rating
          FROM products
          LIMIT 3 OFFSET 5;

 id  |           title           | category  |      price       | rating
-----+---------------------------+-----------+------------------+--------
 152 | Enormous Aluminum Clock   | Widget    | 32.5971248660044 |    3.6
   3 | Synergistic Granite Chair | Doohickey | 35.3887448815391 |      4
 197 | Aerodynamic Concrete Lamp | Gizmo     | 46.7640712447334 |    4.6
(3 rows)

The JOIN clause

Use a JOIN clause to combine rows from two or more tables, based on a related column between them.

The following JOIN query selects the total column from the orders table, and for each of these orders, fetches the id, name, and email from the users table of the corresponding users that placed those orders. The related column between the two tables is the user's id.

yb_demo=# SELECT users.id, users.name, users.email, orders.id, orders.total
          FROM orders INNER JOIN users ON orders.user_id=users.id
          LIMIT 10;

  id  |        name         |             email             |  id   |      total
------+---------------------+-------------------------------+-------+------------------
  616 | Rex Thiel           | rex-thiel@gmail.com           |  4443 | 101.414602060277
 2289 | Alanis Kovacek      | alanis.kovacek@yahoo.com      | 17195 | 71.8499366564206
   37 | Jaleel Collins      | jaleel.collins@gmail.com      |   212 | 38.8821451022809
 2164 | Cordia Farrell      | cordia.farrell@gmail.com      | 16223 | 37.7489430287531
 1528 | Donny Murazik       | murazik-donny@hotmail.com     | 11546 | 52.3082273751586
 1389 | Henriette O'Connell | connell-o-henriette@yahoo.com | 10551 | 69.3117644687696
 2408 | Blake Jast          | jast.blake@hotmail.com        | 18149 | 150.788925887077
 1201 | Kaycee Keebler      | kaycee-keebler@gmail.com      |  8937 | 48.3440955866708
 1421 | Cornell Cartwright  | cornell-cartwright@gmail.com  | 10772 | 191.867670306882
  523 | Deonte Hoeger       | hoeger.deonte@hotmail.com     |  3710 | 71.4010754169826
(10 rows)

Distributed transactions

To track quantities accurately, each product being ordered in some quantity by a user has to decrement the corresponding product inventory quantity. These operations should be performed inside a transaction.

Imagine the user with id 1 wants to order 10 units of the product with id 2.

Before running the transaction, you can verify the quantity of product 2 in stock by running the following query:

yb_demo=# SELECT id, category, price, quantity FROM products WHERE id=2;

SELECT id, category, price, quantity FROM products WHERE id=2;
 id | category  |      price       | quantity
----+-----------+------------------+----------
  2 | Doohickey | 70.0798961307176 |     5000
(1 row)

To place the order, run the following transaction:

yb_demo=# BEGIN TRANSACTION;

/* First insert a new order into the orders table. */
INSERT INTO orders
  (id, created_at, user_id, product_id, discount, quantity, subtotal, tax, total)
VALUES (
  (SELECT max(id)+1 FROM orders)                 /* id */,
  now()                                          /* created_at */,
  1                                              /* user_id */,
  2                                              /* product_id */,
  0                                              /* discount */,
  10                                             /* quantity */,
  (10 * (SELECT price FROM products WHERE id=2)) /* subtotal */,
  0                                              /* tax */,
  (10 * (SELECT price FROM products WHERE id=2)) /* total */
) RETURNING id;

/* Next decrement the total quantity from the products table. */
UPDATE products SET quantity = quantity - 10 WHERE id = 2;

COMMIT;

Verify that the order got inserted by running the following command:

yb_demo=# select * from orders where id = (select max(id) from orders);

  id   |         created_at         | user_id | product_id | discount | quantity |     subtotal     | tax |      total
-------+----------------------------+---------+------------+----------+----------+------------------+-----+------------------
 18761 | 2020-01-30 09:24:29.784078 |       1 |          2 |        0 |       10 | 700.798961307176 |   0 | 700.798961307176
(1 row)

To verify that total quantity of product id 2 in the inventory has been updated, run the following query:

yb_demo=# SELECT id, category, price, quantity FROM products WHERE id=2;

 id | category  |      price       | quantity
----+-----------+------------------+----------
  2 | Doohickey | 70.0798961307176 |     4990
(1 row)

Built-in functions

YSQL supports a rich set of built-in functions.

To find out how users are signing up for the site, list the unique set of source channels present in the database using the DISTINCT function, as follows:

yb_demo=# SELECT DISTINCT(source) FROM users;

source
-----------
 Facebook
 Twitter
 Organic
 Affiliate
 Google
(5 rows)

Use the MIN, MAX, and AVG functions to show prices of products in the store, as follows:

yb_demo=# SELECT MIN(price), MAX(price), AVG(price) FROM products;

min               |       max        |       avg
------------------+------------------+------------------
 15.6919436739704 | 98.8193368436819 | 55.7463996679207
(1 row)

Aggregations

Use the GROUP BY clause to aggregate data.

To determine the most effective channel for user sign ups, run the following command:

yb_demo=# SELECT source, count(*) AS num_user_signups
          FROM users
          GROUP BY source
          ORDER BY num_user_signups DESC;

source     | num_user_signups
-----------+------------------
 Facebook  |              512
 Affiliate |              506
 Google    |              503
 Twitter   |              495
 Organic   |              484
(5 rows)

Discover the most effective channel for product sales by revenue by running the following command:

yb_demo=# SELECT source, ROUND(SUM(orders.total)) AS total_sales
          FROM users LEFT JOIN orders ON users.id=orders.user_id
          GROUP BY source
          ORDER BY total_sales DESC;

  source   | total_sales
-----------+-------------
 Facebook  |      333454
 Google    |      325184
 Twitter   |      320150
 Organic   |      319637
 Affiliate |      297605
(5 rows)

Views

To answer questions such as what percentage of the total sales is from the Facebook channel, you can create a view.

yb_demo=# CREATE VIEW channel AS
            (SELECT source, ROUND(SUM(orders.total)) AS total_sales
             FROM users LEFT JOIN orders ON users.id=orders.user_id
             GROUP BY source
             ORDER BY total_sales DESC);

Now that the view is created, you can see it in the list of relations.

yb_demo=# \d

               List of relations
 Schema |      Name       |   Type   |  Owner
--------+-----------------+----------+----------
 public | channel         | view     | yugabyte
 public | orders          | table    | yugabyte
 public | orders_id_seq   | sequence | yugabyte
 public | products        | table    | yugabyte
 public | products_id_seq | sequence | yugabyte
 public | reviews         | table    | yugabyte
 public | reviews_id_seq  | sequence | yugabyte
 public | users           | table    | yugabyte
 public | users_id_seq    | sequence | yugabyte
(9 rows)

yb_demo=# SELECT source,
            total_sales * 100.0 / (SELECT SUM(total_sales) FROM channel) AS percent_sales
          FROM channel
          WHERE source='Facebook';

  source  |  percent_sales
----------+------------------
 Facebook | 20.8927150492159
(1 row)