Join us on
Star us on
Get Started
Slack
GitHub
Get Started
v2.5 (latest) v2.2 (stable) v2.1 (earlier version) v2.0 (earlier version) v1.3 (earlier version)
  • YUGABYTEDB CORE
    • Quick start
      • 1. Install YugabyteDB
      • 2. Create a local cluster
      • 3. Explore distributed SQL
      • 4. Build an application
        • Java
        • NodeJS
        • Go
        • Python
        • Ruby
        • C#
        • PHP
        • C++
        • C
        • Scala
    • Explore features
      • Linear scalability
      • Fault tolerance
      • Global distribution
      • Auto sharding
      • Follower reads
      • Observability
      • Colocated tables
      • Change data capture (CDC)
      • Two data center (2DC)
    • Develop
      • Learn app development
        • 1. SQL vs NoSQL
        • 2. Data modeling
        • 3. Data types
        • 4. ACID transactions
        • 5. Aggregations
        • 6. Batch operations
        • 7. Date and time
        • 8. Strings and text
        • 9. TTL for data expiration
      • Ecosystem integrations
        • Apache Kafka
        • Apache Spark
        • JanusGraph
        • KairosDB
        • Presto
        • Metabase
      • Build GraphQL apps
        • Hasura
        • Prisma
      • Real-world examples
        • E-Commerce app
        • IoT fleet management
        • Retail Analytics
      • Explore sample apps
      • Best practices
    • Migrate
      • Migration Process
      • Migrating From PostgreSQL
        • Schema Migration
        • App Migration
        • Export Data
        • Prepare Cluster
        • Import Data
        • Verify Migration
    • Deploy
      • Deployment checklist
      • Manual deployment
        • 1. System configuration
        • 2. Install software
        • 3. Start YB-Masters
        • 4. Start YB-TServers
        • 5. Verify deployment
      • Kubernetes
        • Single-zone
          • Open Source
          • Amazon EKS
          • Google Kubernetes Engine
          • Azure Kubernetes Service
        • Multi-zone
          • Amazon EKS
          • Google Kubernetes Engine
        • Multi-cluster
          • Google Kubernetes Engine
        • Best practices
        • Connect clients
      • Docker
      • Public clouds
        • Amazon Web Services
        • Google Cloud Platform
        • Microsoft Azure
      • Multi-DC deployments
        • Three+ data center (3DC)
        • Two data center (2DC)
        • Read replica clusters
      • Change data capture (CDC)
        • CDC to Kafka
    • Benchmark
      • TPC-C
      • sysbench
      • YCSB
      • Key-value workload
      • Large datasets
      • Scalability
        • Scaling queries
      • Resilience
        • Jepsen testing
    • Secure
      • Security checklist
      • Authentication
        • Authentication
        • Fine-grained authentication
      • Encryption in transit
        • Create client certificates
        • Create server certificates
        • Enable server-to-server encryption
        • Enable client-to-server encryption
        • Connect to clusters
      • Encryption at rest
      • Authorization
        • RBAC model
        • Create roles
        • Grant privileges
      • Audit logging
    • Manage
      • Back up and restore
        • Back up data
        • Restore data
        • Snapshot and restore data
      • Migrate data
        • Bulk import
        • Bulk export
      • Change cluster configuration
      • Diagnostics reporting
      • Upgrade a deployment
    • Troubleshoot
      • Troubleshooting
      • Cluster level issues
        • YCQL connection issues
        • YEDIS connection Issues
        • Recover tserver/master
      • Node level issues
        • Check servers
        • Inspect logs
        • System statistics
        • Disk failure
    • Contribute
  • YUGABYTE PLATFORM
    • Yugabyte Platform
      • Overview
        • Install
        • Configure
      • Install Yugabyte Platform
        • Prerequisites
        • Prepare the environment
        • Install software
        • Prepare nodes (on-prem)
        • Uninstall software
      • Configure Yugabyte Platform
        • Create admin user
        • Configure the cloud provider
        • Configure the backup target
        • Configure alerts and health checking
        • Create and edit instance tags
      • Create deployments
        • Multi-zone universe
        • Multi-region universe
        • Read replica cluster
      • Manage deployments
        • Start and stop processes
        • Add a node
        • Remove a node
        • Edit a universe
        • Edit configuration flags
        • Upgrade the YugabyteDB software
        • Delete a universe
        • Migrate to Helm 3
      • Back up and restore universes
        • Configure backup storage
        • Back up universe data
        • Restore universe data
        • Schedule data backups
      • Security
        • Security checklist
        • Customize ports
        • Authorization platform
        • Create a KMS configuration
        • Enable encryption at rest
        • Enable encryption in transit (TLS)
        • Network security
      • Troubleshoot
        • Install and upgrade
        • Universe
      • Administer Yugabyte Platform
        • Back up and restore Yugabyte Platform
  • YUGABYTE CLOUD
    • Yugabyte Cloud
      • Free tier
      • Create clusters
      • Monitor clusters
      • Create databases
      • Manage database access
      • Connect to clusters
  • REFERENCE
    • Reference
    • Architecture
      • Design goals
      • Key concepts
        • Universe
        • YB-TServer Service
        • YB-Master Service
      • Layered architecture
      • Query layer
        • Overview
      • DocDB transactions layer
        • Transactions overview
        • Transaction isolation levels
        • Explicit locking
        • Single-row transactions
        • Distributed transactions
        • Transactional IO path
      • DocDB sharding layer
        • Hash & range sharding
        • Tablet splitting
        • Colocated tables
      • DocDB replication layer
        • Replication
        • xCluster replication
        • Read replicas
        • Change data capture (CDC)
      • DocDB storage layer
        • Persistence
        • Performance
    • APIs
      • YSQL
        • Statements
          • ABORT
          • ALTER DATABASE
          • ALTER DEFAULT PRIVILEGES
          • ALTER DOMAIN
          • ALTER GROUP
          • ALTER POLICY
          • ALTER ROLE
          • ALTER SEQUENCE
          • ALTER TABLE
          • ALTER USER
          • BEGIN
          • COMMENT
          • COMMIT
          • COPY
          • CREATE AGGREGATE
          • CREATE CAST
          • CREATE DATABASE
          • CREATE DOMAIN
          • CREATE EXTENSION
          • CREATE FUNCTION
          • CREATE GROUP
          • CREATE INDEX
          • CREATE OPERATOR
          • CREATE OPERATOR CLASS
          • CREATE POLICY
          • CREATE PROCEDURE
          • CREATE ROLE
          • CREATE RULE
          • CREATE SCHEMA
          • CREATE SEQUENCE
          • CREATE TABLE
          • CREATE TABLE AS
          • CREATE TRIGGER
          • CREATE TYPE
          • CREATE USER
          • CREATE VIEW
          • DEALLOCATE
          • DELETE
          • DO
          • DROP AGGREGATE
          • DROP CAST
          • DROP DATABASE
          • DROP DOMAIN
          • DROP EXTENSION
          • DROP FUNCTION
          • DROP GROUP
          • DROP OPERATOR
          • DROP OPERATOR CLASS
          • DROP OWNED
          • DROP POLICY
          • DROP PROCEDURE
          • DROP ROLE
          • DROP RULE
          • DROP SEQUENCE
          • DROP TABLE
          • DROP TRIGGER
          • DROP TYPE
          • DROP USER
          • END
          • EXECUTE
          • EXPLAIN
          • GRANT
          • INSERT
          • LOCK
          • PREPARE
          • REASSIGN OWNED
          • RESET
          • REVOKE
          • ROLLBACK
          • SELECT
          • SET
          • SET CONSTRAINTS
          • SET ROLE
          • SET SESSION AUTHORIZATION
          • SET TRANSACTION
          • SHOW
          • SHOW TRANSACTION
          • TRUNCATE
          • UPDATE
        • Data types
          • Array
            • array[] constructor
            • Literals
              • Text typecasting and literals
              • Array of primitive values
              • Row
              • Array of rows
            • FOREACH loop (PL/pgSQL)
            • array of DOMAINs
            • Functions and operators
              • ANY and ALL
              • Array comparison
              • Array slice operator
              • Array concatenation
              • Array properties
              • array_agg(), unnest(), generate_subscripts()
              • array_fill()
              • array_position(), array_positions()
              • array_remove()
              • array_replace() / set value
              • array_to_string()
              • string_to_array()
          • Binary
          • Boolean
          • Character
          • Date and time
          • JSON
            • JSON literals
            • Primitive and compound data types
            • Code example conventions
            • Indexes and check constraints
            • Functions & operators
              • ::jsonb, ::json, ::text (typecast)
              • ->, ->>, #>, #>> (JSON subvalues)
              • - and #- (remove)
              • || (concatenation)
              • = (equality)
              • @> and <@ (containment)
              • ? and ?| and ?& (key or value existence)
              • array_to_json()
              • jsonb_agg()
              • jsonb_array_elements()
              • jsonb_array_elements_text()
              • jsonb_array_length()
              • jsonb_build_object()
              • jsonb_build_array()
              • jsonb_each()
              • jsonb_each_text()
              • jsonb_extract_path()
              • jsonb_extract_path_text() and json_extract_path_text()
              • jsonb_object()
              • jsonb_object_agg()
              • jsonb_object_keys()
              • jsonb_populate_record()
              • jsonb_populate_recordset()
              • jsonb_pretty()
              • jsonb_set() and jsonb_insert()
              • jsonb_strip_nulls()
              • jsonb_to_record()
              • jsonb_to_recordset()
              • jsonb_typeof()
              • row_to_json()
              • to_jsonb()
          • Money
          • Numeric
          • Serial
          • UUID
        • Functions and operators
          • currval()
          • lastval()
          • nextval()
          • Window functions
            • Informal functionality overview
            • Invocation SQL syntax and semantics
            • Per function signature and purpose
              • row_number(), rank() and dense_rank()
              • percent_rank(), cume_dist() and ntile()
              • first_value(), nth_value(), last_value()
              • lag(), lead()
              • Tables for the code examples
                • table t1
                • table t2
                • table t3
                • table t4
            • Analyzing a normal distribution
              • Bucket allocation scheme
              • do_clean_start.sql
              • cr_show_t4.sql
              • cr_dp_views.sql
              • cr_int_views.sql
              • cr_pr_cd_equality_report.sql
              • cr_bucket_using_width_bucket.sql
              • cr_bucket_dedicated_code.sql
              • do_assert_bucket_ok
              • cr_histogram.sql
              • cr_do_ntile.sql
              • cr_do_percent_rank.sql
              • cr_do_cume_dist.sql
              • do_populate_results.sql
              • do_report_results.sql
              • do_compare_dp_results.sql
              • do_demo.sql
              • Reports
                • Histogram report
                • dp-results
                • compare-dp-results
                • int-results
        • Extensions
        • Keywords
        • Reserved names
      • YCQL
        • ALTER KEYSPACE
        • ALTER ROLE
        • ALTER TABLE
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE ROLE
        • CREATE TABLE
        • CREATE TYPE
        • DROP INDEX
        • DROP KEYSPACE
        • DROP ROLE
        • DROP TABLE
        • DROP TYPE
        • GRANT PERMISSION
        • GRANT ROLE
        • REVOKE PERMISSION
        • REVOKE ROLE
        • USE
        • INSERT
        • SELECT
        • EXPLAIN
        • UPDATE
        • DELETE
        • TRANSACTION
        • TRUNCATE
        • Simple expressions
        • Subscripted expressions
        • Function call
        • Operators
        • BLOB
        • BOOLEAN
        • Collection
        • FROZEN
        • INET
        • Integer and counter
        • Non-integer
        • TEXT
        • DATE, TIME, and TIMESTAMP
        • UUID and TIMEUUID
        • JSONB
        • Date and time
        • BATCH
    • CLIs
      • yb-ctl
      • yb-docker-ctl
      • ysqlsh
      • ycqlsh
      • yb-admin
      • yb-ts-cli
      • ysql_dump
      • ysql_dumpall
    • Configuration
      • yb-tserver
      • yb-master
      • yugabyted
      • Default ports
    • Drivers
      • Client drivers for YSQL API
      • YugabyteDB JDBC Driver
      • Client drivers for YCQL
      • Spring Data YugabyteDB
    • Connectors
      • Kafka Connect YugabyteDB
    • Third party tools
      • DBeaver
      • DbSchema
      • pgAdmin
      • SQL Workbench/J
      • TablePlus
      • Visual Studio Code
    • Sample datasets
      • Chinook
      • Northwind
      • PgExercises
      • SportsDB
  • RELEASES
    • Releases
    • Releases overview
    • Release versioning
    • What's new
      • 2.3 (latest)
      • 2.2 (stable)
    • Earlier releases
      • v2.1.8
      • v2.1.6
      • v2.1.5
      • v2.1.4
      • v2.1.3
      • v2.1.2
      • v2.1.1
      • v2.1.0
      • v2.0.11
      • v2.0.10
      • v2.0.9
      • v2.0.8
      • v2.0.7
      • v2.0.6
      • v2.0.5
      • v2.0.3
      • v2.0.1
      • v2.0.0
      • v1.3.1
      • v1.3.0
      • v1.2.12
      • v1.2.11
      • v1.2.10
      • v1.2.9
      • v1.2.8
      • v1.2.6
      • v1.2.5
      • v1.2.4
  • FAQ
    • Comparisons
      • Amazon Aurora
      • Google Cloud Spanner
      • CockroachDB
      • TiDB
      • Vitess
      • MongoDB
      • FoundationDB
      • Amazon DynamoDB
      • Azure Cosmos DB
      • Apache Cassandra
      • PostgreSQL
      • Redis in-memory store
      • Apache HBase
    • FAQs
      • General FAQ
      • Operations FAQ
      • API compatibility FAQ
      • Yugabyte Platform FAQ
  • MISC
    • YEDIS
      • Quick start
      • Develop
        • Build an application
        • C#
        • C++
        • Go
        • Java
        • NodeJS
        • Python
      • API reference
        • APPEND
        • AUTH
        • CONFIG
        • CREATEDB
        • DELETEDB
        • LISTDB
        • SELECT
        • DEL
        • ECHO
        • EXISTS
        • EXPIRE
        • EXPIREAT
        • FLUSHALL
        • FLUSHDB
        • GET
        • GETRANGE
        • GETSET
        • HDEL
        • HEXISTS
        • HGET
        • HGETALL
        • HINCRBY
        • HKEYS
        • HLEN
        • HMGET
        • HMSET
        • HSET
        • HSTRLEN
        • HVALS
        • INCR
        • INCRBY
        • KEYS
        • MONITOR
        • PEXPIRE
        • PEXPIREAT
        • PTTL
        • ROLE
        • SADD
        • SCARD
        • RENAME
        • SET
        • SETEX
        • PSETEX
        • SETRANGE
        • SISMEMBER
        • SMEMBERS
        • SREM
        • STRLEN
        • ZRANGE
        • TSADD
        • TSCARD
        • TSGET
        • TSLASTN
        • TSRANGEBYTIME
        • TSREM
        • TSREVRANGEBYTIME
        • TTL
        • ZADD
        • ZCARD
        • ZRANGEBYSCORE
        • ZREM
        • ZREVRANGE
        • ZSCORE
        • PUBSUB
        • PUBLISH
        • SUBSCRIBE
        • UNSUBSCRIBE
        • PSUBSCRIBE
        • PUNSUBSCRIBE
    • Legal
      • Third party software
> APIs > YSQL > Functions and operators > Window functions > Per function signature and purpose > Tables for the code examples >

Create and populate table t4

Attention

This page documents an earlier version. Go to the latest (v2.3) version.

Make sure that you read the section The data sets used by the code examples before running the script to create table "t4". In particular, it's essential that you have installed the pgcrypto and tablefunc extensions.

The table "t4" is used for comparing these window functions: percent_rank(), cume_dist(), and ntile(). See the section Analyzing a normal distribution with percent_rank(), cume_dist(), and ntile().

The table is populated using a procedure that is parameterized with the number of rows to generate. You will typically choose a large number like, 100,000. It uses the normal_rand() function to generate the specified number of values by pseudorandomly picking values from an ideal normal distribution. The normal_rand() function is brought by the tablefunc extension. This function is parameterized by the number of values to create, and by the mean and the standard deviation of the distribution from which to pick the values. An infinite number of such values would range between minus and plus infinity. But of course, some number, like 100,000, of such values will lie between finite limits. It's sufficient for the purposes of the demonstrations that will use this data to scale the values so that the minimum is 0.0 and the maximum is 100.0. Doing this has the consequence that the mean will be about 50.0 and the standard deviation will be about 10.0—no matter what values for these are provided as the actual arguments to normal_rand(). It's sufficient to say that the values will lie on a bell-shaped curve, just as is typical for a large enough sample of examination results.

The demonstrations use two "score" columns, one that holds double precision values and one that holds int values produced by rounding the double precision values with the round() function. The demonstrations rely on the fact that the double precision scores have no duplicates. This is established by creating a unique index on the "dp_score" column. It's just possible that normal_rand() will create some duplicate values. However, this is so very rare that it was never seen as the script was repeated, very many times, during the development of the demonstrations that use this table. If CREATE INDEX does fail because of this, just repeat the script by hand. The demonstrations also rely on the fact that the int scores will have very many duplicates. This is inevitable when there are only 101 available integers in the scaled range and there are 100,000 rows.

A large value like 100,000 gives the best compromise between the time to populate the table and the effectiveness of the demonstration. These are typical times for 100,000 rows (using YB-2.1.8) on a single-node cluster on a laptop computer:

  • < ~3 sec to populate the table
  • < ~3 sec to create the index

This ysqlsh script creates the table "t4" and creates the procedure to populate the table. Save it as t4_1.sql.

-- Suppress the spurious warning that is raised
-- when the to-be-deleted table doesn't yet exist.
set client_min_messages = warning;
drop table if exists t4 cascade;

create table t4(
  k uuid default gen_random_uuid() primary key,
  dp_score double precision not null,
  int_score int not null);

-- Use normal_rand() to insert the specified number of rows into "t4".
create or replace procedure generate_scores(no_of_rows in int)
  language plpgsql
as $body$
declare
  normal_rand_mean   constant double precision :=  0.0;
  normal_rand_stddev constant double precision := 50.0;

  agg     double precision[]  not null := '{0}';
  min_val double precision    not null := 0;
  max_val double precision    not null := 0;
  scale   double precision    not null := 0;

  zero     constant double precision :=    0;
  hundred  constant double precision := 100;
begin
  with v as (
    select normal_rand(no_of_rows, normal_rand_mean, normal_rand_stddev) as r)
  select array_agg(r)
  into strict agg
  from v;

  with v as (
    select unnest(agg) as u)
  select
    min(u), max(u)
    into strict min_val, max_val
  from v;

  -- Scale the values to the range (zero..hundred)
  -- Score of zero means no-show.
  scale := max_val - min_val;
  insert into t4(dp_score, int_score)
  with v as (
    select (((unnest(agg) - min_val)*hundred)/scale) as u)
  select
    greatest(least(u, hundred), zero), -- protect against rounding errors
    round(u)
  from v;
end;
$body$;

This script executes the procedure and then creates a unique index on the "dp_score" column. Save it as t4_2.sql.

-- You can run this script time and again. It will always finish silently.

set client_min_messages = warning;
drop index if exists t4_dp_score_unq;
truncate table t4;

\timing on
call generate_scores(no_of_rows => 100000);
create unique index t4_dp_score_unq on t4(dp_score);
\timing off

The section Analyzing a normal distribution with percent_rank(), cume_dist() and ntile() shows how to plot a histogram (using plain text in ysqlsh). This vividly demonstrates the bell-shaped distribution.

Ask our community
  • Slack
  • Github
  • Forum
  • StackOverflow
Yugabyte
Contact Us
Copyright © 2017-2020 Yugabyte, Inc. All rights reserved.