🌱 Practical SQL for Data Analysis

Source

My first time seeing CTEs.

It’s a common mistake to use round instead of ceil or floor to generate a range of integers. Using round may produce inconsistent distribution

This makes the edges less likely to come up.

To sample a table, use the TABLESAMPLE keyword in the FROM clause, and provide the sampling method along with it’s arguments. For example, sampling 10% of the table using the SYSTEM sampling method:

db=# WITH sample AS (
    SELECT *
    FROM users TABLESAMPLE SYSTEM(10)
)
SELECT count(*) FROM sample;

 count
───────
 95400
(1 row)

Time: 13.690 ms

To generate descriptive statistics in SQL, you can use the following query:

WITH s AS (
    SELECT * FROM (VALUES (1), (2), (3)) AS t(n)
)
SELECT
    count(*),
    avg(n),
    stddev(n),
    min(n),
    percentile_cont(array[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY n),
    max(n)
FROM
    s;

count β”‚   avg  β”‚   stddev  β”‚ min β”‚ percentile_cont β”‚ max
──────┼────────┼───────────┼─────┼─────────────────┼─────
    3 β”‚ 2.0000 β”‚ 1.0000000 β”‚   1 β”‚ {1.5,2,2.5}     β”‚   3

describe a categorical series

WITH s AS (SELECT unnest(array['a', 'a', 'b', 'c']) AS v)
SELECT
    count(*),
    count(DISTINCT V) AS unique,
    mode() WITHIN GROUP (ORDER BY V) AS top
FROM
    s;

 count β”‚ unique β”‚ top
───────┼────────┼─────
     4 β”‚      3 β”‚ a
WITH emails AS (
    SELECT 'ME@hakibenita.com' AS email
)
SELECT * FROM emails;

       email
───────────────────
 ME@hakibenita.com
Made by Brandon . If you find this project useful you can donate.