π± Practical SQL for Data Analysis
My first time seeing CTEs.
Itβs a common mistake to use round instead of ceil or floor to generate a range of integers. Using round may produce inconsistent distribution
This makes the edges less likely to come up.
To sample a table, use the TABLESAMPLE keyword in the FROM clause, and provide the sampling method along with itβs arguments. For example, sampling 10% of the table using the SYSTEM sampling method:
db=# WITH sample AS (
SELECT *
FROM users TABLESAMPLE SYSTEM(10)
)
SELECT count(*) FROM sample;
count
βββββββ
95400
(1 row)
Time: 13.690 ms
To generate descriptive statistics in SQL, you can use the following query:
WITH s AS (
SELECT * FROM (VALUES (1), (2), (3)) AS t(n)
)
SELECT
count(*),
avg(n),
stddev(n),
min(n),
percentile_cont(array[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY n),
max(n)
FROM
s;
count β avg β stddev β min β percentile_cont β max
βββββββΌβββββββββΌββββββββββββΌββββββΌββββββββββββββββββΌβββββ
3 β 2.0000 β 1.0000000 β 1 β {1.5,2,2.5} β 3
describe a categorical series
WITH s AS (SELECT unnest(array['a', 'a', 'b', 'c']) AS v)
SELECT
count(*),
count(DISTINCT V) AS unique,
mode() WITHIN GROUP (ORDER BY V) AS top
FROM
s;
count β unique β top
ββββββββΌβββββββββΌβββββ
4 β 3 β a
WITH emails AS (
SELECT 'ME@hakibenita.com' AS email
)
SELECT * FROM emails;
email
βββββββββββββββββββ
ME@hakibenita.com