🌱 MySQL Tutorial for Beginners

Master level course

Very happy with the format of this course

doesn’t assume you know anything
gets you working in the code immediately with exercises
is broken into small and digestible
covers wide range of topics (even advanced ones)
gives answers to why a topic is useful or when it would be used

These are equivalent

SELECT
  *
FROM
  customers
WHERE
  state = "va"
  OR state = "ga"
  OR state = "fl"

SELECT
  *
FROM
  customers
WHERE
  state in("va", "ga", "fl")

% = any character zero or more times _ = any single character

returns 3 results offset by 6 (skipping the first 6)

SELECT *
FROM customers
LIMIT 6, 3

Order matters

select
from
where
order
limit

SELECT
  *
FROM
  orders o
  JOIN customers c ON o.customer_id = c.customer_id;

is identical to

-- Implicit Join Syntax --
SELECT
  *
FROM
  orders o,
  customers c
WHERE
  o.customer_id = c.customer_id;

if you forget the where ↑ you will get a cross join (which could be bad) explicit syntax may be better because it forces you to define the join

JOIN is INNER JOIN by default

LEFT JOIN and RIGHT JOIN are both outer joins

LEFT JOIN will return all records from the left side wether condition is true or not (base table)

RIGHT JOIN will return all records from the right side wether condition is true or not (table being appended to the base table)

SELECT
  o.order_id,
  c.first_name
FROM
  orders o
  JOIN customers c ON o.customer_id = c.customer_id

is the same as

SELECT
  o.order_id,
  c.first_name
FROM
  orders o
  JOIN customers c USING (customer_id)

using only works if column names are identical

Cross joins combine every record of the first table with every record of the second table

SELECT
  c.first_name as customer,
  p.`name` as product
FROM
  customers c
  CROSS JOIN products p
  order by c.first_name

same as

SELECT
  c.first_name as customer,
  p.`name` as product
FROM
  customers c, products p
  order by c.first_name

UNION can combine multiple queries

where can reference any column having can only reference columns in the select clause

where filters data before group by having filters data after group by

says that when using aggregate function (sum()) you should group by all the columns in your select clause

you cannot use aliases in group by clauses

-- customers who have ordered lettuce
-- using subquery

SELECT
  customer_id,
  first_name,
  last_name
FROM
  customers
WHERE
  customer_id in( SELECT DISTINCT
      customer_id FROM order_items oi
      JOIN orders
      USING (order_id)
    WHERE
      product_id = 3);

-- using joins

SELECT DISTINCT
  customer_id,
  first_name,
  last_name
FROM
  customers
  LEFT JOIN orders USING (customer_id)
  LEFT JOIN order_items USING (order_id)
WHERE
  product_id = 3;

= ANY is == IN

correlated subquery is referencing the outer query from a sub query

Exists can bar used as a subquery to increase performance since it doesn’t return data

cannot use an alias in an expression but you can select it as a subquery

when using a subquery in FROM you must give it an alias it may be easier to use views instead

show full processlist;

kill a process from the processlist

kill {id}

Numeric functions

rand()
ceiling()
floor()
round() (can pass second param to tell how many significant digits to keep)

String functions

select left('kindergarten', 4)

length()
upper()
lower()
trim()
ltrim()
rtrim()
left()
right()
substring()
locate() not case sensitive will return index or 0 if it doesn’t exist
replace()
concat()

Dates

select now(), curdate(), curtime()

year()
month()
day()
hour()
minute()
second()
dayname()
monthname()

unit from date/time

select extract(day from now())

Format dates

date format string

select DATE_FORMAT(now(), '%M %D, %Y')

TIME_FORMAT()

Calculate Dates

select date_add(now(), interval 1 year)

days between 2 dates (doesn’t do time)

SELECT DATEDIFF(now(), '2019-01-01')

time since midnight

SELECT TIME_TO_SEC('09:00')

difference between times

SELECT TIME_TO_SEC('09:00') - TIME_TO_SEC("09:02")

if null

SELECT
  order_id,
  ifnull(shipper_id, 'Not assigned')
FROM
  orders

coalesce to provide options for callbacks, will return first non null value

SELECT
  order_id,
  coalesce(shipper_id, comments, 'Not assigned') as shipper
FROM
  orders

When using an aggregate function you must group by

Views

views can be saved as .sql files and version controlled

updatable views cannot have

you can update or delete from views

You can also insert but you will need all columns for underlying tables

DISTINCT
aggregate functions (min, max, sum)
group by
having
union

When updating views with check option will prevent rows from being removed from the view

Procedures

These are called functions in TablePlus

store and organize sql faster execution data security

You have to change the default ; or the select statement will run Don’t forget to set it back at the end

DELIMITER $$

create PROCEDURE get_clients()

begin
  select * from clients;
end$$

DELIMITER ;

Call get_clients();

You can add parameters to procedures.

All parameters are required

CREATE PROCEDURE `get_invoices_by_client`(
	client_id INT
)
BEGIN
	select *
	from invoices i
	where i.client_id = client_id;
END

Output params for procedure

setup

CREATE PROCEDURE `get_unpaid_invoices_for_client`(
  client_id INT,
  OUT invoices_count INT,
  OUT invoices_total DECIMAL(9,2)
  )
BEGIN
  select
    count(*), sum(invoice_total)
    into invoices_count, invoices_total
  from invoices i
  where i.client_id = client_id
  and payment_total = 0;
end

run

set @invoices_count = 0;
set @invoices_total = 0;
call get_unpaid_invoices_for_client(3, @invoices_count, @invoices_total);
select @invoices_count, @invoices_total;

Functions

can only return a single value

procedures can return multiple params

CREATE FUNCTION `get_risk_factor_for_client`(
  client_id INT
) RETURNS int
    READS SQL DATA
BEGIN
  DECLARE risk_factor DECIMAL(9,2) DEFAULT 0;
  DECLARE invoices_total DECIMAL(9,2);
  DECLARE invoices_count INT;

  select count(*), sum(invoice_total)
  into invoices_count, invoices_total
  from invoices i
  WHERE i.client_id = client_id;

  set risk_factor = invoices_total / invoices_count * 5;

  RETURN risk_factor;
END

SELECT
  client_id,
  name,
  get_risk_factor_for_client(client_id)
FROM
  clients

Triggers

Run before or after and INSERT UPDATE or DELETE

see the ones that are there

SHOW TRIGGERS

DROP TRIGGER IF EXISTS {name of trigger}

Triggers can be a good way to add history or audibility to data

Events

scheduler

show events

Transactions

start transaction;
...
commit;

start transaction;
...
rollback;

Concurrency

locks prevent overwiting each other

dirty reads → reading uncommitted data there are isolation levels to prevent this non-repeating reads repeatable reads phantom reads

more concurrency vs performance

read uncommitted
read committed
repeatable reads
serializable

set transaction isolation level read committed;
commit;

deadlocks → when transactions cannot complete because the transactions have locks that each other need to wait for

be consistent in order of statements in transactions and keep things small to reduce chances.

enums

not recommended

can be very expensive when changing in the future

not reusable

should use a lookup table to store the values

dates

timestamps can only go to year 2038

date time can go higher

blob

store binary (images, pdf, etc)

not best use for databases

modeling

draw.io is an online tool for diagrams

relationship

foreign keys have to be unique across your database

in relational dbs you only have one to many and one to one relationships. You can resolve many to many needs using and relationship table or link table.

Normalization

1-7 normal forms must be applied in order most cases you just need to do 1-3

each cell should have a single value and cannot have repeated columns
every table should describe one entity, and every table in that column should describe that entity
a column in a table should not be derived from other columns

Focus on removing redundancy.

indexes

Speed up reads Slow down writes

Use EXPLAIN in front of a query to see details

you may need to tune indexes of string columns by limiting the number of characters to store in the index

you want to maximize unique values in your index while balancing the number of characters stored in the index

There is also a full text index to help search large bodies of text. You can then use Match() against() to search for terms and see relevancy scores

composite indexes may be necessary to get true speed from complex where statements, they can be more space efficient than single column indexes

4-6 columns is “normal”

general rules for how to Order of columns in indexes

put the most frequently used columns first
put the columns with a higher cardinality (number of records) first
take your queries into account

Sometimes union can speed up query and keep indexes from being skipped.

Isolate your columns to make sure indexes are used

for a table with column named points

where points + 10 > 100

is slower than

where points > 90

given an index on points column the first will not use an index

filesort is expensive

show status like 'last_query_cost'

always check for existing indexes before making more

duplicate
redundant
unused

sql education