🌱 MySQL Tutorial for Beginners
Very happy with the format of this course
- doesn’t assume you know anything
- gets you working in the code immediately with exercises
- is broken into small and digestible
- covers wide range of topics (even advanced ones)
- gives answers to why a topic is useful or when it would be used
These are equivalent
SELECT
*
FROM
customers
WHERE
state = "va"
OR state = "ga"
OR state = "fl"
SELECT
*
FROM
customers
WHERE
state in("va", "ga", "fl")
% = any character zero or more times _ = any single character
returns 3 results offset by 6 (skipping the first 6)
SELECT *
FROM customers
LIMIT 6, 3
Order matters
- select
- from
- where
- order
- limit
SELECT
*
FROM
orders o
JOIN customers c ON o.customer_id = c.customer_id;
is identical to
-- Implicit Join Syntax --
SELECT
*
FROM
orders o,
customers c
WHERE
o.customer_id = c.customer_id;
if you forget the where ↑ you will get a cross join (which could be bad) explicit syntax may be better because it forces you to define the join
JOIN is INNER JOIN by default
LEFT JOIN and RIGHT JOIN are both outer joins
LEFT JOIN will return all records from the left side wether condition is true or not (base table)
RIGHT JOIN will return all records from the right side wether condition is true or not (table being appended to the base table)
SELECT
o.order_id,
c.first_name
FROM
orders o
JOIN customers c ON o.customer_id = c.customer_id
is the same as
SELECT
o.order_id,
c.first_name
FROM
orders o
JOIN customers c USING (customer_id)
using
only works if column names are identical
Cross joins combine every record of the first table with every record of the second table
SELECT
c.first_name as customer,
p.`name` as product
FROM
customers c
CROSS JOIN products p
order by c.first_name
same as
SELECT
c.first_name as customer,
p.`name` as product
FROM
customers c, products p
order by c.first_name
UNION can combine multiple queries
where can reference any column having can only reference columns in the select clause
where filters data before group by having filters data after group by
says that when using aggregate function (sum()) you should group by all the columns in your select clause
you cannot use aliases in group by clauses
-- customers who have ordered lettuce
-- using subquery
SELECT
customer_id,
first_name,
last_name
FROM
customers
WHERE
customer_id in( SELECT DISTINCT
customer_id FROM order_items oi
JOIN orders
USING (order_id)
WHERE
product_id = 3);
-- using joins
SELECT DISTINCT
customer_id,
first_name,
last_name
FROM
customers
LEFT JOIN orders USING (customer_id)
LEFT JOIN order_items USING (order_id)
WHERE
product_id = 3;
= ANY
is == IN
correlated subquery is referencing the outer query from a sub query
Exists can bar used as a subquery to increase performance since it doesn’t return data
cannot use an alias in an expression but you can select it as a subquery
when using a subquery in FROM
you must give it an alias it may be easier to use views instead
show full processlist;
kill a process from the processlist
kill {id}
Numeric functions
rand()
ceiling()
floor()
round()
(can pass second param to tell how many significant digits to keep)
String functions
select left('kindergarten', 4)
length()
upper()
lower()
trim()
ltrim()
rtrim()
left()
right()
substring()
locate()
not case sensitive will return index or 0 if it doesn’t existreplace()
concat()
Dates
select now(), curdate(), curtime()
year()
month()
day()
hour()
minute()
second()
dayname()
monthname()
unit from date/time
select extract(day from now())
Format dates
date format string
select DATE_FORMAT(now(), '%M %D, %Y')
TIME_FORMAT()
Calculate Dates
select date_add(now(), interval 1 year)
days between 2 dates (doesn’t do time)
SELECT DATEDIFF(now(), '2019-01-01')
time since midnight
SELECT TIME_TO_SEC('09:00')
difference between times
SELECT TIME_TO_SEC('09:00') - TIME_TO_SEC("09:02")
if null
SELECT
order_id,
ifnull(shipper_id, 'Not assigned')
FROM
orders
coalesce to provide options for callbacks, will return first non null value
SELECT
order_id,
coalesce(shipper_id, comments, 'Not assigned') as shipper
FROM
orders
When using an aggregate function you must group by
Views
views can be saved as .sql files and version controlled
updatable views cannot have
you can update
or delete
from views
You can also insert
but you will need all columns for underlying tables
DISTINCT
- aggregate functions (min, max, sum)
group by
having
union
When updating views with check option
will prevent rows from being removed from the view
Procedures
These are called functions in TablePlus
store and organize sql faster execution data security
You have to change the default ;
or the select statement will run
Don’t forget to set it back at the end
DELIMITER $$
create PROCEDURE get_clients()
begin
select * from clients;
end$$
DELIMITER ;
Call get_clients();
You can add parameters to procedures.
All parameters are required
CREATE PROCEDURE `get_invoices_by_client`(
client_id INT
)
BEGIN
select *
from invoices i
where i.client_id = client_id;
END
Output params for procedure
setup
CREATE PROCEDURE `get_unpaid_invoices_for_client`(
client_id INT,
OUT invoices_count INT,
OUT invoices_total DECIMAL(9,2)
)
BEGIN
select
count(*), sum(invoice_total)
into invoices_count, invoices_total
from invoices i
where i.client_id = client_id
and payment_total = 0;
end
run
set @invoices_count = 0;
set @invoices_total = 0;
call get_unpaid_invoices_for_client(3, @invoices_count, @invoices_total);
select @invoices_count, @invoices_total;
Functions
can only return a single value
procedures can return multiple params
CREATE FUNCTION `get_risk_factor_for_client`(
client_id INT
) RETURNS int
READS SQL DATA
BEGIN
DECLARE risk_factor DECIMAL(9,2) DEFAULT 0;
DECLARE invoices_total DECIMAL(9,2);
DECLARE invoices_count INT;
select count(*), sum(invoice_total)
into invoices_count, invoices_total
from invoices i
WHERE i.client_id = client_id;
set risk_factor = invoices_total / invoices_count * 5;
RETURN risk_factor;
END
SELECT
client_id,
name,
get_risk_factor_for_client(client_id)
FROM
clients
Triggers
Run before or after and INSERT
UPDATE
or DELETE
see the ones that are there
SHOW TRIGGERS
DROP TRIGGER IF EXISTS {name of trigger}
Triggers can be a good way to add history or audibility to data
Events
scheduler
show events
Transactions
start transaction;
...
commit;
start transaction;
...
rollback;
Concurrency
locks prevent overwiting each other
dirty reads → reading uncommitted data there are isolation levels to prevent this non-repeating reads repeatable reads phantom reads
more concurrency vs performance
- read uncommitted
- read committed
- repeatable reads
- serializable
set transaction isolation level read committed;
commit;
deadlocks → when transactions cannot complete because the transactions have locks that each other need to wait for
be consistent in order of statements in transactions and keep things small to reduce chances.
enums
not recommended
can be very expensive when changing in the future
not reusable
should use a lookup table to store the values
dates
timestamps can only go to year 2038
date time can go higher
blob
store binary (images, pdf, etc)
not best use for databases
modeling
draw.io is an online tool for diagrams
relationship
foreign keys have to be unique across your database
in relational dbs you only have one to many and one to one relationships. You can resolve many to many needs using and relationship table or link table.
Normalization
1-7 normal forms must be applied in order most cases you just need to do 1-3
- each cell should have a single value and cannot have repeated columns
- every table should describe one entity, and every table in that column should describe that entity
- a column in a table should not be derived from other columns
Focus on removing redundancy.
indexes
Speed up reads Slow down writes
Use EXPLAIN
in front of a query to see details
you may need to tune indexes of string columns by limiting the number of characters to store in the index
you want to maximize unique values in your index while balancing the number of characters stored in the index
There is also a full text index to help search large bodies of text. You can then use Match() against()
to search for terms and see relevancy scores
composite indexes may be necessary to get true speed from complex where statements, they can be more space efficient than single column indexes
4-6 columns is “normal”
general rules for how to Order of columns in indexes
- put the most frequently used columns first
- put the columns with a higher cardinality (number of records) first
- take your queries into account
Sometimes union can speed up query and keep indexes from being skipped.
Isolate your columns to make sure indexes are used
for a table with column named points
where points + 10 > 100
is slower than
where points > 90
given an index on points column the first will not use an index
filesort is expensive
show status like 'last_query_cost'
always check for existing indexes before making more
- duplicate
- redundant
- unused