🌱 Brandon Foltz Statistics Channel
~2018-09-14
https://www.youtube.com/watch?v=8X2xfwBP4uo&list=PLIeGtxpvyG-KqNeLQVhw8yv9MI5dd0ky_
Population Segment
Impossible/impractical to measure the entire population.
2018-09-20
https://www.youtube.com/watch?v=vrWYw8d2830&list=PLIeGtxpvyG-K82r1fgL-DO1xKg133PXKh
categorial data - labels for exclusive categories quantitative data - numeric value of measure
frequency - count relative frequency - %
frequency distribution - frequency and relative frequency
DO NOTT USE - Pie and 3D charts; they are deceptive
https://www.youtube.com/watch?v=zC3GaPBJ4c4&list=PLIeGtxpvyG-K82r1fgL-DO1xKg133PXKh&index=2
histogram - no gab bar chart
- bins
- resolution
- distribution
try to find a sweet spot of bins_buckets_groups to show histogram distribution
type of histogram shapes
- left skew - more data on right (lighter in image histogram)
- right skew - more data on left (darker in image histogram)
- symmetric - kinda bell curve
- bimodal - bridge (2 humps)
- uniform - same everywhere
- no pattern
^ adjusting resolution (# of bins) can show different shapes.
https://www.youtube.com/watch?v=6L20ofJXZ7g&index=3&list=PLIeGtxpvyG-K82r1fgL-DO1xKg133PXKh
Stem & Leaf displays
shows
- rank order
- shape of distribution
- modal qualities - things that show frequency
data
50, 61, 66, 73, 82, 82, 90, 103, 108, 115
stem & leaf
5 | 0
6 | 1 6
7 | 3
8 | 2 2
9 | 0
10 | 3 8
11 | 5
data doesn’t have to be in order but the stem and leaf
does
Stretched stem and leaf
break up bins (not base 10); can repeat stem values
leaf is ALWAYS 1 digit for 2384 it would -> 23 | 8 (the 4 is lost)
Cross Tabulation
https://www.youtube.com/watch?v=e6pT-GyT0hk&index=4&list=PLIeGtxpvyG-K82r1fgL-DO1xKg133PXKh
compare two variables
usually done in pivot tables
standard deviation
- variability from average
cross tabs are basis for Chi-square and ANOVA (advanced stats)
2018-09-21
https://www.youtube.com/watch?v=bUu5HIHIrRw&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw
z-score
spread === variance === standard deviation === distribution
2018-09-26
https://www.youtube.com/watch?v=bUu5HIHIrRw&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw
z-score distance from the mean
it ignores the actual unit
mean has a z-score === 0
z = (data point - mean) / standard deviation
score | mean | score - mean | z score |
----- | ---- | -----------: | ------: |
85 | 85 | 0 | 0 |
95 | 85 | 10 | 1.27 |
75 | 85 | -10 | -1.27 |
80 | 85 | -5 | -0.63 |
90 | 85 | 5 | 0.63 |
2018-09-27
https://www.youtube.com/watch?v=JIIXQaMXBVM&index=3&t=0s&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw
Variance and Standard deviation
mean = (x bar) = x̄
variance = (sigma squared) = σ^2
standard deviation = (sigma) = σ
n
= total data points
σ
= sqrt(sum(x-x̄)^2 / (n-1))
coefficient of variation
relative measure of variability
coefficient of variation = ( σ / x̄ ) * 100
helps compare data with different means
units don’t matter (comparing quality of the dataset not the measurements)
2018-10-01
https://www.youtube.com/watch?v=L5pgVbj3WwI&index=4&t=0s&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw
watch him use excel stuff
2018-10-03
https://www.youtube.com/watch?v=5tLEDdDl5tw&index=5&t=0s&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw
https://www.youtube.com/watch?v=5tLEDdDl5tw&index=5&t=0s&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw
Normal distribution
look at data graphically first
look for:
- excess skew (lopsided)
- kurtosis (fat tails)
- bi-modal (two humps)
- have a non-standard distribution
he phrases this as “excess probability”
Graphical Tools
- historgrams
- stem and leaf plots
- box plats (box and whisker plots)
- P-P Plots
- Q-Q Plots
https://www.youtube.com/watch?v=xGbpuFNR1ME&index=7&t=0s&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw
Bivariate Relationships - Covariance
2 variables
covariance - how the variables change together
positive - both go up or down together - slope negative - inverse relation ship one up the other down - slope
covariance is direction only nothing about strength (that is correlation)
sample covariance
population covariance
https://www.youtube.com/watch?v=locZabK4Als&index=7&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw
Covariance Matrix
R, SPSS, sass tools for summary data
Matrix / Scatter Plots
plots each variable agains each other variable
diagonal of covariance matrix is the variance of a variable with itself
excel uses the population covariance formal ( n vs n-1) fix by multiplying (n/n-1)
https://www.youtube.com/watch?v=4EXNedimDMs&list=PLIeGtxpvyG-JMH5fGDWhtniyET88Mexcw&index=8
Correlation
direction and strength of variance of 2 variables
correlation between -1 and +1 agnostic of units of data
correlation is standardized and covariance is not
correlation is only a linear relationship
correlation is NOT causations
strength is not statistically significance
r
== Pearson correlation coefficient
covariance (x,y)/ (standard_deviation(x) * standard_deviation(y))
Rule of thumb:
- if
|r| >= 2/sqrt(n)
then relationship exists
2018-10-10
https://www.youtube.com/watch?v=qaFNhwNBY3k&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA
finite math
permutations vs combinations
probability is the count at which some event happens at a greater number of outcomes
we must be able to count in more complex ways to find certain probabilities
permutations - ordered list : number of different ways a certain number of objects can be arranged in order from a larger number of objects
n
objects can be ordered in lists of size r
P(n,r)
combinations - groups (order doesn’t matter) : the number of diff ways that a certain number of objects as a group can be selected from a larger number of objects
n
choose r
C(n,r)
10 horse race - top 3
Top 3 in any order C(10, 3) = 120 possibilities Top 3 in order P(10,3) = 720 possibilities
25 players in a 9 player lineup C(25,9) = 2,042,975 P(25,9) = 741,000,000,000
C(30,6) = 593,775 P(30,6) = 427,518,000
2018-10-13 10.22.25
Combinations
Group order doesn’t matter
2018-10-15 08.11.39
https://www.youtube.com/watch?v=T1CjOkEb1ew&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=3
Permutations
2018-10-16 09.24.56
https://www.youtube.com/watch?v=1oEQWo28w6U&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=4
Combinations - Marble problem
marbles in a bag
- 4 red
- 3 blue
- 4 green
- 3 yellow
14 - total
How many sets of 4 are possible?
C(14,4)
= 1001
How many sets of 4 are possible with 4 diff colors?
C(3,1)
= blue
C(4,1)
= red
C(4,1)
= green
C(3,1)
= yellow
(3x4x4x3) = 144
How many sets of 4 are possible for at least 2 red?
2 red 2 not red
C(4,2)
= red
C(10,2)
= !red
= 270
or 3 red 1 not red
C(4,3)
= red
C(10,1)
= !red
= 40
4 red
C(4,4)
= red
= 1
Combine above possibilities
270 + 40 + 1 = 311
How many sets of 4 are possible for none red but 1 green?
1 green and 3 other non-red non-green
C(4,1)
= green
C(6,3)
= blue or yellow
= 4 x 20 = 80
2 green and 2 other blue or yellow
C(4,2)
= green
C(6,2)
= blue or yellow
= 6 x 15 = 90
3 green and 1 other blue or yellow
C(4,3)
= green
C(6,1)
= blue or yellow
= 4 x 6 = 24
4 green
C(4,4)
= green
= 1
80 + 90 + 24 + 1 = 195
Reverse method
All possible without red or green (only blue or yellow) is C(6,4) = 15 All possible without red is C(10,4) = 210
Total minus - what we cannot have 210 - 15 = 195
2018-10-22 07.46.26
https://www.youtube.com/watch?v=VS1GQrt7s0I&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=5
Dogs of the Dow
How many different 5-stock portfolios are possible?
10 choose 5 || C(10,5) = 252
How many diff stock portfolios contain GE & PG but not have INTC nor KFT?
PG & GE C(6,3) = 1x20 = 20
How many diff portfolios contain at least 4 stocks with yields >= 3.5%
C(6,4) + C(4,1) = 60 C(6,5) = 6 = 66
If we randomly choose 5 stocks what is the chance of our portfolio has 4 stocks with 3.5% yields
66/250 = 26.2%
2018-10-25
https://www.youtube.com/watch?v=NgLsfdEH4Go&index=6&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA
Combinations - Nearly Normal
n | r | expression | combinations |
---|---|---|---|
2 | 0 | C(2,0) | 1 |
2 | 1 | C(2,1) | 2 |
2 | 2 | C(2,0) | 1 |
as group size goes up (2,5,10,20) the histogram of all group sizes looks like normal curve
histogram -> discrete random variables normal curve -> continuous random variables
2018-10-31
https://www.youtube.com/watch?v=nxnZE1ZeqXw&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=7
combinations under the curve
2018-11-05
https://www.youtube.com/watch?v=pcedp8eB4UQ&index=8&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA
Sets
Elements : individual things inside a set
N = Set N = {1,2,3,4,5,6,7,8,9}
4 ∈ N 4 is in set N
110 ∉ N 110 is not in set N
E⊆N E is a subset of N including equivalent set (subset of its twin)
E={2,4,6,8}
Proper subset E⊂N
E can be any subset of N, but E !== N
Empty set Z = {null} or Z = {}
empty set is a subset of any set
There are Infinite sets
2018-11-06
https://www.youtube.com/watch?v=V1J-4YFK7FQ&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=9
Venn Diagrams
Unions
or
is inclusive
A “union” B A∪B
Disjoint - no subset an empty set is a valid set
Universal set is all elements we are interested in; problem specific
There can be subsets in the universal set. Complement (A prime; A’) is the numbers in universal set but not in A
∩ = intersect A∩B (A intersect B)
2018-11-07
https://docs.google.com/spreadsheets/d/1zt0siGTvHhN7jK6_fj404lOwPekNsxXmBLyXDue6UpM/edit#gid=0
venn diagram region method
Ur (U sup r) Sf (S sup r)
compliment === not that region including empty
2018-11-08
https://www.youtube.com/watch?v=bycNlM4KnLQ&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=11
cardinality of a union
number of elements in a set
prevent double counting
2018-11-10
[Finite Math: Venn Diagram Practice Problems - YouTube](https://www.youtube.com/watch?v=22tiyb7Kemk&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=12)
2018-11-14
https://www.youtube.com/watch?v=DkHWKAy47X0&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=13
Joint and Marginal Probabilities
2018-11-23
https://www.youtube.com/watch?v=4tyzC-MkzOY&list=PLIeGtxpvyG-I9m4otjYGCQL_1m0Edm0LA&index=14
2018-11-27
https://www.youtube.com/watch?v=rifK8BtHaYI&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX
random variables == R.V.
can be discrete or continuous
2018-11-28
https://www.youtube.com/watch?v=fGKd6ZtuTzM&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=2
discrete random variables
can be infinite sequence (integers 1 to ♾)
whole units only no fractions of parts
2018-11-29
https://www.youtube.com/watch?v=XGM51WaHzNw&index=3&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX
discrete random variable probabilities
uniform probability distribution
probability is between 0 and 1 sum of all probabilities will add to 1
compound probabilities (add P(x) together)
2018-12-08
https://www.youtube.com/watch?v=u5svtXcKXEk&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=4
expected value = average or mean of a random variable
2018-12-11
https://www.youtube.com/watch?v=lahWlKj9p7U&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=5
discrete random variable variance
variable == how spread out from the mean
2018-12-13
https://www.youtube.com/watch?v=ConmIDAzRqI&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=6
sales performance
quality control
2018-12-17
https://www.youtube.com/watch?v=nxKBfcwEulg&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=7
Mean and Standard deviation
2018-12-19
https://www.youtube.com/watch?v=FwIguCYyu7o&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=8
Sales Quota Performance
2018-12-20
https://www.youtube.com/watch?v=edf75Y0fIjA&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=9
Binomial Distributions
2018-12-27
https://www.youtube.com/watch?v=sNP1w4z8HWQ&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=10
binomial distribution
95% threshold (above is not due to random chance)
2018-12-28
poisson distributions queuing theory
2018-12-31
https://www.youtube.com/watch?v=833ciyKyKNw&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=12
2019-01-02
https://www.youtube.com/watch?v=cKf4DPT0Xno&list=PLIeGtxpvyG-LWd2IOW1wveszJXy_aHytX&index=13