Giải tích 1 - Chapter 3: Numerical measures
● r
xy > 0: a positive linear relationship
● r
xy < 0: a negative linear relationship
● Absolute value of r
xy: from 0 to 1
● The higher value, the tighter / closer linear
relationship
45 trang |
Chia sẻ: nguyenlam99 | Lượt xem: 826 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Giải tích 1 - Chapter 3: Numerical measures, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Chapter 3
NUMERICAL MEASURES
MBA Nguyen Tien Dung
School of Economics and Management
Website: https://sites.google.com/site/nguyentiendungbkhn
Email: dung.nguyentien3@hust.edu.vn
Main Contents
3.1 MEASURES OF LOCATION
3.2 MEASURES OF VARIABILITY
3.3 MEASURES OF DISTRIBUTION SHAPE,
RELATIVE LOCATION, AND DETECTION OF
OUTLIERS
© Nguyễn Tiến Dũng Applied Statistics for Business 2
3.1 MEASURES OF LOCATION
●Mean
●Median
●Mode
●Percentiles
●Quartiles
© Nguyễn Tiến Dũng Applied Statistics for Business 3
Mean
●A population, say, a data set about
the ages of students in 5 classes.
We denote:
● X: the random variable of age
● X1, X2, , XN
● N – population size (say N = 200)
●A random sample taken from a
population
● x1, x2, , xn
● n – sample size (say, n = 30)
● The sample mean is the unbiased
point estimator of the population
mean
© Nguyễn Tiến Dũng Applied Statistics for Business 4
1
1 n
i
i
x x
n
1
1 N
i
i
X
N
Population mean
Sample mean
© Nguyễn Tiến Dũng Applied Statistics for Business 5
© Nguyễn Tiến Dũng Applied Statistics for Business 6
© Nguyễn Tiến Dũng Applied Statistics for Business 7
© Nguyễn Tiến Dũng Applied Statistics for Business 8
Median
● The median is the value in the middle when the
data are arranged in ascending order (smallest
value to largest value).
●A set of observations: x1, x2, , xn
●Arrange the data in ascending order (smallest
value to largest value).
●Me = x(n+1)/2
● If n = 2k+1, then Me = xk+1
● If n = 2k, then Me = 0.5(xk + xk+1)
●Sample 1: 1 3 5 8 10 n = 5 k = 2 k+1 = 3
●Sample 2: 1 3 5 8 9 10 (n+1)/2 = 3.5
© Nguyễn Tiến Dũng Applied Statistics for Business 9
Mode
●The mode is the value that occurs with
greatest frequency.
●1 1 2 2 3 4 4 4 5 5 6 6 Mode = 4
●1 2 2 3 4 4 4 5 5 6 6 6 Mode = 4, 6
(multiple modes)
●1 1 2 2 3 3 4 4 5 5 6 6 no Mode
© Nguyễn Tiến Dũng Applied Statistics for Business 10
Percentile (Textbook)
● Anderson 2014: The pth percentile is a value such that at least p
percent of the observations are less than or equal to this value
and at least (100 - p) percent of the observations are greater than
or equal to this value.
© Nguyễn Tiến Dũng Applied Statistics for Business 11
Percentile (Excel)
●Position of the kth percentile:
● pk = k.(n-1)/100 + 1
●Value of the pth percentile:
● if pk is integer -> x(pk)
● if pk is not an integer, use the interpolation
procedure
© Nguyễn Tiến Dũng Applied Statistics for Business 12
Quartiles
● Q1: the first quartile = the 25th percentile
● Q2: the second quartile = the 50th percentile = Median
● Q3: the third quartile = the 75th percentile
© Nguyễn Tiến Dũng Applied Statistics for Business 13
Quartiles (Excel & MegaStat)
●Q1: The first quartile
● Position: q1 = [1*(n-1)/4] +1
● Value: Q1 = x(q1)
●Q2
● Position: q2 = [2*(n-1)/4] +1
● Value: Q2 = x(q2) = Median
●Q3
● Position: q3 = [3*(n-1)/4] +1
● Value: Q3 = x(q3)
●Recommend: Use Excel & MegaStat procedure
© Nguyễn Tiến Dũng Applied Statistics for Business 14
3.2 MEASURES OF VARIABILITY
●Range
● Interquartile Range
●Variance
●Standard Deviation
●Coefficient of Variation
© Nguyễn Tiến Dũng Applied Statistics for Business 15
Different Variances
© Nguyễn Tiến Dũng Applied Statistics for Business 16
● Range = Max - Min
● Interquartile Range = Q3 – Q1
● Population Variance 2 and Population Standard Deviation
● Sample Variance s2 & Sample Standard Deviation s
© Nguyễn Tiến Dũng Applied Statistics for Business 17
2
2 1
( )
1
n
i
i
x x
s
n
2
2 1
( )
N
i
i
X
N
2
2 1
( )
1
n
i
i
x x
s s
n
2
2 1
( )
N
i
i
X
N
Calculating the Mean and Std. Deviation
●Sample Data
●Sample Variance = 256 / 4 = 64
●Sample Std. Deviation = sqrt(64) = 8
© Nguyễn Tiến Dũng Applied Statistics for Business 18
Sample Variance & Standard Deviation
© Nguyễn Tiến Dũng Applied Statistics for Business 19
Coefficient of Variation
● A measure of how large the standard deviation is
relative to the mean, expressed as a percentage.
© Nguyễn Tiến Dũng Applied Statistics for Business 20
100%CV
100%
s
CV
x
or
Patterns of Skewness
© Nguyễn Tiến Dũng Applied Statistics for Business 21
Skewness and Kurtosis
© Nguyễn Tiến Dũng Applied Statistics for Business 22
Skewness and Kurtosis
© Nguyễn Tiến Dũng Applied Statistics for Business 23
z-Scores
●Suppose we have a sample of n
observations, with the values
denoted by x1, x2, . . . , xn.
●The z-score is often called the
standardized value.
●The z-score, zi, can be interpreted
as the number of standard
deviations xi is from the mean .
© Nguyễn Tiến Dũng Applied Statistics for Business 24
i
i
x x
z
s
© Nguyễn Tiến Dũng Applied Statistics for Business 25
Chebyshev’s Theorem
●At least (1 - 1/z2) of the data values
must be within z standard
deviations of the mean, where z is
any value greater than 1.
● Implications:
● At least 0.75, or 75%, of the data
values must be within z = 2 standard
deviations of the mean.
● At least 0.89, or 89%, of the data
values must be within z = 3 standard
deviations of the mean.
● At least 0.94, or 94%, of the data
values must be within z = 4 standard
deviations of the mean.
© Nguyễn Tiến Dũng Applied Statistics for Business 26
1821 - 1894
Chebyshev’s Inequality Theorem
© Nguyễn Tiến Dũng Applied Statistics for Business 27
Empirical Rule
●68% of observations
are within 1 std. dev.
from the mean.
●95% of observations
are within 2 std. dev.
from the mean.
●Nearly 100% of
observations are
within 3 std. dev. from
the mean.
© Nguyễn Tiến Dũng Applied Statistics for Business 28
Detection of Outliers
● Outliers: Some data points may have unusually large or
unusually small values. These extreme values.
● Lower limit = Q1 – 1.5.IQR
● Upper limit = Q3 + 1.5.IQR
● If x(i) < Lower limit a low outlier
● If x(i) > Upper limit a high outlier
● If x(i) Q3 + 3.IQR extreme values
● For example: 1 2 3 4 10
● Sources of outliers;
● Errors of data records be corrected
● An inappropriate observation be removed
● Correctly recorded, but unsual values be retained, but be noticed
© Nguyễn Tiến Dũng Applied Statistics for Business 29
3.4 EXPLORATORY DATA ANALYSIS
● Five number summary
1. Smallest value (Min)
2. First quartile (Q1)
3. Median (Q2)
4. Third quartile (Q3)
5. Largest value (Max)
© Nguyễn Tiến Dũng Applied Statistics for Business 30
Boxplot (Box-and-whisker plot)
© Nguyễn Tiến Dũng Applied Statistics for Business 31
3.5 MEASURES OF ASSOCIATION BETWEEN
TWO VARIABLES
●Covariance: A descriptive
measure of the linear association
between two variables.
● Sample covariance
● Population covariance
© Nguyễn Tiến Dũng Applied Statistics for Business 32
1
( )( )
N
i x i y
i
xy
x y
N
Example
●Question: Is there any correlation /
relationship between x and y ?
© Nguyễn Tiến Dũng Applied Statistics for Business 33
Drawing a scatter diagram
© Nguyễn Tiến Dũng Applied Statistics for Business 34
© Nguyễn Tiến Dũng Applied Statistics for Business 35
Interpretation of Sample Covariance
© Nguyễn Tiến Dũng Applied Statistics for Business 36
sXY is positive
A positive linear relationship
sXY is about 0
No apparent
relationship
sXY is negative
A negative linear relationship
Correlation Coefficient
●Pearson Product Moment Correlation
Coefficient for Sample Data
© Nguyễn Tiến Dũng Applied Statistics for Business 37
●Pearson Product Moment Correlation
Coefficient for Population Data
© Nguyễn Tiến Dũng Applied Statistics for Business 38
© Nguyễn Tiến Dũng Applied Statistics for Business 39
● rxy > 0: a positive linear relationship
● rxy < 0: a negative linear relationship
●Absolute value of rxy: from 0 to 1
● The higher value, the tighter / closer linear
relationship
●Excel Application:
● CORREL() function
© Nguyễn Tiến Dũng Applied Statistics for Business 40
3.6 THE WEIGHTED MEAN AND WORKING
WITH GROUPED DATA
●A simple mean
●A weighted mean
●Calculate: GPA (Grade Point in
Average)
● Marks of the courses: x1, x2, , xn
● Credits of the courses: w1, w2, , wn
© Nguyễn Tiến Dũng Applied Statistics for Business 41
Grouped Data
●Sample mean for grouped data
© Nguyễn Tiến Dũng Applied Statistics for Business 42
Sample Variance
© Nguyễn Tiến Dũng Applied Statistics for Business 43
Population Mean and Variance for Grouped Data
●Population mean
●Population variance
© Nguyễn Tiến Dũng Applied Statistics for Business 44
Exercises for Homework
Section Exercises
3.1 1, 5, 6, 11, 16 – 7, 10 (Excel)
3.2 25, 26 – 27, 32 (Excel)
3.3 37, 41 – 44, 45 (Excel)
3.4 48, 49, 51 – 52, 53 (Excel)
3.5 55, 58 – 57, 59 (Excel)
3.6 -
Supplementary 63, 68 (Excel)
© Nguyễn Tiến Dũng Applied Statistics for Business 45
Các file đính kèm theo tài liệu này:
- stat2015_ch03_1484.pdf