Giải tích 1 - Chapter 3: Numerical measures

● r xy > 0: a positive linear relationship ● r xy < 0: a negative linear relationship ● Absolute value of r xy: from 0 to 1 ● The higher value, the tighter / closer linear relationship

45 trang | Chia sẻ: nguyenlam99 | Lượt xem: 724 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Giải tích 1 - Chapter 3: Numerical measures, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Chapter 3 NUMERICAL MEASURES MBA Nguyen Tien Dung School of Economics and Management Website: https://sites.google.com/site/nguyentiendungbkhn Email: dung.nguyentien3@hust.edu.vn Main Contents 3.1 MEASURES OF LOCATION 3.2 MEASURES OF VARIABILITY 3.3 MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTION OF OUTLIERS © Nguyễn Tiến Dũng Applied Statistics for Business 2 3.1 MEASURES OF LOCATION ●Mean ●Median ●Mode ●Percentiles ●Quartiles © Nguyễn Tiến Dũng Applied Statistics for Business 3 Mean ●A population, say, a data set about the ages of students in 5 classes. We denote: ● X: the random variable of age ● X1, X2, , XN ● N – population size (say N = 200) ●A random sample taken from a population ● x1, x2, , xn ● n – sample size (say, n = 30) ● The sample mean is the unbiased point estimator of the population mean © Nguyễn Tiến Dũng Applied Statistics for Business 4 1 1 n i i x x n    1 1 N i i X N     Population mean Sample mean © Nguyễn Tiến Dũng Applied Statistics for Business 5 © Nguyễn Tiến Dũng Applied Statistics for Business 6 © Nguyễn Tiến Dũng Applied Statistics for Business 7 © Nguyễn Tiến Dũng Applied Statistics for Business 8 Median ● The median is the value in the middle when the data are arranged in ascending order (smallest value to largest value). ●A set of observations: x1, x2, , xn ●Arrange the data in ascending order (smallest value to largest value). ●Me = x(n+1)/2 ● If n = 2k+1, then Me = xk+1 ● If n = 2k, then Me = 0.5(xk + xk+1) ●Sample 1: 1 3 5 8 10 n = 5  k = 2  k+1 = 3 ●Sample 2: 1 3 5 8 9 10  (n+1)/2 = 3.5 © Nguyễn Tiến Dũng Applied Statistics for Business 9 Mode ●The mode is the value that occurs with greatest frequency. ●1 1 2 2 3 4 4 4 5 5 6 6  Mode = 4 ●1 2 2 3 4 4 4 5 5 6 6 6  Mode = 4, 6 (multiple modes) ●1 1 2 2 3 3 4 4 5 5 6 6  no Mode © Nguyễn Tiến Dũng Applied Statistics for Business 10 Percentile (Textbook) ● Anderson 2014: The pth percentile is a value such that at least p percent of the observations are less than or equal to this value and at least (100 - p) percent of the observations are greater than or equal to this value. © Nguyễn Tiến Dũng Applied Statistics for Business 11 Percentile (Excel) ●Position of the kth percentile: ● pk = k.(n-1)/100 + 1 ●Value of the pth percentile: ● if pk is integer -> x(pk) ● if pk is not an integer, use the interpolation procedure © Nguyễn Tiến Dũng Applied Statistics for Business 12 Quartiles ● Q1: the first quartile = the 25th percentile ● Q2: the second quartile = the 50th percentile = Median ● Q3: the third quartile = the 75th percentile © Nguyễn Tiến Dũng Applied Statistics for Business 13 Quartiles (Excel & MegaStat) ●Q1: The first quartile ● Position: q1 = [1*(n-1)/4] +1 ● Value: Q1 = x(q1) ●Q2 ● Position: q2 = [2*(n-1)/4] +1 ● Value: Q2 = x(q2) = Median ●Q3 ● Position: q3 = [3*(n-1)/4] +1 ● Value: Q3 = x(q3) ●Recommend: Use Excel & MegaStat procedure © Nguyễn Tiến Dũng Applied Statistics for Business 14 3.2 MEASURES OF VARIABILITY ●Range ● Interquartile Range ●Variance ●Standard Deviation ●Coefficient of Variation © Nguyễn Tiến Dũng Applied Statistics for Business 15 Different Variances © Nguyễn Tiến Dũng Applied Statistics for Business 16 ● Range = Max - Min ● Interquartile Range = Q3 – Q1 ● Population Variance 2 and Population Standard Deviation  ● Sample Variance s2 & Sample Standard Deviation s © Nguyễn Tiến Dũng Applied Statistics for Business 17 2 2 1 ( ) 1 n i i x x s n      2 2 1 ( ) N i i X N         2 2 1 ( ) 1 n i i x x s s n       2 2 1 ( ) N i i X N       Calculating the Mean and Std. Deviation ●Sample Data ●Sample Variance = 256 / 4 = 64 ●Sample Std. Deviation = sqrt(64) = 8 © Nguyễn Tiến Dũng Applied Statistics for Business 18 Sample Variance & Standard Deviation © Nguyễn Tiến Dũng Applied Statistics for Business 19 Coefficient of Variation ● A measure of how large the standard deviation is relative to the mean, expressed as a percentage. © Nguyễn Tiến Dũng Applied Statistics for Business 20 100%CV     100% s CV x  or Patterns of Skewness © Nguyễn Tiến Dũng Applied Statistics for Business 21 Skewness and Kurtosis © Nguyễn Tiến Dũng Applied Statistics for Business 22 Skewness and Kurtosis © Nguyễn Tiến Dũng Applied Statistics for Business 23 z-Scores ●Suppose we have a sample of n observations, with the values denoted by x1, x2, . . . , xn. ●The z-score is often called the standardized value. ●The z-score, zi, can be interpreted as the number of standard deviations xi is from the mean . © Nguyễn Tiến Dũng Applied Statistics for Business 24 i i x x z s   © Nguyễn Tiến Dũng Applied Statistics for Business 25 Chebyshev’s Theorem ●At least (1 - 1/z2) of the data values must be within z standard deviations of the mean, where z is any value greater than 1. ● Implications: ● At least 0.75, or 75%, of the data values must be within z = 2 standard deviations of the mean. ● At least 0.89, or 89%, of the data values must be within z = 3 standard deviations of the mean. ● At least 0.94, or 94%, of the data values must be within z = 4 standard deviations of the mean. © Nguyễn Tiến Dũng Applied Statistics for Business 26 1821 - 1894 Chebyshev’s Inequality Theorem © Nguyễn Tiến Dũng Applied Statistics for Business 27 Empirical Rule ●68% of observations are within 1 std. dev. from the mean. ●95% of observations are within 2 std. dev. from the mean. ●Nearly 100% of observations are within 3 std. dev. from the mean. © Nguyễn Tiến Dũng Applied Statistics for Business 28 Detection of Outliers ● Outliers: Some data points may have unusually large or unusually small values. These extreme values. ● Lower limit = Q1 – 1.5.IQR ● Upper limit = Q3 + 1.5.IQR ● If x(i) < Lower limit  a low outlier ● If x(i) > Upper limit  a high outlier ● If x(i) Q3 + 3.IQR  extreme values ● For example: 1 2 3 4 10 ● Sources of outliers; ● Errors of data records  be corrected ● An inappropriate observation  be removed ● Correctly recorded, but unsual values  be retained, but be noticed © Nguyễn Tiến Dũng Applied Statistics for Business 29 3.4 EXPLORATORY DATA ANALYSIS ● Five number summary 1. Smallest value (Min) 2. First quartile (Q1) 3. Median (Q2) 4. Third quartile (Q3) 5. Largest value (Max) © Nguyễn Tiến Dũng Applied Statistics for Business 30 Boxplot (Box-and-whisker plot) © Nguyễn Tiến Dũng Applied Statistics for Business 31 3.5 MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES ●Covariance: A descriptive measure of the linear association between two variables. ● Sample covariance ● Population covariance © Nguyễn Tiến Dũng Applied Statistics for Business 32 1 ( )( ) N i x i y i xy x y N         Example ●Question: Is there any correlation / relationship between x and y ? © Nguyễn Tiến Dũng Applied Statistics for Business 33 Drawing a scatter diagram © Nguyễn Tiến Dũng Applied Statistics for Business 34 © Nguyễn Tiến Dũng Applied Statistics for Business 35 Interpretation of Sample Covariance © Nguyễn Tiến Dũng Applied Statistics for Business 36 sXY is positive A positive linear relationship sXY is about 0 No apparent relationship sXY is negative A negative linear relationship Correlation Coefficient ●Pearson Product Moment Correlation Coefficient for Sample Data © Nguyễn Tiến Dũng Applied Statistics for Business 37 ●Pearson Product Moment Correlation Coefficient for Population Data © Nguyễn Tiến Dũng Applied Statistics for Business 38 © Nguyễn Tiến Dũng Applied Statistics for Business 39 ● rxy > 0: a positive linear relationship ● rxy < 0: a negative linear relationship ●Absolute value of rxy: from 0 to 1 ● The higher value, the tighter / closer linear relationship ●Excel Application: ● CORREL() function © Nguyễn Tiến Dũng Applied Statistics for Business 40 3.6 THE WEIGHTED MEAN AND WORKING WITH GROUPED DATA ●A simple mean ●A weighted mean ●Calculate: GPA (Grade Point in Average) ● Marks of the courses: x1, x2, , xn ● Credits of the courses: w1, w2, , wn © Nguyễn Tiến Dũng Applied Statistics for Business 41 Grouped Data ●Sample mean for grouped data © Nguyễn Tiến Dũng Applied Statistics for Business 42 Sample Variance © Nguyễn Tiến Dũng Applied Statistics for Business 43 Population Mean and Variance for Grouped Data ●Population mean ●Population variance © Nguyễn Tiến Dũng Applied Statistics for Business 44 Exercises for Homework Section Exercises 3.1 1, 5, 6, 11, 16 – 7, 10 (Excel) 3.2 25, 26 – 27, 32 (Excel) 3.3 37, 41 – 44, 45 (Excel) 3.4 48, 49, 51 – 52, 53 (Excel) 3.5 55, 58 – 57, 59 (Excel) 3.6 - Supplementary 63, 68 (Excel) © Nguyễn Tiến Dũng Applied Statistics for Business 45

Các file đính kèm theo tài liệu này:

stat2015_ch03_1484.pdf