Variability-Bertelsmann Udacity Data Science

Jida Asare
3 min readJun 20, 2018

--

Q1 : The first quartile is the point where 25% of the distribution is below that point, and 75% of the data is above that point.

Q2: The second quartile is the point where 50% of the distribution is below that point, and 50% of the data is above that point.

Q3 : The third quartile is the point where 75% of the distribution is below that point, and 25% of the data is above that point.

Interquartile Range(IQR): This is the difference between Q3 and Q1.

Outliers are extreme data values.

Formula for determining outliers:

outlier < Q1 - 1.5(IQR) or outlier > Q3 + 1.5(IQR)

Therefore in determing outliers, data sets inbetween these two values(calculated) are not outliers whereas values beyond these values are outliers.

Box Plots or Box and Whisker plots are alternate means of visualizing data. They are used to represent data in a box format whereas outliers are denoted by a dot.

Using the box model, you cannot tell if a distribution is Normal, Bimodal or Uniform and that is a disadvantage of the box model.

Image of a sample box plot. source:https://datavizcatalogue.com/methods/box_plot.html

Variance: Mean of squared deviations

sample variance. source : https://study.com/academy/lesson/population-sample-variance-definition-formula-examples.html

Standard Deviation: This is the square root of the mean squared deviations

For a normal distribution:

68% data falls within 1 standard deviation. This means
mean -standard deviation<x(68% of data)<mean + standard deviation

95% of data falls within 2 standard deviation. This means

mean -(2*standard deviation)<x(95% of data)<mean + (2*standard deviation)

To get an accurate value for the standard deviation so that the standard deviation for a sample is similar to that for the population, the Bessels correction is used where 1 is deducted from the sample size. Hence the corrected formula for standard deviation which is known as the Sample Standard Deviation is:

Formula for standard deviation. Source: https://www.albert.io/blog/standard-deviation-ap-statistics-crash-course-review/

Formula for determining number of standard deviations from the mean.

(u-x)/standard deviation.

Standardizing any value on the x-axis gives the z-score. The z-score is the number of standard deviations any value is from the mean.

Negative z score means that the original value(x) is less than the mean(u). Whereas if the z score is positive, it means that the original value(x) is greater than the mean(u).

The mean for the standardized values(z score) is zero. Where as the standard deviation for the standardized values is one. This is for a normalized distribution.

To standardize values, subtract the mean shifting it to zero and divide by the standard deviation which makes the standard deviation one. This is called the standard normal distribution with mean zero and the standard deviation one.

Therefore every data set is written in terms of the number of standard deviations from the mean.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response