In statistics, the **mean **of a dataset is the mean value. It’s beneficial to know because it offers us one idea of where the “center” of the dataset is located. It is calculated making use of the basic formula:

**mean** = (sum of observations) / (number of observations)

For example, expect we have actually the adhering to dataset:

<1, 4, 5, 6, 7>

The average of the dataset is (1+4+5+6+7) / (5) = **4.6**

But if the median is a useful and also easy to calculate, it does have one drawback: **It have the right to be impacted by outliers**. In particular, the smaller the dataset, the more that an outlier could affect the mean.

You are watching: How does an outlier affect the mean

To show this, take into consideration the following standard example:

Ten men are sitting in a bar. The average income of the ten males is $50,000. All of sudden one man walks out and also Bill gates walks in. Now the average revenue of the ten guys in the bar is $40 million.

This example shows exactly how one outlier (Bill Gates) can drastically affect the mean.

**Small & big Outliers**

An outlier can affect the average by gift unusually tiny or person that is abnormal large. In the vault example, Bill gates had one unusually big income, which led to the median to be misleading.

However, one unusually tiny value deserve to also impact the mean. To illustrate this, think about the following example:

Ten students take an exam and also receive the adhering to scores:

<0, 88, 90, 92, 94, 95, 95, 96, 97, 99>

The mean score is **84.6**.

However, if we remove the “0” score native the dataset, climate the median score becomes **94**.

The one unusually low score that one college student drags the average down for the whole dataset.

**Sample size & Outliers**

The smaller sized the sample size of the dataset, the much more an outlier has actually the potential to impact the mean.

For example, suppose we have a dataset of 100 test scores where every one of the students score at least a 90 or greater except because that one student that scored a zero:

<**0**, 90, 90, 92, 94, 95, 95, 96, 97, 99, 94, 90, 90, 92, 94, 95, 95, 96, 97, 99, 93, 90, 90, 92, 94, 95, 95, 96, 97, 99, 93, 90, 90, 92, 94, 95, 95, 96, 97, 99, 93, 90, 90, 92, 94, 95, 95, 96, 97, 99, 93, 90, 90, 92, 94, 95, 95, 96, 97, 99, 93, 90, 90, 92, 94, 95, 95, 96, 97, 99, 93, 90, 90, 92, 94, 95, 95, 96, 97, 99, 93, 90, 90, 92, 94, 95, 95, 96, 97, 99, 93, 90, 90, 92, 94, 95, 95, 96, 97, 99>

The typical turns out to be **93.18**. If we gotten rid of the “0” from the dataset, the typical would be **94.12**. This is a reasonably small difference. This reflects that even severe outlier only has a little effect if the dataset is huge enough.

**How to manage Outliers**

If you’re worried that an outlier is existing in your dataset, you have a couple of options:

**Make sure the outlier is no the result of a data entrance error.**Sometimes an individual just enters the dorn data value when recording data. If one outlier is present, an initial verify the the value was gotten in correctly and that that wasn’t one error.

**Remove the outlier.**If the worth is a true outlier, you may pick to remove it if that will have a far-ranging impact ~ above your as whole analysis. Simply make sure to mention in your last report or analysis that you eliminated an outlier.

**Use the Median**

Another means to uncover the “center” that a dataset is come use **the median**, i m sorry is found by arranging all of the individual values in a dataset from smallest to largest and also finding the middle value.

Because that the means it is calculated, the mean is less influenced by outliers and it does a far better job of recording the central location that a distribution when there room outliers present.

See more: When Does High School Basketball Season End, When Does The High School Basketball Season End

For example, consider the complying with chart that reflects the square footage of homes in a particular neighborhood:

The average is greatly influenced by a pair extremely large houses, if the average is not. Thus, the average does a better job of capturing the “typical” square clip of a residence in this neighborhood compared come the mean.

**Further Reading:**

**Measures of main Tendency – Mean, Median, and also ModeDixon’s Q Test because that Detecting OutliersOutlier Calculator**