# Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset. A discomfiting possibility is that these data come from a different population than the one intended to be studied.

1,010
questions

**0**

votes

**0**answers

10 views

### Whether to cap the dependent variable while treating the outliers?

So I am trying to run a linear regression model in R where the objective is to identify what's driving the credit card spends including both primary and secondary. I have a dataset with 10000 obs
I ...

**1**

vote

**1**answer

36 views

### Electrical Consumption Outlier Detection

Suppose you have several years of monthly consumption (kWh) data for 500,000 electrical meters and your job is to look for outlier behavior of various types. How would you approach modeling the meters ...

**8**

votes

**2**answers

1k views

### Why the `cooks.distance()` function doesn't detect an obvious outlier?

I have the next plot:
I want to detect outliers to delete them. I apply next code to detect them and delete them:
...

**0**

votes

**0**answers

51 views

### Detecting outliers and influencial cases in R (plm)

when performing a multiple linear regression, the checking of outliers and influential observations is considered important. Since I am performing a panel analysis with the package 'plm' and have not ...

**0**

votes

**0**answers

29 views

### How can I Include extremely large outliers in analytics?

Like most of us stuck at home, I'm taking time to get back up to speed with machine learning with some pet projects and one of my projects includes trying to use machine learning to predict missing ...

**1**

vote

**1**answer

15 views

### Consequences not fulfilling normality assumption when looking for outliers

Taking a large (n >> 10 000) data set where the population is clearly not normal and detecting/testing for outliers using mean +/- 3 standard deviations.
Multiple colleagues of mine use this ...

**3**

votes

**1**answer

87 views

### In R, how to detect possible outliers in right skewed data assuming Poisson distribution?

I am attempting to identify possible outliers in data which is skewed to the right and I assume it is Poisson distributed. I am a novice in all things statistics, and the following may be utterly ...

**0**

votes

**0**answers

25 views

### Expected Value of Outliers

Suppose there is a server of a website that sometimes breaks down. Over 802 days it went down 18 times. Most of the time the breakdown lasts for an entire day.
We know that the summary stats for the ...

**0**

votes

**0**answers

20 views

### weird outlier in a cox regression model

I'm using normal deviate residual to identify outliers, and I'm confused that my plot seems to suggest that there are unreasonably a lot of outliers...? Has anyone seen something like this?
...

**0**

votes

**1**answer

17 views

### Can I use anomaly detection models as outliers and novelty detection?

Several books that I have read do not distinguish the several models that exist for anomaly and outlier detection.
After I read about these models, I have chosen to detect anomalous events on ...

**0**

votes

**0**answers

26 views

### How can I use statistics to find variables that cause outliers in data?

I have outliers in my dataset with high values. I am trying to figure out which variable actually influence more on those outliers using statistical approaches.
I first thought I can use deviation ...

**0**

votes

**1**answer

40 views

### Removing outliers using the get_outlier function (R studio: repeated measures ANOVA)?

0
I am trying to remove outliers from my dataset:
...

**4**

votes

**0**answers

65 views

### Is this method for comparing whether two distributions have similar outliers studied in the literature?

I am working on a project where I am trying to compare outliers from two different distributions. I came up with a natural seeming measurement, and I want to find out whether there's a name for it or ...

**5**

votes

**2**answers

107 views

### Detect abrupt change in time series

I am trying to detect abrupt change (the "bump") in my data. My end goal is to fit a decline curve that describes the overall trend of a gas well's production rate over time. When fitting my curve, I ...

**0**

votes

**0**answers

25 views

### Robust regression is not helping?

I have run multiple linear regression, using data set with several outliers.
M <- lm(data = data, y ~ x1 + x2 + x3 + x4)
As a result, I have obtained qqplot ...

**0**

votes

**0**answers

35 views

### How to decide which technique to use to treat outliers?

In my mind, there are multiple ways to treat dataset outliers
-> Delete data
-> Transforming using log or Bin
-> using mean median
-> Test separately
...

**1**

vote

**0**answers

18 views

### Outlier identification — 3s control limits or 1.5IQR

I am would like to implement outlier flag (extreme user) identification on our data system. We have a daily database that shows each user usage (in minutes). A simple statistical control would be ...

**3**

votes

**2**answers

68 views

### outlier detection after clustering

I am quite new to data analysis and Machine Learning, that's why I am asking for help for a problem I am facing.
It's an outliers detection problem.
I have a quite big amount of data that I need to ...

**1**

vote

**1**answer

24 views

### Method to determine outliers with a skewed dataset [duplicate]

How can we find outliers in a dataset with a (highly) skewed distribution? With a normal distribution, is it well documented to use 2 x Standard Deviation or the upper boundary of the box plot (1.5 x ...

**0**

votes

**0**answers

12 views

### Log Transforming My TS Data for a First Difference Regression

I'm currently working with a ts of monthly yields where $Yield = \frac{Expense}{Blance}$. I am trying to understand the change in yield given a change in the market rate. My regression is
$Y = \...

**0**

votes

**1**answer

20 views

### Big outlier in dependent variable

I have my data from the official statistics office of my country and I rechecked multiple times already. I have a big outlier skewing all my glm (poisson) modells to the extreme (like 5 times the ...

**0**

votes

**0**answers

23 views

### How to determine if an obersation is significantly different from a distribution

I have a set of thousands of machine learning results like in the figure below. Each result represents the performance of a model on the same test set. All the results, in particular with this metric ...

**1**

vote

**2**answers

38 views

### Is there a convention for how to handle data points that fall on the fences when determining outliers?

If I have a data set where a given measurement is equal to Q1 - 1.5(IQR), is there a convention for how to handle this? Should this be considered an outlier?

**0**

votes

**1**answer

54 views

### which metrics are suitable for density-based clustering validation?

I'm working on a project where I use several clustering methods, mainly density based ones such as hdbscan, optics... I'm looking for a metric to evaluate clustering results that takes into account ...

**0**

votes

**0**answers

10 views

### Are data outliers are data cleaning error for machine learning process?

I found a lot of ways and examples for data cleaning errors, but should we remove dataset outliers from our dataset in machine learning pre-process like data cleaning?
Because sometimes in linear ...

**2**

votes

**0**answers

24 views

### Determining outliers

I have a data set containing 12 concentrations with 8 absorbance readings each. From this I did linear regression of three consecutive concentrations and their absorbances and got 10 slopes. I am ...

**1**

vote

**0**answers

19 views

### rolling removing outliers: include or not include

In the paper "Realized kernels in practice: trades and quotes" by O. E.Bandorff-Nielsen etc. cf.
http://onlinelibrary.wiley.com/doi/full/10.1111/j.1368-423X.2008.00275.x
in the section dedicated to ...

**0**

votes

**0**answers

20 views

### Standardize or Normalize, and dealing with outliers that are affecting diversity of rest of data

Appreciate any help with this, as i'm not really sure what i'm doing!
For my work I've been producing an 'index' that measures entities against 16 different indicators. My plan has been for each ...

**1**

vote

**0**answers

16 views

### Mean Absolute Deviation and data preprocessing

Assume we have data points $x_{1}, \dots, x_{n}, x_{n+1}$. Next, based on Mean Absolute Deviation (MAD) we aim to decide if the last point $x_{n+1}$ is outlier or not.
First, let us compute the MAD:
...

**1**

vote

**0**answers

137 views

### 95% confidence ellipse using Hotelling T-squared in a score plot (R)

I am trying to draw a 95% confidence ellipse using Hotelling T-squared in a score plot of two principal components from a PCA. I have checked that:
http://stackoverflow.com/questions/42637860/pca-...

**2**

votes

**0**answers

53 views

### Resources for learning the time series stuff they donāt (or didnāt) teach you

I at one point, a long time ago, had two years of graduate econometrics focusing on time series, plus more on micro cross-section techniques. I havenāt made much use of the time-series stuff for a ...

**2**

votes

**0**answers

11 views

### How Do I Detect Outliers From Clustered Data Points?

I have a data set on a single variable (say, x). Below is the point plot of the data. From the plot it is seen that data points form few clusters around values, say 4, [1.5,2] and 0. Can we say that ...

**1**

vote

**1**answer

29 views

### Correlation being muddied by outliers

I have a study in which I find a decent correlation: on a quadratic prediction plot between a binary outcome and a continuous x. However, there are a few observations that have numbers that are not ...

**0**

votes

**1**answer

17 views

### Discordance between various methods of multivariate outliers detection

Here is a small "toy example" dataset, with 15 individuals described by 6 variables (this is R language):
...

**0**

votes

**0**answers

39 views

### For outliers treatment, clipping, winsorizing or removing?

I came across three different techniques for treating outliers winsorization, clipping and removing:
Winsorizing: Consider the data set consisting of:
{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, ā40, ...

**0**

votes

**0**answers

11 views

### Interpreting Rlof results

I'm following a tutorial on R, could someone help me to interpret the results confusion matrix of Rlof.
...

**0**

votes

**1**answer

37 views

### Outlier detection using the difference between two z-scores

Long story short: Can you use the difference in z-score of two variables as an outlier detector.
I have this data set which had poor quality data. Lots of measurement/human error and probably also ...

**1**

vote

**1**answer

53 views

### Fix wrong data coming from a sensor

I have data coming from a sensor that I store in a time serie.
When I graph them, I obtain:
These data are supposed to be "continuous", like temperatures, not going up and down so fast.
After ...

**1**

vote

**0**answers

14 views

### Asymmetric robust regression

What are the methods for robust regression with asymmetric distribution of outliers?
I am specifically interested in equivalents of Huber and Tukey M-estimators. However, asymmetric heavy-tailed ...

**1**

vote

**0**answers

28 views

### Dealing with outliers when Inter Quartile Range is 0

I am working with Classification Machine Learning problems and have come across a problem where I have 0 IQR for my data. No matter what technique I use, ...

**0**

votes

**0**answers

14 views

### Detecting the presence outliers, while ignoring pure noise

I would like to detect peaks in a time series, but all too often a bunch of noise gets picked up as well, and fools most algorithm I throw at it. Often, to get it to work, I need to tweak the ...

**0**

votes

**0**answers

12 views

### Outlier detection with EM

I am interested in using expectation maximization for outlier detection. In the literature this is usually done assuming that the data of interest are normally distributed while the outliers are ...

**0**

votes

**1**answer

15 views

### When to use RANSAC?

Does it make sense to use RANSAC-type algorithms (RANSAC, MSAC, MLESAC, etc.) for small data sets (20-30 points)?
On the one hand, all the points need to be accounted for and this can be done with ...

**1**

vote

**1**answer

12 views

### Invertibility of covariance matrix when number of training examples are lesser than number of features

I was trying to study an outlier detection algorithm and realized that in case we use a multinomial Gaussian distribution to model data then the invertibility of Covariance matrix ($\sum$) is ...

**6**

votes

**2**answers

181 views

### ANOVA: life after rejecting the null hypothesis

I have multiple groups of data (20+ groups) and test the null hypothesis that they have the same mean. How do I proceed after the null hypothesis is rejected? What is the standard method for selecting ...

**1**

vote

**0**answers

17 views

### Determine outliers for robust Mahalanobis distance

I want to apply a robust mahal distance and found an implementation in scikit: http://scikit-learn.org/stable/auto_examples/covariance/plot_mahalanobis_distances.html
but there is the number of ...

**1**

vote

**0**answers

14 views

### Detecting multivariate outliers in multiple regression

I am running a multiple regression and wish to screen for outliers. I have the variables Y (outcome), and predictors X1, X2, X3.
For univariate outliers, I can check each variable for z-scored ...

**1**

vote

**1**answer

16 views

### How can a neural net recognize it is out of its training domain?

In a recent kaggle competition with a huge overfitting potential the winning team first searched for features on the training data and after the feature engineering used a Kolmogorov-Smirnov Test for ...

**1**

vote

**0**answers

59 views

### Robust common mean inference

In an ANOVA-like setting I have several groups of variables that I expect to have the same mean, $\mu$. Quite often some of the groups would have shifted means, $\mu + \Delta$, due to effects beyond ...

**2**

votes

**0**answers

42 views

### What happens when we model outliers as dummy's in a VAR-system?

I am wondering what is going on "under the hood" or intuitively of what the implication of modelling outliers as dummy's in a VAR-model.
To make this question more clear I will provide an example.
...