Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset. A discomfiting possibility is that these data come from a different population than the one intended to be studied.

Filter by
Sorted by
Tagged with
0
votes
0answers
10 views

Whether to cap the dependent variable while treating the outliers?

So I am trying to run a linear regression model in R where the objective is to identify what's driving the credit card spends including both primary and secondary. I have a dataset with 10000 obs I ...
1
vote
1answer
36 views

Electrical Consumption Outlier Detection

Suppose you have several years of monthly consumption (kWh) data for 500,000 electrical meters and your job is to look for outlier behavior of various types. How would you approach modeling the meters ...
8
votes
2answers
1k views

Why the `cooks.distance()` function doesn't detect an obvious outlier?

I have the next plot: I want to detect outliers to delete them. I apply next code to detect them and delete them: ...
0
votes
0answers
51 views

Detecting outliers and influencial cases in R (plm)

when performing a multiple linear regression, the checking of outliers and influential observations is considered important. Since I am performing a panel analysis with the package 'plm' and have not ...
0
votes
0answers
29 views

How can I Include extremely large outliers in analytics?

Like most of us stuck at home, I'm taking time to get back up to speed with machine learning with some pet projects and one of my projects includes trying to use machine learning to predict missing ...
1
vote
1answer
15 views

Consequences not fulfilling normality assumption when looking for outliers

Taking a large (n >> 10 000) data set where the population is clearly not normal and detecting/testing for outliers using mean +/- 3 standard deviations. Multiple colleagues of mine use this ...
3
votes
1answer
87 views

In R, how to detect possible outliers in right skewed data assuming Poisson distribution?

I am attempting to identify possible outliers in data which is skewed to the right and I assume it is Poisson distributed. I am a novice in all things statistics, and the following may be utterly ...
0
votes
0answers
25 views

Expected Value of Outliers

Suppose there is a server of a website that sometimes breaks down. Over 802 days it went down 18 times. Most of the time the breakdown lasts for an entire day. We know that the summary stats for the ...
0
votes
0answers
20 views

weird outlier in a cox regression model

I'm using normal deviate residual to identify outliers, and I'm confused that my plot seems to suggest that there are unreasonably a lot of outliers...? Has anyone seen something like this? ...
0
votes
1answer
17 views

Can I use anomaly detection models as outliers and novelty detection?

Several books that I have read do not distinguish the several models that exist for anomaly and outlier detection. After I read about these models, I have chosen to detect anomalous events on ...
0
votes
0answers
26 views

How can I use statistics to find variables that cause outliers in data?

I have outliers in my dataset with high values. I am trying to figure out which variable actually influence more on those outliers using statistical approaches. I first thought I can use deviation ...
0
votes
1answer
40 views

Removing outliers using the get_outlier function (R studio: repeated measures ANOVA)?

0 I am trying to remove outliers from my dataset: ...
4
votes
0answers
65 views

Is this method for comparing whether two distributions have similar outliers studied in the literature?

I am working on a project where I am trying to compare outliers from two different distributions. I came up with a natural seeming measurement, and I want to find out whether there's a name for it or ...
5
votes
2answers
107 views

Detect abrupt change in time series

I am trying to detect abrupt change (the "bump") in my data. My end goal is to fit a decline curve that describes the overall trend of a gas well's production rate over time. When fitting my curve, I ...
0
votes
0answers
25 views

Robust regression is not helping?

I have run multiple linear regression, using data set with several outliers. M <- lm(data = data, y ~ x1 + x2 + x3 + x4) As a result, I have obtained qqplot ...
0
votes
0answers
35 views

How to decide which technique to use to treat outliers?

In my mind, there are multiple ways to treat dataset outliers -> Delete data -> Transforming using log or Bin -> using mean median -> Test separately ...
1
vote
0answers
18 views

Outlier identification — 3s control limits or 1.5IQR

I am would like to implement outlier flag (extreme user) identification on our data system. We have a daily database that shows each user usage (in minutes). A simple statistical control would be ...
3
votes
2answers
68 views

outlier detection after clustering

I am quite new to data analysis and Machine Learning, that's why I am asking for help for a problem I am facing. It's an outliers detection problem. I have a quite big amount of data that I need to ...
1
vote
1answer
24 views

Method to determine outliers with a skewed dataset [duplicate]

How can we find outliers in a dataset with a (highly) skewed distribution? With a normal distribution, is it well documented to use 2 x Standard Deviation or the upper boundary of the box plot (1.5 x ...
0
votes
0answers
12 views

Log Transforming My TS Data for a First Difference Regression

I'm currently working with a ts of monthly yields where $Yield = \frac{Expense}{Blance}$. I am trying to understand the change in yield given a change in the market rate. My regression is $Y = \...
0
votes
1answer
20 views

Big outlier in dependent variable

I have my data from the official statistics office of my country and I rechecked multiple times already. I have a big outlier skewing all my glm (poisson) modells to the extreme (like 5 times the ...
0
votes
0answers
23 views

How to determine if an obersation is significantly different from a distribution

I have a set of thousands of machine learning results like in the figure below. Each result represents the performance of a model on the same test set. All the results, in particular with this metric ...
1
vote
2answers
38 views

Is there a convention for how to handle data points that fall on the fences when determining outliers?

If I have a data set where a given measurement is equal to Q1 - 1.5(IQR), is there a convention for how to handle this? Should this be considered an outlier?
0
votes
1answer
54 views

which metrics are suitable for density-based clustering validation?

I'm working on a project where I use several clustering methods, mainly density based ones such as hdbscan, optics... I'm looking for a metric to evaluate clustering results that takes into account ...
0
votes
0answers
10 views

Are data outliers are data cleaning error for machine learning process?

I found a lot of ways and examples for data cleaning errors, but should we remove dataset outliers from our dataset in machine learning pre-process like data cleaning? Because sometimes in linear ...
2
votes
0answers
24 views

Determining outliers

I have a data set containing 12 concentrations with 8 absorbance readings each. From this I did linear regression of three consecutive concentrations and their absorbances and got 10 slopes. I am ...
1
vote
0answers
19 views

rolling removing outliers: include or not include

In the paper "Realized kernels in practice: trades and quotes" by O. E.Bandorff-Nielsen etc. cf. http://onlinelibrary.wiley.com/doi/full/10.1111/j.1368-423X.2008.00275.x in the section dedicated to ...
0
votes
0answers
20 views

Standardize or Normalize, and dealing with outliers that are affecting diversity of rest of data

Appreciate any help with this, as i'm not really sure what i'm doing! For my work I've been producing an 'index' that measures entities against 16 different indicators. My plan has been for each ...
1
vote
0answers
16 views

Mean Absolute Deviation and data preprocessing

Assume we have data points $x_{1}, \dots, x_{n}, x_{n+1}$. Next, based on Mean Absolute Deviation (MAD) we aim to decide if the last point $x_{n+1}$ is outlier or not. First, let us compute the MAD: ...
1
vote
0answers
137 views

95% confidence ellipse using Hotelling T-squared in a score plot (R)

I am trying to draw a 95% confidence ellipse using Hotelling T-squared in a score plot of two principal components from a PCA. I have checked that: http://stackoverflow.com/questions/42637860/pca-...
2
votes
0answers
53 views

Resources for learning the time series stuff they don’t (or didn’t) teach you

I at one point, a long time ago, had two years of graduate econometrics focusing on time series, plus more on micro cross-section techniques. I haven’t made much use of the time-series stuff for a ...
2
votes
0answers
11 views

How Do I Detect Outliers From Clustered Data Points?

I have a data set on a single variable (say, x). Below is the point plot of the data. From the plot it is seen that data points form few clusters around values, say 4, [1.5,2] and 0. Can we say that ...
1
vote
1answer
29 views

Correlation being muddied by outliers

I have a study in which I find a decent correlation: on a quadratic prediction plot between a binary outcome and a continuous x. However, there are a few observations that have numbers that are not ...
0
votes
1answer
17 views

Discordance between various methods of multivariate outliers detection

Here is a small "toy example" dataset, with 15 individuals described by 6 variables (this is R language): ...
0
votes
0answers
39 views

For outliers treatment, clipping, winsorizing or removing?

I came across three different techniques for treating outliers winsorization, clipping and removing: Winsorizing: Consider the data set consisting of: {92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, ...
0
votes
0answers
11 views

Interpreting Rlof results

I'm following a tutorial on R, could someone help me to interpret the results confusion matrix of Rlof. ...
0
votes
1answer
37 views

Outlier detection using the difference between two z-scores

Long story short: Can you use the difference in z-score of two variables as an outlier detector. I have this data set which had poor quality data. Lots of measurement/human error and probably also ...
1
vote
1answer
53 views

Fix wrong data coming from a sensor

I have data coming from a sensor that I store in a time serie. When I graph them, I obtain: These data are supposed to be "continuous", like temperatures, not going up and down so fast. After ...
1
vote
0answers
14 views

Asymmetric robust regression

What are the methods for robust regression with asymmetric distribution of outliers? I am specifically interested in equivalents of Huber and Tukey M-estimators. However, asymmetric heavy-tailed ...
1
vote
0answers
28 views

Dealing with outliers when Inter Quartile Range is 0

I am working with Classification Machine Learning problems and have come across a problem where I have 0 IQR for my data. No matter what technique I use, ...
0
votes
0answers
14 views

Detecting the presence outliers, while ignoring pure noise

I would like to detect peaks in a time series, but all too often a bunch of noise gets picked up as well, and fools most algorithm I throw at it. Often, to get it to work, I need to tweak the ...
0
votes
0answers
12 views

Outlier detection with EM

I am interested in using expectation maximization for outlier detection. In the literature this is usually done assuming that the data of interest are normally distributed while the outliers are ...
0
votes
1answer
15 views

When to use RANSAC?

Does it make sense to use RANSAC-type algorithms (RANSAC, MSAC, MLESAC, etc.) for small data sets (20-30 points)? On the one hand, all the points need to be accounted for and this can be done with ...
1
vote
1answer
12 views

Invertibility of covariance matrix when number of training examples are lesser than number of features

I was trying to study an outlier detection algorithm and realized that in case we use a multinomial Gaussian distribution to model data then the invertibility of Covariance matrix ($\sum$) is ...
6
votes
2answers
181 views

ANOVA: life after rejecting the null hypothesis

I have multiple groups of data (20+ groups) and test the null hypothesis that they have the same mean. How do I proceed after the null hypothesis is rejected? What is the standard method for selecting ...
1
vote
0answers
17 views

Determine outliers for robust Mahalanobis distance

I want to apply a robust mahal distance and found an implementation in scikit: http://scikit-learn.org/stable/auto_examples/covariance/plot_mahalanobis_distances.html but there is the number of ...
1
vote
0answers
14 views

Detecting multivariate outliers in multiple regression

I am running a multiple regression and wish to screen for outliers. I have the variables Y (outcome), and predictors X1, X2, X3. For univariate outliers, I can check each variable for z-scored ...
1
vote
1answer
16 views

How can a neural net recognize it is out of its training domain?

In a recent kaggle competition with a huge overfitting potential the winning team first searched for features on the training data and after the feature engineering used a Kolmogorov-Smirnov Test for ...
1
vote
0answers
59 views

Robust common mean inference

In an ANOVA-like setting I have several groups of variables that I expect to have the same mean, $\mu$. Quite often some of the groups would have shifted means, $\mu + \Delta$, due to effects beyond ...
2
votes
0answers
42 views

What happens when we model outliers as dummy's in a VAR-system?

I am wondering what is going on "under the hood" or intuitively of what the implication of modelling outliers as dummy's in a VAR-model. To make this question more clear I will provide an example. ...

1
2 3 4 5
21