# Questions tagged [data-transformation]

Mathematical re-expression, often nonlinear, of data values. Data are often transformed either to meet the assumptions of a statistical model or to make the results of an analysis more interpretable.

1,842 questions
Filter by
Sorted by
Tagged with
6 views

### Is there I guide to decide which transformation to choose for different scenarios/ types of data and distribution?

1) how do i decide which transformation or scaling to use before passing our data into machine learning model. Can someone please guide me on which transformation to use in different situations. There ...
5 views

### What is the best way to convert a graded scale (A to G) to a numeric scale to be used in a composite index?

I'm creating a composite index and one of my indicators ranks countries in terms of grades (A, B, C, D, E, F, G). The grades come from a purely qualitative (but thorough) analysis which does not ...
12 views

### Should I impute the missing values of timeseries data?

I have the following task - predicting the next 12 hours of PM10 particles based on historical data of previous 24 hours of PM10, O3 (ozone), CO (carbon monoxide), and others (not included) using RNN'...
7 views

### Transforming mean-absolutes to mean differences for meta-analysis

For a meta-analysis project, I have been tasked to submit data in Reviewmanager software. However, often I come across papers that report on average pain or ...
12 views

### Will change in standard deviation impact covariance?

If we increase the degree of standard deviation of one variable, does it affect covariance of two variables? Example, two variables are there, A & B, the covariance of A & B is 100, and the ...
20 views

### Multiple Regression Analysis Beginner

Background: I am using an instrument that measures two physical properties, X1~Temperture and X2~ Velocity. When gathering the data to make the curve a set of predetermined concentrations are chosen ...
13 views

### Mutliple Regression Calibration Curve

Background: I am using an instrument that measures two physical properties, X1~Temperture and X2~ Velocity. When gathering the data to make the curve a set of predetermined concentrations are chosen ...
28 views

### Where does the Box-Cox Transformation actually come from?

I'm trying to figure out where the actual box-cox transformation comes from. I've looked at the original paper, and some of it's references, but for the most part, it seems that they just drop the ...
18 views

### About scaling of data in political science

Sometimes we will see a survey about social and political opinions and social opinions, the author is trying to combine the polling results, fit them into a curve and make some conclusions. Let's say ...
41 views

### How to prove a multivariate r.v. does not follow the nonparanormal distribution?

Background You may find the definition of the non-paranormal distribution at the 2nd paragraph in p.2296 of this paper. In short, $(X_1, \ldots, X_p)$ is non-paranormal if there exists a set of ...
5 views

### Reason for transformation of b variable in Boston Housing dataset

In the Boston Housing dataset (see http://www.rdocumentation.org/packages/mlbench/versions/2.1-1/topics/BostonHousing for details), one of the variables is $b = 1000(B - 0.63)^2$ where $B$ is the ...
4 views

### Pivot table where I have two time-series mixed [closed]

I have a data frame where I have two codes a,b that are represented in time-series like this ...
16 views

### Regression: Is it bad practice to use log difference as approximation for % difference when changes are large?

I'm running a vector autoregression model with quarterly IPOs as one of the variables. Since the number of IPOs isn't stationary, I took the log first difference to make it stationary. However, I ...
29 views

### How can I Include extremely large outliers in analytics?

Like most of us stuck at home, I'm taking time to get back up to speed with machine learning with some pet projects and one of my projects includes trying to use machine learning to predict missing ...
9 views

### Regression interpretation after transformation of independent and dependent variable [duplicate]

How do I interpret the regression output (coefficients), when I have transformed one of the independent variables (lg10) and have transformed the dependent variable (sqrt) as well?
21 views

### Transformation and linear regression

I'm running a multivariate regression to analyze the relationship between two variables, adjusted by other remarkable variables (based on previous data). My hypothesis is that their relationship is ...
32 views

### Do I use the mean vector from my training set to center my testing set when dimension reducing for classification?

Please let me know if this is the right place to ask this (or if any of my tags are wrong) or if I need to write this any differently. Do I use the mean vector from my training set to center my ...
19 views

### What to do when a value in the testing set is bigger than the max value used to min-max normalize the training set building a histogram classifier

Please let me know what to do when there is a value in the testing set is bigger than the max value used to min-max normalize the training set building a histogram classifier. Do I go back and change ...
20 views

10 views

### How to use target encoding : expanding mean on the test set

The expanding mean is a way to prevent overfitting when performing target encoding. But what I do not understand is how to use ...
25 views

### Right skewed distribution of a continuous variable with outliers: replace outliers with mode or median of that column?

When I replace my outliers with the median value of that column/feature, my mode for that column/feature also changes. Is that correct?
23 views

### Mean and variance preserving skewness 'spread'

This is essentially a request for references in case what I am describing is studied somewhere, to avoid trying to come up with the machinery myself. Heuristically, what I want to do is take some ...
29 views

### R - transpose dataframe from existing data frame and convert it to time-series [closed]

I'm beginning with R and I would like to transpose the following data frame into another dataframe with the column names being the company names and the vector values for each column (company names) ...
28 views

### How to adjust/normalize/standardize mean? [closed]

I am making a reviews/ratings section for a website, with ratings that range from 0-5 stars. I am not confident that the users of this system will all have the same idea of what these stars mean, so I'...
25 views

### Should I use log transformed pharmacokinetic data or use GLM gamma regression with log link?

I was taught, that when we deal with data of multiplicative nature, following the log-normal distribution, like in pharmacokinetic analyses, we should log the data first to enable classic parametric ...
46 views

### How to reduce kurtosis of data

I'm trying to reduce the kurtosis of my dataset and make it approximately Gaussian, with a common-sense uni-modal shape. The raw data looks like this: I first tried ...
19 views

### I need to normalize this distribution, but cannot identify it

I have this distribution that I need to normalize for comparison between sub-populations. I thought it might be lognormal, but the kurtosis of the log product is still very high. How do I go about ...