Logarithmic Transformation for Beginners

Unit-free interpretation of association and other benefits

In statistical and machine learning models, the variables are often transformed to natural logarithm. There are a number of benefits to this, which include

unit-free interpretation of the relationship;
variance stabilization;
mitigating the effect of outliers;
linearization; and
closeness to normality.

In this post, I explain the above points in detail with an application. The data and R code are available from here.

1. Change and slope of a function

Consider, for simplicity, Y = 1 + 2X, where Y is the response variable and X is the input variable. We are often interested in how much Y changes in response to a change in X. Let Δ denote the change operator. That is,

ΔY = Y1 - Y0: change of Y from Y0 to Y1; and

ΔX = X1 - X0: change of X from X0 to X1.

Suppose, with our example (Y = 1 + 2 X), X changes from 1 to 3. Then, in response to this, Y change from 3 to 7. That is, ΔY = 4 and ΔX = 2.

The slope (or derivative) measures how much Y changes in response to one-unit change of X. It is defined as

β ≡ ΔY/ΔX,

and β = 2 in our example. A slope coefficient that we encounter in a linear regression or a machine learning model has the same interpretation. The slope is a standardized measure, but it is unit-dependent. That is, interpretation of a slope coefficient requires a careful consideration of their units.

2. Change and slope of the logarithmic function

Consider the function Y = log(X), where log() denotes the natural logarithm.

As plotted above, the function provides a monotonic transformation of X into a smaller scale, applicable for X > 0.

The function has a special property where the slope of the function at a point of X is 1/X. That is,

This means that a change of Y is equal to ΔX/X, which represents a proportional change of X.

As an example, suppose X has increased from 2000 to 2010 (0.5% increase).

As the above table shows, this means

Δlog(X) = log(2010) - log(2000) = 7.606 – 7.601=0.005,

which is equal to (X1-X0)/X0 = (2010–2000)/2000.

That is, 100Δlog(X) = 100ΔX/X, which measures a % change of X, at a given point of X.

In general, for any variable Z, 100Δlog(Z) = 100ΔZ/Z, and it measures a % change of Z, at a given point of Z.

3. log transformation in a linear equation

As a result of the above-mentioned property of the logarithmic function, the log-transformed regressions can be used for a unit-free interpretation of a relationship, as the following table shows:

Table 1

Case 1: both Y and X are not transformed to natural logarithm. In this case, the slope coefficient β measures how much Y changes in response to one-unit change of X. That is, its interpretation depends on the units of Y and X.
Case 2: both Y and X are transformed to natural logarithm. The slope coefficient in this case measures a percentage change of Y in response to 1% change of X. This measure is called the elasticity of Y with respect to X, a unit-free measure of association widely used in economics.
Case 3: only X is transformed to natural logarithm. In this case, the slope coefficient is interpreted as (β/100) unit change of Y in response to 1% change of X.
Case 4: only Y is transformed to natural logarithm. In this case, the slope coefficient is interpreted as 100β% change of Y in response to 1 unit change of X.

Case 2 is useful when both Y and X are continuous variables in different units. Case 3 may be useful when Y takes a negative value or when Y is already expressed in percentage. Case 4 may be used when X is an indicator variable or a discrete variable. Hence, which case to take is up to the researcher, depending on the context of the research.

4. Other benefits of logarithmic transformation

The scale-down effect of the transformation can bring other benefits, which can deliver a more accurate or reliable estimation of the relationship.

When Y and X are in large numbers, the variability of estimation can be excessive. The log transformation monotonically transforms the data into a smaller scale, with a much smaller variability, which in turn can reduce the variability of estimation.
In this process, the effect or influence of outliers can be substantially mitigated.
As a result, the intrinsic relationship can be better revealed with improved linearity than otherwise.
The transformed data can be closer to a normal distribution.

5. Application

I have selected a data set for Chicago house price from Kaggle, which can be accessed from here. The variables include

Price: price of house
Bedroom: number of bedrooms
Space: size of house (in square feet)
Lot: width of a lot
Tax: amount of annual tax
Bathroom: number of bathrooms
Garage: number of garage
Condition: condition of house (1 if good , 0 otherwise)

Note that the units of Price, Lot and Tax variables are not provided in the data source.

Figure 1 below presents Q-Q plots of the Price variable and log(Price).

Figure 1

The blue straight line is the reference line where the sample quantiles exactly match those of a normal distribution, and the blue band indicates a 95% confidence band for the sample quantiles. If a distribution follows a normal distribution, then sample quantiles should be closely located to the reference line. A deviation from the reference line is statistically negligible at the 5% level of significance, if they are within the 95% confidence band.

As clear from Figure 1, the Price variable shows a degree of departure from normality, with a number of sample quantiles outside the 95% confidence band. However, log(Price) has nearly all the sample quantiles values within this band, indicating that the variable becomes closer to a normality as a result of the logarithmic transformation.

Figure 2

Figure 2 above shows the scatter plots of Price against Tax; and log(Price) against log(Tax). With the former, one may argue that the relationship is non-linear, with the presence of several outliers. With the log transformation, the effect of these outliers looks substantially diminished, and the relationship may well be considered to be linear.

Now I run the regression of Price against all other variables as explanatory variables.

Model 1: all variables are included as they are; and
Model 2: all continues variables (Price, Space, Lot, Tax) are transformed to natural logarithm, while other (discrete) variables are included as they are.

The regression results are tabulated in Table 2 below:

Table 2

Both models show sufficiently large R² values of more than 0.70. However, the two values are not comparable because the dependent variables are in different scales.
Model 2 has all coefficients statistically significant at the 5% level. In contrast, Model 1 has two coefficients (those of Tax and Condition) that are statistically insignificant at a conventional level of significance, although the associated variables are economically important.
In Model 1, the coefficient of Tax is small and statistically insignificant; but that of log(Tax) in Model 2 is large and statistically significant. This may be closely related to the observation made in the scatter plots in Figure 2, in relation to linearization by logarithmic transformation.

To interpret the Space coefficient,

from Model 1, a house with 100-square-foot extra space is expected to have a higher price by 1.3 units (other factors being held constant): see Case 1 in Table 1; and
from Model 2, a house with 10% larger space is expected to have a higher price by 1.63% (other factors being held constant): see Case 2 in Table 1.

To interpret the Bathroom coefficient,

from Model 1, a house with an extra bathroom is expected to have a higher price by 7.251 units (other factors being held constant);
from Model 2, a house with an extra bathroom is expected to have a higher price by 13.3% (=100 × 0.133): see Case 4 in Table 1.

Other coefficients can also be interpreted in a similar manner. With logarithmic transformation, the researchers can have unit-free interpretation of association, which is a lot easier to understand and interpret.

To conclude, a logarithmic transformation is useful in statistical modelling and machine learning methods. It is strongly recommended when the researcher wishes to have a unit-free interpretation of association. It can also be useful when the units of data are unknown or difficult to compare. In addition, it can also bring a range of benefits for more accurate and reliable estimation of the model and its parameters.

Jae H. Kim