Bootstrap Tests for Beginners

Non-parametric tests for beginners: Part 2

In Part 1 of this series, I have presented simple rank and sign tests as an introduction to non-parametric tests. As mentioned in Part 1, the bootstrap also is a popular non-parametric method for statistical inference, based on re-sampling of observed data. It has gained a wide popularity (especially in academia), since Bradley Efron first introduced in the 1980’s. Efron and Tibshirani (1994) provide an introductory and comprehensive survey of the bootstrap method. Its application has been extensive in the fields of statistical science, with the above book attracting more than 50,000 Google Scholar citations to date.

In this post, I present the bootstrap method for beginners in an intuitive way, with simple examples and R code.

Introduction

As mentioned in Part 1, the key elements of hypothesis testing include

The null and alternative hypotheses (H0 and H1)
Test statistic
Sampling distribution of the test statistic under H0
Decision rule (p-value or critical value, at a given level of significance)

In generating the sampling distribution of a test statistic,

the parametric tests (such as the t-test or F-test) assume that the population follows a normal distribution. If the population is non- normal, then a normal distribution is used as an approximation to the sampling distribution, by virtue of the central limit theorem (called asymptotic normal approximation);
the rank and sign tests use rank and signs of the data points to generate the exact sampling distribution, as discussed in Part 1;
the bootstrap generates or approximate the sampling distribution of a statistic, based on resampling the observed data (with replacement), in a similar way where the samples are taken randomly and repeatedly from the population.
As with the rank and sign tests, the bootstrap does not require normality of the population or asymptotic normal approximation based on the central limit theorem.
In its basic form, the bootstrap requires pure random sampling from a population of fixed mean and variance (without normality), although there are the bootstrap methods applicable to dependent or heteroskedastic data.

In this post, the basic bootstrap method for the data generated randomly from a population is presented with examples. For the bootstrap methods for more general data structure, their brief details and R resources are presented in a separate section.

Toy Examples for the bootstrap

Example 1: X = (1, 2, 3)

Suppose a researcher observes a data set X = (1, 2, 3) with the sample mean of 2 and standard deviation (s) of 1. Assuming a normal population, the sampling distribution of the sample mean (Xbar) under H0: μ = 2 is

where s = 1 and μ is the population mean. This means that, under normal approximation, the sample mean follows a normal distribution with mean 2 and and variance of 1/3.

The bootstrap resamples the observed data X = (1, 2, 3) with replacement, giving equal probability of 1/3 to its members. Table 1 below presents all 27 possible outcomes of these resamples (or pseudo-data) X* = (X1*, X2*, X3*) with the mean values from each outcomes.

Table 1: Sampling with Replacement from X (Image Created by the Author)

The mean of these 27 outcomes is 2 and the variance is 0.23. The distribution of the sample means from these X*’s represents the exact bootstrap distribution, which are plotted in Figure 1 below:

Figure 1: Exact bootstrap distribution and its density estimate (Image Created by the Author)

From these two examples, we can state the following points:

Example 1 is the case where the data set X is exactly symmetric around its mean. The bootstrap sampling distribution for the sample mean is also symmetric, well approximated by a normal distribution.
Example 2 is the case where the data set X is asymmetric around its mean, which is well-reflected in the shape of the bootstrap sampling distribution. However, the normal distribution is unable to reflect this asymmetry.
Given that the population distribution is unknown in these examples, it is difficult to assess whether bootstrap distribution is a better representation of the true sampling distribution of the sample mean.
However, we observe that the bootstrap has ability to reflect possible asymmetry in the population distribution, which asymptotic normal approximation is unable to capture.

Note that the bootstrap is able to capture many non-normal properties of a population, such as asymmetry, fat-tail, and bi-modality, which cannot be captured by a normal approximation.

Many academic studies that compare the bootstrap and asymptotic normal approximation provide strong evidence that the bootstrap in general performs better, in capturing the features of the true sampling distribution, especially when the sample size is small. They report that, as the sample size increases, the two methods show similar properties, which means that the bootstrap should in general be preferred when the sample size is small.

Bootstrapping for X = (X1, …, Xn)

The above toy examples present the case where n = 3, where we are able to obtain the exact bootstrap distribution for all 27 possible resamples. Noting that the number of all possible resamples is nⁿ, calculating the exact bootstrap distribution with nⁿ resamples as above may be too computationally burdensome, for a general value of n. However, this process is not necessary, because a Monte Carlo simulation can provide a fairly accurate approximation to the exact bootstrap distribution.

Suppose the data X is obtained randomly from a population with fixed mean and variance. Suppose the statistic of interest, such as the sample mean or t-statistic, is denoted as T(X). Then,

we obtain X* = (X₁*, …, Xₙ*) by resampling with replacement from X, purely randomly giving the equal probability to each member of X.
Since we cannot do this for all possible nⁿ resamples, we repeat the above sufficiently many times B, such as 1000, 5000, or 10000. By doing this, we have B different sets of X*, which can be written as {X*(i)}, where i = 1, …, B.
From each X*(i), the statistic of interest [T(X*)] is calculated. Then we have {T(X*,i)} (i = 1,…., B), where T*(X*,i) is T(X*) calculated from X*(i).

The bootstrap distribution {T(X*,i)} is used as an approximation to the exact bootstrap distribution, as well as to the unknown sampling distribution of T.

As an example, I have generated X = (X1, …, X20) from

the F-distribution with 2 and 10 degrees of freedom [F(2,10)],
chi-squared with 3 degrees of freedom [chisq(3)],
Student-t with 3 degrees of freedom [t(3)], and
log-normal distribution with mean 0 and variance 1 (lognormal).

Figure 3 below plots the density estimates of {T(X*,i)}(i = 1,…., B), where T is the mean and B = 10000, in comparison with the densities of the normal distribution with the mean and variance values corresponding to those of X. The bootstrap distributions can be different from the normal distribution, especially when the underlying population distribution departs substantially from a normal distribution.

Figure 3: Bootstrap Distribution (red) vs. Normal Distribution (black) (Image Created by the Author)

The R code for the above Monte Carlo simulations and plots are given below:

n=20    # Sample size
set.seed(1234)
pop = "lognorm"      # population type
if (pop=="F(2,10)") x=rf(n,df1=2,df2=10)
if (pop=="chisq(3)") x=rchisq(n,df=3)
if (pop=="t(3)") x=rt(n,df=3)
if (pop=="lognorm") x=rlnorm(n)

# Bootstrapping sample mean 
B=10000          # number of bootstrap iterations 
stat=matrix(NA,nrow=B)
for(i in 1:B){
  xboot=sample(x,size=n,replace = TRUE)
  stat[i,] = mean(xboot)
}

# Plots
plot(density(stat),col="red",lwd=2,main=pop,xlab="")
m=mean(x); s=sd(x)/sqrt(n)
curve(dnorm(x,mean=m,sd=s),add=TRUE, yaxt="n")
rug(stat)

The bootstrap test and analysis are conducted based on the red curves above, which are {T(X*,i)}, instead of normal distributions in black.

Inferential statistics such as the confidence interval or p-value are obtained from {T(X*,i)}, in the same way as we do using a normal distribution.
Bootstrap distribution can reveal further and more detailed information, such as the symmetry, fat-tail, non-normality, bi-modality, and presence of outliers, regarding the properties of the population.

Suppose T(X) is the sample mean as above.

The bootstrap confidence interval for the population mean can be obtained by taking appropriate percentiles of {T(X*,i)}. For example, let {T(X*,i;θ)} be the θth percentile of {T(X*,i)}. Then, the 95% bootstrap confidence interval obtained as the interval [{T(X*,i;2.5)},{T(X*,i;97.5)}].

Suppose T(X) is the t-test statistic for H0: μ = 0 against H0: μ > 0. Then, the bootstrap p-value is calculated as the proportion of {T(X*,i)} greater than the T(X) value from the original sample. That is, the p-value is calculated analogously to the case of normal distribution, depending on the structure of H1.

Table 2: Bootstrap vs. Normal 95% Confidence Intervals (Image Created by the Author)

Table 2 above presents the bootstrap confidence interval in comparison with asymptotic normal confidence interval, both with 95% confidence. The two alternatives provide the similar intervals when the population distribution is t(3) or chisq(3), but they can be quite different when the population follows F(2,10) or lognorm distributions.

Bootstrapping t-test in R

The bootstrap method can be applied to one-sample and two-sample t-tests. In this case, the test statistic of interest T(X) is the t-test statistics, and its bootstrap distribution can be obtained as above. In R, the package “MKinfer” provides the functions for the bootstrap tests.

Let us consider X and Y in the example used in Part 1:

x = c(-0.63, 0.18,-0.84,1.60,0.33, -0.82,0.49,0.74,0.58,-0.31,
      1.51,0.39,-0.62,-2.21,1.12,-0.04,-0.02,0.94,0.82,0.59)

y=c(1.14,0.54,0.01,-0.02,1.26,-0.29,0.43,0.82,1.90,1.51,
    1.83,2.01,1.37,2.54,3.55, 3.99,5.28,5.41,3.69,2.85)

# Install MKinfer package
library(MKinfer)
# One-sample test for X with H0: mu = 0
boot.t.test(x,mu=0)
# One-sample test for Y with H0: mu = 1
boot.t.test(y,mu=1)
# Two-sample test for X and Y with H0: mu(x) - mu(y) = -1
boot.t.test(x,y,mu=-1)

The results are summarized in the table below (all tests assuming two-tailed H1):

Table 3: 95% Confidence Intervals and p-values (Image Created by the Author)

To test for μ(X) = 0, the sample mean of X is 0.19 and the t-statistic is 0.93. The bootstrap and asymptotic confidence intervals and p-values provide similar inferential outcomes of failure to reject H0, but the bootstrap confidence interval is tighter.
To test for μ(Y) = 1, the sample mean of Y is 1.99 and the t-statistic is 2.63. The bootstrap and asymptotic confidence intervals and p-values provide similar inferential outcomes of rejecting H0 at he 5% significance level, but the bootstrap confidence interval is tighter with a lower p-value.
To test for H0: μ(X) — μ(Y) = — 1, the mean difference between X and Y is -1.80 and the t-statistic is -1.87. The bootstrap and asymptotic confidence intervals and p-values provide similar inferential outcomes of rejecting H0 at the 10% significance level.

Bootstrap methods for more general data structure

As mentioned above, the bootstrap methods also have been developed for linear regression model, time series forecasting, and for the data with more general structures. Several important extensions of the bootstrap methods are summarized below:

For the linear regression model, the bootstrap can be conducted by resampling the residuals or by resampling the cases: see the “car” package in R.
The bootstrap can be applied to time series forecasting based on autoregressive model: see “BootPR” package in R.
For time series data with unknown structure of serial dependence, the stationary bootstrap (or moving block bootstrap) may be used. This involves resampling blocks of time series observations. The R package “tseries” provides a function for this method.
For data with heteroskedasticity of unknown form, the wild bootstrap can be used, using the R package “fANCOVA”. It resamples the data by scaling with a random variable with zero mean and unit variance so that the heteroskedastic structure is effectively replicated.

Conclusion

This post has reviewed the bootstrap method as a non-parametric test where repetitive resampling of the observed data is used as a way of calculating or approximating the sampling distribution of a statistic. Although only the bootstrap method for confidence interval and p-value for the test for the population mean are covered in this post, application of the bootstrap is extensive, ranging from regression analysis to time series data with unknown dependence structure.

Many academic studies have reported the theoretical or computational results that the bootstrap test often outperforms the asymptotic normal approximation, especially when the sample size is small or moderate.

Hence, in small samples, researchers in statistical and machine learning are strongly recommended to use the bootstrap as a useful alternative to the conventional statistical inference based on asymptotic normal approximation.

Jae H. Kim