Robust Estimation of Average Treatment Effects with Observational Studies

Li XIAO; Peichao YU

doi:10.1051/wujns/2024292117

Open Access

Issue		Wuhan Univ. J. Nat. Sci. Volume 29, Number 2, April 2024


Page(s)		117 - 124
DOI		https://doi.org/10.1051/wujns/2024292117
Published online		14 May 2024

Wuhan University Journal of Natural Sciences, 2024, Vol.29 No.2, 117-124

Mathematics

CLC number: O212.1

Robust Estimation of Average Treatment Effects with Observational Studies

Li XIAO¹ and Peichao YU²^†

¹ Department of Physical Education, Guilin University of Aerospace Technology, Guilin 541004, Guangxi, China
² School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, Hubei, China

^† Corresponding author. E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Received: 25 August 2023

Abstract

Estimating treatment effects has always been one of the hot issues in empirical research. It brings great challenges to estimating treatment effects because heterogeneity exists in the distribution of covariates between treated and controlled groups. Propensity score methods have been widely used to adjust for heterogeneity in observational studies. However, the propensity score is usually unknown and needs to be estimated. In this article, we propose a generalized single-index model to estimate the propensity score and use the propensity score residuals to reduce the estimation bias. The finite-sample performance of the proposed method is evaluated through simulation studies. We use the proposed method to evaluate the policy of "Sunshine Running" and find that the physical test scores of college students participating in the "Sunshine Running" can be improved by 3.72 points.

Key words: treatment effect / propensity score / generalized single-index model / partial linear model

Cite this article: XIAO Li, YU Peichao. Robust Estimation of Average Treatment Effects with Observational Studies[J]. Wuhan Univ J of Nat Sci, 2024, 29(2): 117-124.

Biography: XIAO Li, male, Master, Associate professor, research direction: school sports, sports statistics. E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Fundation item: Supported by 2020 Guilin University of Aerospace Technology Teaching Group Construction Project (2020JXTD19), 2021 Guangxi Philosophy and Social Science Research Project (21FTY012), and 2022 Guangxi Higher Education Undergraduate Teaching Reform Project (2022JGA358)

© Wuhan University 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

Many empirical issues in social sciences depend on evaluating treatment or policy effects. The estimation of causal effects is a hot issue in statistics and econometrics. Half of the 2021 Nobel Prize in Economics was awarded to Angrist and Imbens to recognize their contributions to the causal inference methodology^[1,2]. Because the average treatment effect is commonly used to study causal effects in Ref. [3], we consider the average treatment effect to study causal effects in this article.

Randomized Controlled Trails (RCTs) have been considered as the "golden" method to evaluate the average treatment effect, where the treatment assignment is randomized. Hence, the average treatment effect can be obtained by comparing the difference in responses between treatment and control groups. However, it is often infeasible to conduct RCTs due to the ethics, morality, technology, or cost in practice. For example, it is unethical to randomize pregnant women to smoke for studying its effects on neonatal weight, and we should not randomly expose people to air pollution to evaluate the impact of PM2.5 on health.

With the rapid development of science and technology, observational studies are widely used to study treatment effects in which the subjects are observed and recorded in the natural state. Compared with RCTs, observational studies can provide richer information about markets, humans, and their behavioral characteristics. However, the treatment assignment in observational studies is far from random. Due to the lack of randomization, systematic differences in the distribution of baseline covariates between treated and controlled groups usually exist. If these differences cannot be appropriately adjusted, the estimator of treatment effects will be seriously biased. The propensity score (PS), proposed by Rosenbaum and Rubin^[4], has been widely used to adjust for previously mentioned differences, which is defined as the conditional probability of receiving a particular treatment given the baseline covariates. Subjects with identical PS will have the same distribution of the baseline covariates regardless of which groups they come from.

There are usually four propensity score-based methods to adjust for heterogeneity in observational studies, including propensity score matching, propensity score stratification, propensity score inverse probability weighting, and propensity score covariate adjustment^[5-9]. Propensity score covariate adjustment and partial least squares can estimate treatment effects in survival analysis^[10,11]. Unfortunately, the propensity score is usually unknown and needs to be evaluated in observational studies. In many practical applications, PS is estimated by the Logistic or Probit models. However, the misspecification of the propensity score model will bring serious bias to the estimator of treatment effects. Lee^[12] proposed a simple least squares estimator based on propensity score residuals to reduce the bias of the estimator of treatment effects, in which the propensity score was estimated by the Probit model. Inspired by Lee's method^[12], we will study treatment effects with propensity score residuals and propose a more robust approach based on the generalized single-index model to estimate the propensity score.

The rest of the paper is organized as follows. We introduce the statistical inference procedures in Section 1. In Section 2, we evaluate the finite-sample performance of the proposed method through simulation studies. Section 3 applies the proposed method to evaluate the policy of "Sunshine Running" in the university. Conclusions are shown in Section 4.

1 Estimating Procedures of Average Treatment Effect

We adopted the counterfactual framework to study the average treatment effect^[13,14]. In the counterfactual framework, two potential outcomes exist under the treatment and control, respectively. The foundation problem of causal inference is that only one of two potential outcomes can be observed for the same individual^[15]. The average treatment effect is defined as the expectation of the difference between two potential outcomes.

To simplify the expression, we adopt the following notation: the binary variable D denotes the treatment assignment ( $D = 1$ Mathematical equation for treatment, $D = 0$ for control) and let X represent a p-vector of the baseline covariates; $Y (1)$ , $Y (0)$ are potential outcomes under the active treatment $(D = 1)$ and control $(D = 0)$ , respectively. Hence, the observed outcome is $Y = D Y (1) + (1 - D) Y (0)$ Mathematical equation . The observed data ${(Y_{i}, D_{i}, X_{i}), i = 1, . . ., n}$ are assumed to be the independent copies of $(Y, D, X)$ . In this paper, we consider the following average treatment effect (ATE) to study the causal effect:

$τ = E [Y (1)] - E [Y (0)]$ Mathematical equation (1)

The conditional average treatment effect (CATE) is:

$τ (X) = E [Y (1) - Y (0) | X] = E [Y (1) | X] - E [Y (0) | X]$ Mathematical equation (2)

According to the properties of the conditional expectation, we can have the following:

$τ = E [τ (X)]$ Mathematical equation (3)

1.1 Identifiability of Average Treatment Effect

In order to identify the average treatment effect, we have to make the following assumptions.

Assumption 1:

Stable Unit Treatment Value Assumption: The subject's potential outcomes are unrelated to the treatment assignment of other subjects.

Assumption 2:

Ignorable: The treatment assignment and potential outcomes are independent given the baseline covariates, $(Y (1), Y (0)) ⊥ D | X$ Mathematical equation .

Assumption 3:

Overlap: $0 < c_{1} \leq P (D = 1 | X) \leq c_{2} < 1$ Mathematical equation , where $c_{1}$ , $c_{2}$ are constant.

Remarks: The above three assumptions are commonly used in the causal inference literature. Assumption 2 means that all the covariates that affect both the outcome and treatment assignment are measured; Assumption 3 ensures overlap exists in the distributions of covariates in treatment and control groups.

1.2 Models

There usually exist two types of models in the study of treatment effects: the outcome model, which links the response with the treatment assignment and baseline covariates, and the treatment assignment model, which defines the conditional probability of receiving a particular treatment given the baseline covariates. We generate the outcome through the following partial linear model:

$Y_{i} = α D_{i} + g (X_{i}) + ε_{i}, i = 1, \dots, n$ Mathematical equation (4)

where the parameter $α$ Mathematical equation denotes the average treatment effect, $g (X)$ denotes the unknown smoothed function, and $ε$ is the random error with $E [ε | D, X] = 0$ . We consider the following generalized single-index model for the propensity score:

$P (D = 1 | X) = \frac{e x p (h (γ_{0}^{T} X))}{1 + e x p (h (γ_{0}^{T} X))}$ Mathematical equation (5)

where $h (\cdot)$ Mathematical equation is an unknown smoothed function and $γ_{0}$ denotes the unknown parameter. Furthermore, in order to identify model (5), $γ_{0}$ needs to satisfy $‖ γ_{0} ‖ = 1$ with $γ_{0,1} \geq 0$ . The proposed model (5) includes the commonly used Logistic and Probit models as special cases.

1.3 Estimation Method of Average Treatment Effect

Inspired by Lee's method^[12], we use the simple least squares estimator based on propensity score residuals to estimate the parameter $α$ Mathematical equation . Let $π (X)$ and $\hat{π} (X)$ denote the true and estimated propensity score and parameters ${\hat{γ}}_{n}$ , ${\hat{α}}_{n}$ are estimators of $γ$ , $α$ , respectively. Firstly, we estimate the propensity score by the maximum likelihood estimate (MLE). Since the function $h (\cdot)$ Mathematical equation in model (5) is unknown, we use the sieve method to approximate the unknown function $h (\cdot)$ . Define $H_{k} (x) = (h_{0} (x), h_{1} (x),$ $\dots, h_{k - 1} {(x))}^{T}$ to denote the k-dimensional orthogonal basis $C_{k} = (c_{0}, c_{1}, . . ., c_{k - 1})^{T}$ denotes k-dimensional coefficients (A detailed description of the construction of orthogonal basis and the related properties can be found in Chen^[16], Dong et al^[17]). For model (5), we can use the following maximum likelihood function:

$L_{n} (γ, C_{k}) = \frac{1}{n} \sum_{i = 1}^{n} {D_{i} (H_{k} (γ^{T} X_{i})^{T} C_{k}) - l o g (1 + e x p (H_{k} (γ^{T} X_{i})^{T} C_{k}))}$ Mathematical equation (6)

Then, $({\hat{γ}}_{n}, {\hat{C}}_{k})$ Mathematical equation , the estimator of $(γ, C_{k})$ , can be obtained by:

$({\hat{γ}}_{n}, {\hat{C}}_{k}) = a r g \underset{γ, C_{k}}{m i n} [- L_{n} (γ, C_{k}) + λ ({‖ γ ‖}_{2}^{2} - 1)]$ Mathematical equation (7)

where the positive integer value $k$ Mathematical equation is a regulation parameter, $λ$ is a Lagrange multiplier, and ${‖ γ ‖}_{2}^{2}$ denotes the square of the Euclidean norm of $γ$ . Once obtained $({\hat{γ}}_{n}, {\hat{C}}_{k})$ , the propensity score can be estimated as:

$\hat{π} (X) = \frac{e x p (H_{k} ({\hat{γ}}_{n}^{T} {X)}^{T} {\hat{C}}_{k})}{1 + e x p (H_{k} ({\hat{γ}}_{n}^{T} {X)}^{T} {\hat{C}}_{k})}$ Mathematical equation (8)

According to the simple least squares estimator based on the propensity score residuals, we can obtain the estimator of the average treatment effect:

${\hat{α}}_{n} = \frac{\sum_{i = 1}^{n} (D_{i} - \hat{π} (X_{i})) (Y_{i} - \sum_{j = 0}^{q} ({\hat{γ}}_{n}^{T} X_{i})^{j})}{\sum_{i = 1}^{n} (D_{i} - \hat{π} (X_{i} {))}^{2}}$ Mathematical equation (9)

where q denotes the polynomial order, which is usually taken as 2 in practical applications. Furthermore, by similar arguments as Lee^[12], we establish the asymptotic properties of ${\hat{α}}_{n}$ Mathematical equation , presented in Theorem 1.

Theorem 1

$\sqrt[]{n} ({\hat{α}}_{n} - α)$ Mathematical equation converges in distribution to a zero-mean normal distribution with variance $E^{- 1} [(D - π {(X))}^{2}] U E^{- 1} [(D - π {(X))}^{2}]$ , where $U$ is defined by (38) in the proof of Theorem 1.

To simply notations, we set

$π (X) = \frac{e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}{1 + e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}$ Mathematical equation (10)

$Δ = D - π (X)$ Mathematical equation (11)

$V = Y - \sum_{j = 0}^{q} (γ^{T} {X)}^{j} - α Δ$ Mathematical equation (12)

and its estimations are:

$\hat{π} (X_{i}) = \frac{e x p (H_{k} ({\hat{γ}}_{n}^{T} X_{i})^{T} {\hat{C}}_{k})}{1 + e x p (H_{k} ({\hat{γ}}_{n}^{T} X_{i})^{T} {\hat{C}}_{k})}$ Mathematical equation (13)

${\hat{Δ}}_{i} = D_{i} - \hat{π} (X_{i})$ Mathematical equation (14)

${\hat{V}}_{i} = Y_{i} - \sum_{j = 0}^{q} ({\hat{γ}}_{n}^{T} X_{i})^{j} - {\hat{α}}_{n} {\hat{Δ}}_{i}$ Mathematical equation (15)

In order to establish the asymptotic properties of the proposed estimators ${\hat{α}}_{n}$ Mathematical equation , we need the following lemmas.

Lemma 1

Accounting for first-stage errors.

The moment conditions:

$\begin{array}{l} E [{Y - \sum_{j = 0}^{q} (γ^{T} {X)}^{j} - α (D - \frac{e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}{1 + e x p (H_{k} (γ^{T} {X)}^{T} C_{k})})} \\ \cdot {D - \frac{e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}{1 + e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}}] = 0 \end{array}$ Mathematical equation (16)

for ${\hat{α}}_{n}$ Mathematical equation , with $γ$ and $C_{k}$ replaced by $g$ and $c$ , is

$E [{Y - \sum_{j = 0}^{q} (g^{T} {X)}^{j} - α (D - \frac{e x p (H_{k} (g^{T} {X)}^{T} c)}{1 + e x p (H_{k} (g^{T} {X)}^{T} c)})}$ Mathematical equation

$\cdot {D - \frac{e x p (H_{k} (g^{T} {X)}^{T} c)}{1 + e x p (H_{k} (g^{T} {X)}^{T} c)}}] = 0$ Mathematical equation (17)

The effect of ${\hat{γ}}_{n} - γ$ Mathematical equation and ${\hat{C}}_{k} - C_{k}$ on the asymptotic distribution of ${\hat{α}}_{n} - α$ would be zero if the derivatives of the left-hand side were zero at $g = γ$ and $c = C_{k}$ . The derivatives with respect to $g$ at $g = γ$ and $c = C_{k}$ are

$0 - E [V \frac{e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}{(1 + e x p (H_{k} (γ^{T} {X)}^{T} C_{k} {))}^{2}} {\dot{H}}_{k} (γ^{T} {X)}^{T} C_{k} X^{T}]$ Mathematical equation (18)

where ${\dot{H}}_{k} (\cdot)$ Mathematical equation denotes the derivative of $H_{k} (\cdot)$ . The first term is zero because $E [(D - π (X)) | X] = 0$ , we denote:

$q_{α γ} = - E [V \frac{e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}{(1 + e x p (H_{k} (γ^{T} {X)}^{T} C_{k} {))}^{2}} {\dot{H}}_{k} (γ^{T} {X)}^{T} C_{k} X^{T}]$ Mathematical equation (19)

and $q_{α γ}$ Mathematical equation can be estimated consistently by:

${\hat{q}}_{α γ} = - \frac{1}{n} \sum_{i = 1}^{n} {\hat{V}}_{i} \frac{e x p (H_{k} ({\hat{γ}}_{n}^{T} X_{i})^{T} {\hat{C}}_{k})}{(1 + e x p (H_{k} ({\hat{γ}}_{n}^{T} X_{i})^{T} {\hat{C}}_{k} {))}^{2}} {\dot{H}}_{k} ({\hat{γ}}_{n}^{T} X_{i})^{T} {\hat{C}}_{k} X_{i}^{T}$ Mathematical equation (20)

The derivatives with respect to $c$ Mathematical equation at $g = γ$ and $c = C_{k}$ are

$0 - E [V \frac{e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}{(1 + e x p (H_{k} (γ^{T} {X)}^{T} C_{k} {))}^{2}} H_{k} (γ^{T} {X)}^{T}]$ Mathematical equation (21)

The first term is zero because $E [(D - π (X)) | X] = 0$ Mathematical equation , we denote

$q_{α C_{k}} = - E [V \frac{e x p (H_{k} (γ^{T} {X)}^{T} C_{k})}{(1 + e x p (H_{k} (γ^{T} {X)}^{T} C_{k} {))}^{2}} H_{k} (γ^{T} {X)}^{T}]$ Mathematical equation (22)

and $q_{α C_{k}}$ Mathematical equation can be estimated consistently by:

${\hat{q}}_{α C_{k}} = - \frac{1}{n} \sum_{i = 1}^{n} {\hat{V}}_{i} \frac{e x p (H_{k} ({\hat{γ}}_{n}^{T} X_{i})^{T} {\hat{C}}_{k})}{(1 + e x p (H_{k} ({\hat{γ}}_{n}^{T} X_{i})^{T} {\hat{C}}_{k} {))}^{2}} H_{k} ({\hat{γ}}_{n}^{T} X_{i})^{T}$ Mathematical equation (23)

Lemma 2

The asymptotic properties of ${\hat{γ}}_{n}$ Mathematical equation and ${\hat{C}}_{k}$ .

We denote the log-likelihood function for model (5) is

$F_{n} (γ, C_{k}) = \sum_{i = 1}^{n} {D_{i} (H_{k} (γ^{T} X_{i})^{T} C_{k}) - l o g (1 + e x p (H_{k} (γ^{T} X_{i})^{T} C_{k}))}$ Mathematical equation (24)

then we have the score function with respect to $γ$ Mathematical equation

$\begin{array}{l} s_{γ i} = \frac{\partial F (γ, C_{k})}{\partial γ} \\ = D_{i} {\dot{H}}_{k} {(γ^{T} X_{i})}^{T} C_{k} X_{i}^{T} - π (X_{i}) {\dot{H}}_{k} {(γ^{T} X_{i})}^{T} C_{k} X_{i}^{T} \end{array}$ Mathematical equation (25)

By the definition of the influence function, we have the influence function with respect to $γ$ Mathematical equation ,

$η_{γ i} = E^{- 1} [s_{γ} s_{γ}^{T}] s_{γ i}$ Mathematical equation (26)

then

$\sqrt[]{n} ({\hat{γ}}_{n} - γ) = \frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} η_{γ i} + o_{p} (1)$ Mathematical equation (27)

Similarly, we also have the score function with respect to $C_{k}$ Mathematical equation ,

$s_{C_{k} i} = \frac{\partial F (γ, C_{k})}{\partial C_{k}} = D_{i} H_{k} (γ^{T} X_{i})^{T} - π (X_{i}) H_{k} (γ^{T} X_{i})^{T}$ Mathematical equation (28)

By the definition of the influence function, we have the influence function with respect to $C_{k}$ Mathematical equation ,

$η_{C_{k} i} = E^{- 1} [s_{C_{k}} s_{C_{k}}^{T}] s_{C_{k} i}$ Mathematical equation (29)

then

$\sqrt[]{n} ({\hat{C}}_{k} - C_{k}) = \frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} η_{C_{k} i} + o_{p} (1)$ Mathematical equation (30)

Proof of Theorem 1

Following the first-order condition

$\frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} {D_{i} - \hat{π} (X_{i})} {Y_{i} - \sum_{j = 0}^{q} ({\hat{γ}}_{n}^{T} X_{i})^{j} - {\hat{α}}_{n} (D_{i} - \hat{π} (X_{i})} = 0$ Mathematical equation (31)

then apply the mean value theorem to ${\hat{α}}_{n}$ Mathematical equation around $α$ to get

$\frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} [{D_{i} - \hat{π} (X_{i})} {Y_{i} - \sum_{j = 0}^{q} ({\hat{γ}}_{n}^{T} X_{i})^{j} - α (D_{i} - \hat{π} (X_{i})} - {D_{i} - \hat{π} (X_{i} {)}}^{2} {{\hat{α}}_{n} - α}] = 0$ Mathematical equation (32)

So, the term $\sqrt[]{n} ({\hat{α}}_{n} - α)$ Mathematical equation can be written as

$\begin{array}{l} \sqrt[]{n} ({\hat{α}}_{n} - α) \\ = \frac{\frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} (D_{i} - \hat{π} (X_{i})) (Y_{i} - \sum_{j = 0}^{q} ({\hat{γ}}_{n}^{T} X_{i})^{j} - α (D_{i} - \hat{π} (X_{i})))}{\frac{1}{n} \sum_{i = 1}^{n} (D_{i} - \hat{π} (X_{i} {))}^{2}} \end{array}$ Mathematical equation (33)

Due to ${\hat{γ}}_{n}$ Mathematical equation and ${\hat{C}}_{k}$ , we should go one step further applying the mean value theorem to ${\hat{γ}}_{n}$ and ${\hat{C}}_{k}$ as follows. Expand the numerator of (33) to $({\hat{γ}}_{n}, {\hat{C}}_{k})$ around $(γ, C_{k})$ to get

$\begin{array}{l} \frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} {\hat{Δ}}_{i} {\hat{V}}_{i} = \frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} Δ_{i} V_{i} + {\hat{q}}_{α γ} \sqrt[]{n} ({\hat{γ}}_{n} - γ) \\ + {\hat{q}}_{α C_{k}} \sqrt[]{n} ({\hat{C}}_{k} - C_{k}) \end{array}$ Mathematical equation (34)

Invoking the uniform LLN (Law of Large Numbers), the denominator of (33), ${\hat{q}}_{α γ}$ Mathematical equation and ${\hat{q}}_{α C_{k}}$ can be replaced with $E [(D - π {(X))}^{2}]$ , $q_{α γ}$ and $q_{α C_{k}}$ , respectively. $q_{α γ}$ , ${\hat{q}}_{α γ}$ , $q_{α C_{k}}$ and ${\hat{q}}_{α C_{k}}$ are defined by (19), (20), (22) and (23) in Lemma 1.

Then we have,

$\begin{array}{l} \sqrt[]{n} ({\hat{α}}_{n} - α) \\ = \frac{\frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} Δ_{i} V_{i} + q_{α γ} \sqrt[]{n} ({\hat{γ}}_{n} - γ) + q_{α C_{k}} \sqrt[]{n} ({\hat{C}}_{k} - C_{k})}{E [(D - π {(X))}^{2}]} + o_{p} (1) \end{array}$ Mathematical equation (35)

By (27) and (30) in Lemma 2, we have

$\sqrt[]{n} ({\hat{α}}_{n} - α) = \frac{\frac{1}{\sqrt[]{n}} \sum_{i = 1}^{n} {Δ_{i} V_{i} + q_{α γ} η_{γ i} + q_{α C_{k}} η_{C_{k} i}}}{E [(D - π {(X))}^{2}]} + o_{p} (1)$ Mathematical equation (36)

Hence,

$\begin{array}{l} \sqrt[]{n} ({\hat{α}}_{n} - α) \overset{d}{\to} N (0, E^{- 1} [(D - π {(X))}^{2}] U E^{- 1} \\ \cdot [(D - π {(X))}^{2}]) \end{array}$ Mathematical equation (37)

where

$U = E [{Δ V + q_{α γ} η_{γ} + q_{α C_{k}} η_{C_{k}}} {Δ V + q_{α γ} η_{γ} + q_{α C_{k}} η_{C_{k}}}^{T}]$ Mathematical equation (38)

The proof is completed.

2 Simulation Studies

In this section, extensive simulation studies are conducted to evaluate the finite-sample performance of the proposed method. We consider different scenarios to simulate the observational studies in reality.

2.1 Simulation 1: Idealized Scenario

The outcome is generated by the following linear regression model:

$Y = α D + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + ε$ Mathematical equation (39)

where, $D$ Mathematical equation denotes the treatment assignment, the covariates $X_{1}$ , $X_{2}$ and $X_{3}$ are independent of each other, $X_{1}$ and $X_{2}$ follow the standard normal distribution, $X_{3}$ follows Bernoulli distribution with successful probability of 0.4, the error term $ε$ follows the standard normal distribution, $(β_{1}, β_{2}, β_{3}) = (0.5,0, 1)$ Mathematical equation and $α$ is set to 0 or 0.5. The propensity score is generated by the Logistic model:

$P (D = 1 | X_{1}, X_{2}, X_{3}) = \frac{e x p (γ_{1} X_{1} + γ_{2} X_{2} + γ_{3} X_{3})}{1 + e x p (γ_{1} X_{1} + γ_{2} X_{2} + γ_{3} X_{3})}$ Mathematical equation (40)

where $(γ_{1}, γ_{2}, γ_{3}) = (0.8,0, - 0.6)$ Mathematical equation , then the proportion of subjects in the treatment group is about 45%. We compare the proposed estimator ${\hat{α}}_{P}$ with two estimators: ${\hat{α}}_{R}$ , the estimator based on the covariate adjustment with propensity score^[18,19]; ${\hat{α}}_{L}$ , and the estimator based on propensity score residuals^[12].

We generate 500 simulated data sets with the total sample size $n$ Mathematical equation being 150 or 300. The sample mean, sample standard deviation and sample root mean square error of 500 estimators are given in the columns "Mean", "SD" and "RMSE", respectively. Furthermore, the column "ESD" shows a mean of the estimated standard deviation, and "CP" gives the nominal 95% confidence interval coverage rate using the estimated standard deviation. The bootstrap method is adopted to obtain the estimated variance, and the number of bootstraps is 100. The simulation results are summarized in Table 1.

From Table 1, we obtain that the three estimators ${\hat{α}}_{R}$ Mathematical equation , ${\hat{α}}_{L}$ and ${\hat{α}}_{P}$ are all approximately unbiased. The corresponding estimated standard error agrees well with the sampling standard error, and the coverage probability of a 95% confidence interval is around the nominal level of the three estimators.

Table 1

Simulation result for the idealized scenario

2.2 Simulation 2: Single-Index Model Scenario

The outcome is generated by the following partial single-index model:

$Y = α D + e x p (β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3}) + ε$ Mathematical equation (41)

The treatment assignment model is:

$\begin{array}{l} P (D = 1 | X_{1}, X_{2}, X_{3}) \\ = \frac{e x p ((γ_{1} X_{1} + γ_{2} X_{2} + γ_{3} X_{3})^{2} + 4 (γ_{1} X_{1} + γ_{2} X_{2} + γ_{3} X_{3}))}{1 + e x p ((γ_{1} X_{1} + γ_{2} X_{2} + γ_{3} X_{3})^{2} + 4 (γ_{1} X_{1} + γ_{2} X_{2} + γ_{3} X_{3}))} \end{array}$ Mathematical equation (42)

The parameter settings are the same as in Simulation 1. The simulation results are summarized in Table 2.

From Table 2, the three estimators ${\hat{α}}_{R}$ Mathematical equation , ${\hat{α}}_{L}$ and ${\hat{α}}_{P}$ are all approximately unbiased. When the sample size is 150, we can know that the proposed estimator ${\hat{α}}_{P}$ has the smallest sample standard deviation and sample root mean square error. The coverage proportions of the proposed estimator ${\hat{α}}_{P}$ Mathematical equation is apparently closer to the nominal 95% than that of the estimators ${\hat{α}}_{R}$ and ${\hat{α}}_{L}$ . When the sample size is increased to 300, all indicators show that the proposed estimator ${\hat{α}}_{P}$ is the most effective.

Table 2

Simulation result for the single-index model scenario

2.3 Simulation 3: More Complex Scenario

The outcome is generated by the following model:

$Y = α D + (β_{1} X_{1} + β_{1} X_{2} + β_{1} X_{3})^{2} + ε$ Mathematical equation (43)

The treatment assignment model is:

$P (D = 1 | X_{1}, X_{2}, X_{3}) = \frac{e x p (γ_{1} X_{1} + γ_{2} X_{2} + γ_{3} X_{3} + X_{1}^{2})}{1 + e x p (γ_{1} X_{1} + γ_{2} X_{2} + γ_{3} X_{3} + X_{1}^{2})}$ Mathematical equation (44)

The parameter settings are the same as in Simulation 1. The simulation results are summarized in Table 3.

From Table 3, when the treatment assignment model is no longer the single-index model, the three estimators ${\hat{α}}_{R}$ Mathematical equation , ${\hat{α}}_{L}$ and ${\hat{α}}_{P}$ are all biased. However, the biases of the estimators ${\hat{α}}_{R}$ and ${\hat{α}}_{L}$ are relatively large, while the bias of the proposed estimator ${\hat{α}}_{P}$ is relatively small. The average of the estimated standard deviation of ${\hat{α}}_{P}$ Mathematical equation is close to the sample standard deviation. The coverage proportions of the proposed estimator ${\hat{α}}_{P}$ is much more relative to the nominal level than the estimators ${\hat{α}}_{R}$ and ${\hat{α}}_{L}$ . Therefore, from the simulation results of three different scenarios, the performance of our proposed estimator ${\hat{α}}_{P}$ Mathematical equation is optimal.

Table 3

Simulation result for a more complex scenario

3 Real Data Analysis

Nowadays, due to unhealthy work and rest habits, many college students have the problem of poor physical quality. In order to continuously improve the overall level of physical health of college students, relevant government documents put forward mandatory requirements for the physical health level of college students; many universities also organize "Sunshine Running" to improve the physical health of students^[20,21]. In this section, we apply the proposed method to evaluate the effectiveness of the "Sunshine Running" policy offered by the university to improve college students' physical test scores. The college where we collected the data implemented the "Sunshine Running" policy among the freshmen admitted in 2018 but did not implement the policy in other grades. We collected physical test data from the class of grad 2017 at this university in 2018 as the control group and from the class of grad 2018 in 2019 as the treatment group. In order to make the collected data independent of each other, we randomly selected one male and one female in each major of the university to record their physical test scores, gender, age, and Body Mass Index (BMI, weight divided by height squared). We collected 82 students in the treatment group and 69 students in the control group, for a total of 151 students.

To simplify the expression, we use the following notation: Y denotes the physical test score, D represents the treatment assignment, Sex denotes gender (male=1, female=2), Age denotes age, and BMI denotes Body Mass Index. We consider the following outcome model:

$Y = α D + g (S e x, A g e, B M I) + ε$ Mathematical equation (45)

where the error term $ε$ Mathematical equation follows the standard normal distribution, $g (\cdot, \cdot, \cdot)$ is an unknown smoothed function.

We consider the following generalized single-index model to generate the treatment assignment:

$\begin{array}{l} P (D = 1 | S e x, A g e, B M I) \\ = \frac{e x p (h (γ_{1} S e x + γ_{2} A g e + γ_{3} B M I))}{1 + e x p (h (γ_{1} S e x + γ_{2} A g e + γ_{3} B M I))} \end{array}$ Mathematical equation (46)

where $h (\cdot)$ Mathematical equation is an unknown smoothed function, $γ_{1}, γ_{2}, γ_{3}$ are unknown parameters.

Likewise, we considered three estimators ${\hat{α}}_{R}$ Mathematical equation , ${\hat{α}}_{L}$ and ${\hat{α}}_{P}$ . We considered them in the simulation studies to study the effect of the "Sunshine Running" policy. We use the bootstrap to derive the variance of the three estimators, and the number of bootstraps is 100. The estimators, the standard deviations, and p-value of ${\hat{α}}_{R}$ Mathematical equation , ${\hat{α}}_{L}$ and ${\hat{α}}_{P}$ are summarized in Table 4.

From Table 4, we can conclude that all the estimators confirm that the "Sunshine Running" policy has a significant impact on improving college students' physical test scores. From the standard deviations of the three estimators, the proposed estimator ${\hat{α}}_{P}$ Mathematical equation is the most effective, and the estimator ${\hat{α}}_{R}$ is the worst.

Table 4

Results for the "Sunshine Running" policy

4 Conclusion

Estimating treatment effects or evaluating policy effects is an essential issue in empirical science, which can provide us with the basis for quantitative decision-making. With the development of science and technology, a large number of high-quality observational data have been collected, and how to use high-quality and information-rich observational data to estimate treatment effects is one of the hot issues in current research. The propensity score has been widely used to adjust for heterogeneity in observational studies. However, the misspecification of the propensity score will cause serious bias. In this paper, we proposed the generalized single-index model to estimate the propensity score and used the propensity score residuals to reduce the estimation bias. From the results of simulation studies and real data analysis, our proposed method is more effective than other competitive methods. Next, we will consider more robust methods to estimate the treatment effect, such as deep learning, in the future work.

References

Imbens G W, Angrist J D. Identification and estimation of local average treatment effects [J]. Econometrica, 1994, 62(2): 467-475. [CrossRef] [MathSciNet] [Google Scholar]
Angrist J D, Imbens G W, Rubin D B. Identification of causal effects using instrumental variables [J]. Journal of the American Statistical Association, 1996, 91(434): 444-455. [CrossRef] [Google Scholar]
Imbens G W. Nonparametric estimation of average treatment effects under exogeneity: A review [J]. Review of Economics and Statistics, 2004, 86(1): 4-29. [CrossRef] [Google Scholar]
Rosenbaum P R, Rubin D B. The central role of the propensity score in observational studies for causal effects [J]. Biometrika, 1983, 70(1): 41-55. [CrossRef] [Google Scholar]
Austin P C. An introduction to propensity score methods for reducing the effects of confounding in observational studies [J]. Multivariate Behavioral Research, 2011, 46(3): 399-424. [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
Austin P C, Schuster T. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: A simulation study [J]. Statistical Methods in Medical Research, 2016, 25(5): 2214-2237. [Google Scholar]
Herman M, Robins J M. Causal Inference: What if [M]. Boca Raton: Chapman & Hall/CRC, 2020. [Google Scholar]
Imbens G W, Rubin D B. Causal Inference in Statistics, Social, and Biomedical Sciences [M]. Cambridge: Cambridge University Press, 2015. [Google Scholar]
Lunceford J K, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study [J]. Statistics in Medicine, 2004, 23(19): 2937-2960. [Google Scholar]
Cao Y X, Zhang C X, Yu C J. Estimating survival treatment effects with covariate adjustment using propensity score[J]. Acta Mathematica Sinica, English Series, 2022, 38(11): 2057-2068. [CrossRef] [MathSciNet] [Google Scholar]
Cao Y X,Yu J C. Partial least squares method for treatment effect in observational studies with censored outcomes[J].Wuhan University Journal of Natural Sciences, 2018, 23(6): 487-492. [CrossRef] [MathSciNet] [Google Scholar]
Lee M J. Simple least squares estimator for treatment effects using propensity score residuals [J]. Biometrika, 2018, 105(1): 149-164. [CrossRef] [MathSciNet] [Google Scholar]
Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies [J]. Journal of Educational Psychology, 1974, 66(5): 688. [CrossRef] [Google Scholar]
Rubin D B. Inference and missing data [J]. Biometrika, 1976, 63(3): 581-592. [CrossRef] [MathSciNet] [Google Scholar]
Holland P W. Statistics and causal inference [J]. Journal of the American Statistical Association, 1986, 81(396): 945-960. [CrossRef] [MathSciNet] [Google Scholar]
Chen X. Large sample sieve estimation of semi-nonparametric models [J]. Handbook of Econometrics, 2007, 6: 5549-5632. [CrossRef] [Google Scholar]
Dong C H, Gao J T, Peng B. Series estimation for single‐index models under constraints [J]. Australian & New Zealand Journal of Statistics, 2019, 61(3): 299-335. [CrossRef] [MathSciNet] [Google Scholar]
Vansteelandt S, Daniel R M. On regression adjustment for the propensity score [J]. Statistics in Medicine, 2014, 33(23): 4053-4072. [Google Scholar]
Zou B M, Zou F, Shuster J J, et al. On variance estimate for covariate adjustment by propensity score analysis [J]. Statistics in Medicine, 2016, 35(20): 3537-3548. [CrossRef] [MathSciNet] [PubMed] [Google Scholar]
Ministry of Education. Opinions on deepening the teaching reform of undergraduate education and comprehensively improving the quality of talent cultivation [EB/OL]. [2019-10-08]. http://www.moe.gov.cn/srcsite/A08/s7056/201910/t20191011_402759.html. [Google Scholar]
The General Office of the CPC Central Committee, the State Council. Opinions on comprehensively strengthening and improving school sports work in the new era [EB/OL]. [2020-10-15]. http://www.moe.gov.cn/jyb_xxgk/moe_1777/moe_1778/202010/t20201015_494794.html. [Google Scholar]

All Tables

Table 1

Simulation result for the idealized scenario

In the text

Table 2

Simulation result for the single-index model scenario

In the text

Table 3

Simulation result for a more complex scenario

In the text

Table 4

Results for the "Sunshine Running" policy

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[R1] Imbens G W, Angrist J D. Identification and estimation of local average treatment effects [J]. Econometrica, 1994, 62(2): 467-475. [CrossRef] [MathSciNet] [Google Scholar]

[R2] Angrist J D, Imbens G W, Rubin D B. Identification of causal effects using instrumental variables [J]. Journal of the American Statistical Association, 1996, 91(434): 444-455. [CrossRef] [Google Scholar]

[R3] Imbens G W. Nonparametric estimation of average treatment effects under exogeneity: A review [J]. Review of Economics and Statistics, 2004, 86(1): 4-29. [CrossRef] [Google Scholar]

[R4] Rosenbaum P R, Rubin D B. The central role of the propensity score in observational studies for causal effects [J]. Biometrika, 1983, 70(1): 41-55. [CrossRef] [Google Scholar]

[R5] Austin P C. An introduction to propensity score methods for reducing the effects of confounding in observational studies [J]. Multivariate Behavioral Research, 2011, 46(3): 399-424. [NASA ADS] [CrossRef] [PubMed] [Google Scholar]

[R6] Austin P C, Schuster T. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: A simulation study [J]. Statistical Methods in Medical Research, 2016, 25(5): 2214-2237. [Google Scholar]

[R7] Herman M, Robins J M. Causal Inference: What if [M]. Boca Raton: Chapman & Hall/CRC, 2020. [Google Scholar]

[R8] Imbens G W, Rubin D B. Causal Inference in Statistics, Social, and Biomedical Sciences [M]. Cambridge: Cambridge University Press, 2015. [Google Scholar]

[R9] Lunceford J K, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study [J]. Statistics in Medicine, 2004, 23(19): 2937-2960. [Google Scholar]

[R10] Cao Y X, Zhang C X, Yu C J. Estimating survival treatment effects with covariate adjustment using propensity score[J]. Acta Mathematica Sinica, English Series, 2022, 38(11): 2057-2068. [CrossRef] [MathSciNet] [Google Scholar]

[R11] Cao Y X,Yu J C. Partial least squares method for treatment effect in observational studies with censored outcomes[J].Wuhan University Journal of Natural Sciences, 2018, 23(6): 487-492. [CrossRef] [MathSciNet] [Google Scholar]

[R12] Lee M J. Simple least squares estimator for treatment effects using propensity score residuals [J]. Biometrika, 2018, 105(1): 149-164. [CrossRef] [MathSciNet] [Google Scholar]

[R13] Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies [J]. Journal of Educational Psychology, 1974, 66(5): 688. [CrossRef] [Google Scholar]

[R14] Rubin D B. Inference and missing data [J]. Biometrika, 1976, 63(3): 581-592. [CrossRef] [MathSciNet] [Google Scholar]

[R15] Holland P W. Statistics and causal inference [J]. Journal of the American Statistical Association, 1986, 81(396): 945-960. [CrossRef] [MathSciNet] [Google Scholar]

[R16] Chen X. Large sample sieve estimation of semi-nonparametric models [J]. Handbook of Econometrics, 2007, 6: 5549-5632. [CrossRef] [Google Scholar]

[R17] Dong C H, Gao J T, Peng B. Series estimation for single‐index models under constraints [J]. Australian & New Zealand Journal of Statistics, 2019, 61(3): 299-335. [CrossRef] [MathSciNet] [Google Scholar]

[R18] Vansteelandt S, Daniel R M. On regression adjustment for the propensity score [J]. Statistics in Medicine, 2014, 33(23): 4053-4072. [Google Scholar]

[R19] Zou B M, Zou F, Shuster J J, et al. On variance estimate for covariate adjustment by propensity score analysis [J]. Statistics in Medicine, 2016, 35(20): 3537-3548. [CrossRef] [MathSciNet] [PubMed] [Google Scholar]

[R20] Ministry of Education. Opinions on deepening the teaching reform of undergraduate education and comprehensively improving the quality of talent cultivation [EB/OL]. [2019-10-08]. http://www.moe.gov.cn/srcsite/A08/s7056/201910/t20191011_402759.html. [Google Scholar]

[R21] The General Office of the CPC Central Committee, the State Council. Opinions on comprehensively strengthening and improving school sports work in the new era [EB/OL]. [2020-10-15]. http://www.moe.gov.cn/jyb_xxgk/moe_1777/moe_1778/202010/t20201015_494794.html. [Google Scholar]