Issue 
Wuhan Univ. J. Nat. Sci.
Volume 29, Number 2, April 2024



Page(s)  117  124  
DOI  https://doi.org/10.1051/wujns/2024292117  
Published online  14 May 2024 
Mathematics
CLC number: O212.1
Robust Estimation of Average Treatment Effects with Observational Studies
^{1}
Department of Physical Education, Guilin University of Aerospace Technology, Guilin 541004, Guangxi, China
^{2}
School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, Hubei, China
^{†} Corresponding author. Email: yupeichaode123@163.com
Received:
25
August
2023
Estimating treatment effects has always been one of the hot issues in empirical research. It brings great challenges to estimating treatment effects because heterogeneity exists in the distribution of covariates between treated and controlled groups. Propensity score methods have been widely used to adjust for heterogeneity in observational studies. However, the propensity score is usually unknown and needs to be estimated. In this article, we propose a generalized singleindex model to estimate the propensity score and use the propensity score residuals to reduce the estimation bias. The finitesample performance of the proposed method is evaluated through simulation studies. We use the proposed method to evaluate the policy of "Sunshine Running" and find that the physical test scores of college students participating in the "Sunshine Running" can be improved by 3.72 points.
Key words: treatment effect / propensity score / generalized singleindex model / partial linear model
Cite this article: XIAO Li, YU Peichao. Robust Estimation of Average Treatment Effects with Observational Studies[J]. Wuhan Univ J of Nat Sci, 2024, 29(2): 117124.
Biography: XIAO Li, male, Master, Associate professor, research direction: school sports, sports statistics. Email: 16543871@qq.com
Fundation item: Supported by 2020 Guilin University of Aerospace Technology Teaching Group Construction Project (2020JXTD19), 2021 Guangxi Philosophy and Social Science Research Project (21FTY012), and 2022 Guangxi Higher Education Undergraduate Teaching Reform Project (2022JGA358)
© Wuhan University 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
0 Introduction
Many empirical issues in social sciences depend on evaluating treatment or policy effects. The estimation of causal effects is a hot issue in statistics and econometrics. Half of the 2021 Nobel Prize in Economics was awarded to Angrist and Imbens to recognize their contributions to the causal inference methodology^{[1,2]}. Because the average treatment effect is commonly used to study causal effects in Ref. [3], we consider the average treatment effect to study causal effects in this article.
Randomized Controlled Trails (RCTs) have been considered as the "golden" method to evaluate the average treatment effect, where the treatment assignment is randomized. Hence, the average treatment effect can be obtained by comparing the difference in responses between treatment and control groups. However, it is often infeasible to conduct RCTs due to the ethics, morality, technology, or cost in practice. For example, it is unethical to randomize pregnant women to smoke for studying its effects on neonatal weight, and we should not randomly expose people to air pollution to evaluate the impact of PM2.5 on health.
With the rapid development of science and technology, observational studies are widely used to study treatment effects in which the subjects are observed and recorded in the natural state. Compared with RCTs, observational studies can provide richer information about markets, humans, and their behavioral characteristics. However, the treatment assignment in observational studies is far from random. Due to the lack of randomization, systematic differences in the distribution of baseline covariates between treated and controlled groups usually exist. If these differences cannot be appropriately adjusted, the estimator of treatment effects will be seriously biased. The propensity score (PS), proposed by Rosenbaum and Rubin^{[4]}, has been widely used to adjust for previously mentioned differences, which is defined as the conditional probability of receiving a particular treatment given the baseline covariates. Subjects with identical PS will have the same distribution of the baseline covariates regardless of which groups they come from.
There are usually four propensity scorebased methods to adjust for heterogeneity in observational studies, including propensity score matching, propensity score stratification, propensity score inverse probability weighting, and propensity score covariate adjustment^{[59]}. Propensity score covariate adjustment and partial least squares can estimate treatment effects in survival analysis^{[10,11]}. Unfortunately, the propensity score is usually unknown and needs to be evaluated in observational studies. In many practical applications, PS is estimated by the Logistic or Probit models. However, the misspecification of the propensity score model will bring serious bias to the estimator of treatment effects. Lee^{[12]} proposed a simple least squares estimator based on propensity score residuals to reduce the bias of the estimator of treatment effects, in which the propensity score was estimated by the Probit model. Inspired by Lee's method^{[12]}, we will study treatment effects with propensity score residuals and propose a more robust approach based on the generalized singleindex model to estimate the propensity score.
The rest of the paper is organized as follows. We introduce the statistical inference procedures in Section 1. In Section 2, we evaluate the finitesample performance of the proposed method through simulation studies. Section 3 applies the proposed method to evaluate the policy of "Sunshine Running" in the university. Conclusions are shown in Section 4.
1 Estimating Procedures of Average Treatment Effect
We adopted the counterfactual framework to study the average treatment effect^{[13,14]}. In the counterfactual framework, two potential outcomes exist under the treatment and control, respectively. The foundation problem of causal inference is that only one of two potential outcomes can be observed for the same individual^{[15]}. The average treatment effect is defined as the expectation of the difference between two potential outcomes.
To simplify the expression, we adopt the following notation: the binary variable D denotes the treatment assignment ( for treatment, for control) and let X represent a pvector of the baseline covariates;, are potential outcomes under the active treatment and control , respectively. Hence, the observed outcome is . The observed data are assumed to be the independent copies of . In this paper, we consider the following average treatment effect (ATE) to study the causal effect:
The conditional average treatment effect (CATE) is:
According to the properties of the conditional expectation, we can have the following:
1.1 Identifiability of Average Treatment Effect
In order to identify the average treatment effect, we have to make the following assumptions.
Stable Unit Treatment Value Assumption: The subject's potential outcomes are unrelated to the treatment assignment of other subjects.
Ignorable: The treatment assignment and potential outcomes are independent given the baseline covariates, .
Remarks: The above three assumptions are commonly used in the causal inference literature. Assumption 2 means that all the covariates that affect both the outcome and treatment assignment are measured; Assumption 3 ensures overlap exists in the distributions of covariates in treatment and control groups.
1.2 Models
There usually exist two types of models in the study of treatment effects: the outcome model, which links the response with the treatment assignment and baseline covariates, and the treatment assignment model, which defines the conditional probability of receiving a particular treatment given the baseline covariates. We generate the outcome through the following partial linear model:
where the parameter denotes the average treatment effect, denotes the unknown smoothed function, and is the random error with . We consider the following generalized singleindex model for the propensity score:
where is an unknown smoothed function and denotes the unknown parameter. Furthermore, in order to identify model (5), needs to satisfy with . The proposed model (5) includes the commonly used Logistic and Probit models as special cases.
1.3 Estimation Method of Average Treatment Effect
Inspired by Lee's method^{[12]}, we use the simple least squares estimator based on propensity score residuals to estimate the parameter . Let and denote the true and estimated propensity score and parameters , are estimators of , , respectively. Firstly, we estimate the propensity score by the maximum likelihood estimate (MLE). Since the function in model (5) is unknown, we use the sieve method to approximate the unknown function . Define to denote the kdimensional orthogonal basis denotes kdimensional coefficients (A detailed description of the construction of orthogonal basis and the related properties can be found in Chen^{[16]}, Dong et al^{[17]}). For model (5), we can use the following maximum likelihood function:
Then, , the estimator of , can be obtained by:
where the positive integer value is a regulation parameter, is a Lagrange multiplier, and denotes the square of the Euclidean norm of . Once obtained , the propensity score can be estimated as:
According to the simple least squares estimator based on the propensity score residuals, we can obtain the estimator of the average treatment effect:
where q denotes the polynomial order, which is usually taken as 2 in practical applications. Furthermore, by similar arguments as Lee^{[12]}, we establish the asymptotic properties of , presented in Theorem 1.
converges in distribution to a zeromean normal distribution with variance , where is defined by (38) in the proof of Theorem 1.
To simply notations, we set
and its estimations are:
In order to establish the asymptotic properties of the proposed estimators , we need the following lemmas.
The moment conditions:
for , with and replaced by and , is
The effect of and on the asymptotic distribution of would be zero if the derivatives of the lefthand side were zero at and . The derivatives with respect to at and are
where denotes the derivative of . The first term is zero because , we denote:
and can be estimated consistently by:
The derivatives with respect to at and are
The first term is zero because , we denote
and can be estimated consistently by:
We denote the loglikelihood function for model (5) is
then we have the score function with respect to
By the definition of the influence function, we have the influence function with respect to ,
then
Similarly, we also have the score function with respect to ,
By the definition of the influence function, we have the influence function with respect to ,
then
then apply the mean value theorem to around to get
So, the term can be written as
Due to and , we should go one step further applying the mean value theorem to and as follows. Expand the numerator of (33) to around to get
Invoking the uniform LLN (Law of Large Numbers), the denominator of (33), and can be replaced with , and , respectively. , , and are defined by (19), (20), (22) and (23) in Lemma 1.
Then we have,
By (27) and (30) in Lemma 2, we have
Hence,
where
The proof is completed.
2 Simulation Studies
In this section, extensive simulation studies are conducted to evaluate the finitesample performance of the proposed method. We consider different scenarios to simulate the observational studies in reality.
2.1 Simulation 1: Idealized Scenario
The outcome is generated by the following linear regression model:
where,denotes the treatment assignment, the covariates , and are independent of each other, and follow the standard normal distribution, follows Bernoulli distribution with successful probability of 0.4, the error term follows the standard normal distribution, and is set to 0 or 0.5. The propensity score is generated by the Logistic model:
where , then the proportion of subjects in the treatment group is about 45%. We compare the proposed estimator with two estimators:, the estimator based on the covariate adjustment with propensity score^{[18,19]}; , and the estimator based on propensity score residuals^{[12]}.
We generate 500 simulated data sets with the total sample size being 150 or 300. The sample mean, sample standard deviation and sample root mean square error of 500 estimators are given in the columns "Mean", "SD" and "RMSE", respectively. Furthermore, the column "ESD" shows a mean of the estimated standard deviation, and "CP" gives the nominal 95% confidence interval coverage rate using the estimated standard deviation. The bootstrap method is adopted to obtain the estimated variance, and the number of bootstraps is 100. The simulation results are summarized in Table 1.
From Table 1, we obtain that the three estimators , and are all approximately unbiased. The corresponding estimated standard error agrees well with the sampling standard error, and the coverage probability of a 95% confidence interval is around the nominal level of the three estimators.
Simulation result for the idealized scenario
2.2 Simulation 2: SingleIndex Model Scenario
The outcome is generated by the following partial singleindex model:
The treatment assignment model is:
The parameter settings are the same as in Simulation 1. The simulation results are summarized in Table 2.
From Table 2, the three estimators , and are all approximately unbiased. When the sample size is 150, we can know that the proposed estimator has the smallest sample standard deviation and sample root mean square error. The coverage proportions of the proposed estimator is apparently closer to the nominal 95% than that of the estimators and . When the sample size is increased to 300, all indicators show that the proposed estimator is the most effective.
Simulation result for the singleindex model scenario
2.3 Simulation 3: More Complex Scenario
The outcome is generated by the following model:
The treatment assignment model is:
The parameter settings are the same as in Simulation 1. The simulation results are summarized in Table 3.
From Table 3, when the treatment assignment model is no longer the singleindex model, the three estimators , and are all biased. However, the biases of the estimators and are relatively large, while the bias of the proposed estimator is relatively small. The average of the estimated standard deviation of is close to the sample standard deviation. The coverage proportions of the proposed estimator is much more relative to the nominal level than the estimators and . Therefore, from the simulation results of three different scenarios, the performance of our proposed estimator is optimal.
Simulation result for a more complex scenario
3 Real Data Analysis
Nowadays, due to unhealthy work and rest habits, many college students have the problem of poor physical quality. In order to continuously improve the overall level of physical health of college students, relevant government documents put forward mandatory requirements for the physical health level of college students; many universities also organize "Sunshine Running" to improve the physical health of students^{[20,21]}. In this section, we apply the proposed method to evaluate the effectiveness of the "Sunshine Running" policy offered by the university to improve college students' physical test scores. The college where we collected the data implemented the "Sunshine Running" policy among the freshmen admitted in 2018 but did not implement the policy in other grades. We collected physical test data from the class of grad 2017 at this university in 2018 as the control group and from the class of grad 2018 in 2019 as the treatment group. In order to make the collected data independent of each other, we randomly selected one male and one female in each major of the university to record their physical test scores, gender, age, and Body Mass Index (BMI, weight divided by height squared). We collected 82 students in the treatment group and 69 students in the control group, for a total of 151 students.
To simplify the expression, we use the following notation: Y denotes the physical test score, D represents the treatment assignment, Sex denotes gender (male=1, female=2), Age denotes age, and BMI denotes Body Mass Index. We consider the following outcome model:
where the error term follows the standard normal distribution, is an unknown smoothed function.
We consider the following generalized singleindex model to generate the treatment assignment:
where is an unknown smoothed function, are unknown parameters.
Likewise, we considered three estimators , and . We considered them in the simulation studies to study the effect of the "Sunshine Running" policy. We use the bootstrap to derive the variance of the three estimators, and the number of bootstraps is 100. The estimators, the standard deviations, and pvalue of , and are summarized in Table 4.
From Table 4, we can conclude that all the estimators confirm that the "Sunshine Running" policy has a significant impact on improving college students' physical test scores. From the standard deviations of the three estimators, the proposed estimator is the most effective, and the estimator is the worst.
Results for the "Sunshine Running" policy
4 Conclusion
Estimating treatment effects or evaluating policy effects is an essential issue in empirical science, which can provide us with the basis for quantitative decisionmaking. With the development of science and technology, a large number of highquality observational data have been collected, and how to use highquality and informationrich observational data to estimate treatment effects is one of the hot issues in current research. The propensity score has been widely used to adjust for heterogeneity in observational studies. However, the misspecification of the propensity score will cause serious bias. In this paper, we proposed the generalized singleindex model to estimate the propensity score and used the propensity score residuals to reduce the estimation bias. From the results of simulation studies and real data analysis, our proposed method is more effective than other competitive methods. Next, we will consider more robust methods to estimate the treatment effect, such as deep learning, in the future work.
References
 Imbens G W, Angrist J D. Identification and estimation of local average treatment effects [J]. Econometrica, 1994, 62(2): 467475. [Google Scholar]
 Angrist J D, Imbens G W, Rubin D B. Identification of causal effects using instrumental variables [J]. Journal of the American Statistical Association, 1996, 91(434): 444455. [Google Scholar]
 Imbens G W. Nonparametric estimation of average treatment effects under exogeneity: A review [J]. Review of Economics and Statistics, 2004, 86(1): 429. [Google Scholar]
 Rosenbaum P R, Rubin D B. The central role of the propensity score in observational studies for causal effects [J]. Biometrika, 1983, 70(1): 4155. [Google Scholar]
 Austin P C. An introduction to propensity score methods for reducing the effects of confounding in observational studies [J]. Multivariate Behavioral Research, 2011, 46(3): 399424. [Google Scholar]
 Austin P C, Schuster T. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: A simulation study [J]. Statistical Methods in Medical Research, 2016, 25(5): 22142237. [Google Scholar]
 Herman M, Robins J M. Causal Inference: What if [M]. Boca Raton: Chapman & Hall/CRC, 2020. [Google Scholar]
 Imbens G W, Rubin D B. Causal Inference in Statistics, Social, and Biomedical Sciences [M]. Cambridge: Cambridge University Press, 2015. [Google Scholar]
 Lunceford J K, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study [J]. Statistics in Medicine, 2004, 23(19): 29372960. [Google Scholar]
 Cao Y X, Zhang C X, Yu C J. Estimating survival treatment effects with covariate adjustment using propensity score[J]. Acta Mathematica Sinica, English Series, 2022, 38(11): 20572068. [Google Scholar]
 Cao Y X,Yu J C. Partial least squares method for treatment effect in observational studies with censored outcomes[J].Wuhan University Journal of Natural Sciences, 2018, 23(6): 487492. [Google Scholar]
 Lee M J. Simple least squares estimator for treatment effects using propensity score residuals [J]. Biometrika, 2018, 105(1): 149164. [Google Scholar]
 Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies [J]. Journal of Educational Psychology, 1974, 66(5): 688. [Google Scholar]
 Rubin D B. Inference and missing data [J]. Biometrika, 1976, 63(3): 581592. [Google Scholar]
 Holland P W. Statistics and causal inference [J]. Journal of the American Statistical Association, 1986, 81(396): 945960. [Google Scholar]
 Chen X. Large sample sieve estimation of seminonparametric models [J]. Handbook of Econometrics, 2007, 6: 55495632. [Google Scholar]
 Dong C H, Gao J T, Peng B. Series estimation for single‐index models under constraints [J]. Australian & New Zealand Journal of Statistics, 2019, 61(3): 299335. [Google Scholar]
 Vansteelandt S, Daniel R M. On regression adjustment for the propensity score [J]. Statistics in Medicine, 2014, 33(23): 40534072. [Google Scholar]
 Zou B M, Zou F, Shuster J J, et al. On variance estimate for covariate adjustment by propensity score analysis [J]. Statistics in Medicine, 2016, 35(20): 35373548. [Google Scholar]
 Ministry of Education. Opinions on deepening the teaching reform of undergraduate education and comprehensively improving the quality of talent cultivation [EB/OL]. [20191008]. http://www.moe.gov.cn/srcsite/A08/s7056/201910/t20191011_402759.html. [Google Scholar]
 The General Office of the CPC Central Committee, the State Council. Opinions on comprehensively strengthening and improving school sports work in the new era [EB/OL]. [20201015]. http://www.moe.gov.cn/jyb_xxgk/moe_1777/moe_1778/202010/t20201015_494794.html. [Google Scholar]
All Tables
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.