论文标题
羽毛的方法论鸟聚集在一起吗?
Do Methodological Birds of a Feather Flock Together?
论文作者
论文摘要
在过去的二十年中,准实验方法已经扩散,随着研究人员为随机化是不可行的设置而开发因果推理工具。两种流行的方法,分别差异(DID)和比较中断时间序列(CITS),将干预前后的观测值与同一时期观察到的未处理比较组进行了比较。两种方法都依赖于强烈的,不可测试的反事实假设。尽管它们相似,但有关CIT的方法论文献缺乏DID的数学形式。在本文中,我们使用潜在的结果框架正式化了两个版本的CITS- Bloom(2005)描述的一般版本以及经常用于健康服务研究中的线性版本。然后,我们将它们与两个相应的配方进行比较 - 一种具有时间固定效应,另一个具有时间固定效应和组趋势。我们还使用这些方法重新分析了三项先前发表的研究。我们证明了最通用的CITS版本,并且确实采用了相同的反事实,并估计了相同的治疗效果。这两种设计之间的唯一区别是用于描述它们及其在不同学科中的知名度的语言。我们还表明,当人们使用线性(CITS)或并行趋势(DID)约束它们时,这些设计会有所不同。我们建议通过考虑数据生成机制在更加灵活的版本中默认为更灵活的版本,并向从业者提供有关在更受约束版本之间进行选择的建议。我们还建议更加注意指定论文中的结果模型和反事实,从而可以透明地评估因果假设的合理性。
Quasi-experimental methods have proliferated over the last two decades, as researchers develop causal inference tools for settings in which randomization is infeasible. Two popular such methods, difference-in-differences (DID) and comparative interrupted time series (CITS), compare observations before and after an intervention in a treated group to an untreated comparison group observed over the same period. Both methods rely on strong, untestable counterfactual assumptions. Despite their similarities, the methodological literature on CITS lacks the mathematical formality of DID. In this paper, we use the potential outcomes framework to formalize two versions of CITS - a general version described by Bloom (2005) and a linear version often used in health services research. We then compare these to two corresponding DID formulations - one with time fixed effects and one with time fixed effects and group trends. We also re-analyze three previously published studies using these methods. We demonstrate that the most general versions of CITS and DID impute the same counterfactuals and estimate the same treatment effects. The only difference between these two designs is the language used to describe them and their popularity in distinct disciplines. We also show that these designs diverge when one constrains them using linearity (CITS) or parallel trends (DID). We recommend defaulting to the more flexible versions and provide advice to practitioners on choosing between the more constrained versions by considering the data-generating mechanism. We also recommend greater attention to specifying the outcome model and counterfactuals in papers, allowing for transparent evaluation of the plausibility of causal assumptions.