| Title: | Bayesian Synthetic Control |
|---|---|
| Description: | Implements the Bayesian Synthetic Control method for causal inference in comparative case studies. This package provides tools for estimating treatment effects in settings with a single treated unit and multiple control units, allowing for uncertainty quantification and flexible modeling of time-varying effects. The methodology is based on the paper by Vives and Martinez (2022) <doi:10.48550/arXiv.2206.01779>. |
| Authors: | Ignacio Martinez [aut, cre] (ORCID: <https://orcid.org/0000-0002-3721-8172>), Jaume [aut] |
| Maintainer: | Ignacio Martinez <[email protected]> |
| License: | Apache License 2.0 |
| Version: | 1.0 |
| Built: | 2026-05-12 09:05:46 UTC |
| Source: | https://github.com/google/bsynth |
Provides causal inference with a Bayesian synthetic control method.
Maintainer: Ignacio Martinez [email protected] (ORCID)
Authors:
Jaume [email protected]
Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org
Useful links:
Report bugs at https://github.com/google/bsynth/issues
Helper function to get the long dataset of draws given a stan fit object.
.get_par_long(fit, par).get_par_long(fit, par)
fit |
Stan object with the fitted model. |
par |
Variable to do the long table for.expand_more |
A tibble containing the parameter estimates in long format.
This function processes data frames containing synthetic and observed outcomes, calculates confidence intervals for the synthetic outcomes, and returns a combined data frame suitable for plotting the results.
.get_plot_df(y_synth_draws, pre_data, post_data, time, outcome, ci = 0.75).get_plot_df(y_synth_draws, pre_data, post_data, time, outcome, ci = 0.75)
y_synth_draws |
A data frame containing draws from the Stan fit object. |
pre_data |
A data frame with data before the intervention. |
post_data |
A data frame with data after the intervention. |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
ci |
The width of the credible confidence interval (default: 0.75). |
A data frame containing:
time: The time period.
outcome: The observed outcome.
y_synth: The mean synthetic outcome.
LB: The lower bound of the confidence interval for the synthetic outcome.
UB: The upper bound of the confidence interval for the synthetic outcome.
tau: The difference between the observed and synthetic outcomes.
tau_LB: The lower bound of the confidence interval for tau.
tau_UB: The upper bound of the confidence interval for tau.
This function processes data for multiple treated units, calculating synthetic outcomes, confidence intervals, and treatment effects. It combines this information into a data frame suitable for plotting the results.
.get_plot_df2(y_synth_draws, data, treated_ids, id, time, outcome, ci = 0.75).get_plot_df2(y_synth_draws, data, treated_ids, id, time, outcome, ci = 0.75)
y_synth_draws |
A data frame containing synthetic outcome draws for each treated unit and time period. |
data |
A data frame with the original data, including outcomes for treated units. |
treated_ids |
A vector of identifiers for the treated units. |
id |
The name of the variable in |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
ci |
The width of the credible confidence interval (default: 0.75). |
A data frame containing:
time: The time period.
id: The unit identifier (including "Average" for the average treatment effect).
outcome: The observed outcome (for treated units).
y_synth: The mean synthetic outcome (for treated units and the average).
LB: The lower bound of the confidence interval for the synthetic outcome.
UB: The upper bound of the confidence interval for the synthetic outcome.
tau: The treatment effect (difference between observed and synthetic outcomes).
tau_LB: The lower bound of the confidence interval for the treatment effect.
tau_UB: The upper bound of the confidence interval for the treatment effect.
This internal helper function extracts synthetic draws from a Stan fit object, combines them with observed outcome data, and returns a tidy data frame suitable for further analysis or plotting. This function is specifically designed for scenarios with a single treated unit.
.get_synth_draws(fit, pre_data, post_data, time, outcome).get_synth_draws(fit, pre_data, post_data, time, outcome)
fit |
A Stan fit object containing the model results. |
pre_data |
A data frame with outcome data before the intervention. |
post_data |
A data frame with outcome data after the intervention. |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
A data frame containing:
draw: The index of the synthetic draw.
time: The time period.
y_synth: The synthetic outcome for the given draw and time period.
outcome: The observed outcome for the given time period.
This internal helper function extracts synthetic draws from a Stan fit object generated by a predictor match model. It combines these draws with observed outcome data and returns a tidy data frame suitable for analysis or plotting. It specifically works with variable definitions from the predictor match model.
.get_synth_draws_predictor_match(fit, pre_data, post_data, time, outcome).get_synth_draws_predictor_match(fit, pre_data, post_data, time, outcome)
fit |
A Stan fit object containing the model results. |
pre_data |
A data frame with outcome data before the intervention. |
post_data |
A data frame with outcome data after the intervention. |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
A data frame containing:
draw: The index of the synthetic draw.
time: The time period.
y_synth: The synthetic outcome for the given draw and time period.
outcome: The observed outcome for the given time period.
This internal helper function extracts synthetic draws from a Stan fit object where the draws are stored in a 3D array. It handles multiple treated units and combines the draws with observed outcome data, returning a tidy data frame suitable for analysis or plotting.
.get_synth_draws3d(fit, data, id, treated_ids, time, outcome, intervention).get_synth_draws3d(fit, data, id, treated_ids, time, outcome, intervention)
fit |
A Stan fit object containing the model results. |
data |
A data frame with the input data, including outcome, time, and unit identifier. |
id |
The name of the variable in |
treated_ids |
A vector of identifiers for the treated units. |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
intervention |
The name of the variable in |
A data frame containing:
draw: The index of the synthetic draw.
id: The identifier of the treated unit.
time: The time period.
y_hat: The synthetic outcome for the given draw, unit, and time period.
This internal helper function transforms data from a long format, where each row represents an observation for a specific unit and time, to a wide format, where each row represents a time period and each column represents a unit's outcome. It specifically focuses on separating treated and untreated units.
.makeWide(data, id, time, outcome, treatment).makeWide(data, id, time, outcome, treatment)
data |
A data frame containing the input data. |
id |
The name of the variable in |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
treatment |
The name of the variable in |
A data frame in wide format, where each row corresponds to a time period, and columns include the time variable, the treatment indicator, and the outcome values for each treated unit and all untreated units.
This internal helper function creates a plot to visualize the estimated treatment effect over time. It allows for faceting by a specified variable and optional subsetting of units to include in the plot.
.plot_tau(data, x, y, ymin, ymax, xintercept, facet, id, subset = NULL).plot_tau(data, x, y, ymin, ymax, xintercept, facet, id, subset = NULL)
data |
A data frame containing the data to be plotted. |
x |
The name of the x-axis variable (typically the time period) (as a string). |
y |
The name of the y-axis variable (typically the treatment effect) (as a string). |
ymin |
The name of the variable containing the lower bound of the confidence interval (as a string). |
ymax |
The name of the variable containing the upper bound of the confidence interval (as a string). |
xintercept |
The time point of the intervention to be marked with a vertical dashed line. |
facet |
(Optional) The name of the variable to facet the plot by (as a string). |
id |
The name of the variable identifying the units (as a string). |
subset |
(Optional) A vector specifying a subset of units to include in the plot. If NULL, all units are included. |
A ggplot object displaying the treatment effect plot.
A Bayesian Factor Model has raw data and draws from the posterior distribution. This is represented by an R6 Class.
Code and theory based on Pinkney 2021.
public methods:
initialize() initializes the variables and model parameters
fit() fits the stan model and returns a fit object
updateWidth updates the width of the credible interval
placeboPlot generates a counterfactual placebo plot
effectPlot returns a plot of the treatment effect over time
summarizeLiftreturns descriptive statistics of the lift estimate
biasDraws returns a plot of the relative bias in a LFM
liftDraws returns a plot of the posterior lift distribution
liftBias returns a plot of the relative bias given a lift offset
vizdraws object with the relative bias with offset.
timeTilesggplot2 object that shows when the intervention happened.
plotDatatibble with the observed outcome and the counterfactual data.
interventionTimereturns the intervention time period.
syntheticggplot2 object that shows the observed and counterfactual outcomes over time.
new()
Create a new bayesianFactor object.
bayesianFactor$new( data, time, id, treated, outcome, ci_width = 0.75, covariates )
dataLong data.frame object with fields outcome, time, id, and treatment indicator.
timeName of the variable in the data frame that
idName of the variable in the data frame that identifies the units (e.g. country, region etc).
treatedName of the variable in the data frame that contains the treatment assignment of the intervention.
outcomeName of the outcome variable.
ci_widthCredible interval's width. This number is in the (0,1) interval.
covariatesDataframe with a column for id and the other columns Defaults to NULL if no covariates should be included in the model.
params described in the data structure section of the documentation of the R6 class at the top of the file.
A new bayesianFactor object.
fit()
Fit Stan model.
bayesianFactor$fit(L = 8, ...)
LNumber of factors.
...other arguments passed to rstan::sampling().
updateWidth()
Update the width of the credible interval.
bayesianFactor$updateWidth(ci_width = 0.75)
ci_widthNew width for the credible interval. This number should be in the (0,1) interval.
summarizeLift()
summarizeLift returns descriptive statistics of the lift estimate.
bayesianFactor$summarizeLift()
effectPlot()
effectPlot returns ggplot2 object that shows the effect of the intervention over time.
bayesianFactor$effectPlot()
liftDraws()
Plots lift.
bayesianFactor$liftDraws(from, to, ...)
fromFirst period to consider when calculating lift. If infinite, set to the time of the intervention.
toLast period to consider when calculating lift. If infinite, set to the last period.
...other arguments passed to vizdraws::vizdraws().
vizdraws object with the posterior distribution of the lift.
liftBias()
Plot bias magnitude in terms of lift for period (firstT, lastT)
bayesianFactor$liftBias(firstT, lastT, offset, ...)
firstTStart of the time period to compute relative bias over. Must be after the intervention.
lastTEnd of the time period to compute relative bias over. Must be after the intervention. over. They must be after the intervention.
offsetTarget lift %.
...other arguments passed to vizdraws::vizdraws().
biasDraws()
Plots relative upper bias / tau for a time period (firstT, lastT).
bayesianFactor$biasDraws(small_bias = 0.3, firstT, lastT)
small_biasThreshold value for considering the bias "small".
firstT, lastTTime periods to compute relative bias over, they must after the intervention.
vizdraw object with the posterior distribution of relative bias. Bias is scaled by the time periods.
clone()
The objects of this class are cloneable with this method.
bayesianFactor$clone(deep = FALSE)
deepWhether to make a deep clone.
A Bayesian Synthetic Control has raw data and draws from the posterior distribution. This is represented by an R6 Class.
public methods:
initialize() initializes the variables and model parameters
fit() fits the stan model and returns a fit object
updateWidth updates the width of the credible interval
placeboPlot generates a counterfactual placebo plot
effectPlot returns a plot of the treatment effect over time
summarizeLiftreturns descriptive statistics of the lift estimate
biasDraws returns a plot of the relative bias in a LFM
liftDraws returns a plot of the posterior lift distribution
liftBias returns a plot of the relative bias given a lift offset
Data structure:
vizdraws object with the relative bias with offset.
timeTilesggplot2 object that shows when the intervention happened.
plotDatareturns tibble with the observed outcome and the counterfactual data.
interventionTimereturns intervention time period (e.g., year) in which the treatment occurred.
syntheticreturns ggplot2 object that shows the observed and counterfactual outcomes over time.
checksreturns MCMC checks.
liftdraws from the posterior distribution of the lift.
new()
Create a new bayesianSynth object.
bayesianSynth$new( data, time, id, treated, outcome, ci_width = 0.75, gp = FALSE, covariates = NULL, predictor_match = FALSE, predictor_match_covariates0 = NULL, predictor_match_covariates1 = NULL, vs = NULL )
dataLong data.frame object with fields outcome, time, id, and treatment indicator.
timeName of the variable in the data frame that identifies the time period (e.g. year, month, week etc).
idName of the variable in the data frame that identifies the units (e.g. country, region etc).
treatedName of the variable in the data frame that contains the treatment assignment of the intervention.
outcomeName of the outcome variable.
ci_widthCredible interval's width. This number is in the (0,1) interval.
gpLogical that indicates whether or not to include a Gaussian Process as part of the model.
covariatesData.frame with time dependent covariates for for each unit and time field. Defaults to NULL if no covariates should be included in the model.
predictor_matchLogical that indicates whether or not to run the matching version of the Bayesian Synthetic Control. This option can not be used with gp, covariates or multiple treated units.
predictor_match_covariates0data.frame with time independent covariates on each row and column indicating the control unit names (dim k x J+1).
predictor_match_covariates1Vector with time independent covariates for the treated unit (dim k x 1).
vsVector of weights for the importance of the predictors used in creating the synthetic control. Defaults to equal weight for all predictors.
A new bayesianSynth object.
fit()
Fit Stan model.
bayesianSynth$fit(...)
...other arguments passed to rstan::sampling().
updateWidth()
Update the width of the credible interval.
bayesianSynth$updateWidth(ci_width = 0.75)
ci_widthNew width for the credible interval. This number should be in the (0,1) interval.
summarizeLift()
returns descriptive statistics of the lift estimate.
bayesianSynth$summarizeLift()
effectPlot()
effect ggplot2 object that shows the effect of the intervention over time.
bayesianSynth$effectPlot(facet = TRUE, subset = NULL)
facetBoolean that is TRUE if we want to divide the plot for each unit.
subsetSet of units to use in the effect plot.
placeboPlot()
Plot placebo intervention.
bayesianSynth$placeboPlot(periods, ...)
periodsPositive number of periods for the placebo intervention.
...other arguments passed to rstan::sampling().
ggplot2 object for placebo treatment effect.
biasDraws()
Plots relative upper bias / tau for a time period (firstT, lastT).
bayesianSynth$biasDraws(small_bias = 0.3, firstT, lastT)
small_biasThreshold value for considering the bias "small".
firstTStart of the time period to compute relative bias over. Must be after the intervention.
lastTEnd of the time period to compute relative bias over. Must be after the intervention.
vizdraw object with the posterior distribution of relative bias. Bias is scaled by the time periods.
liftDraws()
Plots lift.
bayesianSynth$liftDraws(from, to, ...)
fromFirst period to consider when calculating lift. If infinite, set to the time of the intervention.
toLast period to consider when calculating lift. If infinite, set to the last period.
...other arguments passed to vizdraws::vizdraws().
vizdraws object with the posterior distribution of the lift.
liftBias()
Plot Bias magnitude in terms of lift for period (firstT, lastT) pre_MADs / y0 relative to lift thresholds.
bayesianSynth$liftBias(firstT, lastT, offset, ...)
firstTstart of the time period to compute relative bias over. They must be after the intervention.
lastTend of the Time period to compute relative bias over. They must be after the intervention.
offsetTarget lift %.
...other arguments passed to vizdraws::vizdraws().
weightDraws()
Plot implicit weight distribution across draws.
bayesianSynth$weightDraws()
ggplot object with weight distribution per unit.
weightCorr()
Plots correlations between weights across draws.
bayesianSynth$weightCorr()
ggplot heatmap object with correlations.
clone()
The objects of this class are cloneable with this method.
bayesianSynth$clone(deep = FALSE)
deepWhether to make a deep clone.
This function creates a time tiles plot visualizing when and which units are affected by an intervention. Each tile represents a unit at a specific time point, with the color indicating the treatment status.
time_tiles(data, time, id, status)time_tiles(data, time, id, status)
data |
A data frame containing the input data. |
time |
The name of the time period variable (as a string). |
id |
The name of the unit identifier variable (as a string). |
status |
The name of the variable that identifies the treatment status (as a string). |
A ggplot object displaying the time tiles plot.