Title: | Bayesian Synthetic Control |
---|---|
Description: | Implements the Bayesian Synthetic Control method for causal inference in comparative case studies. This package provides tools for estimating treatment effects in settings with a single treated unit and multiple control units, allowing for uncertainty quantification and flexible modeling of time-varying effects. The methodology is based on the paper by Vives and Martinez (2022) <doi:10.48550/arXiv.2206.01779>. |
Authors: | Ignacio Martinez [aut, cre] , Jaume [aut] |
Maintainer: | Ignacio Martinez <[email protected]> |
License: | Apache License 2.0 |
Version: | 1.0 |
Built: | 2024-11-02 05:39:40 UTC |
Source: | https://github.com/google/bsynth |
Provides causal inference with a Bayesian synthetic control method.
Maintainer: Ignacio Martinez [email protected] (ORCID)
Authors:
Jaume [email protected]
Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org
Useful links:
Report bugs at https://github.com/google/bsynth/issues
Helper function to get the long dataset of draws given a stan fit object.
.get_par_long(fit, par)
.get_par_long(fit, par)
fit |
Stan object with the fitted model. |
par |
Variable to do the long table for.expand_more |
A tibble containing the parameter estimates in long format.
This function processes data frames containing synthetic and observed outcomes, calculates confidence intervals for the synthetic outcomes, and returns a combined data frame suitable for plotting the results.
.get_plot_df(y_synth_draws, pre_data, post_data, time, outcome, ci = 0.75)
.get_plot_df(y_synth_draws, pre_data, post_data, time, outcome, ci = 0.75)
y_synth_draws |
A data frame containing draws from the Stan fit object. |
pre_data |
A data frame with data before the intervention. |
post_data |
A data frame with data after the intervention. |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
ci |
The width of the credible confidence interval (default: 0.75). |
A data frame containing:
time
: The time period.
outcome
: The observed outcome.
y_synth
: The mean synthetic outcome.
LB
: The lower bound of the confidence interval for the synthetic outcome.
UB
: The upper bound of the confidence interval for the synthetic outcome.
tau
: The difference between the observed and synthetic outcomes.
tau_LB
: The lower bound of the confidence interval for tau
.
tau_UB
: The upper bound of the confidence interval for tau
.
This function processes data for multiple treated units, calculating synthetic outcomes, confidence intervals, and treatment effects. It combines this information into a data frame suitable for plotting the results.
.get_plot_df2(y_synth_draws, data, treated_ids, id, time, outcome, ci = 0.75)
.get_plot_df2(y_synth_draws, data, treated_ids, id, time, outcome, ci = 0.75)
y_synth_draws |
A data frame containing synthetic outcome draws for each treated unit and time period. |
data |
A data frame with the original data, including outcomes for treated units. |
treated_ids |
A vector of identifiers for the treated units. |
id |
The name of the variable in |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
ci |
The width of the credible confidence interval (default: 0.75). |
A data frame containing:
time
: The time period.
id
: The unit identifier (including "Average" for the average treatment effect).
outcome
: The observed outcome (for treated units).
y_synth
: The mean synthetic outcome (for treated units and the average).
LB
: The lower bound of the confidence interval for the synthetic outcome.
UB
: The upper bound of the confidence interval for the synthetic outcome.
tau
: The treatment effect (difference between observed and synthetic outcomes).
tau_LB
: The lower bound of the confidence interval for the treatment effect.
tau_UB
: The upper bound of the confidence interval for the treatment effect.
This internal helper function extracts synthetic draws from a Stan fit object, combines them with observed outcome data, and returns a tidy data frame suitable for further analysis or plotting. This function is specifically designed for scenarios with a single treated unit.
.get_synth_draws(fit, pre_data, post_data, time, outcome)
.get_synth_draws(fit, pre_data, post_data, time, outcome)
fit |
A Stan fit object containing the model results. |
pre_data |
A data frame with outcome data before the intervention. |
post_data |
A data frame with outcome data after the intervention. |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
A data frame containing:
draw
: The index of the synthetic draw.
time
: The time period.
y_synth
: The synthetic outcome for the given draw and time period.
outcome
: The observed outcome for the given time period.
This internal helper function extracts synthetic draws from a Stan fit object generated by a predictor match model. It combines these draws with observed outcome data and returns a tidy data frame suitable for analysis or plotting. It specifically works with variable definitions from the predictor match model.
.get_synth_draws_predictor_match(fit, pre_data, post_data, time, outcome)
.get_synth_draws_predictor_match(fit, pre_data, post_data, time, outcome)
fit |
A Stan fit object containing the model results. |
pre_data |
A data frame with outcome data before the intervention. |
post_data |
A data frame with outcome data after the intervention. |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
A data frame containing:
draw
: The index of the synthetic draw.
time
: The time period.
y_synth
: The synthetic outcome for the given draw and time period.
outcome
: The observed outcome for the given time period.
This internal helper function extracts synthetic draws from a Stan fit object where the draws are stored in a 3D array. It handles multiple treated units and combines the draws with observed outcome data, returning a tidy data frame suitable for analysis or plotting.
.get_synth_draws3d(fit, data, id, treated_ids, time, outcome, intervention)
.get_synth_draws3d(fit, data, id, treated_ids, time, outcome, intervention)
fit |
A Stan fit object containing the model results. |
data |
A data frame with the input data, including outcome, time, and unit identifier. |
id |
The name of the variable in |
treated_ids |
A vector of identifiers for the treated units. |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
intervention |
The name of the variable in |
A data frame containing:
draw
: The index of the synthetic draw.
id
: The identifier of the treated unit.
time
: The time period.
y_hat
: The synthetic outcome for the given draw, unit, and time period.
This internal helper function transforms data from a long format, where each row represents an observation for a specific unit and time, to a wide format, where each row represents a time period and each column represents a unit's outcome. It specifically focuses on separating treated and untreated units.
.makeWide(data, id, time, outcome, treatment)
.makeWide(data, id, time, outcome, treatment)
data |
A data frame containing the input data. |
id |
The name of the variable in |
time |
The name of the time period variable (as a string). |
outcome |
The name of the outcome variable (as a string). |
treatment |
The name of the variable in |
A data frame in wide format, where each row corresponds to a time period, and columns include the time variable, the treatment indicator, and the outcome values for each treated unit and all untreated units.
This internal helper function creates a plot to visualize the estimated treatment effect over time. It allows for faceting by a specified variable and optional subsetting of units to include in the plot.
.plot_tau(data, x, y, ymin, ymax, xintercept, facet, id, subset = NULL)
.plot_tau(data, x, y, ymin, ymax, xintercept, facet, id, subset = NULL)
data |
A data frame containing the data to be plotted. |
x |
The name of the x-axis variable (typically the time period) (as a string). |
y |
The name of the y-axis variable (typically the treatment effect) (as a string). |
ymin |
The name of the variable containing the lower bound of the confidence interval (as a string). |
ymax |
The name of the variable containing the upper bound of the confidence interval (as a string). |
xintercept |
The time point of the intervention to be marked with a vertical dashed line. |
facet |
(Optional) The name of the variable to facet the plot by (as a string). |
id |
The name of the variable identifying the units (as a string). |
subset |
(Optional) A vector specifying a subset of units to include in the plot. If NULL, all units are included. |
A ggplot object displaying the treatment effect plot.
A Bayesian Factor Model has raw data and draws from the posterior distribution. This is represented by an R6 Class.
Code and theory based on Pinkney 2021.
public methods:
initialize()
initializes the variables and model parameters
fit()
fits the stan model and returns a fit object
updateWidth
updates the width of the credible interval
placeboPlot
generates a counterfactual placebo plot
effectPlot
returns a plot of the treatment effect over time
summarizeLift
returns descriptive statistics of the lift estimate
biasDraws
returns a plot of the relative bias in a LFM
liftDraws
returns a plot of the posterior lift distribution
liftBias
returns a plot of the relative bias given a lift offset
vizdraws object with the relative bias with offset.
timeTiles
ggplot2 object that shows when the intervention happened.
plotData
tibble with the observed outcome and the counterfactual data.
interventionTime
returns the intervention time period.
synthetic
ggplot2 object that shows the observed and counterfactual outcomes over time.
new()
Create a new bayesianFactor object.
bayesianFactor$new( data, time, id, treated, outcome, ci_width = 0.75, covariates )
data
Long data.frame object with fields outcome, time, id, and treatment indicator.
time
Name of the variable in the data frame that
id
Name of the variable in the data frame that identifies the units (e.g. country, region etc).
treated
Name of the variable in the data frame that contains the treatment assignment of the intervention.
outcome
Name of the outcome variable.
ci_width
Credible interval's width. This number is in the (0,1) interval.
covariates
Dataframe with a column for id and the other columns Defaults to NULL if no covariates should be included in the model.
params described in the data structure section of the documentation of the R6 class at the top of the file.
A new bayesianFactor
object.
fit()
Fit Stan model.
bayesianFactor$fit(L = 8, ...)
L
Number of factors.
...
other arguments passed to rstan::sampling()
.
updateWidth()
Update the width of the credible interval.
bayesianFactor$updateWidth(ci_width = 0.75)
ci_width
New width for the credible interval. This number should be in the (0,1) interval.
summarizeLift()
summarizeLift returns descriptive statistics of the lift estimate.
bayesianFactor$summarizeLift()
effectPlot()
effectPlot returns ggplot2 object that shows the effect of the intervention over time.
bayesianFactor$effectPlot()
liftDraws()
Plots lift.
bayesianFactor$liftDraws(from, to, ...)
from
First period to consider when calculating lift. If infinite, set to the time of the intervention.
to
Last period to consider when calculating lift. If infinite, set to the last period.
...
other arguments passed to vizdraws::vizdraws().
vizdraws object with the posterior distribution of the lift.
liftBias()
Plot bias magnitude in terms of lift for period (firstT, lastT)
bayesianFactor$liftBias(firstT, lastT, offset, ...)
firstT
Start of the time period to compute relative bias over. Must be after the intervention.
lastT
End of the time period to compute relative bias over. Must be after the intervention. over. They must be after the intervention.
offset
Target lift %.
...
other arguments passed to vizdraws::vizdraws().
biasDraws()
Plots relative upper bias / tau for a time period (firstT, lastT).
bayesianFactor$biasDraws(small_bias = 0.3, firstT, lastT)
small_bias
Threshold value for considering the bias "small".
firstT, lastT
Time periods to compute relative bias over, they must after the intervention.
vizdraw object with the posterior distribution of relative bias. Bias is scaled by the time periods.
clone()
The objects of this class are cloneable with this method.
bayesianFactor$clone(deep = FALSE)
deep
Whether to make a deep clone.
A Bayesian Synthetic Control has raw data and draws from the posterior distribution. This is represented by an R6 Class.
public methods:
initialize()
initializes the variables and model parameters
fit()
fits the stan model and returns a fit object
updateWidth
updates the width of the credible interval
placeboPlot
generates a counterfactual placebo plot
effectPlot
returns a plot of the treatment effect over time
summarizeLift
returns descriptive statistics of the lift estimate
biasDraws
returns a plot of the relative bias in a LFM
liftDraws
returns a plot of the posterior lift distribution
liftBias
returns a plot of the relative bias given a lift offset
Data structure:
vizdraws object with the relative bias with offset.
timeTiles
ggplot2 object that shows when the intervention happened.
plotData
returns tibble with the observed outcome and the counterfactual data.
interventionTime
returns intervention time period (e.g., year) in which the treatment occurred.
synthetic
returns ggplot2 object that shows the observed and counterfactual outcomes over time.
checks
returns MCMC checks.
lift
draws from the posterior distribution of the lift.
new()
Create a new bayesianSynth object.
bayesianSynth$new( data, time, id, treated, outcome, ci_width = 0.75, gp = FALSE, covariates = NULL, predictor_match = FALSE, predictor_match_covariates0 = NULL, predictor_match_covariates1 = NULL, vs = NULL )
data
Long data.frame object with fields outcome, time, id, and treatment indicator.
time
Name of the variable in the data frame that identifies the time period (e.g. year, month, week etc).
id
Name of the variable in the data frame that identifies the units (e.g. country, region etc).
treated
Name of the variable in the data frame that contains the treatment assignment of the intervention.
outcome
Name of the outcome variable.
ci_width
Credible interval's width. This number is in the (0,1) interval.
gp
Logical that indicates whether or not to include a Gaussian Process as part of the model.
covariates
Data.frame with time dependent covariates for for each unit and time field. Defaults to NULL if no covariates should be included in the model.
predictor_match
Logical that indicates whether or not to run the matching version of the Bayesian Synthetic Control. This option can not be used with gp, covariates or multiple treated units.
predictor_match_covariates0
data.frame with time independent covariates on each row and column indicating the control unit names (dim k x J+1).
predictor_match_covariates1
Vector with time independent covariates for the treated unit (dim k x 1).
vs
Vector of weights for the importance of the predictors used in creating the synthetic control. Defaults to equal weight for all predictors.
A new bayesianSynth
object.
fit()
Fit Stan model.
bayesianSynth$fit(...)
...
other arguments passed to rstan::sampling()
.
updateWidth()
Update the width of the credible interval.
bayesianSynth$updateWidth(ci_width = 0.75)
ci_width
New width for the credible interval. This number should be in the (0,1) interval.
summarizeLift()
returns descriptive statistics of the lift estimate.
bayesianSynth$summarizeLift()
effectPlot()
effect ggplot2 object that shows the effect of the intervention over time.
bayesianSynth$effectPlot(facet = TRUE, subset = NULL)
facet
Boolean that is TRUE if we want to divide the plot for each unit.
subset
Set of units to use in the effect plot.
placeboPlot()
Plot placebo intervention.
bayesianSynth$placeboPlot(periods, ...)
periods
Positive number of periods for the placebo intervention.
...
other arguments passed to rstan::sampling()
.
ggplot2 object for placebo treatment effect.
biasDraws()
Plots relative upper bias / tau for a time period (firstT, lastT).
bayesianSynth$biasDraws(small_bias = 0.3, firstT, lastT)
small_bias
Threshold value for considering the bias "small".
firstT
Start of the time period to compute relative bias over. Must be after the intervention.
lastT
End of the time period to compute relative bias over. Must be after the intervention.
vizdraw object with the posterior distribution of relative bias. Bias is scaled by the time periods.
liftDraws()
Plots lift.
bayesianSynth$liftDraws(from, to, ...)
from
First period to consider when calculating lift. If infinite, set to the time of the intervention.
to
Last period to consider when calculating lift. If infinite, set to the last period.
...
other arguments passed to vizdraws::vizdraws().
vizdraws object with the posterior distribution of the lift.
liftBias()
Plot Bias magnitude in terms of lift for period (firstT, lastT) pre_MADs / y0 relative to lift thresholds.
bayesianSynth$liftBias(firstT, lastT, offset, ...)
firstT
start of the time period to compute relative bias over. They must be after the intervention.
lastT
end of the Time period to compute relative bias over. They must be after the intervention.
offset
Target lift %.
...
other arguments passed to vizdraws::vizdraws().
weightDraws()
Plot implicit weight distribution across draws.
bayesianSynth$weightDraws()
ggplot object with weight distribution per unit.
weightCorr()
Plots correlations between weights across draws.
bayesianSynth$weightCorr()
ggplot heatmap object with correlations.
clone()
The objects of this class are cloneable with this method.
bayesianSynth$clone(deep = FALSE)
deep
Whether to make a deep clone.
This function creates a time tiles plot visualizing when and which units are affected by an intervention. Each tile represents a unit at a specific time point, with the color indicating the treatment status.
time_tiles(data, time, id, status)
time_tiles(data, time, id, status)
data |
A data frame containing the input data. |
time |
The name of the time period variable (as a string). |
id |
The name of the unit identifier variable (as a string). |
status |
The name of the variable that identifies the treatment status (as a string). |
A ggplot object displaying the time tiles plot.