Title: | Impact Measurement Toolkit |
---|---|
Description: | A toolkit for causal inference in experimental and observational studies. Implements various simple Bayesian models including linear, negative binomial, and logistic regression for impact estimation. Provides functionality for randomization and checking baseline equivalence in experimental designs. The package aims to simplify the process of impact measurement for researchers and analysts across different fields. Examples and detailed usage instructions are available at <https://book.martinez.fyi>. |
Authors: | Ignacio Martinez [aut, cre] |
Maintainer: | Ignacio Martinez <[email protected]> |
License: | Apache License (>= 2.0) |
Version: | 1.1.0 |
Built: | 2024-11-07 02:42:06 UTC |
Source: | https://github.com/google/imt |
This function takes a data frame, identifies columns that are not specified in the exclusion list, and combines them into a new column called 'group'. The original columns used for the combination are then removed. Finally, it returns a data frame with only the 'group', 'variables', 'std_diff', and 'balance' columns.
.combineColumns(df)
.combineColumns(df)
df |
A data frame containing the columns to be combined. |
A data frame with the combined 'group' column and specified columns.
This function takes a data frame and adds a new column named "treated" with randomly assigned TRUE/FALSE values. Randomization can be done either on the entire data frame or stratified by specified columns. The probability of being assigned to the treatment group can be specified, with a default of 0.5.
.randomize_internal(data, group_by = NULL, seed = NULL, pr_treated = 0.5)
.randomize_internal(data, group_by = NULL, seed = NULL, pr_treated = 0.5)
data |
The input data frame. |
group_by |
(Optional) A character vector of column names to stratify the randomization. If provided, the randomization will be done within d each groupefined by the specified columns. |
seed |
(Optional) An integer to set the random seed for reproducibility. |
pr_treated |
(Optional) The probability of a row being assigned to the treatment group (TRUE). Default is 0.5. |
A new data frame with the added "treated" column.
Create a Baseline Balance Plot.
balancePlot(data)
balancePlot(data)
data |
tibble produced with |
ggplot2 baseline balance plot.
library(imt) set.seed(123) # for reproducibility N <- 1000 fake_data <- tibble::tibble(x1 = rnorm(N), x2 = rnorm(N), t = rbinom(N, 1, 0.5)) baseline <- checkBaseline(data = fake_data, variables = c("x1", "x2"), treatment = "t") balancePlot(data = baseline)
library(imt) set.seed(123) # for reproducibility N <- 1000 fake_data <- tibble::tibble(x1 = rnorm(N), x2 = rnorm(N), t = rbinom(N, 1, 0.5)) baseline <- checkBaseline(data = fake_data, variables = c("x1", "x2"), treatment = "t") balancePlot(data = baseline)
Computes the BASIE (BAyeSian Interpretation of Estimates) posterior distribution
Computes the BASIE (BAyeSian Interpretation of Estimates) posterior distribution
Implementation of BASIE
A list containing the mean and standard deviation of the BASIE posterior distribution.
mean
Mean of the BASIE posterior distribution.
sd
Standard deviation of the BASIE posterior distribution.
prior
Prior distribution.
new()
Estimate the BASIE posterior distribution.
basie$new(priorMean, priorSD, likMean, likSD)
priorMean
Prior distribution mean.
priorSD
Prior distribution standard deviation.
likMean
Likelihood mean (point estimate).
likSD
Likelihood standard deviation (standard error of the point estimate).
An object of class Basie
.
vizdraws()
Plot the prior and BASIE posterior distributions.
See vizdraws::vizdraws()
for more details.
basie$vizdraws(draws = 50000L, ...)
draws
Number of draws for the posterior.
...
Other arguments passed to vizdraws::vizdraws()
.
probability()
Estimates the probability that the effect of the intervention is
greater than x
.
basie$probability(x)
x
Threshold of interest.
plotProbabilities()
Plot the probability that the effect of the intervention is greater than a given threshold.
basie$plotProbabilities(from = -0.08, to = 0.08, length.out = 100, ...)
from
The starting value for the x-axis.
to
The ending value for the x-axis.
length.out
The number of points to generate for the x-axis.
...
Other arguments passed to ggplot2::ggplot()
.
clone()
The objects of this class are cloneable with this method.
basie$clone(deep = FALSE)
deep
Whether to make a deep clone.
priorMean <- 0 priorSD <- 0.03 likMean <- 0.071 likSD <- 0.074 basie_estimate <- basie$new( priorMean = priorMean, priorSD = priorSD, likMean = likMean, likSD = likSD ) txt <- glue::glue( "This example assumes that the prior distribution mean is ", "{scales::percent(priorMean)}, and its standard deviation ", "is {scales::percent(priorSD)}. ", "Furthermore, we imagine a study that found ", "a point estimate for the effect of {scales::percent(likMean)} ", "with a standard error of {scales::percent(likSD)}. ", "Finally, we assume that different decisions would be made ", "if lift is negative, positive but less than 5%, or greater than 5%." ) cat(stringr::str_wrap(txt), "\n") basie_estimate$plotProbabilities()
priorMean <- 0 priorSD <- 0.03 likMean <- 0.071 likSD <- 0.074 basie_estimate <- basie$new( priorMean = priorMean, priorSD = priorSD, likMean = likMean, likSD = likSD ) txt <- glue::glue( "This example assumes that the prior distribution mean is ", "{scales::percent(priorMean)}, and its standard deviation ", "is {scales::percent(priorSD)}. ", "Furthermore, we imagine a study that found ", "a point estimate for the effect of {scales::percent(likMean)} ", "with a standard error of {scales::percent(likSD)}. ", "Finally, we assume that different decisions would be made ", "if lift is negative, positive but less than 5%, or greater than 5%." ) cat(stringr::str_wrap(txt), "\n") basie_estimate$plotProbabilities()
Bayesian Linear Model Factory
Bayesian Linear Model Factory
version
im package version used to fit model
eta_draws
Posterior draws for the treatment effect
mcmChecks
MCMC diagnostics
credible_interval
Credible interval for the treatment effect
new()
Get the package version
Get the posterior draws for eta
Get the MCMC diagnostics
Get the credible interval
Create a new Bayesian Linear Model object.
blm$new( data, y, x, treatment, eta_mean, eta_sd, generate_fake_data = 0, seed = 1982, ... )
data
Data frame to be used
y
Name of the outcome variable in the data frame
x
Vector of names of all covariates in the data frame
treatment
Name of the treatment indicator variable in the data frame
eta_mean
Prior mean for the treatment effect estimation
eta_sd
Prior standard deviation for the treatment effect estimation
generate_fake_data
Flag for generating fake data
seed
Seed for Stan fitting
...
Additional arguments for Stan
invisible
ppcDensOverlay()
This method compares the empirical distribution of the data 'y'
to the distributions of simulated/replicated data 'yrep' from the
posterior predictive distribution. This is done by creating a density
overlay plot using the bayesplot::ppc_dens_overlay
function.
blm$ppcDensOverlay(n = 50)
n
Number of posterior draws to use for the overlay
ggplot2 visualization
tracePlot()
Plot MCMC trace for the eta and sigma parameters.
blm$tracePlot(...)
...
Additional arguments for Stan
A ggplot object.
posteriorProb()
Calculate posterior probability of effect exceeding a threshold
This function calculates the posterior probability of the effect being larger or smaller than a specified threshold.
blm$posteriorProb(threshold = 0, gt = TRUE)
threshold
A numeric value specifying the threshold.
gt
A logical value indicating whether to calculate the probability of the effect being greater than (TRUE) or less than (FALSE) the threshold.
This function uses the private$..eta_draws internal variable to obtain draws from the posterior distribution of the effect size. Based on the specified arguments, the function calculates the proportion of draws exceeding/falling below the threshold and returns a formatted statement describing the estimated probability.
Calculate point estimate of the effect
This R6 method calculates the point estimate of the effect size based on the posterior draws of the eta parameter.
A character string summarizing the estimated probability.
pointEstimate()
blm$pointEstimate(median = TRUE)
median
Logical value. If TRUE (default), the median of the eta draws is returned. If FALSE, the mean is returned.
This method uses the private$..eta_draws internal variable which contains MCMC draws of the eta parameter representing the effect size. Based on the specified median argument, the method calculates and returns either the median or the mean of the draws. Calculates credible interval for the effect of the intervention
This R6 method calculates and returns a formatted statement summarizing the credible interval of a specified width for the effect of the intervention.
A numeric value representing the point estimate.
credibleInterval()
blm$credibleInterval(width = 0.75, round = 0)
width
Numeric value between 0 and 1 representing the desired width of the credible interval (e.g., 0.95 for a 95% credible interval).
round
Integer value indicating the number of decimal places to round the lower and upper bounds of the credible interval.
This method uses the private$..eta_draws internal variable containing MCMC draws of the eta parameter representing the effect size. It calculates the credible interval, stores it internally, and returns a formatted statement summarizing the findings.
Calculate and format probability statement based on prior
This method calculates the probability that the effect is greater than or less than a given threshold based on the prior distribution of the effect. The results are formatted into a statement suitable for reporting.
A character string with the following information:
The probability associated with the specified width
The lower and upper bounds of the credible interval, rounded to the specified number of decimal places
priorProb()
blm$priorProb(threshold = 0, gt = TRUE)
threshold
Numerical threshold for comparison.
gt
Logical indicating whether to calculate probability greater than or less than the threshold.
A character string containing the formatted probability statement. Calculates the prior for the effedt of the intervention based on the hyperpriors.
This method computes and formats a statement about the probability interval of the effect based on the prior distribution.
priorInterval()
blm$priorInterval(width = 0.75, round = 0)
width
Desired probability width of the interval (default: 0.75).
round
Number of decimal places for rounding the bounds (default: 0).
This method calculates the lower and upper bounds of the interval based on the specified probability width and the prior distribution of the effect. It then formats the results into a clear and informative statement.
Note that the probability is checked to be within valid range (0-1) with consideration of machine precision using .Machine$double.eps.
A character string containing the formatted probability interval statement.
vizdraws()
Plots impact's prior and posterior distributions.
For more details see vizdraws::vizdraws()
.
blm$vizdraws(...)
...
other arguments passed to vizdraws.
An interactive plot of the prior and posterior distributions.
clone()
The objects of this class are cloneable with this method.
blm$clone(deep = FALSE)
deep
Whether to make a deep clone.
This function estimates the probability that a vector of posterior draws,
represented by the parameter eta
, falls within a specified range.
It provides flexibility to use either prior distributions or posterior draws,
and to specify one-sided or two-sided probability calculations.
calcProb(x, a = NULL, b = NULL, prior = FALSE, group_name = "group average")
calcProb(x, a = NULL, b = NULL, prior = FALSE, group_name = "group average")
x |
A numeric vector containing either posterior draws (default)
or prior samples of the treatment effect parameter ( |
a |
(Optional) The lower bound of the range (as a proportion, not percentage). |
b |
(Optional) The upper bound of the range (as a proportion, not percentage). |
prior |
A logical value indicating whether to use prior samples ( |
group_name |
A string describing the group for which the probability is being calculated (default: "group average"). |
This function checks the following cases:
If both a
and b
are NULL
, it returns an empty string.
If b
is less than or equal to a
, it throws an error.
The calculated probability and range are presented in a human-readable string
using the glue
package for formatting.
A formatted string stating the calculated probability and the specified range. The probability is the proportion of samples (either prior or posterior) that fall within the defined range.
This function calculates the difference-in-differences (DID) estimate of the average treatment effect, along with its standard error. It assumes a simple DID design with two groups (control and treatment) and two time periods (pre and post).
calculateDIDEffect( mean_pre_control, sd_pre_control, n_pre_control, mean_post_control, sd_post_control, n_post_control, mean_pre_treat, sd_pre_treat, n_pre_treat, mean_post_treat, sd_post_treat, n_post_treat )
calculateDIDEffect( mean_pre_control, sd_pre_control, n_pre_control, mean_post_control, sd_post_control, n_post_control, mean_pre_treat, sd_pre_treat, n_pre_treat, mean_post_treat, sd_post_treat, n_post_treat )
mean_pre_control |
Mean outcome for the control group in the pre-treatment period. |
sd_pre_control |
Standard deviation of the outcome for the control group in the pre-treatment period. |
n_pre_control |
Sample size for the control group in the pre-treatment period. |
mean_post_control |
Mean outcome for the control group in the post-treatment period. |
sd_post_control |
Standard deviation of the outcome for the control group in the post-treatment period. |
n_post_control |
Sample size for the control group in the post-treatment period. |
mean_pre_treat |
Mean outcome for the treatment group in the pre-treatment period. |
sd_pre_treat |
Standard deviation of the outcome for the treatment group in the pre-treatment period. |
n_pre_treat |
Sample size for the treatment group in the pre-treatment period. |
mean_post_treat |
Mean outcome for the treatment group in the post-treatment period. |
sd_post_treat |
Standard deviation of the outcome for the treatment group in the post-treatment period. |
n_post_treat |
Sample size for the treatment group in the post-treatment period. |
A list containing the following components:
did_estimate |
The DID estimate of the average treatment effect. |
se_did |
The standard error of the DID estimate. |
calculateDIDEffect( mean_pre_control = 10, sd_pre_control = 2, n_pre_control = 50, mean_post_control = 12, sd_post_control = 2.5, n_post_control = 50, mean_pre_treat = 11, sd_pre_treat = 2.1, n_pre_treat = 50, mean_post_treat = 15, sd_post_treat = 2.6, n_post_treat = 50 )
calculateDIDEffect( mean_pre_control = 10, sd_pre_control = 2, n_pre_control = 50, mean_post_control = 12, sd_post_control = 2.5, n_post_control = 50, mean_pre_treat = 11, sd_pre_treat = 2.1, n_pre_treat = 50, mean_post_treat = 15, sd_post_treat = 2.6, n_post_treat = 50 )
This function calculates effect sizes for each variable in a data frame, comparing treatment and control groups. It handles continuous, binary, and categorical variables using appropriate effect size measures.
calculateEffectSizes(data, treatment_column, to_check = NULL)
calculateEffectSizes(data, treatment_column, to_check = NULL)
data |
A data frame containing the variables and treatment/control indicator. |
treatment_column |
The name of the column (as a string) in the data frame that indicates whether an observation is in the treatment or control group. This column must be a factor with exactly two levels. |
to_check |
(optional) A character vector specifying the names of the variables for which effect sizes should be calculated. If NULL (default), all variables (except the treatment column) are processed. |
- For continuous variables, Hedges' g effect size is calculated. - For binary variables, Cox's Proportional Hazards Index (Cox's C) is calculated. - For categorical variables, the variable is converted into multiple indicator (dummy) variables, and the average Cox's C across these indicators is reported.
A data frame with two columns:
* Variable
: The name of each variable in the original data frame.
* EffectSize
: The calculated effect size for each variable.
Check Baseline Equivalency.
checkBaseline(data, variables, treatment)
checkBaseline(data, variables, treatment)
data |
dataframe with the pre-intervention variables, and the treatment indicator. |
variables |
vector of with the names of the pre-intervention variables. |
treatment |
name of the treatment indicator. |
tibble with the standardized difference of the pre-intervention variables. The tibble includes variables: the variable name, std_diff: the standardized difference for that variable as a number balance: the standardized difference for that variable as a factor variable. For more details about this methodology check https://ies.ed.gov/ncee/wwc/Docs/OnlineTraining/wwc_training_m3.pdf.
This function performs a series of data cleaning and preprocessing steps to ensure the data is suitable for analysis. This includes:
Missing data handling
Variable type checks
Collinearity and zero-variance feature removal
cleanData(data, y, treatment, x = NULL, binary = FALSE)
cleanData(data, y, treatment, x = NULL, binary = FALSE)
data |
A data.frame containing the data to be cleaned. |
y |
Name of the dependent variable (character). |
treatment |
Name of the treatment variable (character, should be logical). |
x |
Names of the covariates to include in the model (character vector, optional). |
binary |
Should the dependent variable be treated as binary? Default is FALSE |
A list containing the cleaned dataset and relevant metadata:
N
: The number of observations after cleaning.
K
The number of covariates after cleaning.
X
The cleaned covariate matrix.
treat_vec
: Treatment vector as integers (1 for TRUE, 0 for FALSE).
Y
: The dependent variable vector.
This function takes a dataframe as input and returns a tibble summarizing the number of missing values (NA) in each column and the number of rows with at least one missing value.
countMissing(df)
countMissing(df)
df |
A dataframe to analyze. |
A tibble with the following columns:
NA_<column_name>: Number of NAs in each original column
missing_rows: Number of rows with at least one NA across all columns
Calculates Cox's C, a standardized effect size measure for comparing hazard rates between two groups in survival analysis.
coxsIndex(p_t, p_c, n_t, n_c)
coxsIndex(p_t, p_c, n_t, n_c)
p_t |
Numeric value representing the proportion of events (e.g., failures, deaths) in the treatment group. |
p_c |
Numeric value representing the proportion of events in the control group. |
n_t |
Numeric value representing the sample size of the treatment group. |
n_c |
Numeric value representing the sample size of the control group. |
Cox's C is a useful effect size for survival analysis when hazard ratios are not constant over time. It's calculated based on the log odds ratio of events and includes a small sample size correction. The value 1.65 is used to approximate a conversion to a Cohen's d-like scale.
The calculated Cox's C effect size.
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187-202.
Converts a dataframe into a named list to provide data to a Stan model
createData( data, y, treatment, x = NULL, eta_mean = 0, eta_sd = 1, run_estimation = 1 )
createData( data, y, treatment, x = NULL, eta_mean = 0, eta_sd = 1, run_estimation = 1 )
data |
The data frame to be used |
y |
The name of the outcome variable in the data frame |
treatment |
The name of the treatment indicator variable in the data frame |
x |
A vector of names of all covariates to be used in the data frame |
eta_mean |
The prior mean to be used for estimating the treatment effect |
eta_sd |
The prior standard deviation to be used for estimating the treatment effect |
run_estimation |
Whether to run the estimation, or merely draw data from the priors |
Returns stan_data
a named list providing the data for the stan model
This function calculates a credible interval of the specified width from a vector of MCMC draws.
credibleInterval(draws, width)
credibleInterval(draws, width)
draws |
A numeric vector containing MCMC draws. |
width |
A numeric value between 0 and 1 specifying the desired width of the credible interval. |
The function calculates the lower and upper bounds of the credible interval using the quantile function based on the specified width.
A named list containing three elements:
width: The specified width of the credible interval.
lower_bound: The lower bound of the credible interval.
upper_bound: The upper bound of the credible interval.
# Generate example draws draws <- rnorm(1000) # Calculate 95% credible interval credibleInterval(draws, width = 0.95)
# Generate example draws draws <- rnorm(1000) # Calculate 95% credible interval credibleInterval(draws, width = 0.95)
Fits Stan model.
fit(stan_data, model = "blm", ...)
fit(stan_data, model = "blm", ...)
stan_data |
A named list providing the data for the Stan model. |
model |
A character string specifying the model type. Must be either "blm" (Bayesian linear model) or "bnb" (Bayesian negative binomial model). Defaults to "blm". |
... |
Additional options to be passed through to |
The complete StanFit object from the fitted Stan model.
Extracts parameter from Stan model.
getStanParameter(fit, par)
getStanParameter(fit, par)
fit |
A Stan fit model. |
par |
Name of the parameter you want to extract. |
A long tibble with draws for the pamete.
Calculates Hedges' g, a standardized effect size for comparing means. This version includes a small-sample correction factor (omega) and uses the pooled standard deviation.
hedgesG(n_t, n_c, y_t, y_c, s_t, s_c)
hedgesG(n_t, n_c, y_t, y_c, s_t, s_c)
n_t |
Numeric value representing the sample size of the treatment group. |
n_c |
Numeric value representing the sample size of the control group. |
y_t |
Numeric value representing the mean of the treatment group. |
y_c |
Numeric value representing the mean of the control group. |
s_t |
Numeric value representing the standard deviation of the treatment group. |
s_c |
Numeric value representing the standard deviation of the control group. |
Hedges' g is a variation of Cohen's d that adjusts for small-sample bias. It is calculated as the difference in means divided by the pooled standard deviation, then multiplied by a correction factor.
The calculated Hedges' g effect size.
Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6(2), 107-128.
A class for creating and managing Bayesian Hurdle Log-Normal Models
version
imt package version used to fit model
ATE_draws
Posterior draws for the Average Treatment Effect on the positive outcome
tau_prob_zero_draws
Posterior draws for the change in probability of zero outcome due to treatment
mcmChecks
MCMC diagnostics
credible_interval
Credible interval for the treatment effect
prior_ATE
Prior distribution for ATE
prior_tau_prob_zero
Prior distribution for tau_prob_zero
predict_list
List of predictions
new()
Get the package version
Get the posterior draws for ATE
Get the posterior draws for change in zero probability
Get the MCMC diagnostics
Get the credible interval
Get the prior for ATE
Get the prior for change in zero probability
Get the list of predictions
Create a new Bayesian Hurdle Log-Normal Model object.
hurdleLogNormal$new( data, y, x, treatment, mean_alpha_logit = -3, sd_alpha_logit = 2, mean_beta_logit = NULL, sd_beta_logit = NULL, tau_mean_logit = 0, tau_sd_logit = 0.5, mean_tau = 0, sigma_tau = 0.035, seed = 1982, fit = TRUE, ... )
data
Data frame to be used
y
Name of the outcome variable in the data frame
x
Vector of names of all covariates in the data frame
treatment
Name of the treatment indicator variable in the data frame
mean_alpha_logit
Prior mean for alpha in the logit part (default: -3)
sd_alpha_logit
Prior standard deviation for alpha in the logit part (default: 2)
mean_beta_logit
Prior mean for beta in the logit part (default: 0 vector)
sd_beta_logit
Prior standard deviation for beta in the logit part (default: 0.5 vector)
tau_mean_logit
Prior mean for the treatment effect in the logit part (default: 0)
tau_sd_logit
Prior standard deviation for the treatment effect in the logit part (default: 0.5)
mean_tau
Prior mean for the treatment effect in the log-normal part (default: 0)
sigma_tau
Prior standard deviation for the treatment effect in the log-normal part (default: 0.035)
seed
Seed for Stan fitting
fit
Flag for fitting the data to the model or not
...
Additional arguments for Stan
invisible
tracePlot()
Plot MCMC trace for the ATE and tau_prob_zero parameters
hurdleLogNormal$tracePlot(...)
...
Additional arguments for Stan
A ggplot object
calcProb()
Calculates the posterior probability of an effect being greater than, less than, or within a range defined by thresholds.
hurdleLogNormal$calcProb(effect_type, a = 0, b = NULL, prior = FALSE)
effect_type
The type of effect to calculate probability for ("ATE" or "tau_prob_zero")
a
Optional. Lower bound for the threshold.
b
Optional. Upper bound for the threshold.
prior
Logical. If TRUE, calculates probabilities based on the prior distribution. If FALSE (default), uses the posterior distribution.
A character string summarizing the estimated probability
pointEstimate()
Calculate point estimate of the effect
hurdleLogNormal$pointEstimate(effect_type, median = TRUE)
effect_type
The type of effect to calculate the point estimate for ("ATE" or "tau_prob_zero")
median
Logical value. If TRUE (default), the median of the draws is returned. If FALSE, the mean is returned.
A numeric value representing the point estimate.
credibleInterval()
Calculates credible interval for the effect of the intervention
hurdleLogNormal$credibleInterval(effect_type, width = 0.95, round = 2)
effect_type
The type of effect to calculate the credible interval for ("ATE" or "tau_prob_zero")
width
Numeric value between 0 and 1 representing the desired width of the credible interval (e.g., 0.95 for a 95% credible interval).
round
Integer value indicating the number of decimal places to round the lower and upper bounds of the credible interval.
A character string with the following information:
The probability associated with the specified width
The lower and upper bounds of the credible interval, rounded to the specified number of decimal places
plotPrior()
Plot prior distributions for ATE and tau_prob_zero
hurdleLogNormal$plotPrior(bins = 2000, xlim_ate = NULL, xlim_tau = NULL)
bins
Number of bins for the histograms
xlim_ate
Optional. Limits for the x-axis of the ATE histogram
xlim_tau
Optional. Limits for the x-axis of the tau_prob_zero histogram
A list containing two ggplot objects: one for ATE and one for tau_prob_zero
posteriorPredictiveCheck()
Performs Posterior Predictive Checks and generates a density overlay plot
hurdleLogNormal$posteriorPredictiveCheck(n = 50, xlim = NULL)
n
Number of posterior predictive samples to use in the plot
xlim
Optional. Limits for the x-axis of the plot
A ggplot object representing the density overlay plot
clone()
The objects of this class are cloneable with this method.
hurdleLogNormal$clone(deep = FALSE)
deep
Whether to make a deep clone.
A class for creating and managing Bayesian Logit Models
version
im package version used to fit model
tau_draws
Posterior draws for the treatment effect
mcmChecks
MCMC diagnostics
credible_interval
Credible interval for the treatment effect
prior_eta
Prior distribution for eta
prior_tau
Prior distribution for tau
prior_mean_y
Prior distribution for mean y
eta_draws
Posterior draws for eta
predict_list
List of predictions
new()
Get the package version
Get the posterior draws for tau
Get the MCMC diagnostics
Get the credible interval
Get the prior for eta
Get the prior for tau
Get the prior for mean y
Get the posterior draws for eta
Get the list of predictions
Create a new Bayesian Logit Model object.
logit$new( data, y, x, treatment, mean_alpha, sd_alpha, mean_beta, sd_beta, tau_mean, tau_sd, seed = 1982, fit = TRUE, ... )
data
Data frame to be used
y
Name of the outcome variable in the data frame
x
Vector of names of all covariates in the data frame
treatment
Name of the treatment indicator variable in the data frame
mean_alpha
Prior mean for alpha
sd_alpha
Prior standard deviation for alpha
mean_beta
Prior mean for beta
sd_beta
Prior standard deviation for beta
tau_mean
Prior mean for the treatment effect estimation
tau_sd
Prior standard deviation for the treatment effect estimation
seed
Seed for Stan fitting
fit
Flag for fitting the data to the model or not
...
Additional arguments for Stan
invisible
tracePlot()
Plot MCMC trace for the eta and sigma parameters.
logit$tracePlot(...)
...
Additional arguments for Stan
A ggplot object.
calcProb()
Calculates the posterior of an effect being greater than, less than, or within a range defined by thresholds.
logit$calcProb(a = 0, b = NULL, prior = FALSE)
a
Optional. Lower bound for the threshold.
b
Optional. Upper bound for the threshold.
prior
Logical. If TRUE, calculates probabilities based on the prior distribution. If FALSE (default), uses the posterior distribution.
x
A numeric vector containing draws from the posterior
A character string summarizing the estimated probability
Calculate point estimate of the effect
This R6 method calculates the point estimate of the effect size based on the posterior draws of the eta parameter.
pointEstimate()
logit$pointEstimate(median = TRUE)
median
Logical value. If TRUE (default), the median of the eta draws is returned. If FALSE, the mean is returned.
This method uses the private$..eta_draws internal variable which contains MCMC draws of the eta parameter representing the effect size. Based on the specified median argument, the method calculates and returns either the median or the mean of the draws. Calculates credible interval for the effect of the intervention
This R6 method calculates and returns a formatted statement summarizing the credible interval of a specified width for the effect of the intervention.
A numeric value representing the point estimate.
credibleInterval()
logit$credibleInterval(width = 0.75, round = 0)
width
Numeric value between 0 and 1 representing the desired width of the credible interval (e.g., 0.95 for a 95% credible interval).
round
Integer value indicating the number of decimal places to round the lower and upper bounds of the credible interval.
This method uses the private$..eta_draws internal variable containing MCMC draws of the eta parameter representing the effect size. It calculates the credible interval, stores it internally, and returns a formatted statement summarizing the findings.
A character string with the following information:
The probability associated with the specified width
The lower and upper bounds of the credible interval, rounded to the specified number of decimal places
vizdraws()
Plots impact's prior and posterior distributions.
logit$vizdraws(tau = FALSE, ...)
tau
Logical. If TRUE, plot tau instead of eta
...
other arguments passed to vizdraws.
An interactive plot of the prior and posterior distributions.
lollipop()
Plots lollipop chart for the prior and posterior of the impact being greater or less than a threshold.
For more details see vizdraws::lollipops()
.
logit$lollipop(threshold = 0, ...)
threshold
cutoff used to calculate the probability. Defaults to zero percent points
...
other arguments passed to vizdraws.
A lollipop chart with the prior and posterior probability of the impact being above or below a threshold.
plotPrior()
Plots draws from the prior distribution of the outcome, tau, and impact in percentage points.
logit$plotPrior()
predict()
Predict new data
logit$predict(new_data, name = NULL, M = NULL, ...)
new_data
Data frame to be predicted
name
Group name of the prediction
M
Number of posterior draws to sample from
...
Additional arguments
invisible(self)
getPred()
Get posterior predictive draws
logit$getPred(name = NULL, ...)
name
Group name of the prediction
...
Additional arguments (not used)
Matrix of posterior predictive draws
predSummary()
Get point estimate, credible interval and prob summary of predictive draws
logit$predSummary( name = NULL, subgroup = NULL, median = TRUE, width = 0.75, round = 0, a = NULL, b = NULL, ... )
name
Optional. Group name of the prediction
subgroup
Optional. A boolean vector to get summary on the conditional group average
median
Optional. Logical value for using median or mean
width
Optional. Numeric value for credible interval width
round
Optional. Integer value for rounding
a
Optional. Lower bound threshold
b
Optional. Upper bound threshold
...
Additional arguments
A character string with summary information
predCompare()
Compare the average of the posterior draws of two groups
logit$predCompare( name1, name2, subgroup1 = NULL, subgroup2 = NULL, median = TRUE, width = 0.75, round = 0, a = NULL, b = NULL, ... )
name1
Group name of the first prediction to be compared
name2
Group name of the second prediction to be compared
subgroup1
Optional. A boolean vector for the first group
subgroup2
Optional. A boolean vector for the second group
median
Optional. Logical value for using median or mean
width
Optional. Numeric value for credible interval width
round
Optional. Integer value for rounding
a
Optional. Lower bound threshold
b
Optional. Upper bound threshold
...
Additional arguments
A character string with comparison summary
clone()
The objects of this class are cloneable with this method.
logit$clone(deep = FALSE)
deep
Whether to make a deep clone.
This function takes prior / posterior draws of the Bayesian logit model, applies theinverse logit (logistic) transformation to obtain probabilities, and then generates random samples from binomial distributions.
logitRng(alpha, tau, beta, treat, X, N)
logitRng(alpha, tau, beta, treat, X, N)
alpha |
Numeric. A draw of alpha param from the logit model |
tau |
Numeric. A draw of tau param from the logit model |
beta |
Vector. A draw of beta params from the logit model |
treat |
A 0 / 1 vector of treatment indicator |
X |
Data to be predicted |
N |
Numeric. Size of the sample, should depend on size of the data. |
A vector of N samples of preditive draws
Checks convergence, mixing, effective sample size, and divergent transitions
$new(fit, pars)
Runs diagnostics on the supplied stanfit
object, restricted to parameters identified by the character vector
pars
.
Tests include:
Share of specified parameters with an Rhat less than 1.1. If any
have an Rhat > 1.1, everything_looks_fine
is set to FALSE
.
Share of specified parameters with an n_eff at least 0.1% of the
total number of posterior draws. If any have n_eff < 0.001 * N,
everything_looks_fine
is set to FALSE
.
Share of specified parameters with an n_eff of at least 100. If any
have n_eff < 100, everything_looks_fine
is set to FALSE
.
Number of divergent transitions during posterior sampling. If there
are any whatsoever, everything_looks_fine
is set to
FALSE
.
Share of posterior iterations where the sampler reached the
maximum treedepth. If more than 25\
everything_looks_fine
is set to FALSE
.
everything_looks_fine
logical indicating whether all MCMC tests passed.
diagnostics
list of the outcome of each MCMC test
warnings
list of the warning messages from failed MCMC tests
new()
Initialize a new mcmcChecks object and run diagnostics
mcmcChecks$new(fit, pars)
fit
A stanfit object to check
pars
A character vector of parameter names to check
clone()
The objects of this class are cloneable with this method.
mcmcChecks$clone(deep = FALSE)
deep
Whether to make a deep clone.
Create a Meta-Analysis Object Using Data From Previous Studies
Create a Meta-Analysis Object Using Data From Previous Studies
A meta analysis has raw data and draws from the lift's posterior distribution. This is represented by an R6 Class.
PosteriorATE
Draws from the posterior distribution of the average treatment effect.
checks
MCMC diagnostics
CredibleInterval
Lower and upper bounds of the credible interval
PointEstimate
Point estimate of the average treatment effect
fitted
Stan fit object
new()
Create a new meta analysis object.
metaAnalysis$new( data, point_estimates, standard_errors, id, mean_mu = 0, sd_mu = 0.05, ci_width = 0.75, X = NULL, run_estimation = 1, ... )
data
Data frame with data point estimates and standard errors from studies.
point_estimates
Name of the variable in the data frame that contains the point estimates.
standard_errors
Name of the variable in the data frame that contains the standard errors of the point estimates.
id
Name of the variable in the data frame that contains the id of the studies.
mean_mu
Prior mean for the true lift in the population.
sd_mu
Prior mean for the standard deviation of the true lift in the population.
ci_width
Credible interval's width.
X
Covariates matrix.
run_estimation
Integer flag to control whether estimation is run (1) or not (0).
...
other arguments passed to rstan::sampling()
A new meta_analysis
object.
PlotRawData()
Plots the raw data.
metaAnalysis$PlotRawData()
A plot with point estimates and 95% confidence intervals.
PlotLift()
Plots lift's prior and posterior distributions.
For more details see vizdraws::vizdraws()
.
metaAnalysis$PlotLift(...)
...
other arguments passed to vizdraws.
An interactive plot of the prior and posterior distributions.
UpdateCI()
Update the width of the credible interval.
metaAnalysis$UpdateCI(ci_width)
ci_width
New width for the credible interval. This number in the (0,1) interval.
probability()
Calculates that probability that lift is between a and b.
metaAnalysis$probability(a = -Inf, b = Inf, percent = TRUE)
a
Lower bound. By default -Inf.
b
Upper bound. By default Inf.
percent
A logical that indicates that a and b should be converted to percentage.
A string with the probability.
findings()
Calculates the point estimate a credible interval for the meta analysis.
metaAnalysis$findings(percent = TRUE)
percent
A logical that indicates that the point estimate should be converted to percent.
A string with the findings
clone()
The objects of this class are cloneable with this method.
metaAnalysis$clone(deep = FALSE)
deep
Whether to make a deep clone.
Bayesian Negative Binomial Model Factory
Bayesian Negative Binomial Model Factory
version
Package version used to fit the model
mcmChecks
MCMC diagnostics
credible_interval
Credible interval for the treatment effect
tau_draws
Posterior draws for the treatment effect
new()
Create a new Bayesian Negative Binomial Model object.
negativeBinomial$new( data, y, x, treatment, tau_mean, tau_sd, run_estimation = 1, seed = 1982, ... )
data
Data frame to be used
y
Name of the outcome variable in the data frame
x
Vector of names of all covariates in the data frame
treatment
Name of the treatment indicator variable in the data frame
tau_mean
Prior mean for the treatment effect estimation
tau_sd
Prior standard deviation for the treatment effect estimation
run_estimation
Integer flag to control whether estimation is run (1) or not (0)
seed
Seed for Stan fitting
...
Additional arguments for Stan
invisible
tracePlot()
Plot MCMC trace for the eta and sigma parameters.
negativeBinomial$tracePlot(...)
...
Additional arguments for Stan
A ggplot object.
posteriorProb()
Calculate posterior probability of effect exceeding a threshold
This function calculates the posterior probability of the effect being larger or smaller than a specified threshold.
negativeBinomial$posteriorProb(threshold = 0, gt = TRUE)
threshold
A numeric value specifying the threshold.
gt
A logical value indicating whether to calculate the probability of the effect being greater than (TRUE) or less than (FALSE) the threshold.
This function uses the private$..tau_draws internal variable to obtain draws from the posterior distribution of the effect size. Based on the specified arguments, the function calculates the proportion of draws exceeding/falling below the threshold and returns a formatted statement describing the estimated probability.
Calculate point estimate of the effect
This R6 method calculates the point estimate of the effect size based on the posterior draws of the tau parameter.
A character string summarizing the estimated probability.
pointEstimate()
negativeBinomial$pointEstimate(median = TRUE)
median
Logical value. If TRUE (default), the median of the tau draws is returned. If FALSE, the mean is returned.
This method uses the private$..tau_draws internal variable which contains MCMC draws of the tau parameter representing the effect size. Based on the specified median argument, the method calculates and returns either the median or the mean of the draws. Calculates credible interval for the effect of the intervention
This R6 method calculates and returns a formatted statement summarizing the credible interval of a specified width for the effect of the intervention.
A numeric value representing the point estimate.
credibleInterval()
negativeBinomial$credibleInterval(width = 0.75, round = 0)
width
Numeric value between 0 and 1 representing the desired width of the credible interval (e.g., 0.95 for a 95% credible interval).
round
Integer value indicating the number of decimal places to round the lower and upper bounds of the credible interval.
This method uses the private$..tau_draws internal variable containing MCMC draws of the tau parameter representing the effect size. It calculates the credible interval, stores it internally, and returns a formatted statement summarizing the findings.
A character string with the following information:
The probability associated with the specified width
The lower and upper bounds of the credible interval, rounded to the specified number of decimal places
vizdraws()
Plots impact's prior and posterior distributions.
For more details see vizdraws::vizdraws()
.
negativeBinomial$vizdraws(...)
...
other arguments passed to vizdraws.
An interactive plot of the prior and posterior distributions.
clone()
The objects of this class are cloneable with this method.
negativeBinomial$clone(deep = FALSE)
deep
Whether to make a deep clone.
This function computes a point estimate from a numeric vector, returning either the median or the mean as a percentage.
pointEstimate(x, median = TRUE)
pointEstimate(x, median = TRUE)
x |
A numeric vector containing the data from which to calculate the point estimate. |
median |
A logical value indicating whether to use the median (default: |
This function provides a simple way to obtain either the median or mean
of a numeric vector as a percentage. The choice between these two measures
of central tendency can be controlled by the median
argument.
A numeric value representing the chosen point estimate (median or mean)
of the input vector x
, multiplied by 100 to express it as a percentage.
This function repeatedly randomizes treatment
assignment (using .randomizer
) until
baseline equivalency is achieved across specified
variables, as measured by the checkBaseline
function from
the im
package. It can optionally stratify the randomization by
specified groups.
randomize( data, variables, standard = "Not Concerned", seed = NULL, max_attempts = 100, pr_treated = 0.5, group_by = NULL )
randomize( data, variables, standard = "Not Concerned", seed = NULL, max_attempts = 100, pr_treated = 0.5, group_by = NULL )
data |
The input data frame containing pre-intervention variables. |
variables |
A vector of the names of the pre-intervention variables to check for baseline equivalency. |
standard |
The desired level of baseline equivalence. Must be one of "Not Concerned", "Concerned", or "Very Concerned". Default is "Not Concerned". ("Not Concerned", "Concerned", or "Very Concerned"). Must be one of "Not Concerned", "Concerned", or "Very Concerned". |
seed |
(Optional) An integer to set the random seed for reproducibility of the initial randomization attempt. Subsequent attempts will use new random seeds. |
max_attempts |
The maximum number of randomization attempts to make before stopping and returning an error. |
pr_treated |
(Optional) The probability of a row being assigned to the treatment group (TRUE). Default is 0.5. |
group_by |
(Optional) A character vector of column names to stratify the randomization. If provided, the randomization will be done within each group defined by the specified columns. |
A new data frame with the added "treated" column, if baseline equivalency is achieved within the specified number of attempts. Otherwise, an error is thrown.
This class provides methods to randomly assign treatments to a dataset while ensuring baseline covariate balance. It can handle both simple and stratified randomization.
version
The version of the im
package used for randomization.
data
The data frame with the assigned treatment.
seed
The random seed used for reproducibility.
balance_summary
A summary (or list of summaries) of the balance assessment after randomization.
balance_plot
A plot (or list of plots) of the balance assessment after randomization.
new()
Initialize a new Randomizer object.
randomizer$new( data, variables, standard = "Not Concerned", seed = NULL, max_attempts = 100, group_by = NULL )
data
The input data frame.
variables
A vector of covariate names to check for balance.
standard
The desired level of baseline equivalence. Must be one of "Not Concerned", "Concerned", or "Very Concerned". Default is "Not Concerned". ("Not Concerned", "Concerned", or "Very Concerned").
seed
(Optional) An integer to set the random seed.
max_attempts
(Optional) Maximum number of randomization attempts.
group_by
(Optional) A character vector of column names to stratify randomization.
A new randomizer
object.
clone()
The objects of this class are cloneable with this method.
randomizer$clone(deep = FALSE)
deep
Whether to make a deep clone.
This function checks if a given vector is a logical vector (TRUE
/FALSE
)
and whether its length matches the number of rows in a specified matrix.
It is designed to validate subgroup vectors used for subsettin data.
validate_logical_vector(subgroup, N, name = NULL)
validate_logical_vector(subgroup, N, name = NULL)
subgroup |
A logical vector representing the subgroup to be validated. |
N |
Length the subgroup should have. |
name |
(Optional) A string indicating the name of group. |
This function performs two key validations:
Checks if the subgroup
vector is logical.
Checks if the length of the subgroup
vector matches the N.
The original subgroup
vector if it passes all validation checks.