--- title: "Ensuring Model Feasibility" author: "Michael Robbins and Steven Davenport" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Ensuring Model Feasibility} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Using microSynth is easy when the models used to calculate weights are feasible. But as more variables are used for matching, especially when data is scarce or variables are sparse, the risk of an infeasible model increases. Below is a quick guide to how to troubleshoot model feasiblity issues. ## Causes of model infeasibility Model infeasibility becomes increasingly likely when: * There are few control observations * More variables used for matching * Matching variables are sparse (e.g., mostly zero) * Treatment units have extreme values for matching variables * Permutation weights are calculated in addition to main weights * Jackknife weights are calculated in addition to main weights ## Responses to an infeasible model As there are multiple causes of model infeasibility, there is an equally broad range of responses. ### Specification of matching variables If a model is found to be infeasible, the problem may trace back to matching variable specification. We recommend the following diagnostic steps: * Review the frequency of matching variables (e.g., with `hist()` or `table()`) to check for sparseness. Sparse variables are difficult to match on without large sample sizes. * Compare the distribution of variable values in treatment units to the un-treated units. * Attempt to reduce the number of matching variables, move variables from exact matches (`match.out`/`match.covar`) to best-possible matches (`match.out.min`/`match.covar.min`), or aggregate time-variant variables before matching. When attempts to match on a sparse variable cause model infeasibility, there are several solutions: * Do not attempt an exact match. If the variable is time-invariant, move it from `match.covar` to `match.covar.min`; if the variable is time-variant, move it from `match.out` to `match.out.min`. * If the variable is time-variant, aggregate the variable over multiple time periods before matching. If just one or several variables that appear to be sparse or for which the treatment contains values that are rare in the un-treated units, then the user can issue instructions for each of those variables to be aggregated over different time periods. (Those time periods do not have to be at regular intervals, for instance if the sparseness only occurs at certain points in the pre-intervention data.) Exercise 4 from the \link{Introduction} provides an example of this. If the user would like to aggregate all time-variant variables over the same regular time periods, then it is somewhat simple to pass `match.out` or `match.out.min` a vector of variable names, and specify the aggregation periods using `period`. ### Parameters for calculating weights If varying the specification of matching variables is not satisfactory, the user can set the parameters microSynth() uses for the calculation of weights. * `max.mse` may be raised. This relaxes the constraint governing matches for variables passed to `match.out` and `match.covar`. * Advanced users may wish to alter `maxit`, `cal.epsilon`, `calfun`, and `bounds`, which correspond to parameters from the `survey::calibrate()` and govern the calculation of weights. ### Calling on (computationally-intensive) back-up models By default microSynth() attempts to calculate weights using simple methods. But because these are not always sufficient to produce a feasible model, two arguments, `check.feas` and `use.backup`, specify how microsynth should find and use less restrictive backup models. The two arguments do not interact and can be set independently. `check.feas = TRUE` will search for a single model that yields satisfactory constraints for all purposes: estimating main weights, permutation weights, and jackknife residuals. The same model will be used for all purposes. `use.backup = TRUE` will calculate the main weights without checking for feasibility, but if weights appear to be poor (i.e., they do not satisfy the max.mse condition), then weights will be re-calculated using another model. This way, different backup models may be used for different purposes (i.e., for estimating main weights, permutation weights, and jackknife residuals).