Create a Model from a formula and dataframe.
Parameters: | formula : str or generic Formula object
data : array-like
re_formula : string
vc_formula : dict-like
subset : array-like
args : extra arguments
kwargs : extra keyword arguments
|
---|---|
Returns: | model : Model instance |
Notes
data must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame.
If the variance component is intended to produce random intercepts for disjoint subsets of a group, specified by string labels or a categorical data value, always use ‘0 +’ in the formula so that no overall intercept is included.
If the variance components specify random slopes and you do not also want a random group-level intercept in the model, then use ‘0 +’ in the formula to exclude the intercept.
The variance components formulas are processed separately for each group. If a variable is categorical the results will not be affected by whether the group labels are distinct or re-used over the top-level groups.
This method currently does not correctly handle missing values, so missing values should be explicitly dropped from the DataFrame before calling this method.
Examples
Suppose we have an educational data set with students nested in classrooms nested in schools. The students take a test, and we want to relate the test scores to the students’ ages, while accounting for the effects of classrooms and schools. The school will be the top-level group, and the classroom is a nested group that is specified as a variance component. Note that the schools may have different number of classrooms, and the classroom labels may (but need not be) different across the schools.
>>> vc = {'classroom': '0 + C(classroom)'}
>>> MixedLM.from_formula('test_score ~ age', vc_formula=vc,
re_formula='1', groups='school', data=data)
Now suppose we also have a previous test score called ‘pretest’. If we want the relationship between pretest scores and the current test to vary by classroom, we can specify a random slope for the pretest score
>>> vc = {'classroom': '0 + C(classroom)', 'pretest': '0 + pretest'}
>>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc,
re_formula='1', groups='school', data=data)
The following model is almost equivalent to the previous one, but here the classroom random intercept and pretest slope may be correlated.
>>> vc = {'classroom': '0 + C(classroom)'}
>>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc,
re_formula='1 + pretest', groups='school',
data=data)