diffxpy.api.fit.model

diffxpy.api.fit.model(data: Union[anndata._core.anndata.AnnData, anndata._core.raw.Raw, numpy.ndarray, scipy.sparse.csr.csr_matrix, batchglm.models.base.input.InputDataBase], formula_loc: Union[None, str] = None, formula_scale: Union[None, str] = '~1', as_numeric: Union[List[str], Tuple[str], str] = (), init_a: Union[numpy.ndarray, str] = 'AUTO', init_b: Union[numpy.ndarray, str] = 'AUTO', gene_names: Optional[Union[numpy.ndarray, list]] = None, sample_description: Union[None, pandas.core.frame.DataFrame] = None, dmat_loc: Optional[patsy.design_info.DesignMatrix] = None, dmat_scale: Optional[patsy.design_info.DesignMatrix] = None, constraints_loc: Union[None, List[str], Tuple[str, str], dict, numpy.ndarray] = None, constraints_scale: Union[None, List[str], Tuple[str, str], dict, numpy.ndarray] = None, noise_model: str = 'nb', size_factors: Optional[Union[numpy.ndarray, pandas.core.series.Series, str]] = None, batch_size: Optional[int] = None, training_strategy: Union[str, List[Dict[str, object]], Callable] = 'AUTO', quick_scale: bool = False, dtype='float64', **kwargs)

Fit model via maximum likelihood for each gene.

Parameters
  • data – Input data matrix (observations x features) or (cells x genes).

  • formula_loc – formula model formula for location and scale parameter models. If not specified, formula will be used instead.

  • formula_scale – formula model formula for scale parameter model. If not specified, formula will be used instead.

  • as_numeric – Which columns of sample_description to treat as numeric and not as categorical. This yields columns in the design matrix which do not correspond to one-hot encoded discrete factors. This makes sense for number of genes, time, pseudotime or space for example.

  • init_a

    (Optional) Low-level initial values for a. Can be:

    • str:
      • ”auto”: automatically choose best initialization

      • ”standard”: initialize intercept with observed mean

      • ”closed_form”: try to initialize with closed form

    • np.ndarray: direct initialization of ‘a’

  • init_b

    (Optional) Low-level initial values for b Can be:

    • str:
      • ”auto”: automatically choose best initialization

      • ”standard”: initialize with zeros

      • ”closed_form”: try to initialize with closed form

    • np.ndarray: direct initialization of ‘b’

  • gene_names – optional list/array of gene names which will be used if data does not implicitly store these

  • sample_description – optional pandas.DataFrame containing sample annotations

  • dmat_loc – Pre-built location model design matrix. This over-rides formula_loc and sample description information given in data or sample_description.

  • dmat_scale – Pre-built scale model design matrix. This over-rides formula_scale and sample description information given in data or sample_description.

  • constraints_loc

    Constraints for location model. Can be one of the following:

    • np.ndarray:

      Array with constraints in rows and model parameters in columns. Each constraint contains non-zero entries for the a of parameters that has to sum to zero. This constraint is enforced by binding one parameter to the negative sum of the other parameters, effectively representing that parameter as a function of the other parameters. This dependent parameter is indicated by a -1 in this array, the independent parameters of that constraint (which may be dependent at an earlier constraint) are indicated by a 1. You should only use this option together with prebuilt design matrix for the location model, dmat_loc, for example via de.utils.setup_constrained().

    • dict:

      Every element of the dictionary corresponds to one set of equality constraints. Each set has to be be an entry of the form {…, x: y, …} where x is the factor to be constrained and y is a factor by which levels of x are grouped and then constrained. Set y=”1” to constrain all levels of x to sum to one, a single equality constraint.

      E.g.: {“batch”: “condition”} Batch levels within each condition are constrained to sum to

      zero. This is applicable if repeats of a an experiment within each condition are independent so that the set-up ~1+condition+batch is perfectly confounded.

      Can only group by non-constrained effects right now, use constraint_matrix_from_string for other cases.

    • list of strings or tuple of strings:

      String encoded equality constraints.

      E.g. [“batch1 + batch2 + batch3 = 0”]

    • None:

      No constraints are used, this is equivalent to using an identity matrix as a constraint matrix.

  • constraints_scale

    Constraints for scale model. Can be one of the following:

    • np.ndarray:

      Array with constraints in rows and model parameters in columns. Each constraint contains non-zero entries for the a of parameters that has to sum to zero. This constraint is enforced by binding one parameter to the negative sum of the other parameters, effectively representing that parameter as a function of the other parameters. This dependent parameter is indicated by a -1 in this array, the independent parameters of that constraint (which may be dependent at an earlier constraint) are indicated by a 1. You should only use this option together with prebuilt design matrix for the scale model, dmat_scale, for example via de.utils.setup_constrained().

    • dict:

      Every element of the dictionary corresponds to one set of equality constraints. Each set has to be be an entry of the form {…, x: y, …} where x is the factor to be constrained and y is a factor by which levels of x are grouped and then constrained. Set y=”1” to constrain all levels of x to sum to one, a single equality constraint.

      E.g.: {“batch”: “condition”} Batch levels within each condition are constrained to sum to

      zero. This is applicable if repeats of a an experiment within each condition are independent so that the set-up ~1+condition+batch is perfectly confounded.

      Can only group by non-constrained effects right now, use constraint_matrix_from_string for other cases.

    • list of strings or tuple of strings:

      String encoded equality constraints.

      E.g. [“batch1 + batch2 + batch3 = 0”]

    • None:

      No constraints are used, this is equivalent to using an identity matrix as a constraint matrix.

  • size_factors – 1D array of transformed library size factors for each cell in the same order as in data or string-type column identifier of size-factor containing column in sample description.

  • noise_model

    str, noise model to use in model-based unit_test. Possible options:

    • ’nb’: default

  • batch_size – The batch size to use for the estimator.

  • training_strategy

    {str, function, list} training strategy to use. Can be:

    • str: will use Estimator.TrainingStrategy[training_strategy] to train

    • function: Can be used to implement custom training function will be called as training_strategy(estimator).

    • list of keyword dicts containing method arguments: Will call Estimator.train() once with each dict of method arguments.

  • quick_scale

    Depending on the optimizer, scale will be fitted faster and maybe less accurate.

    Useful in scenarios where fitting the exact scale is not absolutely necessary.

  • dtype

    Allows specifying the precision which should be used to fit data.

    Should be “float32” for single precision or “float64” for double precision.

  • kwargs – [Debugging] Additional arguments will be passed to the _fit method.

Returns

An estimator instance that contains all estimation relevant attributes and the model in estim.model. The attributes of the model depend on the noise model and the covariates used. We provide documentation for the model class in the model section of the documentation.