diffxpy.api.test.wald

diffxpy.api.test.wald(data: Union[anndata._core.anndata.AnnData, anndata._core.raw.Raw, numpy.ndarray, scipy.sparse.csr.csr_matrix, batchglm.models.base.input.InputDataBase], factor_loc_totest: Optional[Union[str, List[str]]] = None, coef_to_test: Optional[Union[str, List[str]]] = None, formula_loc: Union[None, str] = None, formula_scale: Union[None, str] = '~1', as_numeric: Union[List[str], Tuple[str], str] = (), init_a: Union[numpy.ndarray, str] = 'AUTO', init_b: Union[numpy.ndarray, str] = 'AUTO', gene_names: Optional[Union[numpy.ndarray, list]] = None, sample_description: Union[None, pandas.core.frame.DataFrame] = None, dmat_loc: Optional[patsy.design_info.DesignMatrix] = None, dmat_scale: Optional[patsy.design_info.DesignMatrix] = None, constraints_loc: Union[None, List[str], Tuple[str, str], dict, numpy.ndarray] = None, constraints_scale: Union[None, List[str], Tuple[str, str], dict, numpy.ndarray] = None, noise_model: str = 'nb', size_factors: Optional[Union[numpy.ndarray, pandas.core.series.Series, str]] = None, batch_size: Union[None, int, Tuple[int, int]] = None, backend: str = 'numpy', train_args: dict = {}, training_strategy: Union[str, List[Dict[str, object]], Callable] = 'AUTO', quick_scale: bool = False, dtype='float64', **kwargs)

Perform Wald test for differential expression for each gene.

Parameters
  • data – Input data matrix (observations x features) or (cells x genes).

  • factor_loc_totest – str, list of strings List of factors of formula to test with Wald test. E.g. “condition” or [“batch”, “condition”] if formula_loc would be “~ 1 + batch + condition”

  • coef_to_test – If there are more than two groups specified by factor_loc_totest, this parameter allows to specify the group which should be tested. Alternatively, if factor_loc_totest is not given, this list sets the exact coefficients which are to be tested.

  • formula_loc – formula model formula for location and scale parameter models.

  • formula_scale – formula model formula for scale parameter model.

  • as_numeric – Which columns of sample_description to treat as numeric and not as categorical. This yields columns in the design matrix which do not correspond to one-hot encoded discrete factors. This makes sense for number of genes, time, pseudotime or space for example.

  • init_a

    (Optional) Low-level initial values for a. Can be:

    • str:
      • ”auto”: automatically choose best initialization

      • ”standard”: initialize intercept with observed mean

      • ”closed_form”: try to initialize with closed form

    • np.ndarray: direct initialization of ‘a’

  • init_b

    (Optional) Low-level initial values for b Can be:

    • str:
      • ”auto”: automatically choose best initialization

      • ”standard”: initialize with zeros

      • ”closed_form”: try to initialize with closed form

    • np.ndarray: direct initialization of ‘b’

  • gene_names – optional list/array of gene names which will be used if data does not implicitly store these

  • sample_description – optional pandas.DataFrame containing sample annotations

  • dmat_loc – Pre-built location model design matrix. This over-rides formula_loc and sample description information given in data or sample_description.

  • dmat_scale – Pre-built scale model design matrix. This over-rides formula_scale and sample description information given in data or sample_description.

  • constraints_loc

    Constraints for location model. Can be one of the following:

    • np.ndarray:

      Array with constraints in rows and model parameters in columns. Each constraint contains non-zero entries for the a of parameters that has to sum to zero. This constraint is enforced by binding one parameter to the negative sum of the other parameters, effectively representing that parameter as a function of the other parameters. This dependent parameter is indicated by a -1 in this array, the independent parameters of that constraint (which may be dependent at an earlier constraint) are indicated by a 1. You should only use this option together with prebuilt design matrix for the location model, dmat_loc, for example via de.utils.setup_constrained().

    • dict:

      Every element of the dictionary corresponds to one set of equality constraints. Each set has to be be an entry of the form {…, x: y, …} where x is the factor to be constrained and y is a factor by which levels of x are grouped and then constrained. Set y=”1” to constrain all levels of x to sum to one, a single equality constraint.

      E.g.: {“batch”: “condition”} Batch levels within each condition are constrained to sum to

      zero. This is applicable if repeats of a an experiment within each condition are independent so that the set-up ~1+condition+batch is perfectly confounded.

      Can only group by non-constrained effects right now, use constraint_matrix_from_string for other cases.

    • list of strings or tuple of strings:

      String encoded equality constraints.

      E.g. [“batch1 + batch2 + batch3 = 0”]

    • None:

      No constraints are used, this is equivalent to using an identity matrix as a constraint matrix.

  • constraints_scale

    Constraints for scale model. Can be one of the following:

    • np.ndarray:

      Array with constraints in rows and model parameters in columns. Each constraint contains non-zero entries for the a of parameters that has to sum to zero. This constraint is enforced by binding one parameter to the negative sum of the other parameters, effectively representing that parameter as a function of the other parameters. This dependent parameter is indicated by a -1 in this array, the independent parameters of that constraint (which may be dependent at an earlier constraint) are indicated by a 1. You should only use this option together with prebuilt design matrix for the scale model, dmat_scale, for example via de.utils.setup_constrained().

    • dict:

      Every element of the dictionary corresponds to one set of equality constraints. Each set has to be be an entry of the form {…, x: y, …} where x is the factor to be constrained and y is a factor by which levels of x are grouped and then constrained. Set y=”1” to constrain all levels of x to sum to one, a single equality constraint.

      E.g.: {“batch”: “condition”} Batch levels within each condition are constrained to sum to

      zero. This is applicable if repeats of a an experiment within each condition are independent so that the set-up ~1+condition+batch is perfectly confounded.

      Can only group by non-constrained effects right now, use constraint_matrix_from_string for other cases.

    • list of strings or tuple of strings:

      String encoded equality constraints.

      E.g. [“batch1 + batch2 + batch3 = 0”]

    • None:

      No constraints are used, this is equivalent to using an identity matrix as a constraint matrix.

  • size_factors – 1D array of transformed library size factors for each cell in the same order as in data or string-type column identifier of size-factor containing column in sample description.

  • noise_model

    str, noise model to use in model-based unit_test. Possible options:

    • ’nb’: default

  • batch_size

    Argument controlling the memory load of the fitting procedure. For backends that allow chunking of operations, this parameter controls the size of the batch / chunk.

    • If backend is “tf1” or “tf2”: number of observations per batch

    • If backend is “numpy”: Tuple of (number of observations per chunk, number of genes per chunk)

  • backend

    Which linear algebra library to chose. This impact the available noise models and optimizers / training strategies. Available are:

    • ”numpy” numpy

    • ”tf1” tensorflow1.* >= 1.13

    • ”tf2” tensorflow2.*

  • training_strategy

    {str, function, list} training strategy to use. Can be:

    • str: will use Estimator.TrainingStrategy[training_strategy] to train

    • function: Can be used to implement custom training function will be called as training_strategy(estimator).

    • list of keyword dicts containing method arguments: Will call Estimator.train() once with each dict of method arguments.

  • quick_scale

    Depending on the optimizer, scale will be fitted faster and maybe less accurate.

    Useful in scenarios where fitting the exact scale is not absolutely necessary.

  • dtype

    Allows specifying the precision which should be used to fit data.

    Should be “float32” for single precision or “float64” for double precision.

  • kwargs – [Debugging] Additional arguments will be passed to the _fit method.