diffxpy.api.test.two_sample

diffxpy.api.test.two_sample(data: Union[anndata._core.anndata.AnnData, anndata._core.raw.Raw, numpy.ndarray, scipy.sparse.csr.csr_matrix, batchglm.models.base.input.InputDataBase], grouping: Union[str, numpy.ndarray, list], as_numeric: Union[List[str], Tuple[str], str] = (), test: str = 't-test', gene_names: Union[numpy.ndarray, list] = None, sample_description: pandas.core.frame.DataFrame = None, noise_model: str = None, size_factors: numpy.ndarray = None, batch_size: Union[None, int, Tuple[int, int]] = None, backend: str = 'numpy', train_args: dict = {}, training_strategy: Union[str, List[Dict[str, object]], Callable] = 'AUTO', is_sig_zerovar: bool = True, quick_scale: bool = None, dtype='float64', **kwargs) → diffxpy.testing.det._DifferentialExpressionTestSingle

Perform differential expression test between two groups on adata object for each gene.

This function wraps the selected statistical test for the scenario of a two sample comparison. All unit_test offered in this wrapper test for the difference of the mean parameter of both samples. The exact unit_test are as follows (assuming the group labels are saved in a column named “group”):

  • “lrt” - (log-likelihood ratio test):

    Requires the fitting of 2 generalized linear models (full and reduced). The models are automatically assembled as follows, use the de.test.lrt() function if you would like to perform a different test.

    • full model location parameter: ~ 1 + group
    • full model scale parameter: ~ 1 + group
    • reduced model location parameter: ~ 1
    • reduced model scale parameter: ~ 1 + group
  • “wald” - Wald test:

    Requires the fitting of 1 generalized linear models. model location parameter: ~ 1 + group model scale parameter: ~ 1 + group Test the group coefficient of the location parameter model against 0.

  • “t-test” - Welch’s t-test:

    Doesn’t require fitting of generalized linear models. Welch’s t-test between both observation groups.

  • “rank” - Wilcoxon rank sum (Mann-Whitney U) test:

    Doesn’t require fitting of generalized linear models. Wilcoxon rank sum (Mann-Whitney U) test between both observation groups.

Parameters:
  • data – Array-like, or anndata.Anndata object containing observations. Input data matrix (observations x features) or (cells x genes).
  • grouping

    str, array

    • column in data.obs/sample_description which contains the split of observations into the two groups.
    • array of length num_observations containing group labels
  • as_numeric – Which columns of sample_description to treat as numeric and not as categorical. This yields columns in the design matrix which do not correpond to one-hot encoded discrete factors. This makes sense for number of genes, time, pseudotime or space for example.
  • test

    str, statistical test to use. Possible options:

    • ’wald’: default
    • ’lrt’
    • ’t-test’
    • ’rank’
  • gene_names – optional list/array of gene names which will be used if data does not implicitly store these
  • sample_description – optional pandas.DataFrame containing sample annotations
  • size_factors – 1D array of transformed library size factors for each cell in the same order as in data
  • noise_model

    str, noise model to use in model-based unit_test. Possible options:

    • ’nb’: default
  • batch_size

    Argument controlling the memory load of the fitting procedure. For backends that allow chunking of operations, this parameter controls the size of the batch / chunk.

    • If backend is “tf1” or “tf2”: number of observations per batch
    • If backend is “numpy”: Tuple of (number of observations per chunk, number of genes per chunk)
  • backend

    Which linear algebra library to chose. This impact the available noise models and optimizers / training strategies. Available are:

    • ”numpy” numpy
    • ”tf1” tensorflow1.* >= 1.13
    • ”tf2” tensorflow2.*
  • training_strategy

    {str, function, list} training strategy to use. Can be:

    • str: will use Estimator.TrainingStrategy[training_strategy] to train
    • function: Can be used to implement custom training function will be called as training_strategy(estimator).
    • list of keyword dicts containing method arguments: Will call Estimator.train() once with each dict of method arguments.
  • is_sig_zerovar – Whether to assign p-value of 0 to a gene which has zero variance in both groups but not the same mean. If False, the p-value is set to np.nan.
  • quick_scale

    Depending on the optimizer, scale will be fitted faster and maybe less accurate.

    Useful in scenarios where fitting the exact scale is not absolutely necessary.

  • dtype

    Allows specifying the precision which should be used to fit data.

    Should be “float32” for single precision or “float64” for double precision.

  • kwargs – [Debugging] Additional arguments will be passed to the _fit method.