diffxpy.api.test.versus_rest

diffxpy.api.test.versus_rest(data: Union[anndata._core.anndata.AnnData, anndata._core.raw.Raw, numpy.ndarray, scipy.sparse.csr.csr_matrix, batchglm.models.base.input.InputDataBase], grouping: Union[str, numpy.ndarray, list], as_numeric: Union[List[str], Tuple[str], str] = (), test: str = 'wald', gene_names: Optional[Union[numpy.ndarray, list]] = None, sample_description: Optional[pandas.core.frame.DataFrame] = None, noise_model: Optional[str] = None, size_factors: Optional[numpy.ndarray] = None, batch_size: Union[None, int, Tuple[int, int]] = None, backend: str = 'numpy', train_args: dict = {}, training_strategy: Union[str, List[Dict[str, object]], Callable] = 'AUTO', is_sig_zerovar: bool = True, quick_scale: Optional[bool] = None, dtype='float64', pval_correction: str = 'global', keep_full_test_objs: bool = False, **kwargs)

Perform pairwise differential expression test between two groups on adata object for each gene for each groups versus the rest of the data set.

This function wraps the selected statistical test for the scenario of a two sample comparison. All unit_test offered in this wrapper test for the difference of the mean parameter of both samples. We note that the much more efficient default method is coefficient based and only requires one model fit.

The exact unit_test are as follows (assuming the group labels are saved in a column named “group”), each test is executed on the entire data and the labels are modified so that the target group is one group and the remaining groups are allocated to the second reference group):

  • “lrt” - log-likelihood ratio test):

    Requires the fitting of 2 generalized linear models (full and reduced).

    • full model location parameter: ~ 1 + group

    • full model scale parameter: ~ 1 + group

    • reduced model location parameter: ~ 1

    • reduced model scale parameter: ~ 1 + group

  • “wald” - Wald test:

    Requires the fitting of 1 generalized linear models. model location parameter: ~ 1 + group model scale parameter: ~ 1 + group Test the group coefficient of the location parameter model against 0.

  • “t-test” - Welch’s t-test:

    Doesn’t require fitting of generalized linear models. Welch’s t-test between both observation groups.

  • “rank” - Wilcoxon rank sum (Mann-Whitney U) test:

    Doesn’t require fitting of generalized linear models. Wilcoxon rank sum (Mann-Whitney U) test between both observation groups.

Parameters
  • data – Array-like or anndata.Anndata object containing observations. Input data matrix (observations x features) or (cells x genes).

  • grouping

    str, array

    • column in data.obs/sample_description which contains the split of observations into the two groups.

    • array of length num_observations containing group labels

  • as_numeric – Which columns of sample_description to treat as numeric and not as categorical. This yields columns in the design matrix which do not correpond to one-hot encoded discrete factors. This makes sense for number of genes, time, pseudotime or space for example.

  • test

    str, statistical test to use. Possible options (see function description):

    • ’wald’

    • ’lrt’

    • ’t-test’

    • ’rank’

  • gene_names – optional list/array of gene names which will be used if data does not implicitly store these

  • sample_description – optional pandas.DataFrame containing sample annotations

  • pval_correction

    Choose between global and test-wise correction. Can be:

    • ”global”: correct all p-values in one operation

    • ”by_test”: correct the p-values of each test individually

  • size_factors – 1D array of transformed library size factors for each cell in the same order as in data

  • noise_model

    str, noise model to use in model-based unit_test. Possible options:

    • ’nb’: default

  • batch_size

    Argument controlling the memory load of the fitting procedure. For backends that allow chunking of operations, this parameter controls the size of the batch / chunk.

    • If backend is “tf1” or “tf2”: number of observations per batch

    • If backend is “numpy”: Tuple of (number of observations per chunk, number of genes per chunk)

  • backend

    Which linear algebra library to chose. This impact the available noise models and optimizers / training strategies. Available are:

    • ”numpy” numpy

    • ”tf1” tensorflow1.* >= 1.13

    • ”tf2” tensorflow2.*

  • training_strategy

    {str, function, list} training strategy to use. Can be:

    • str: will use Estimator.TrainingStrategy[training_strategy] to train

    • function: Can be used to implement custom training function will be called as training_strategy(estimator).

    • list of keyword dicts containing method arguments: Will call Estimator.train() once with each dict of method arguments.

  • quick_scale

    Depending on the optimizer, scale will be fitted faster and maybe less accurate.

    Useful in scenarios where fitting the exact scale is not absolutely necessary.

  • dtype

    Allows specifying the precision which should be used to fit data.

    Should be “float32” for single precision or “float64” for double precision.

  • pval_correction

    Choose between global and test-wise correction. Can be:

    • ”global”: correct all p-values in one operation

    • ”by_test”: correct the p-values of each test individually

  • is_sig_zerovar – Whether to assign p-value of 0 to a gene which has zero variance in both groups but not the same mean. If False, the p-value is set to np.nan.

  • kwargs – [Debugging] Additional arguments will be passed to the _fit method.