pypairs.pairs.sandbag

pypairs.pairs.sandbag(data, annotation=None, gene_names=None, sample_names=None, fraction=0.65, filter_genes=None, filter_samples=None)

Calculate ‘marker pairs’ from a genecount matrix. Cells x Genes.

A Pair of genes (g1, g2) is considered a marker for a category if its expression changes from g1 > g2 in one category to g1 < g2 in all other categories, for at least a fraction of cells in this category.

data can be of type AnnData, DataFrame or ndarray and should contain the raw or normalized gene counts of shape n_obs * n_vars. Rows correspond to cells and columns to genes.

  • If data is AnnData object, the category for each sample should be in in data.vars['category'], gene names in data.var_names and sample names in data.obs_names.
  • If data is DataFrame object, gene names can be in df.columns or passed via gene_names and sample names in df.index or passed via sample_names. The category for each sample must be passed via annotation.
    • annotation must be in form of {‘category1’: [‘sample_1’,’sample_2’,…], …}. List of samples for indexing can be integer, str or a boolean mask of len(sample_names).
  • If data ndarray, all information must be passed via annotation, gene_names and sample_names parameters.

Marker pairs are returned as a mapping from category to list of 2-tuple Genes: {‘category’: [(Gene_1,Gene_2), …], …}

Parameters:
data : Union[AnnData, DataFrame, ndarray, Collection[Collection[float]]]

The (annotated) data matrix of shape n_obs * n_vars. Rows correspond to cells and columns to genes.

annotation : Optional[Mapping[str, Collection[Union[str, int, bool]]]]

Mapping from category to genes. If data is not AnnData, this is required. List of genes can be index, names or logical mask.

gene_names : Optional[Collection[str]]

Names for genes, must be same length as n_vars. If data is not AnnData, this is required.

sample_names : Optional[Collection[str]]

Names for samples, must be same length as n_obs. If data is not AnnData, this is required.

fraction : float

Fraction of cells per category where marker criteria must be satisfied. Default: 0.65

filter_genes : Optional[Collection[Union[str, int, bool]]]

A list of genes to keep. If not None all genes not in this list will be removed. List can be index, names or logical mask.

filter_samples : Optional[Collection[Union[str, int, bool]]]

A list of samples to keep. If not None all samples not in this list will be removed. List can be index, names or logical mask.

Return type:

Mapping[str, Collection[Tuple[str, str]]]

Returns:

marker_pairs_dict – A dict mapping from str to a list of 2-tuple, where the key is the category and the list contains the marker pairs: {‘Category_1’: [(Gene_1, Gene_2), …], …}.

Examples

To generate marker pairs for a different fraction than the default (0.65) based on the bundled oscope-dataset [Leng15] run:

from pypairs import pairs, datasets

adata = datasets.leng15()
marker_pairs = pairs.sandbag(adata, fraction=0.5)