pypairs.pairs.sandbag¶
-
pypairs.pairs.sandbag(data, annotation=None, gene_names=None, sample_names=None, fraction=0.65, filter_genes=None, filter_samples=None)¶ Calculate ‘marker pairs’ from a genecount matrix. Cells x Genes.
A Pair of genes (g1, g2) is considered a marker for a category if its expression changes from g1 > g2 in one category to g1 < g2 in all other categories, for at least a
fractionof cells in this category.datacan be of typeAnnData,DataFrameorndarrayand should contain the raw or normalized gene counts of shapen_obs*n_vars. Rows correspond to cells and columns to genes.- If data is
AnnDataobject, the category for each sample should be in indata.vars['category'], gene names indata.var_namesand sample names indata.obs_names. - If data is
DataFrameobject, gene names can be indf.columnsor passed viagene_namesand sample names indf.indexor passed viasample_names. The category for each sample must be passed viaannotation.annotationmust be in form of {‘category1’: [‘sample_1’,’sample_2’,…], …}. List of samples for indexing can be integer, str or a boolean mask oflen(sample_names).
- If data
ndarray, all information must be passed viaannotation,gene_namesandsample_namesparameters.
Marker pairs are returned as a mapping from category to list of 2-tuple Genes: {‘category’: [(Gene_1,Gene_2), …], …}
Parameters: - data :
Union[AnnData,DataFrame,ndarray,Collection[Collection[float]]] The (annotated) data matrix of shape
n_obs*n_vars. Rows correspond to cells and columns to genes.- annotation :
Optional[Mapping[str,Collection[Union[str,int,bool]]]] Mapping from category to genes. If
datais notAnnData, this is required. List of genes can be index, names or logical mask.- gene_names :
Optional[Collection[str]] Names for genes, must be same length as
n_vars. Ifdatais notAnnData, this is required.- sample_names :
Optional[Collection[str]] Names for samples, must be same length as
n_obs. Ifdatais notAnnData, this is required.- fraction :
float Fraction of cells per category where marker criteria must be satisfied. Default: 0.65
- filter_genes :
Optional[Collection[Union[str,int,bool]]] A list of genes to keep. If not
Noneall genes not in this list will be removed. List can be index, names or logical mask.- filter_samples :
Optional[Collection[Union[str,int,bool]]] A list of samples to keep. If not
Noneall samples not in this list will be removed. List can be index, names or logical mask.
Return type: Mapping[str,Collection[Tuple[str,str]]]Returns: marker_pairs_dict – A dict mapping from str to a list of 2-tuple, where the key is the category and the list contains the marker pairs: {‘Category_1’: [(Gene_1, Gene_2), …], …}.
Examples
To generate marker pairs for a different fraction than the default (0.65) based on the bundled
oscope-dataset [Leng15] run:from pypairs import pairs, datasets adata = datasets.leng15() marker_pairs = pairs.sandbag(adata, fraction=0.5)
- If data is