pypairs.pairs.sandbag¶
-
pypairs.pairs.
sandbag
(data, annotation=None, gene_names=None, sample_names=None, fraction=0.65, filter_genes=None, filter_samples=None)¶ Calculate ‘marker pairs’ from a genecount matrix. Cells x Genes.
A Pair of genes (g1, g2) is considered a marker for a category if its expression changes from g1 > g2 in one category to g1 < g2 in all other categories, for at least a
fraction
of cells in this category.data
can be of typeAnnData
,DataFrame
orndarray
and should contain the raw or normalized gene counts of shapen_obs
*n_vars
. Rows correspond to cells and columns to genes.- If data is
AnnData
object, the category for each sample should be in indata.vars['category']
, gene names indata.var_names
and sample names indata.obs_names
. - If data is
DataFrame
object, gene names can be indf.columns
or passed viagene_names
and sample names indf.index
or passed viasample_names
. The category for each sample must be passed viaannotation
.annotation
must be in form of {‘category1’: [‘sample_1’,’sample_2’,…], …}. List of samples for indexing can be integer, str or a boolean mask oflen(sample_names)
.
- If data
ndarray
, all information must be passed viaannotation
,gene_names
andsample_names
parameters.
Marker pairs are returned as a mapping from category to list of 2-tuple Genes: {‘category’: [(Gene_1,Gene_2), …], …}
Parameters: - data :
Union
[AnnData
,DataFrame
,ndarray
,Collection
[Collection
[float
]]] The (annotated) data matrix of shape
n_obs
*n_vars
. Rows correspond to cells and columns to genes.- annotation :
Optional
[Mapping
[str
,Collection
[Union
[str
,int
,bool
]]]] Mapping from category to genes. If
data
is notAnnData
, this is required. List of genes can be index, names or logical mask.- gene_names :
Optional
[Collection
[str
]] Names for genes, must be same length as
n_vars
. Ifdata
is notAnnData
, this is required.- sample_names :
Optional
[Collection
[str
]] Names for samples, must be same length as
n_obs
. Ifdata
is notAnnData
, this is required.- fraction :
float
Fraction of cells per category where marker criteria must be satisfied. Default: 0.65
- filter_genes :
Optional
[Collection
[Union
[str
,int
,bool
]]] A list of genes to keep. If not
None
all genes not in this list will be removed. List can be index, names or logical mask.- filter_samples :
Optional
[Collection
[Union
[str
,int
,bool
]]] A list of samples to keep. If not
None
all samples not in this list will be removed. List can be index, names or logical mask.
Return type: Mapping
[str
,Collection
[Tuple
[str
,str
]]]Returns: marker_pairs_dict – A dict mapping from str to a list of 2-tuple, where the key is the category and the list contains the marker pairs: {‘Category_1’: [(Gene_1, Gene_2), …], …}.
Examples
To generate marker pairs for a different fraction than the default (0.65) based on the bundled
oscope
-dataset [Leng15] run:from pypairs import pairs, datasets adata = datasets.leng15() marker_pairs = pairs.sandbag(adata, fraction=0.5)
- If data is