pypairs.pairs.cyclone

pypairs.pairs.cyclone(data, marker_pairs=None, gene_names=None, sample_names=None, iterations=1000, min_iter=100, min_pairs=50)

Score samples for each category based on marker pairs.

data can be of type AnnData, DataFrame or ndarray and should contain the raw or normalized gene counts of shape n_obs * n_vars. Rows correspond to cells and columns to genes.

  • If a AnnData object is passed, the category scores and the final prediction will be added to data.obs with key pypairs_{category}_score and pypairs_max_class.
    • If marker pairs contain only the cell cycle categories G1, S and G2M an additional column pypairs_cc_prediction will be added. Where category S is assigned to samples where G1 and G2M score are below 0.5, as described in [Scialdone15].

marker_pairs, i.e. output from sandbag(), must be a mapping from category to list of 2-tuple Genes: {‘category’: [(Gene_1,Gene_2), …], …}.

  • If no marker_pairs are passed the default are used from default_marker() based on [Leng15] (marker pairs for cell cycle prediction).
Parameters:
data : Union[AnnData, DataFrame, ndarray, Collection[Collection[float]]]

The (annotated) data matrix of shape n_obs * n_vars. Rows correspond to cells and columns to genes.

marker_pairs : Optional[Mapping[str, Collection[Tuple[str, str]]]]

A dict mapping from str to a list of 2-tuple, where the key is the category and the list contains the marker pairs: {‘Category_1’: [(Gene_1, Gene_2), …], …}. If not provided default marker pairs are used

gene_names : Optional[Collection[str]]

Names for genes, must be same length as n_vars.

sample_names : Optional[Collection[str]]

Names for samples, must be same length as n_obs.

iterations : Optional[int]

An integer specifying the number of iterations for random sampling to obtain a cycle score. Default: 1000

min_iter : Optional[int]

An integer specifying the minimum number of iterations for score estimation. Default: 100

min_pairs : Optional[int]

An integer specifying the minimum number of pairs for cycle estimation. Default: 50

Return type:

DataFrame

Returns:

  • A DataFrame with samples as index and categories as columns with scores for each category for each
  • sample and a additional column with the name of the max scoring category for each sample.
    • If marker pairs contain only the cell cycle categories G1, S and G2M an additional column pypairs_cc_prediction will be added. Where category S is assigned to samples where G1 and G2M score are below 0.5, as described in [Scialdone15].

Examples

To predict the cell cycle phase of the unsorted cell from the [Leng15] dataset run:

import pypairs import pairs, datasets

adata = datasets.leng15('unsorted')
marker_pairs = datasets.default_cc_marker()
scores = pairs.cyclone(adata, marker_pairs)
print(scores)