Steamboat API

Steamboat API includes four modules.

dataset module provides the dataset class for loading and preprocessing the data.
model module provides the model class for training and predicting the data.
tools module provides tools for post-processing, visualization, clustering, segmentation, etc.
utils module provides basic utility functions used in multiple modules, usually not needed by the end user.

Please refer to respective sections for detailed information.

steamboat.dataset module

class steamboat.dataset.SteamboatDataset(data_list, sparse_graph)

Bases: Dataset

Steamboat Dataset class

Parameters:

data_list – a list of dictionaries containing ‘X’ and ‘adj’ keys
sparse_graph – Whether to use adjacency list or adjacency matrix

to(device): Send everything to a device. Always copy (even if it’s on the device already).

steamboat.dataset.make_dataset(adatas: list[AnnData], sparse_graph=True, mask_var: str = None, obsm_key=None, regional_obs: str | list[str] = None) → SteamboatDataset: Create a PyTorch Dataset from a list of adata The input data should be a list of AnnData that contains 1. raw counts or normalized counts :param adatas: A list of SCANPY AnnData :param sparse_graph: Use adjacency list. :param mask_var: Column in var to select variables. Default: obs.highly_variable if available, otherwise no filtering. Specify False to use all genes. :return: A torch.Dataset including all data.

steamboat.dataset.prep_adatas(adatas: list[AnnData], n_neighs: int = 8, norm=True, log1p=True, scale=False, renorm=False) → list[AnnData]

Preprocess a list of AnnData objects

Parameters:

adatas – A list of SCANPY AnnData
n_neighs – number of neighbors for kNN spatial graph, defaults to 8
log_norm – Whether or not to normalize and log-transform the data, defaults to True

Returns:

A list of preprocessed SCANPY AnnData

steamboat.model module

class steamboat.model.BilinearAttention(d_in: int, n_heads: int, n_scales: int = 2, d_out: int = None)

Bases: Module

Bilinear attention layer

Parameters:

d_in – number of input features
n_heads – number of heads
n_scales – number of scales (default 2, i.e., ego and local; 3 will add global)
d_out – _description_, defaults to None (meaning d_out = d_in)

forward(adj_list, x, masked_x=None, regional_adj_lists=None, regional_xs=None, get_details=False, explained_variance_mask=None, chosen_head=None)

Forward pass

Parameters:

adj_list – adjacency list for spatial graph
x – input data
masked_x – masked input data, defaults to None (i.e, using x)
regional_adj_lists – list of adjacency list for bipartite graph of cells - regions, defaults to None
regional_xs – list of mean expression of regions, defaults to None
get_details – whether to return details, defaults to False

Returns:

reconstructed gene expression

score_interactive(q_emb, k_emb, adj_list, activation=None)

Score interactive factors. Attention to other cells/environment.

Parameters:

q_emb – query scores
k_emb – key scores
adj_list – adjacency list

Returns:

interactive scores for short or long range interaction

score_intrinsic(q_emb, k_emb, activation=None)

Score intrinsic factors. No attention to other cells/environment.

Parameters:

q_emb – query scores
k_emb – key scores
activation – activation function

Returns:

ego scores

class steamboat.model.NonNegBias(d)

Bases: Module

Non-negative bias layer (i.e., add a non-negative vector to the output)

Parameters:: d – number of input/output features

property bias

Transform bias to be non-negative

Returns:: non-negative bias

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class steamboat.model.NonNegLinear(d_in, d_out, bias)

Bases: Module

Nonegative linear layer

Parameters:

d_in – number of input features
d_out – number of output features
bias – umimplemented

Raises:

NotImplementedError – when bias is True

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property weight

transform weight matrix to be non-negative

Returns:: transformed weight matrix

class steamboat.model.NonNegScale(d)

Bases: Module

Non-negative bias layer (i.e., add a non-negative vector to the output)

Parameters:: d – number of input/output features

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property scale

Transform bias to be non-negative

Returns:: non-negative bias

class steamboat.model.Steamboat(features: list[str] | int, n_heads: int, n_scales: int = 2)

Bases: Module

Steamboat model

Parameters:

features – feature names (usuall adata.var_names or a column in adata.var for gene symbols)
n_heads – number of heads
n_scales – number of scales (default 2, i.e., ego and local; 3 will add global)

fit(dataset: SteamboatDataset, entry_masking_rate: float = 0.1, feature_masking_rate: float = 0.1, device: str = 'cuda', *, opt=None, opt_args=None, loss_fun=None, max_epoch: int = 100, stop_eps: float = 0.0001, stop_tol: int = 10, log_dir: str = 'log/', report_per: int = 10)

Create a PyTorch Dataset from a list of adata

Parameters:

dataset – Dataset to be trained on
entry_masking_rate – Rate of masking a random entries, default 0.0
feature_masking_rate – Rate of masking a full feature (can overlap with entry masking), default 0.0
device – Device to be used (“cpu” or “cuda”)
local_entropy_penalty – entropy penalty to make the local attention more diverse
opt – Optimizer for fitting
opt_args – Arguments for optimizer (e.g., {‘lr’: 0.01})
loss_fun – Loss function: Default is MSE (nn.MSELoss).

You may use MAE nn.L1Loss, Huber ‘nn.HuberLoss`, SmoothL1 nn.SmoothL1Loss, or a customized loss function. :param max_epoch: maximum number of epochs :param stop_eps: Stopping criterion: minimum change (see also stop_tol) :param stop_tol: Stopping criterion: number of epochs that don’t meet stop_eps before stopping :param log_dir: Directory to save logs :param report_per: report per how many epoch. 0 to only report before termination. negative number to never report.

Returns:: self

forward(adj_list, x, masked_x, regional_adj_lists, regional_xs, get_details=False, explained_variance_mask=None, chosen_head=None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_bias() → array

get_ego_transform() → array

Get gene attention matrix

Parameters:: separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.
Returns:: Gene attention vectors

get_global_transform() → array

Get gene attention matrix

Parameters:: separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.
Returns:: Gene attention matrix

get_local_transform() → array

Get gene attention matrix

Parameters:: separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.
Returns:: Gene attention vectors

get_top_features(top_k=5)

masking(x: Tensor, xs, entry_masking_rate: float, feature_masking_rate: float)

Masking the dataset

Parameters:

x – input data
mask_rate – masking rate
masking_method – full matrix or feature-wise masking

Returns:

masked data

score_cells(x)

score_global(x, x_bar=None)

score_local(x, adj_matrix)

transform(x, adj_matrix, get_details=True, explained_variance_mask=None)

steamboat.utils module

steamboat.utils.set_random_seed(seed: int) → None

Reset seed for Numpy and PyTorch

Parameters:: seed – Random seed

steamboat.tools module

steamboat.tools.calc_adjacency_freq(adatas, sample_key: str, cell_type_key: str, pseudocount: float = 20.0)

Calculate baseline interaction matrix determined by adjacency frequency

Parameters:

adatas – all adatas
sample_key – obs key for sample names
cell_type_key – obs key for cell types

Returns:

adjacency frequency matrices (one per sample) in a dictionary

steamboat.tools.calc_geneset_auroc(metagenes, genesets)

Gene set enrichment analysis by AUROC

Parameters:

metagenes – metagenes from calc_var(model)
genesets – genesets in a dictionary

Returns:

AUROC in a DataFrame

steamboat.tools.calc_geneset_auroc_order(sig_df, by='q')

Order the metagenes by AUROC

Parameters:

sig_df – Analysis results
by – by which metagene, defaults to ‘q’

Returns:

ordering of the metagenes

steamboat.tools.calc_head_weights(adatas, model: Steamboat)

Calculate weights of heads and scales within each head

Parameters:

adatas – all adatas
model – the trained Steamboat model

Returns:

weights

steamboat.tools.calc_head_weights_quantile(adatas, model: Steamboat, quantile: float = 0.9)

Calculate weights of heads and scales within each head

Parameters:

adatas – all adatas
model – the trained Steamboat model

Returns:

weights

steamboat.tools.calc_interaction(adatas, model: Steamboat, sample_key: str, cell_type_key: str, pseudocount: float = 20.0)

Calculate interaction matrix

Parameters:

adatas – all adatas
model – Steamboat model
sample_key – obs key for sample names
cell_type_key – obs key for cell types
pseudocount – pseudocount in denominator when averaging scores in cell type pairs, defaults to 20.

Returns:

interaction matrices (one per sample) in a dictionary

steamboat.tools.calc_obs(adatas: list[AnnData], dataset: SteamboatDataset, model: Steamboat, device='cuda', get_recon: bool = False)

Calculate and store the embeddings and attention scores in the AnnData objects

Parameters:

adatas – List of AnnData objects to store the embeddings and attention scores
dataset – SteamboatDataset object to be processed
model – Steamboat model
device – Device to run the model, defaults to ‘cuda’
get_recon – Whether to store the reconstructed data, defaults to False

steamboat.tools.calc_v_weights(model: Steamboat, normalize: bool = True)

Calculate weight of reconstruction (w_v) metagene

Parameters:

model – Steamboat model
normalize – whether normalize the sum to 1, defaults to True

Returns:

weights

steamboat.tools.calc_var(model: Steamboat)

Write metagenes into a DataFrame

Parameters:: model – Steamboat model
Returns:: DataFrame of metagenes

steamboat.tools.contribution_by_scale(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')

Calculate explained variance for each scale

Parameters:

model – Steamboat model
dataset – SteamboatDataset object to be processed
adatas – list of AnnData objects corresponding to the dataset
device – Device to run the model, defaults to ‘cuda’

Returns:

explained variance scores in a dictionary

steamboat.tools.contribution_by_scale_and_head(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')

Calculate explained variance for each scale

Parameters:

model – Steamboat model
dataset – SteamboatDataset object to be processed
adatas – list of AnnData objects corresponding to the dataset
device – Device to run the model, defaults to ‘cuda’

Returns:

explained variance scores in a dictionary

steamboat.tools.explained_variance_by_scale(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')

Calculate explained variance for each scale

Parameters:

model – Steamboat model
dataset – SteamboatDataset object to be processed
adatas – list of AnnData objects corresponding to the dataset
device – Device to run the model, defaults to ‘cuda’

Returns:

explained variance scores in a dictionary

steamboat.tools.find_top_celltypes(adata, obs_key: str, top: int = 3, return_raw: bool = True)

Find top contributing cell types for each metagene by mean scores

Parameters:

adata – AnnData object with cell type information
obs_key – obs key for cell types
top – top cell types per metagene to find, defaults to 3

Returns:

list of lists of top cell types per metagene

steamboat.tools.find_top_lrs(var: DataFrame, lr: DataFrame, top: int = 3, return_raw: bool = True)

Find top contributing ligand-receptor pairs for each metagene

Parameters:

var – var from calc_var(model)
top – top genes per metagene to find, defaults to 3

Returns:

list of lists of top genes per metagene

steamboat.tools.gather_obs(adata: AnnData, adatas: list[AnnData])

Gather obs/obsm/uns from a list of AnnData objects to a single AnnData object

Parameters:

adata – AnnData object to store the gathered obs/obsm/uns
adatas – List of AnnData objects to be gathered

steamboat.tools.leiden(adata: AnnData, resolution: float = 1.0, *, obsp='steamboat_emb_connectivities', key_added='steamboat_clusters', leiden_kwargs: dict = None)

A thin wrapper for scanpy.tl.leiden to cluster for cell types (for spatial domain segmentation, use segment).

Parameters:

adata – AnnData object to be processed
resolution – resolution for Leiden clustering, defaults to 1.
obsp – obsp key to be used, defaults to ‘steamboat_emb_connectivities’
key_added – obs key to be added for resulting clusters, defaults to ‘steamboat_clusters’
leiden_kwargs – Other parameters for scanpy.tl.leiden if desired, defaults to None

Returns:

hands over what scanpy.tl.leiden returns

steamboat.tools.neighbors(adata: AnnData, use_rep: str = 'attn', key_added: str = 'steamboat_emb', metric='cosine', neighbors_kwargs: dict = None)

A thin wrapper for scanpy.pp.neighbors for Steamboat functionalities

Parameters:

adata – AnnData object to be processed
use_rep – embedding to be used, defaults to ‘attn’
key_added – key in obsp to store the resulting similarity graph, defaults to ‘steamboat_emb’
metric – metric for similarity graph, defaults to ‘cosine’
neighbors_kwargs – Other parameters for scanpy.pp.neighbors if desired, defaults to None

Returns:

hands over what scanpy.pp.neighbors returns

steamboat.tools.plot_all_transforms(model: Steamboat, top: int = 3, head_order=None, figsize: str | tuple[float, float] = 'auto', chosen_features: List[str] = None)

Plot all metagenes for scale == 3

Parameters:

model – Steamboat model
top – top genes per metagene to plot, defaults to 3
head_order – order of heads in a list, which can be some or all the heads, defaults to None
figsize – (width, height), defaults to ‘auto’
chosen_features – selected features to plot, defaults to None

steamboat.tools.plot_all_transforms2(model, top: int = 3, reorder: bool = False, figsize: str | tuple[float, float] = 'auto', vmin: float = 0.0, vmax: float = 1.0, xticklabels: tuple[str, str, str] = ('environment', 'center cell', 'reconstruction'))

Plot all metagenes

Parameters:

model – Steamboat model
top – Number of top genes per metagene to plot, defaults to 3
reorder – Reorder the genes by metagene, or keep the orginal ordering, defaults to False
figsize – Size of the figure, defaults to ‘auto’
vmin – minimum value in the color bar, defaults to 0.
vmax – maximum value in the color bar, defaults to 1.

steamboat.tools.plot_cell_type_enrichment(all_adata, adatas, score_dim, label_key, select_labels=None, figsize=(0.75, 4))

steamboat.tools.plot_geneset_auroc(sig_df, order, figsize=(8, 5))

Plot gene set enrichment by AUROC

Parameters:

sig_df – Analysis results
order – order of heads
figsize – (width, height), defaults to (8, 5)

Returns:

fig, ax

steamboat.tools.plot_head_weights(head_weights, multiplier: float = 100, order=None, figsize=(7, 0.8), heatmap_kwargs=None, save: str = None)

Plot head weights calculated by calc_head_weights

Parameters:

head_weights – head weights calculated by calc_head_weights
multiplier – 100 for percentage, 1000 for mills, etc., defaults to 100
order – ordering of heads, defaults to None
figsize – (width, height), defaults to (7, 0.8)
heatmap_kwargs – additional arguments for heatmap plotting, defaults to None
save – save to file, defaults to None

steamboat.tools.plot_vq(model: Steamboat, chosen_features: List[str], figsize=(3, 3))

Plot the reconstruction metagenes (w_q) only

Parameters:

model – Steamboat model
chosen_features – chosen genes to plot
figsize – (width, height), defaults to (3, 3)

Returns:

fig, ax

steamboat.tools.plot_wq(model: Steamboat, chosen_features: List[str], figsize=(3, 3))

Plot the reconstruction metagenes (w_q) only

Parameters:

model – Steamboat model
chosen_features – chosen genes to plot
figsize – (width, height), defaults to (3, 3)

Returns:

fig, ax

steamboat.tools.rank(x, axis=1)

Rank number

Parameters:

x – numpy array of numbers
axis – perform over which axis, defaults to 1

Returns:

ranks

steamboat.tools.read_lrdb(species: str = 'human') → DataFrame

Read ligand-receptor database

Parameters:: species – Species, either ‘human’ or ‘mouse’, defaults to ‘human’
Returns:: DataFrame of ligand-receptor pairs

steamboat.tools.score_lrs(adata, model, lrdb, species='human', gene_names='index')

Calculate ligand-receptor scores based on Steamboat model

Parameters:

adata – AnnData object containing the data
model – Steamboat model
lrdb – Ligand-receptor database
species – Species, either ‘human’ or ‘mouse’, defaults to ‘human’
gene_names – Gene names to use, either ‘index’ or a column name in adata.var

Returns:

List of DataFrames containing ligand-receptor scores for each head

steamboat.tools.segment(adata: AnnData, resolution: float = 1.0, *, key_added: str = 'steamboat_spatial_domain', key_added_pairwise: str = 'pairwise', key_added_similarity: str = 'similarity', key_added_combined: str = 'combined', n_prop: int = 3, spatial_graph_threshold: float = 0.0, leiden_kwargs: dict = None)

Spatial domain segmentation using Steamboat embeddings and graphs

Parameters:

adata – AnnData object to be processed
resolution – resolution for Leiden clustering, defaults to 1.
key_added – obs key for semgentaiton result, defaults to ‘steamboat_spatial_domain’
key_added_pairwise – obsp key for pairwise cell-cell attention graph, defaults to ‘pairwise’
key_added_similarity – obsp key for per-cell attention k-NN similarity graph, defaults to ‘similarity’
key_added_combined – obsp key for combined pairwise and similarity graphs, defaults to ‘combined’
n_prop – power (numbers of propagation) for the pairwise graph, defaults to 3
spatial_graph_threshold – threshold to include/exclude an edge, a larger number will make the program run faster but potentially less accurate, defaults to 0.0
leiden_kwargs – Other parameters for scanpy.tl.leiden if desired, defaults to None

Returns:

_descripthands over what scanpy.tl.leiden returnsion_

Steamboat API

steamboat.dataset module

steamboat.model module

steamboat.utils module

steamboat.tools module

Module contents