Steamboat API

Steamboat API includes four modules.

  • dataset module provides the dataset class for loading and preprocessing the data.

  • model module provides the model class for training and predicting the data.

  • tools module provides tools for post-processing, visualization, clustering, segmentation, etc.

  • utils module provides basic utility functions used in multiple modules, usually not needed by the end user.

Please refer to respective sections for detailed information.

steamboat.dataset module

class steamboat.dataset.SteamboatDataset(data_list, sparse_graph)

Bases: Dataset

Steamboat Dataset class

Parameters:
  • data_list – a list of dictionaries containing ‘X’ and ‘adj’ keys

  • sparse_graph – Whether to use adjacency list or adjacency matrix

to(device)

Send everything to a device. Always copy (even if it’s on the device already).

steamboat.dataset.make_dataset(adatas: list[AnnData], sparse_graph=True, mask_var: str = None, obsm_key=None, regional_obs: str | list[str] = None) SteamboatDataset

Create a PyTorch Dataset from a list of adata The input data should be a list of AnnData that contains 1. raw counts or normalized counts :param adatas: A list of SCANPY AnnData :param sparse_graph: Use adjacency list. :param mask_var: Column in var to select variables. Default: obs.highly_variable if available, otherwise no filtering. Specify False to use all genes. :return: A torch.Dataset including all data.

steamboat.dataset.prep_adatas(adatas: list[AnnData], n_neighs: int = 8, norm=True, log1p=True, scale=False, renorm=False) list[AnnData]

Preprocess a list of AnnData objects

Parameters:
  • adatas – A list of SCANPY AnnData

  • n_neighs – number of neighbors for kNN spatial graph, defaults to 8

  • log_norm – Whether or not to normalize and log-transform the data, defaults to True

Returns:

A list of preprocessed SCANPY AnnData

steamboat.model module

class steamboat.model.BilinearAttention(d_in: int, n_heads: int, n_scales: int = 2, d_out: int = None)

Bases: Module

Bilinear attention layer

Parameters:
  • d_in – number of input features

  • n_heads – number of heads

  • n_scales – number of scales (default 2, i.e., ego and local; 3 will add global)

  • d_out – _description_, defaults to None (meaning d_out = d_in)

forward(adj_list, x, masked_x=None, regional_adj_lists=None, regional_xs=None, get_details=False, explained_variance_mask=None, chosen_head=None)

Forward pass

Parameters:
  • adj_list – adjacency list for spatial graph

  • x – input data

  • masked_x – masked input data, defaults to None (i.e, using x)

  • regional_adj_lists – list of adjacency list for bipartite graph of cells - regions, defaults to None

  • regional_xs – list of mean expression of regions, defaults to None

  • get_details – whether to return details, defaults to False

Returns:

reconstructed gene expression

score_interactive(q_emb, k_emb, adj_list, activation=None)

Score interactive factors. Attention to other cells/environment.

Parameters:
  • q_emb – query scores

  • k_emb – key scores

  • adj_list – adjacency list

Returns:

interactive scores for short or long range interaction

score_intrinsic(q_emb, k_emb, activation=None)

Score intrinsic factors. No attention to other cells/environment.

Parameters:
  • q_emb – query scores

  • k_emb – key scores

  • activation – activation function

Returns:

ego scores

class steamboat.model.NonNegBias(d)

Bases: Module

Non-negative bias layer (i.e., add a non-negative vector to the output)

Parameters:

d – number of input/output features

property bias

Transform bias to be non-negative

Returns:

non-negative bias

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class steamboat.model.NonNegLinear(d_in, d_out, bias)

Bases: Module

Nonegative linear layer

Parameters:
  • d_in – number of input features

  • d_out – number of output features

  • bias – umimplemented

Raises:

NotImplementedError – when bias is True

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property weight

transform weight matrix to be non-negative

Returns:

transformed weight matrix

class steamboat.model.NonNegScale(d)

Bases: Module

Non-negative bias layer (i.e., add a non-negative vector to the output)

Parameters:

d – number of input/output features

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property scale

Transform bias to be non-negative

Returns:

non-negative bias

class steamboat.model.Steamboat(features: list[str] | int, n_heads: int, n_scales: int = 2)

Bases: Module

Steamboat model

Parameters:
  • features – feature names (usuall adata.var_names or a column in adata.var for gene symbols)

  • n_heads – number of heads

  • n_scales – number of scales (default 2, i.e., ego and local; 3 will add global)

fit(dataset: SteamboatDataset, entry_masking_rate: float = 0.1, feature_masking_rate: float = 0.1, device: str = 'cuda', *, opt=None, opt_args=None, loss_fun=None, max_epoch: int = 100, stop_eps: float = 0.0001, stop_tol: int = 10, log_dir: str = 'log/', report_per: int = 10)

Create a PyTorch Dataset from a list of adata

Parameters:
  • dataset – Dataset to be trained on

  • entry_masking_rate – Rate of masking a random entries, default 0.0

  • feature_masking_rate – Rate of masking a full feature (can overlap with entry masking), default 0.0

  • device – Device to be used (“cpu” or “cuda”)

  • local_entropy_penalty – entropy penalty to make the local attention more diverse

  • opt – Optimizer for fitting

  • opt_args – Arguments for optimizer (e.g., {‘lr’: 0.01})

  • loss_fun – Loss function: Default is MSE (nn.MSELoss).

You may use MAE nn.L1Loss, Huber ‘nn.HuberLoss`, SmoothL1 nn.SmoothL1Loss, or a customized loss function. :param max_epoch: maximum number of epochs :param stop_eps: Stopping criterion: minimum change (see also stop_tol) :param stop_tol: Stopping criterion: number of epochs that don’t meet stop_eps before stopping :param log_dir: Directory to save logs :param report_per: report per how many epoch. 0 to only report before termination. negative number to never report.

Returns:

self

forward(adj_list, x, masked_x, regional_adj_lists, regional_xs, get_details=False, explained_variance_mask=None, chosen_head=None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_bias() array
get_ego_transform() array

Get gene attention matrix

Parameters:

separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.

Returns:

Gene attention vectors

get_global_transform() array

Get gene attention matrix

Parameters:

separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.

Returns:

Gene attention matrix

get_local_transform() array

Get gene attention matrix

Parameters:

separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.

Returns:

Gene attention vectors

get_top_features(top_k=5)
masking(x: Tensor, xs, entry_masking_rate: float, feature_masking_rate: float)

Masking the dataset

Parameters:
  • x – input data

  • mask_rate – masking rate

  • masking_method – full matrix or feature-wise masking

Returns:

masked data

score_cells(x)
score_global(x, x_bar=None)
score_local(x, adj_matrix)
transform(x, adj_matrix, get_details=True, explained_variance_mask=None)

steamboat.utils module

steamboat.utils.set_random_seed(seed: int) None

Reset seed for Numpy and PyTorch

Parameters:

seed – Random seed

steamboat.tools module

steamboat.tools.calc_adjacency_freq(adatas, sample_key: str, cell_type_key: str, pseudocount: float = 20.0)

Calculate baseline interaction matrix determined by adjacency frequency

Parameters:
  • adatas – all adatas

  • sample_key – obs key for sample names

  • cell_type_key – obs key for cell types

Returns:

adjacency frequency matrices (one per sample) in a dictionary

steamboat.tools.calc_geneset_auroc(metagenes, genesets)

Gene set enrichment analysis by AUROC

Parameters:
  • metagenes – metagenes from calc_var(model)

  • genesets – genesets in a dictionary

Returns:

AUROC in a DataFrame

steamboat.tools.calc_geneset_auroc_order(sig_df, by='q')

Order the metagenes by AUROC

Parameters:
  • sig_df – Analysis results

  • by – by which metagene, defaults to ‘q’

Returns:

ordering of the metagenes

steamboat.tools.calc_head_weights(adatas, model: Steamboat)

Calculate weights of heads and scales within each head

Parameters:
  • adatas – all adatas

  • model – the trained Steamboat model

Returns:

weights

steamboat.tools.calc_head_weights_quantile(adatas, model: Steamboat, quantile: float = 0.9)

Calculate weights of heads and scales within each head

Parameters:
  • adatas – all adatas

  • model – the trained Steamboat model

Returns:

weights

steamboat.tools.calc_interaction(adatas, model: Steamboat, sample_key: str, cell_type_key: str, pseudocount: float = 20.0)

Calculate interaction matrix

Parameters:
  • adatas – all adatas

  • model – Steamboat model

  • sample_key – obs key for sample names

  • cell_type_key – obs key for cell types

  • pseudocount – pseudocount in denominator when averaging scores in cell type pairs, defaults to 20.

Returns:

interaction matrices (one per sample) in a dictionary

steamboat.tools.calc_obs(adatas: list[AnnData], dataset: SteamboatDataset, model: Steamboat, device='cuda', get_recon: bool = False)

Calculate and store the embeddings and attention scores in the AnnData objects

Parameters:
  • adatas – List of AnnData objects to store the embeddings and attention scores

  • dataset – SteamboatDataset object to be processed

  • model – Steamboat model

  • device – Device to run the model, defaults to ‘cuda’

  • get_recon – Whether to store the reconstructed data, defaults to False

steamboat.tools.calc_v_weights(model: Steamboat, normalize: bool = True)

Calculate weight of reconstruction (w_v) metagene

Parameters:
  • model – Steamboat model

  • normalize – whether normalize the sum to 1, defaults to True

Returns:

weights

steamboat.tools.calc_var(model: Steamboat)

Write metagenes into a DataFrame

Parameters:

model – Steamboat model

Returns:

DataFrame of metagenes

steamboat.tools.contribution_by_scale(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')

Calculate explained variance for each scale

Parameters:
  • model – Steamboat model

  • dataset – SteamboatDataset object to be processed

  • adatas – list of AnnData objects corresponding to the dataset

  • device – Device to run the model, defaults to ‘cuda’

Returns:

explained variance scores in a dictionary

steamboat.tools.contribution_by_scale_and_head(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')

Calculate explained variance for each scale

Parameters:
  • model – Steamboat model

  • dataset – SteamboatDataset object to be processed

  • adatas – list of AnnData objects corresponding to the dataset

  • device – Device to run the model, defaults to ‘cuda’

Returns:

explained variance scores in a dictionary

steamboat.tools.explained_variance_by_scale(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')

Calculate explained variance for each scale

Parameters:
  • model – Steamboat model

  • dataset – SteamboatDataset object to be processed

  • adatas – list of AnnData objects corresponding to the dataset

  • device – Device to run the model, defaults to ‘cuda’

Returns:

explained variance scores in a dictionary

steamboat.tools.find_top_celltypes(adata, obs_key: str, top: int = 3, return_raw: bool = True)

Find top contributing cell types for each metagene by mean scores

Parameters:
  • adata – AnnData object with cell type information

  • obs_key – obs key for cell types

  • top – top cell types per metagene to find, defaults to 3

Returns:

list of lists of top cell types per metagene

steamboat.tools.find_top_lrs(var: DataFrame, lr: DataFrame, top: int = 3, return_raw: bool = True)

Find top contributing ligand-receptor pairs for each metagene

Parameters:
  • var – var from calc_var(model)

  • top – top genes per metagene to find, defaults to 3

Returns:

list of lists of top genes per metagene

steamboat.tools.gather_obs(adata: AnnData, adatas: list[AnnData])

Gather obs/obsm/uns from a list of AnnData objects to a single AnnData object

Parameters:
  • adata – AnnData object to store the gathered obs/obsm/uns

  • adatas – List of AnnData objects to be gathered

steamboat.tools.leiden(adata: AnnData, resolution: float = 1.0, *, obsp='steamboat_emb_connectivities', key_added='steamboat_clusters', leiden_kwargs: dict = None)

A thin wrapper for scanpy.tl.leiden to cluster for cell types (for spatial domain segmentation, use segment).

Parameters:
  • adata – AnnData object to be processed

  • resolution – resolution for Leiden clustering, defaults to 1.

  • obsp – obsp key to be used, defaults to ‘steamboat_emb_connectivities’

  • key_added – obs key to be added for resulting clusters, defaults to ‘steamboat_clusters’

  • leiden_kwargs – Other parameters for scanpy.tl.leiden if desired, defaults to None

Returns:

hands over what scanpy.tl.leiden returns

steamboat.tools.neighbors(adata: AnnData, use_rep: str = 'attn', key_added: str = 'steamboat_emb', metric='cosine', neighbors_kwargs: dict = None)

A thin wrapper for scanpy.pp.neighbors for Steamboat functionalities

Parameters:
  • adata – AnnData object to be processed

  • use_rep – embedding to be used, defaults to ‘attn’

  • key_added – key in obsp to store the resulting similarity graph, defaults to ‘steamboat_emb’

  • metric – metric for similarity graph, defaults to ‘cosine’

  • neighbors_kwargs – Other parameters for scanpy.pp.neighbors if desired, defaults to None

Returns:

hands over what scanpy.pp.neighbors returns

steamboat.tools.plot_all_transforms(model: Steamboat, top: int = 3, head_order=None, figsize: str | tuple[float, float] = 'auto', chosen_features: List[str] = None)

Plot all metagenes for scale == 3

Parameters:
  • model – Steamboat model

  • top – top genes per metagene to plot, defaults to 3

  • head_order – order of heads in a list, which can be some or all the heads, defaults to None

  • figsize – (width, height), defaults to ‘auto’

  • chosen_features – selected features to plot, defaults to None

steamboat.tools.plot_all_transforms2(model, top: int = 3, reorder: bool = False, figsize: str | tuple[float, float] = 'auto', vmin: float = 0.0, vmax: float = 1.0, xticklabels: tuple[str, str, str] = ('environment', 'center cell', 'reconstruction'))

Plot all metagenes

Parameters:
  • model – Steamboat model

  • top – Number of top genes per metagene to plot, defaults to 3

  • reorder – Reorder the genes by metagene, or keep the orginal ordering, defaults to False

  • figsize – Size of the figure, defaults to ‘auto’

  • vmin – minimum value in the color bar, defaults to 0.

  • vmax – maximum value in the color bar, defaults to 1.

steamboat.tools.plot_cell_type_enrichment(all_adata, adatas, score_dim, label_key, select_labels=None, figsize=(0.75, 4))
steamboat.tools.plot_geneset_auroc(sig_df, order, figsize=(8, 5))

Plot gene set enrichment by AUROC

Parameters:
  • sig_df – Analysis results

  • order – order of heads

  • figsize – (width, height), defaults to (8, 5)

Returns:

fig, ax

steamboat.tools.plot_head_weights(head_weights, multiplier: float = 100, order=None, figsize=(7, 0.8), heatmap_kwargs=None, save: str = None)

Plot head weights calculated by calc_head_weights

Parameters:
  • head_weights – head weights calculated by calc_head_weights

  • multiplier – 100 for percentage, 1000 for mills, etc., defaults to 100

  • order – ordering of heads, defaults to None

  • figsize – (width, height), defaults to (7, 0.8)

  • heatmap_kwargs – additional arguments for heatmap plotting, defaults to None

  • save – save to file, defaults to None

steamboat.tools.plot_vq(model: Steamboat, chosen_features: List[str], figsize=(3, 3))

Plot the reconstruction metagenes (w_q) only

Parameters:
  • model – Steamboat model

  • chosen_features – chosen genes to plot

  • figsize – (width, height), defaults to (3, 3)

Returns:

fig, ax

steamboat.tools.plot_wq(model: Steamboat, chosen_features: List[str], figsize=(3, 3))

Plot the reconstruction metagenes (w_q) only

Parameters:
  • model – Steamboat model

  • chosen_features – chosen genes to plot

  • figsize – (width, height), defaults to (3, 3)

Returns:

fig, ax

steamboat.tools.rank(x, axis=1)

Rank number

Parameters:
  • x – numpy array of numbers

  • axis – perform over which axis, defaults to 1

Returns:

ranks

steamboat.tools.read_lrdb(species: str = 'human') DataFrame

Read ligand-receptor database

Parameters:

species – Species, either ‘human’ or ‘mouse’, defaults to ‘human’

Returns:

DataFrame of ligand-receptor pairs

steamboat.tools.score_lrs(adata, model, lrdb, species='human', gene_names='index')

Calculate ligand-receptor scores based on Steamboat model

Parameters:
  • adata – AnnData object containing the data

  • model – Steamboat model

  • lrdb – Ligand-receptor database

  • species – Species, either ‘human’ or ‘mouse’, defaults to ‘human’

  • gene_names – Gene names to use, either ‘index’ or a column name in adata.var

Returns:

List of DataFrames containing ligand-receptor scores for each head

steamboat.tools.segment(adata: AnnData, resolution: float = 1.0, *, key_added: str = 'steamboat_spatial_domain', key_added_pairwise: str = 'pairwise', key_added_similarity: str = 'similarity', key_added_combined: str = 'combined', n_prop: int = 3, spatial_graph_threshold: float = 0.0, leiden_kwargs: dict = None)

Spatial domain segmentation using Steamboat embeddings and graphs

Parameters:
  • adata – AnnData object to be processed

  • resolution – resolution for Leiden clustering, defaults to 1.

  • key_added – obs key for semgentaiton result, defaults to ‘steamboat_spatial_domain’

  • key_added_pairwise – obsp key for pairwise cell-cell attention graph, defaults to ‘pairwise’

  • key_added_similarity – obsp key for per-cell attention k-NN similarity graph, defaults to ‘similarity’

  • key_added_combined – obsp key for combined pairwise and similarity graphs, defaults to ‘combined’

  • n_prop – power (numbers of propagation) for the pairwise graph, defaults to 3

  • spatial_graph_threshold – threshold to include/exclude an edge, a larger number will make the program run faster but potentially less accurate, defaults to 0.0

  • leiden_kwargs – Other parameters for scanpy.tl.leiden if desired, defaults to None

Returns:

_descripthands over what scanpy.tl.leiden returnsion_

Module contents