Steamboat API
Steamboat API includes four modules.
dataset module provides the dataset class for loading and preprocessing the data.
model module provides the model class for training and predicting the data.
tools module provides tools for post-processing, visualization, clustering, segmentation, etc.
utils module provides basic utility functions used in multiple modules, usually not needed by the end user.
Please refer to respective sections for detailed information.
steamboat.dataset module
- class steamboat.dataset.SteamboatDataset(data_list, sparse_graph)
Bases:
DatasetSteamboat Dataset class
- Parameters:
data_list – a list of dictionaries containing ‘X’ and ‘adj’ keys
sparse_graph – Whether to use adjacency list or adjacency matrix
- to(device)
Send everything to a device. Always copy (even if it’s on the device already).
- steamboat.dataset.make_dataset(adatas: list[AnnData], sparse_graph=True, mask_var: str = None, obsm_key=None, regional_obs: str | list[str] = None) SteamboatDataset
Create a PyTorch Dataset from a list of adata The input data should be a list of AnnData that contains 1. raw counts or normalized counts :param adatas: A list of SCANPY AnnData :param sparse_graph: Use adjacency list. :param mask_var: Column in var to select variables. Default: obs.highly_variable if available, otherwise no filtering. Specify False to use all genes. :return: A torch.Dataset including all data.
- steamboat.dataset.prep_adatas(adatas: list[AnnData], n_neighs: int = 8, norm=True, log1p=True, scale=False, renorm=False) list[AnnData]
Preprocess a list of AnnData objects
- Parameters:
adatas – A list of SCANPY AnnData
n_neighs – number of neighbors for kNN spatial graph, defaults to 8
log_norm – Whether or not to normalize and log-transform the data, defaults to True
- Returns:
A list of preprocessed SCANPY AnnData
steamboat.model module
- class steamboat.model.BilinearAttention(d_in: int, n_heads: int, n_scales: int = 2, d_out: int = None)
Bases:
ModuleBilinear attention layer
- Parameters:
d_in – number of input features
n_heads – number of heads
n_scales – number of scales (default 2, i.e., ego and local; 3 will add global)
d_out – _description_, defaults to None (meaning d_out = d_in)
- forward(adj_list, x, masked_x=None, regional_adj_lists=None, regional_xs=None, get_details=False, explained_variance_mask=None, chosen_head=None)
Forward pass
- Parameters:
adj_list – adjacency list for spatial graph
x – input data
masked_x – masked input data, defaults to None (i.e, using x)
regional_adj_lists – list of adjacency list for bipartite graph of cells - regions, defaults to None
regional_xs – list of mean expression of regions, defaults to None
get_details – whether to return details, defaults to False
- Returns:
reconstructed gene expression
- score_interactive(q_emb, k_emb, adj_list, activation=None)
Score interactive factors. Attention to other cells/environment.
- Parameters:
q_emb – query scores
k_emb – key scores
adj_list – adjacency list
- Returns:
interactive scores for short or long range interaction
- score_intrinsic(q_emb, k_emb, activation=None)
Score intrinsic factors. No attention to other cells/environment.
- Parameters:
q_emb – query scores
k_emb – key scores
activation – activation function
- Returns:
ego scores
- class steamboat.model.NonNegBias(d)
Bases:
ModuleNon-negative bias layer (i.e., add a non-negative vector to the output)
- Parameters:
d – number of input/output features
- property bias
Transform bias to be non-negative
- Returns:
non-negative bias
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class steamboat.model.NonNegLinear(d_in, d_out, bias)
Bases:
ModuleNonegative linear layer
- Parameters:
d_in – number of input features
d_out – number of output features
bias – umimplemented
- Raises:
NotImplementedError – when bias is True
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- property weight
transform weight matrix to be non-negative
- Returns:
transformed weight matrix
- class steamboat.model.NonNegScale(d)
Bases:
ModuleNon-negative bias layer (i.e., add a non-negative vector to the output)
- Parameters:
d – number of input/output features
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- property scale
Transform bias to be non-negative
- Returns:
non-negative bias
- class steamboat.model.Steamboat(features: list[str] | int, n_heads: int, n_scales: int = 2)
Bases:
ModuleSteamboat model
- Parameters:
features – feature names (usuall adata.var_names or a column in adata.var for gene symbols)
n_heads – number of heads
n_scales – number of scales (default 2, i.e., ego and local; 3 will add global)
- fit(dataset: SteamboatDataset, entry_masking_rate: float = 0.1, feature_masking_rate: float = 0.1, device: str = 'cuda', *, opt=None, opt_args=None, loss_fun=None, max_epoch: int = 100, stop_eps: float = 0.0001, stop_tol: int = 10, log_dir: str = 'log/', report_per: int = 10)
Create a PyTorch Dataset from a list of adata
- Parameters:
dataset – Dataset to be trained on
entry_masking_rate – Rate of masking a random entries, default 0.0
feature_masking_rate – Rate of masking a full feature (can overlap with entry masking), default 0.0
device – Device to be used (“cpu” or “cuda”)
local_entropy_penalty – entropy penalty to make the local attention more diverse
opt – Optimizer for fitting
opt_args – Arguments for optimizer (e.g., {‘lr’: 0.01})
loss_fun – Loss function: Default is MSE (nn.MSELoss).
You may use MAE nn.L1Loss, Huber ‘nn.HuberLoss`, SmoothL1 nn.SmoothL1Loss, or a customized loss function. :param max_epoch: maximum number of epochs :param stop_eps: Stopping criterion: minimum change (see also stop_tol) :param stop_tol: Stopping criterion: number of epochs that don’t meet stop_eps before stopping :param log_dir: Directory to save logs :param report_per: report per how many epoch. 0 to only report before termination. negative number to never report.
- Returns:
self
- forward(adj_list, x, masked_x, regional_adj_lists, regional_xs, get_details=False, explained_variance_mask=None, chosen_head=None)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- get_bias() array
- get_ego_transform() array
Get gene attention matrix
- Parameters:
separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.
- Returns:
Gene attention vectors
- get_global_transform() array
Get gene attention matrix
- Parameters:
separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.
- Returns:
Gene attention matrix
- get_local_transform() array
Get gene attention matrix
- Parameters:
separate_q_k – If True, return Q and K. Otherwise, return Q.T @ K.
- Returns:
Gene attention vectors
- get_top_features(top_k=5)
- masking(x: Tensor, xs, entry_masking_rate: float, feature_masking_rate: float)
Masking the dataset
- Parameters:
x – input data
mask_rate – masking rate
masking_method – full matrix or feature-wise masking
- Returns:
masked data
- score_cells(x)
- score_global(x, x_bar=None)
- score_local(x, adj_matrix)
- transform(x, adj_matrix, get_details=True, explained_variance_mask=None)
steamboat.utils module
- steamboat.utils.set_random_seed(seed: int) None
Reset seed for Numpy and PyTorch
- Parameters:
seed – Random seed
steamboat.tools module
- steamboat.tools.calc_adjacency_freq(adatas, sample_key: str, cell_type_key: str, pseudocount: float = 20.0)
Calculate baseline interaction matrix determined by adjacency frequency
- Parameters:
adatas – all adatas
sample_key – obs key for sample names
cell_type_key – obs key for cell types
- Returns:
adjacency frequency matrices (one per sample) in a dictionary
- steamboat.tools.calc_geneset_auroc(metagenes, genesets)
Gene set enrichment analysis by AUROC
- Parameters:
metagenes – metagenes from calc_var(model)
genesets – genesets in a dictionary
- Returns:
AUROC in a DataFrame
- steamboat.tools.calc_geneset_auroc_order(sig_df, by='q')
Order the metagenes by AUROC
- Parameters:
sig_df – Analysis results
by – by which metagene, defaults to ‘q’
- Returns:
ordering of the metagenes
- steamboat.tools.calc_head_weights(adatas, model: Steamboat)
Calculate weights of heads and scales within each head
- Parameters:
adatas – all adatas
model – the trained Steamboat model
- Returns:
weights
- steamboat.tools.calc_head_weights_quantile(adatas, model: Steamboat, quantile: float = 0.9)
Calculate weights of heads and scales within each head
- Parameters:
adatas – all adatas
model – the trained Steamboat model
- Returns:
weights
- steamboat.tools.calc_interaction(adatas, model: Steamboat, sample_key: str, cell_type_key: str, pseudocount: float = 20.0)
Calculate interaction matrix
- Parameters:
adatas – all adatas
model – Steamboat model
sample_key – obs key for sample names
cell_type_key – obs key for cell types
pseudocount – pseudocount in denominator when averaging scores in cell type pairs, defaults to 20.
- Returns:
interaction matrices (one per sample) in a dictionary
- steamboat.tools.calc_obs(adatas: list[AnnData], dataset: SteamboatDataset, model: Steamboat, device='cuda', get_recon: bool = False)
Calculate and store the embeddings and attention scores in the AnnData objects
- Parameters:
adatas – List of AnnData objects to store the embeddings and attention scores
dataset – SteamboatDataset object to be processed
model – Steamboat model
device – Device to run the model, defaults to ‘cuda’
get_recon – Whether to store the reconstructed data, defaults to False
- steamboat.tools.calc_v_weights(model: Steamboat, normalize: bool = True)
Calculate weight of reconstruction (w_v) metagene
- Parameters:
model – Steamboat model
normalize – whether normalize the sum to 1, defaults to True
- Returns:
weights
- steamboat.tools.calc_var(model: Steamboat)
Write metagenes into a DataFrame
- Parameters:
model – Steamboat model
- Returns:
DataFrame of metagenes
- steamboat.tools.contribution_by_scale(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')
Calculate explained variance for each scale
- Parameters:
model – Steamboat model
dataset – SteamboatDataset object to be processed
adatas – list of AnnData objects corresponding to the dataset
device – Device to run the model, defaults to ‘cuda’
- Returns:
explained variance scores in a dictionary
- steamboat.tools.contribution_by_scale_and_head(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')
Calculate explained variance for each scale
- Parameters:
model – Steamboat model
dataset – SteamboatDataset object to be processed
adatas – list of AnnData objects corresponding to the dataset
device – Device to run the model, defaults to ‘cuda’
- Returns:
explained variance scores in a dictionary
- steamboat.tools.explained_variance_by_scale(model: Steamboat, dataset: SteamboatDataset, adatas: list[AnnData], device='cuda')
Calculate explained variance for each scale
- Parameters:
model – Steamboat model
dataset – SteamboatDataset object to be processed
adatas – list of AnnData objects corresponding to the dataset
device – Device to run the model, defaults to ‘cuda’
- Returns:
explained variance scores in a dictionary
- steamboat.tools.find_top_celltypes(adata, obs_key: str, top: int = 3, return_raw: bool = True)
Find top contributing cell types for each metagene by mean scores
- Parameters:
adata – AnnData object with cell type information
obs_key – obs key for cell types
top – top cell types per metagene to find, defaults to 3
- Returns:
list of lists of top cell types per metagene
- steamboat.tools.find_top_lrs(var: DataFrame, lr: DataFrame, top: int = 3, return_raw: bool = True)
Find top contributing ligand-receptor pairs for each metagene
- Parameters:
var – var from calc_var(model)
top – top genes per metagene to find, defaults to 3
- Returns:
list of lists of top genes per metagene
- steamboat.tools.gather_obs(adata: AnnData, adatas: list[AnnData])
Gather obs/obsm/uns from a list of AnnData objects to a single AnnData object
- Parameters:
adata – AnnData object to store the gathered obs/obsm/uns
adatas – List of AnnData objects to be gathered
- steamboat.tools.leiden(adata: AnnData, resolution: float = 1.0, *, obsp='steamboat_emb_connectivities', key_added='steamboat_clusters', leiden_kwargs: dict = None)
A thin wrapper for scanpy.tl.leiden to cluster for cell types (for spatial domain segmentation, use segment).
- Parameters:
adata – AnnData object to be processed
resolution – resolution for Leiden clustering, defaults to 1.
obsp – obsp key to be used, defaults to ‘steamboat_emb_connectivities’
key_added – obs key to be added for resulting clusters, defaults to ‘steamboat_clusters’
leiden_kwargs – Other parameters for scanpy.tl.leiden if desired, defaults to None
- Returns:
hands over what scanpy.tl.leiden returns
- steamboat.tools.neighbors(adata: AnnData, use_rep: str = 'attn', key_added: str = 'steamboat_emb', metric='cosine', neighbors_kwargs: dict = None)
A thin wrapper for scanpy.pp.neighbors for Steamboat functionalities
- Parameters:
adata – AnnData object to be processed
use_rep – embedding to be used, defaults to ‘attn’
key_added – key in obsp to store the resulting similarity graph, defaults to ‘steamboat_emb’
metric – metric for similarity graph, defaults to ‘cosine’
neighbors_kwargs – Other parameters for scanpy.pp.neighbors if desired, defaults to None
- Returns:
hands over what scanpy.pp.neighbors returns
- steamboat.tools.plot_all_transforms(model: Steamboat, top: int = 3, head_order=None, figsize: str | tuple[float, float] = 'auto', chosen_features: List[str] = None)
Plot all metagenes for scale == 3
- Parameters:
model – Steamboat model
top – top genes per metagene to plot, defaults to 3
head_order – order of heads in a list, which can be some or all the heads, defaults to None
figsize – (width, height), defaults to ‘auto’
chosen_features – selected features to plot, defaults to None
- steamboat.tools.plot_all_transforms2(model, top: int = 3, reorder: bool = False, figsize: str | tuple[float, float] = 'auto', vmin: float = 0.0, vmax: float = 1.0, xticklabels: tuple[str, str, str] = ('environment', 'center cell', 'reconstruction'))
Plot all metagenes
- Parameters:
model – Steamboat model
top – Number of top genes per metagene to plot, defaults to 3
reorder – Reorder the genes by metagene, or keep the orginal ordering, defaults to False
figsize – Size of the figure, defaults to ‘auto’
vmin – minimum value in the color bar, defaults to 0.
vmax – maximum value in the color bar, defaults to 1.
- steamboat.tools.plot_cell_type_enrichment(all_adata, adatas, score_dim, label_key, select_labels=None, figsize=(0.75, 4))
- steamboat.tools.plot_geneset_auroc(sig_df, order, figsize=(8, 5))
Plot gene set enrichment by AUROC
- Parameters:
sig_df – Analysis results
order – order of heads
figsize – (width, height), defaults to (8, 5)
- Returns:
fig, ax
- steamboat.tools.plot_head_weights(head_weights, multiplier: float = 100, order=None, figsize=(7, 0.8), heatmap_kwargs=None, save: str = None)
Plot head weights calculated by calc_head_weights
- Parameters:
head_weights – head weights calculated by calc_head_weights
multiplier – 100 for percentage, 1000 for mills, etc., defaults to 100
order – ordering of heads, defaults to None
figsize – (width, height), defaults to (7, 0.8)
heatmap_kwargs – additional arguments for heatmap plotting, defaults to None
save – save to file, defaults to None
- steamboat.tools.plot_vq(model: Steamboat, chosen_features: List[str], figsize=(3, 3))
Plot the reconstruction metagenes (w_q) only
- Parameters:
model – Steamboat model
chosen_features – chosen genes to plot
figsize – (width, height), defaults to (3, 3)
- Returns:
fig, ax
- steamboat.tools.plot_wq(model: Steamboat, chosen_features: List[str], figsize=(3, 3))
Plot the reconstruction metagenes (w_q) only
- Parameters:
model – Steamboat model
chosen_features – chosen genes to plot
figsize – (width, height), defaults to (3, 3)
- Returns:
fig, ax
- steamboat.tools.rank(x, axis=1)
Rank number
- Parameters:
x – numpy array of numbers
axis – perform over which axis, defaults to 1
- Returns:
ranks
- steamboat.tools.read_lrdb(species: str = 'human') DataFrame
Read ligand-receptor database
- Parameters:
species – Species, either ‘human’ or ‘mouse’, defaults to ‘human’
- Returns:
DataFrame of ligand-receptor pairs
- steamboat.tools.score_lrs(adata, model, lrdb, species='human', gene_names='index')
Calculate ligand-receptor scores based on Steamboat model
- Parameters:
adata – AnnData object containing the data
model – Steamboat model
lrdb – Ligand-receptor database
species – Species, either ‘human’ or ‘mouse’, defaults to ‘human’
gene_names – Gene names to use, either ‘index’ or a column name in adata.var
- Returns:
List of DataFrames containing ligand-receptor scores for each head
- steamboat.tools.segment(adata: AnnData, resolution: float = 1.0, *, key_added: str = 'steamboat_spatial_domain', key_added_pairwise: str = 'pairwise', key_added_similarity: str = 'similarity', key_added_combined: str = 'combined', n_prop: int = 3, spatial_graph_threshold: float = 0.0, leiden_kwargs: dict = None)
Spatial domain segmentation using Steamboat embeddings and graphs
- Parameters:
adata – AnnData object to be processed
resolution – resolution for Leiden clustering, defaults to 1.
key_added – obs key for semgentaiton result, defaults to ‘steamboat_spatial_domain’
key_added_pairwise – obsp key for pairwise cell-cell attention graph, defaults to ‘pairwise’
key_added_similarity – obsp key for per-cell attention k-NN similarity graph, defaults to ‘similarity’
key_added_combined – obsp key for combined pairwise and similarity graphs, defaults to ‘combined’
n_prop – power (numbers of propagation) for the pairwise graph, defaults to 3
spatial_graph_threshold – threshold to include/exclude an edge, a larger number will make the program run faster but potentially less accurate, defaults to 0.0
leiden_kwargs – Other parameters for scanpy.tl.leiden if desired, defaults to None
- Returns:
_descripthands over what scanpy.tl.leiden returnsion_