Spatial domains labeling
novae.label_domains(adata, obs_key=None, tissue='unknown', species=None, n_genes=15, cell_type_key=None, pathways=None, spatial_context=None, provider='openai', model='gpt-4.1', api_key=None, max_tokens=1024, seed=None, return_prompt=False)
While the model.assign_domains function provide domain IDs, this function provide biologically meaningful label (or names) to the latter Novae spatial domain IDs.
Internally, it uses an LLM that is prompted with descriptive information: DEGs per domain, domain sizes, optionally pathway expressions, and cell-type proportions.
API key
An API key is required to use this function. You can either provide it directly as an api_key argument, or set it as an environment variable (OPENAI_API_KEY for OpenAI, ANTHROPIC_API_KEY for Anthropic).
If you just want to generate the prompt without making an API call, set return_prompt=True and no API key will be required. You can then copy/paste the generated messages and output_schema into your preferred LLM playground.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
An |
required |
obs_key
|
str | None
|
Key in |
None
|
tissue
|
str
|
Tissue name (for example, |
'unknown'
|
species
|
str | None
|
Species name (for example, |
None
|
n_genes
|
int
|
Number of marker genes per domain passed to the LLM prompt. |
15
|
cell_type_key
|
str | None
|
Optional key in |
None
|
pathways
|
dict[str, list[str]] | str | None
|
Either a dictionary of pathways (keys are pathway names, values are lists of gene names), or a path to a GSEA JSON file. When provided, pathway enrichment scores per domain are added to the LLM input. |
None
|
spatial_context
|
str | None
|
Optional extra biological or spatial context to include in the prompt. |
None
|
provider
|
str
|
LLM provider to use. Either |
'openai'
|
model
|
str
|
OpenAI model name used for labeling. |
'gpt-4.1'
|
api_key
|
str | None
|
OpenAI API key. If |
None
|
max_tokens
|
int
|
Maximum number of tokens the model is allowed to generate for the labeling response (only for anthropic). |
1024
|
seed
|
int | None
|
Optional random seed passed to the labeling utility. |
None
|
return_prompt
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame | dict[str, dict[str, Any]]
|
A |