Using ChatMDV

Overview

ChatMDV is an optional extension to MDV that allows users to interact with their data via natural language commands. Powered by large language models (LLMs), it bridges the gap between intuitive human queries and MDV’s charting interface, automatically generating views, filters, and visualizations from plain language instructions.

ChatMDV enhances accessibility by:

Eliminating the need for manual chart configuration
Reducing repetitive workflows
Helping users explore datasets using intuitive language
Making advanced analysis accessible to non-programmers
Accelerating hypothesis generation

Accessing ChatMDV

Launch an MDV project via web interface or your local deployment.
Locate the chat icon in the top-right panel (labelled Chat) and click to open the ChatMDV window.
Enter natural language queries like:
- “Show a UMAP scatter plot colored by cluster.”
- “Filter for cells expressing high levels of gene X and display a violin plot.”
ChatMDV interprets your request, runs the required operations, and updates the view.

Setup Requirements

OpenAI API Key: ChatMDV requires an active OpenAI API key.
- Without it, the ChatMDV icon will not be visible.
Obtain a key from OpenAI Platform and follow the setup instructions in the Installation manual.
Be aware that API usage may incur costs depending on your plan.

Common Commands & Examples

Command Example	Action Triggered
“Scatter plot of UMAP colored by treatment”	Adds a scatter plot of the UMAP dimensions with points colored by treatment group
“Create a heatmap of genes A, B and C”	Generates a heatmap showing the expression of genes A, B and C across a categorical variable
“Filter dataset to only include B cells, then plot a violin plot of gene Y”	Filters to B cells and generates the violin plot for gene Y

Best Practices

Specify clearly: include chart type, metadata fields and if you want the dataset to be filtered.
Use exact column or metadata names as they appear in your dataset.
For complex workflows, break requests into smaller steps.

Limitations

Large datasets + analysis requests:
ChatMDV is primarily designed to generate visualisations, but you may ask it to run bioinformatics analysis first (e.g. differential expression using scanpy commands).
- On large datasets these computations can exceed the default server container’s memory and CPU limits or the Docker container's memory if you running a local instance.
- If you plan to request both heavy analysis + visualisation through ChatMDV, you may need to allocate a larger container to avoid failures or timeouts.
Scanpy data loading and analysis through ChatMDV:
By default, ChatMDV loads AnnData objects using backed='r'. This is to limit the computational resources required for loading the data if further bioinformatics analysis is not required.
- This is fine for visualisation workflows, which is ChatMDV’s primary purpose. The data can be efficiently accessed from disk for generating plots without consuming large amounts of memory.
- However, if you instruct ChatMDV to perform bioinformatics analysis first (e.g. differential expression, clustering, dimensionality reduction, trajectory inference), you should explicitly say “Do not use backed='r'”.
  - backed='r' keeps data on disk and prevents in-memory operations required for these analyses (e.g. filtering, subsetting, matrix computations).
  - Standard (in-memory) loading is required for these workflows to run successfully.

When to Use

Use ChatMDV when:

You’re exploring your data and want rapid prototyping.
You prefer a language-first interface and need quick visual outputs.
You are producing identical graphs for different fields that needs automating.

Use manual configuration when:

You require precise control over chart layout, formatting, or advanced customization.

Overview​

Accessing ChatMDV​

Setup Requirements​

Common Commands & Examples​

Best Practices​

Limitations​

When to Use​