Using ChatMDV
Overview
ChatMDV is an optional extension to MDV that allows users to interact with their data via natural language commands. Powered by large language models (LLMs), it bridges the gap between intuitive human queries and MDV’s charting interface, automatically generating views, filters, and visualizations from plain language instructions.
ChatMDV enhances accessibility by:
- Eliminating the need for manual chart configuration
- Reducing repetitive workflows
- Helping users explore datasets using intuitive language
- Making advanced analysis accessible to non-programmers
- Accelerating hypothesis generation
Accessing ChatMDV
- Launch an MDV project via web interface or your local deployment.
- Locate the chat icon in the top-right panel (labelled Chat) and click to open the ChatMDV window.
- Enter natural language queries like:
- “Show a UMAP scatter plot colored by cluster.”
- “Filter for cells expressing high levels of gene X and display a violin plot.”
- ChatMDV interprets your request, runs the required operations, and updates the view.
Setup Requirements
-
OpenAI API Key: ChatMDV requires an active OpenAI API key.
- Without it, the ChatMDV icon will not be visible.
-
Obtain a key from OpenAI Platform and follow the setup instructions in the Installation manual.
-
Be aware that API usage may incur costs depending on your plan.
Common Commands & Examples
Command Example | Action Triggered |
---|---|
“Scatter plot of UMAP colored by treatment” | Adds a scatter plot of the UMAP dimensions with points colored by treatment group |
“Create a heatmap of genes A, B and C” | Generates a heatmap showing the expression of genes A, B and C across a categorical variable |
“Filter dataset to only include B cells, then plot a violin plot of gene Y” | Filters to B cells and generates the violin plot for gene Y |
Best Practices
- Specify clearly: include chart type, metadata fields and if you want the dataset to be filtered.
- Use exact column or metadata names as they appear in your dataset.
- For complex workflows, break requests into smaller steps.
Limitations
-
Large datasets + analysis requests:
ChatMDV is primarily designed to generate visualisations, but you may ask it to run bioinformatics analysis first (e.g. differential expression using scanpy commands).- On large datasets these computations can exceed the default server container’s memory and CPU limits or the Docker container's memory if you running a local instance.
- If you plan to request both heavy analysis + visualisation through ChatMDV, you may need to allocate a larger container to avoid failures or timeouts.
-
Scanpy data loading and analysis through ChatMDV:
By default, ChatMDV loads AnnData objects usingbacked='r'
. This is to limit the computational resources required for loading the data if further bioinformatics analysis is not required.- This is fine for visualisation workflows, which is ChatMDV’s primary purpose. The data can be efficiently accessed from disk for generating plots without consuming large amounts of memory.
- However, if you instruct ChatMDV to perform bioinformatics analysis first (e.g. differential expression, clustering, dimensionality reduction, trajectory inference), you should explicitly say “Do not use
backed='r'
”.backed='r'
keeps data on disk and prevents in-memory operations required for these analyses (e.g. filtering, subsetting, matrix computations).- Standard (in-memory) loading is required for these workflows to run successfully.
When to Use
Use ChatMDV when:
- You’re exploring your data and want rapid prototyping.
- You prefer a language-first interface and need quick visual outputs.
- You are producing identical graphs for different fields that needs automating.
Use manual configuration when:
- You require precise control over chart layout, formatting, or advanced customization.