templates#

Attributes#

`prompt_data`
`packages_functions`

Functions#

get_createproject_prompt_RAG(→ str)

Constructs a RAG prompt to guide LLM code generation for creating MDV plots.

Module Contents#

templates.prompt_data = Multiline-String[source]#

Show Value

"""
Your task is to:
1. Identify the type of data the user needs (e.g., categorical, numerical, etc.) by inspecting the DataFrames provided.
2. Use only the two DataFrames provided:
   - df1: cells (data_frame_obs)
   - df2: genes (data_frame_var)
3. Column selection logic:
   - For non-gene queries: select columns from df1 only. Inspect df1, using df1.columns
   - For gene-related queries (e.g., expression of a gene, comparison of genes, highest expressing genes):
       a. Use ONLY gene names from df2["name"] — do NOT use gene IDs or any other columns (e.g., df2["gene_ids"]).
       b. If a specific gene is mentioned by the user, check if it exists in df2["name"].
           - If it does not exist, assume the user provided a gene name and that df2["name"] may contain gene IDs instead.
                - Attempt to match the user-provided gene name to the corresponding gene ID using any available mapping logic (e.g., a lookup function or mapping dictionary).
                - If a corresponding gene ID is found in df2["name"], return that value.
           - If it exists, return it.
           - If no match is found, ignore the requested gene and instead select one or more gene names from df2["name"].
       c. If no gene is mentioned, select one or more gene names from df2["name"].
       d. Only use values from df2["name"] — do NOT use any other columns from df2.
4. Always return the list of required columns as a quoted comma-separated string, like:
   - "col1", "col2"
   - Or for gene-related: "col", "gene_name"   (make sure "col" is from df1)
5. For gene-related queries:
   - Return both df1 columns and the selected gene name (from df2["name"]).
   - Only return the name as a string (e.g., "gene_name")—do not wrap it.
6. NEVER create new DataFrames or modify existing ones.
7. Ensure that the selected columns match the visualization requirements:
    - Abundance Box plot: Requires three categorical columns.
      - If only one categorical variable is available, return it three times.
      - If two are available, return one of them twice.
    - Box plot: Requires only one categorical column and one numerical column.
    - Density Scatter plot: Requires two numerical columns and one categorical column.
    - Dot plot: Requires only one categorical column and any number of numerical columns.
    - Heatmap: Requires only one categorical column and any number of numerical columns.
    - Histogram: Requires one numerical column.
    - Multiline chart: Requires one numerical column and one categorical column.
    - Pie Chart: Requires one categorical column.
    - Row Chart: Requires one categorical column.
    - Row summary box: Requires any column(s).
    - Sankey plot: Requires two categorical columns.
      - If only one categorical variable is available, return it twice.
    - Scatter plot (2D): Requires two numerical columns and one any column for color.
    - Scatter plot (3D): Requires three numerical columns and one any column for color.
    - Selection dialog plot: Requires any column.
    - Stacked row chart: Requires two categorical columns.
      - If only one categorical variable is available, return it twice.
    - Table Plot: Requires any column(s).
    - Text box: Requires no columns, just text.
    - Violin plot: Requires only one categorical column and one numerical column.
    - Wordcloud: Requires one categorical column.
8. Important: Clearly separate the selected columns with quotes and commas.
9. The column names are case sensitive therefore return them as they are defined in the dataframe.
10. Output format:
   - First line: The word "fields" following by quoted, comma-separated list of column names.
   - Second line: The word "charts" following by quoted, comma-separated list of suitable chart types for the selected columns.
11. NEVER explain your reasoning.
"""

templates.packages_functions = Multiline-String[source]#

Show Value

"""import os
import pandas as pd
import scanpy as sc
from mdvtools.mdvproject import MDVProject
from mdvtools.conversions import convert_scanpy_to_mdv
from mdvtools.charts.density_scatter_plot import DensityScatterPlot
from mdvtools.charts.heatmap_plot import HeatmapPlot
from mdvtools.charts.histogram_plot import HistogramPlot
from mdvtools.charts.dot_plot import DotPlot
from mdvtools.charts.box_plot import BoxPlot
from mdvtools.charts.scatter_plot_3D import ScatterPlot3D
from mdvtools.charts.row_chart import RowChart
from mdvtools.charts.scatter_plot import ScatterPlot
from mdvtools.charts.abundance_box_plot import AbundanceBoxPlot
from mdvtools.charts.stacked_row_plot import StackedRowChart
from mdvtools.charts.ring_chart import RingChart
from mdvtools.charts.pie_chart import PieChart
from mdvtools.charts.violin_plot import ViolinPlot
from mdvtools.charts.multi_line_plot import MultiLinePlot
from mdvtools.charts.table_plot import TablePlot
from mdvtools.charts.wordcloud_plot import WordcloudPlot
from mdvtools.charts.text_box_plot import TextBox
from mdvtools.charts.row_summary_box_plot import RowSummaryBox
from mdvtools.charts.selection_dialog_plot import SelectionDialogPlot
from mdvtools.charts.sankey_plot import SankeyPlot

import json
import numpy as np
import sys
"""

templates.get_createproject_prompt_RAG(project: mdvtools.mdvproject.MDVProject, path_to_data: str, datasource_name: str, final_answer: str, question: str) → str[source]#: Constructs a RAG prompt to guide LLM code generation for creating MDV plots. Handles both standard and gene-related queries.