templates#
Attributes#
Functions#
|
Constructs a RAG prompt to guide LLM code generation for creating MDV plots. |
Module Contents#
- templates.prompt_data = Multiline-String[source]#
Show Value
""" Your task is to: 1. Identify the type of data the user needs (e.g., categorical, numerical, etc.) by inspecting the DataFrames provided. 2. Use only the two DataFrames provided: - df1: cells (data_frame_obs) - df2: genes (data_frame_var) 3. Column selection logic: - For non-gene queries: select columns from df1 only. Inspect df1, using df1.columns - For gene-related queries (e.g., expression of a gene, comparison of genes, highest expressing genes): a. Use ONLY gene names from df2["name"] — do NOT use gene IDs or any other columns (e.g., df2["gene_ids"]). b. If a specific gene is mentioned by the user, check if it exists in df2["name"]. - If it does not exist, assume the user provided a gene name and that df2["name"] may contain gene IDs instead. - Attempt to match the user-provided gene name to the corresponding gene ID using any available mapping logic (e.g., a lookup function or mapping dictionary). - If a corresponding gene ID is found in df2["name"], return that value. - If it exists, return it. - If no match is found, ignore the requested gene and instead select one or more gene names from df2["name"]. c. If no gene is mentioned, select one or more gene names from df2["name"]. d. Only use values from df2["name"] — do NOT use any other columns from df2. 4. Always return the list of required columns as a quoted comma-separated string, like: - "col1", "col2" - Or for gene-related: "col", "gene_name" (make sure "col" is from df1) 5. For gene-related queries: - Return both df1 columns and the selected gene name (from df2["name"]). - Only return the name as a string (e.g., "gene_name")—do not wrap it. 6. NEVER create new DataFrames or modify existing ones. 7. Ensure that the selected columns match the visualization requirements: - Abundance Box plot: Requires three categorical columns. - If only one categorical variable is available, return it three times. - If two are available, return one of them twice. - Box plot: Requires only one categorical column and one numerical column. - Density Scatter plot: Requires two numerical columns and one categorical column. - Dot plot: Requires only one categorical column and any number of numerical columns. - Heatmap: Requires only one categorical column and any number of numerical columns. - Histogram: Requires one numerical column. - Multiline chart: Requires one numerical column and one categorical column. - Pie Chart: Requires one categorical column. - Row Chart: Requires one categorical column. - Row summary box: Requires any column(s). - Sankey plot: Requires two categorical columns. - If only one categorical variable is available, return it twice. - Scatter plot (2D): Requires two numerical columns and one any column for color. - Scatter plot (3D): Requires three numerical columns and one any column for color. - Selection dialog plot: Requires any column. - Stacked row chart: Requires two categorical columns. - If only one categorical variable is available, return it twice. - Table Plot: Requires any column(s). - Text box: Requires no columns, just text. - Violin plot: Requires only one categorical column and one numerical column. - Wordcloud: Requires one categorical column. 8. Important: Clearly separate the selected columns with quotes and commas. 9. The column names are case sensitive therefore return them as they are defined in the dataframe. 10. Output format: - First line: The word "fields" following by quoted, comma-separated list of column names. - Second line: The word "charts" following by quoted, comma-separated list of suitable chart types for the selected columns. 11. NEVER explain your reasoning. """
- templates.packages_functions = Multiline-String[source]#
Show Value
"""import os import pandas as pd import scanpy as sc from mdvtools.mdvproject import MDVProject from mdvtools.conversions import convert_scanpy_to_mdv from mdvtools.charts.density_scatter_plot import DensityScatterPlot from mdvtools.charts.heatmap_plot import HeatmapPlot from mdvtools.charts.histogram_plot import HistogramPlot from mdvtools.charts.dot_plot import DotPlot from mdvtools.charts.box_plot import BoxPlot from mdvtools.charts.scatter_plot_3D import ScatterPlot3D from mdvtools.charts.row_chart import RowChart from mdvtools.charts.scatter_plot import ScatterPlot from mdvtools.charts.abundance_box_plot import AbundanceBoxPlot from mdvtools.charts.stacked_row_plot import StackedRowChart from mdvtools.charts.ring_chart import RingChart from mdvtools.charts.pie_chart import PieChart from mdvtools.charts.violin_plot import ViolinPlot from mdvtools.charts.multi_line_plot import MultiLinePlot from mdvtools.charts.table_plot import TablePlot from mdvtools.charts.wordcloud_plot import WordcloudPlot from mdvtools.charts.text_box_plot import TextBox from mdvtools.charts.row_summary_box_plot import RowSummaryBox from mdvtools.charts.selection_dialog_plot import SelectionDialogPlot from mdvtools.charts.sankey_plot import SankeyPlot import json import numpy as np import sys """
- templates.get_createproject_prompt_RAG(project: mdvtools.mdvproject.MDVProject, path_to_data: str, datasource_name: str, final_answer: str, question: str) str [source]#
Constructs a RAG prompt to guide LLM code generation for creating MDV plots. Handles both standard and gene-related queries.