mdvtools.conversions#

Functions#

convert_scanpy_to_mdv(→ mdvtools.mdvproject.MDVProject)

Convert a Scanpy AnnData object to MDV (Multi-Dimensional Viewer) format.

convert_mudata_to_mdv(folder, mudata_object[, ...])

get_matrix(matrix[, main_names, mod_names])

convert_vcf_to_df(→ pandas.DataFrame)

compute_vcf_end(→ pandas.DataFrame)

Compute the end position of the variant determined from 'POS', 'REF' and 'ALT'.

convert_vcf_to_mdv(→ mdvtools.mdvproject.MDVProject)

Converts a VCF file to an MDV project.

create_regulamentary_project_from_pipeline(output, ...)

Creates a regulamentary project from pipeline outputs.

create_regulamentary_project(output, table, bigwigs, beds)

Creates a regulatory project visualization from input data sources.

_create_dt_heatmap(project, matrix[, ds, ifields])

_add_dims(table, dims, max_dims[, stub])

Module Contents#

mdvtools.conversions.convert_scanpy_to_mdv(folder: str, scanpy_object: scanpy.AnnData, max_dims: int = 3, delete_existing: bool = False, label: str = '', chunk_data: bool = False, add_layer_data=True, gene_identifier_column=None) mdvtools.mdvproject.MDVProject[source]#

Convert a Scanpy AnnData object to MDV (Multi-Dimensional Viewer) format.

This function transforms single-cell RNA sequencing data from AnnData format into the MDV project structure, handling both cells and genes as separate datasources with their associated dimensionality reductions and metadata.

Parameters:
  • folder (str) – Path to the target MDV project folder

  • scanpy_object (AnnData) – The AnnData object containing the single-cell data

  • max_dims (int, optional) – Maximum number of dimensions to include from dimensionality reductions. Defaults to 3.

  • delete_existing (bool, optional) – Whether to delete existing project data. If False, merges with existing data. Defaults to False.

  • label (str, optional) – Prefix to add to datasource names and metadata columns when merging with existing data. Defaults to “”.

  • chunk_data (bool, optional) – For dense matrices, transposing and flattening will be performed in chunks. Saves memory but takes longer. Default is False.

  • add_layer_data (bool, optional) – If True (default) then the layer data (log values etc.) will be added, otherwise just the X object will be used

  • gene_identifier_column – (str, optional) This is the gene column that the user will use to identify the gene. If not specified (default) than a column ‘name’ will be added that is created from the index (which is usaully the unique gene name)

Returns:

The configured MDV project object with the converted data

Return type:

MDVProject

Notes

Data Structure Creation: - Creates two main datasources: ‘{label}cells’ and ‘{label}genes’ - Preserves all cell metadata from scanpy_object.obs - Preserves all gene metadata from scanpy_object.var - Transfers dimension reductions from obsm/varm matrices - Links cells and genes through expression data - Adds gene expression scores as a subgroup

View Handling: - If delete_existing=True:

  • Creates new default view with empty initial charts

  • Sets project as editable

  • If delete_existing=False:
    • Preserves existing views

    • Updates views with new datasources

    • Maintains panel widths and other view settings

    • Adds new datasources to each view’s initialCharts

Dimension Reduction: - Processes dimensionality reductions up to max_dims - Supports standard formats (e.g., PCA, UMAP, t-SNE) - Column names in the format: {reduction_name}_{dimension_number}

Raises:
  • ValueError – If the provided AnnData object is invalid or missing required components

  • IOError – If there are issues with file operations in the target folder

  • Exception – For other unexpected errors during conversion

mdvtools.conversions.convert_mudata_to_mdv(folder, mudata_object, max_dims=3, delete_existing=False, chunk_data=False)[source]#
mdvtools.conversions.get_matrix(matrix, main_names=[], mod_names=[])[source]#
mdvtools.conversions.convert_vcf_to_df(vcf_filename: str) pandas.DataFrame[source]#
mdvtools.conversions.compute_vcf_end(df: pandas.DataFrame) pandas.DataFrame[source]#

Compute the end position of the variant determined from ‘POS’, ‘REF’ and ‘ALT’.

This is added as a column ‘END’ in the given DataFrame.

mdvtools.conversions.convert_vcf_to_mdv(folder: str, vcf_filename: str) mdvtools.mdvproject.MDVProject[source]#

Converts a VCF file to an MDV project. The VCF file must be tab-delimited, with the header lines starting with “##” and column names in the line starting with “#CHROM”.

An ‘END’ column is derived, which is the end position of the variant determined from ‘POS’, ‘REF’ and ‘ALT’.

mdvtools.conversions.create_regulamentary_project_from_pipeline(output, config, results_folder, atac_bw=None, peaks='merge', genome='hg38', openchrom='DNase')[source]#

Creates a regulamentary project from pipeline outputs.

Parameters:
  • output (str) – Path to the directory which will house the MDV Project

  • config (str) – Path to the YAML configuration file.

  • results_folder (str) – Base path to the results directory.

  • atac_bw (str, optional) – Path to ATAC-seq bigWig file. Defaults to None.

  • peaks (str, optional) – Name of the peaks subdirectory. Defaults to “merge”.

  • genome (str, optional) – Genome assembly version to use. Defaults to “hg38”.

  • openchrom (str, optional) – Name of the open chromatin mark. Defaults to “DNase”.

Returns:

An MDVProject

mdvtools.conversions.create_regulamentary_project(output: str, table, bigwigs, beds, matrix=None, openchrom='DNase', marks=None, mark_colors=None, genome='hg38')[source]#

Creates a regulatory project visualization from input data sources.

This method constructs a project using signal and peak files for various histone marks and chromatin accessibility, adds them as data sources and tracks, and configures a genome browser and visualization views.

Parameters:
  • output (str) – Output directory or file for the project.

  • table (str) – Path to the CSV table containing regulatory element data.

  • bigwigs (dict) – Dictionary mapping mark names to bigWig file paths or URLs.

  • beds (dict) – Dictionary mapping mark names to BED file paths.

  • matrix (dict or None, optional) – Matrix and order file information for heatmaps, or None.

  • openchrom (str, optional) – Name for open chromatin mark. Defaults to “DNase”.

  • marks (list of str, optional) – List of marks to process. Defaults to [“ATAC”, “H3K4me1”, “H3K4me3”, “H3K27ac”, “CTCF”].

  • mark_colors (list of str, optional) – List of colors for the marks. Defaults to a preset palette.

  • genome (str, optional) – Genome assembly to use. Defaults to “hg38”.

Returns:

The project object constructed with the given data and views.

Return type:

MDVProject

mdvtools.conversions._create_dt_heatmap(project, matrix, ds='elements', ifields=None)[source]#
mdvtools.conversions._add_dims(table, dims, max_dims, stub='')[source]#