mdvtools.conversions#

Functions#

`convert_scanpy_to_mdv`(→ mdvtools.mdvproject.MDVProject)	Convert a Scanpy AnnData object to MDV (Multi-Dimensional Viewer) format.
`convert_mudata_to_mdv`(folder, mudata_object[, ...])
`get_matrix`(matrix[, main_names, mod_names])
`convert_vcf_to_df`(→ pandas.DataFrame)
`compute_vcf_end`(→ pandas.DataFrame)	Compute the end position of the variant determined from 'POS', 'REF' and 'ALT'.
`convert_vcf_to_mdv`(→ mdvtools.mdvproject.MDVProject)	Converts a VCF file to an MDV project.
`create_regulamentary_project_from_pipeline`(output, ...)	Creates a regulamentary project from pipeline outputs.
`create_regulamentary_project`(output, table, bigwigs, beds)	Creates a regulatory project visualization from input data sources.
`_create_dt_heatmap`(project, matrix[, ds, ifields])
`_add_dims`(table, dims, max_dims[, stub])

Module Contents#

mdvtools.conversions.convert_scanpy_to_mdv(folder: str, scanpy_object: scanpy.AnnData, max_dims: int = 3, delete_existing: bool = False, label: str = '', chunk_data: bool = False, add_layer_data=True, gene_identifier_column=None) → mdvtools.mdvproject.MDVProject[source]#

Convert a Scanpy AnnData object to MDV (Multi-Dimensional Viewer) format.

This function transforms single-cell RNA sequencing data from AnnData format into the MDV project structure, handling both cells and genes as separate datasources with their associated dimensionality reductions and metadata.

Parameters:

folder (str) – Path to the target MDV project folder
scanpy_object (AnnData) – The AnnData object containing the single-cell data
max_dims (int, optional) – Maximum number of dimensions to include from dimensionality reductions. Defaults to 3.
delete_existing (bool, optional) – Whether to delete existing project data. If False, merges with existing data. Defaults to False.
label (str, optional) – Prefix to add to datasource names and metadata columns when merging with existing data. Defaults to “”.
chunk_data (bool, optional) – For dense matrices, transposing and flattening will be performed in chunks. Saves memory but takes longer. Default is False.
add_layer_data (bool, optional) – If True (default) then the layer data (log values etc.) will be added, otherwise just the X object will be used
gene_identifier_column – (str, optional) This is the gene column that the user will use to identify the gene. If not specified (default) than a column ‘name’ will be added that is created from the index (which is usaully the unique gene name)

Returns:

The configured MDV project object with the converted data

Return type:

MDVProject

Notes

Data Structure Creation: - Creates two main datasources: ‘{label}cells’ and ‘{label}genes’ - Preserves all cell metadata from scanpy_object.obs - Preserves all gene metadata from scanpy_object.var - Transfers dimension reductions from obsm/varm matrices - Links cells and genes through expression data - Adds gene expression scores as a subgroup

View Handling: - If delete_existing=True:

Creates new default view with empty initial charts

Sets project as editable

If delete_existing=False:
- Preserves existing views
- Updates views with new datasources
- Maintains panel widths and other view settings
- Adds new datasources to each view’s initialCharts

Dimension Reduction: - Processes dimensionality reductions up to max_dims - Supports standard formats (e.g., PCA, UMAP, t-SNE) - Column names in the format: {reduction_name}_{dimension_number}

Raises:

ValueError – If the provided AnnData object is invalid or missing required components
IOError – If there are issues with file operations in the target folder
Exception – For other unexpected errors during conversion

mdvtools.conversions.convert_mudata_to_mdv(folder, mudata_object, max_dims=3, delete_existing=False, chunk_data=False)[source]#

mdvtools.conversions.get_matrix(matrix, main_names=[], mod_names=[])[source]#

mdvtools.conversions.convert_vcf_to_df(vcf_filename: str) → pandas.DataFrame[source]#

mdvtools.conversions.compute_vcf_end(df: pandas.DataFrame) → pandas.DataFrame[source]#

Compute the end position of the variant determined from ‘POS’, ‘REF’ and ‘ALT’.

This is added as a column ‘END’ in the given DataFrame.

mdvtools.conversions.convert_vcf_to_mdv(folder: str, vcf_filename: str) → mdvtools.mdvproject.MDVProject[source]#

Converts a VCF file to an MDV project. The VCF file must be tab-delimited, with the header lines starting with “##” and column names in the line starting with “#CHROM”.

An ‘END’ column is derived, which is the end position of the variant determined from ‘POS’, ‘REF’ and ‘ALT’.

mdvtools.conversions.create_regulamentary_project_from_pipeline(output, config, results_folder, atac_bw=None, peaks='merge', genome='hg38', openchrom='DNase')[source]#

Creates a regulamentary project from pipeline outputs.

Parameters:

output (str) – Path to the directory which will house the MDV Project
config (str) – Path to the YAML configuration file.
results_folder (str) – Base path to the results directory.
atac_bw (str, optional) – Path to ATAC-seq bigWig file. Defaults to None.
peaks (str, optional) – Name of the peaks subdirectory. Defaults to “merge”.
genome (str, optional) – Genome assembly version to use. Defaults to “hg38”.
openchrom (str, optional) – Name of the open chromatin mark. Defaults to “DNase”.

Returns:

An MDVProject

mdvtools.conversions.create_regulamentary_project(output: str, table, bigwigs, beds, matrix=None, openchrom='DNase', marks=None, mark_colors=None, genome='hg38')[source]#

Creates a regulatory project visualization from input data sources.

This method constructs a project using signal and peak files for various histone marks and chromatin accessibility, adds them as data sources and tracks, and configures a genome browser and visualization views.

Parameters:

output (str) – Output directory or file for the project.
table (str) – Path to the CSV table containing regulatory element data.
bigwigs (dict) – Dictionary mapping mark names to bigWig file paths or URLs.
beds (dict) – Dictionary mapping mark names to BED file paths.
matrix (dict or None, optional) – Matrix and order file information for heatmaps, or None.
openchrom (str, optional) – Name for open chromatin mark. Defaults to “DNase”.
marks (list of str, optional) – List of marks to process. Defaults to [“ATAC”, “H3K4me1”, “H3K4me3”, “H3K27ac”, “CTCF”].
mark_colors (list of str, optional) – List of colors for the marks. Defaults to a preset palette.
genome (str, optional) – Genome assembly to use. Defaults to “hg38”.

Returns:

The project object constructed with the given data and views.

Return type:

MDVProject

mdvtools.conversions._create_dt_heatmap(project, matrix, ds='elements', ifields=None)[source]#

mdvtools.conversions._add_dims(table, dims, max_dims, stub='')[source]#