mdvtools.conversions#
Functions#
|
Convert a Scanpy AnnData object to MDV (Multi-Dimensional Viewer) format. |
|
|
|
|
|
|
|
Compute the end position of the variant determined from 'POS', 'REF' and 'ALT'. |
|
Converts a VCF file to an MDV project. |
|
Creates a regulamentary project from pipeline outputs. |
|
Creates a regulatory project visualization from input data sources. |
|
|
|
Module Contents#
- mdvtools.conversions.convert_scanpy_to_mdv(folder: str, scanpy_object: scanpy.AnnData, max_dims: int = 3, delete_existing: bool = False, label: str = '', chunk_data: bool = False, add_layer_data=True, gene_identifier_column=None) mdvtools.mdvproject.MDVProject [source]#
Convert a Scanpy AnnData object to MDV (Multi-Dimensional Viewer) format.
This function transforms single-cell RNA sequencing data from AnnData format into the MDV project structure, handling both cells and genes as separate datasources with their associated dimensionality reductions and metadata.
- Parameters:
folder (str) – Path to the target MDV project folder
scanpy_object (AnnData) – The AnnData object containing the single-cell data
max_dims (int, optional) – Maximum number of dimensions to include from dimensionality reductions. Defaults to 3.
delete_existing (bool, optional) – Whether to delete existing project data. If False, merges with existing data. Defaults to False.
label (str, optional) – Prefix to add to datasource names and metadata columns when merging with existing data. Defaults to “”.
chunk_data (bool, optional) – For dense matrices, transposing and flattening will be performed in chunks. Saves memory but takes longer. Default is False.
add_layer_data (bool, optional) – If True (default) then the layer data (log values etc.) will be added, otherwise just the X object will be used
gene_identifier_column – (str, optional) This is the gene column that the user will use to identify the gene. If not specified (default) than a column ‘name’ will be added that is created from the index (which is usaully the unique gene name)
- Returns:
The configured MDV project object with the converted data
- Return type:
Notes
Data Structure Creation: - Creates two main datasources: ‘{label}cells’ and ‘{label}genes’ - Preserves all cell metadata from scanpy_object.obs - Preserves all gene metadata from scanpy_object.var - Transfers dimension reductions from obsm/varm matrices - Links cells and genes through expression data - Adds gene expression scores as a subgroup
View Handling: - If delete_existing=True:
Creates new default view with empty initial charts
Sets project as editable
- If delete_existing=False:
Preserves existing views
Updates views with new datasources
Maintains panel widths and other view settings
Adds new datasources to each view’s initialCharts
Dimension Reduction: - Processes dimensionality reductions up to max_dims - Supports standard formats (e.g., PCA, UMAP, t-SNE) - Column names in the format: {reduction_name}_{dimension_number}
- Raises:
ValueError – If the provided AnnData object is invalid or missing required components
IOError – If there are issues with file operations in the target folder
Exception – For other unexpected errors during conversion
- mdvtools.conversions.convert_mudata_to_mdv(folder, mudata_object, max_dims=3, delete_existing=False, chunk_data=False)[source]#
- mdvtools.conversions.compute_vcf_end(df: pandas.DataFrame) pandas.DataFrame [source]#
Compute the end position of the variant determined from ‘POS’, ‘REF’ and ‘ALT’.
This is added as a column ‘END’ in the given DataFrame.
- mdvtools.conversions.convert_vcf_to_mdv(folder: str, vcf_filename: str) mdvtools.mdvproject.MDVProject [source]#
Converts a VCF file to an MDV project. The VCF file must be tab-delimited, with the header lines starting with “##” and column names in the line starting with “#CHROM”.
An ‘END’ column is derived, which is the end position of the variant determined from ‘POS’, ‘REF’ and ‘ALT’.
- mdvtools.conversions.create_regulamentary_project_from_pipeline(output, config, results_folder, atac_bw=None, peaks='merge', genome='hg38', openchrom='DNase')[source]#
Creates a regulamentary project from pipeline outputs.
- Parameters:
output (str) – Path to the directory which will house the MDV Project
config (str) – Path to the YAML configuration file.
results_folder (str) – Base path to the results directory.
atac_bw (str, optional) – Path to ATAC-seq bigWig file. Defaults to None.
peaks (str, optional) – Name of the peaks subdirectory. Defaults to “merge”.
genome (str, optional) – Genome assembly version to use. Defaults to “hg38”.
openchrom (str, optional) – Name of the open chromatin mark. Defaults to “DNase”.
- Returns:
An MDVProject
- mdvtools.conversions.create_regulamentary_project(output: str, table, bigwigs, beds, matrix=None, openchrom='DNase', marks=None, mark_colors=None, genome='hg38')[source]#
Creates a regulatory project visualization from input data sources.
This method constructs a project using signal and peak files for various histone marks and chromatin accessibility, adds them as data sources and tracks, and configures a genome browser and visualization views.
- Parameters:
output (str) – Output directory or file for the project.
table (str) – Path to the CSV table containing regulatory element data.
bigwigs (dict) – Dictionary mapping mark names to bigWig file paths or URLs.
beds (dict) – Dictionary mapping mark names to BED file paths.
matrix (dict or None, optional) – Matrix and order file information for heatmaps, or None.
openchrom (str, optional) – Name for open chromatin mark. Defaults to “DNase”.
marks (list of str, optional) – List of marks to process. Defaults to [“ATAC”, “H3K4me1”, “H3K4me3”, “H3K27ac”, “CTCF”].
mark_colors (list of str, optional) – List of colors for the marks. Defaults to a preset palette.
genome (str, optional) – Genome assembly to use. Defaults to “hg38”.
- Returns:
The project object constructed with the given data and views.
- Return type: