mdvtools.tests.mock_anndata

mdvtools.tests.mock_anndata#

Mock AnnData generation module for testing and stress testing.

This module provides utilities to create realistic AnnData objects with various configurations, data types, and edge cases for comprehensive testing of the MDV conversion pipeline.

Classes#

MockAnnDataFactory

Factory class for creating mock AnnData objects with various configurations.

Functions#

`suppress_anndata_warnings`()	Context manager to suppress expected AnnData warnings.
`chunked_log1p_normalization`(sparse_matrix[, chunk_size])	Perform log1p normalization in chunks to avoid dense matrices.
`chunked_zscore_normalization`(sparse_matrix[, chunk_size])	Perform z-score normalization in chunks to avoid dense matrices.
`estimate_memory_usage`(n_cells, n_genes[, sparse])	Estimate memory usage for a dataset.
`create_minimal_anndata`(→ scanpy.AnnData)	Create minimal AnnData object for testing (backward compatibility).
`create_realistic_anndata`(→ scanpy.AnnData)	Create realistic AnnData object with typical single-cell data features.
`create_large_anndata`(→ scanpy.AnnData)	Create large AnnData object for stress testing.
`create_memory_efficient_large_anndata`(→ scanpy.AnnData)	Create memory-efficient large AnnData object for stress testing.
`create_massive_anndata`(→ scanpy.AnnData)	Create massive AnnData object for extreme stress testing.
`create_extreme_anndata`(→ scanpy.AnnData)	Create extreme AnnData object for ultimate stress testing.
`create_fast_large_anndata`(→ scanpy.AnnData)	Create large AnnData object using fast generation mode.
`create_skeleton_anndata`(→ scanpy.AnnData)	Create large AnnData object with skeleton matrix (structure only).
`create_edge_case_anndata`(→ scanpy.AnnData)	Create AnnData object with various edge cases and problematic data.
`get_anndata_summary`(→ Dict[str, Any])	Get a summary of AnnData object properties for testing.
`validate_anndata`(→ bool)	Validate that AnnData object has expected structure.

Module Contents#

mdvtools.tests.mock_anndata.suppress_anndata_warnings()[source]#: Context manager to suppress expected AnnData warnings.

mdvtools.tests.mock_anndata.chunked_log1p_normalization(sparse_matrix, chunk_size=1000)[source]#: Perform log1p normalization in chunks to avoid dense matrices.

mdvtools.tests.mock_anndata.chunked_zscore_normalization(sparse_matrix, chunk_size=1000)[source]#: Perform z-score normalization in chunks to avoid dense matrices.

mdvtools.tests.mock_anndata.estimate_memory_usage(n_cells, n_genes, sparse=True)[source]#

Estimate memory usage for a dataset.

Parameters:

n_cells – Number of cells
n_genes – Number of genes
sparse – Whether the matrix is sparse

Returns:

Estimated memory usage in MB

class mdvtools.tests.mock_anndata.MockAnnDataFactory(random_seed: int | None = None)[source]#

Factory class for creating mock AnnData objects with various configurations.

create_minimal(n_cells: int = 10, n_genes: int = 5, add_missing: bool = False) → scanpy.AnnData[source]#: Create a minimal AnnData object for basic testing.

create_realistic(n_cells: int = 1000, n_genes: int = 2000, add_missing: bool = True) → scanpy.AnnData[source]#: Create a realistic AnnData object with typical single-cell data features.

create_large(n_cells: int = 10000, n_genes: int = 5000, add_missing: bool = True, density: float = 0.1) → scanpy.AnnData[source]#: Create a large AnnData object for stress testing.

create_memory_efficient_large(n_cells: int = 10000, n_genes: int = 5000, add_missing: bool = True, density: float = 0.1) → scanpy.AnnData[source]#

Create a large AnnData object optimized for memory efficiency.

This method creates large datasets without dense layers to avoid excessive memory consumption during stress testing.

create_massive_dataset(n_cells: int = 100000, n_genes: int = 10000, add_missing: bool = True, density: float = 0.1, chunk_size: int = 10000, mode: str = 'realistic') → scanpy.AnnData[source]#

Create a massive dataset (100k+ cells) for extreme stress testing.

This method uses chunked operations and memory-efficient approaches to handle datasets that would otherwise cause memory issues.

Parameters:

n_cells – Number of cells
n_genes – Number of genes
add_missing – Whether to add missing values
density – Density of non-zero elements
chunk_size – Size of chunks for matrix generation
mode – Generation mode - ‘realistic’, ‘fast’, or ‘skeleton’

create_extreme_dataset(n_cells: int = 1000000, n_genes: int = 5000, density: float = 0.001, chunk_size: int = 50000, mode: str = 'fast') → scanpy.AnnData[source]#

Create an extreme dataset (1M+ cells) for ultimate stress testing.

This method is optimized for generating very large datasets efficiently. Use ‘fast’ or ‘skeleton’ mode for best performance.

Parameters:

n_cells – Number of cells (default: 1M)
n_genes – Number of genes (default: 5K)
density – Density of non-zero elements (default: 0.1%)
chunk_size – Size of chunks for matrix generation
mode – Generation mode - ‘fast’ or ‘skeleton’ recommended for large datasets

create_edge_cases() → scanpy.AnnData[source]#: Create an AnnData object with various edge cases and problematic data.

create_with_specific_features(cell_types: List[str] | None = None, conditions: List[str] | None = None, gene_types: List[str] | None = None, n_cells: int = 100, n_genes: int = 200, **kwargs) → scanpy.AnnData[source]#: Create AnnData with specific categorical features.

_create_anndata(n_cells: int, n_genes: int, add_missing: bool = False, add_dim_reductions: bool = False, add_layers: bool = False, add_uns: bool = False, sparse_matrix: bool = False, density: float = 0.1, chunk_size: int = 10000, mode: str = 'realistic', cell_types: List[str] | None = None, conditions: List[str] | None = None, gene_types: List[str] | None = None, use_chunked_layers: bool = False, minimal_metadata: bool = False) → scanpy.AnnData[source]#: Internal method to create AnnData with specified features.

_create_obs_data(n_cells: int, cell_types: List[str], conditions: List[str], add_missing: bool, minimal_metadata: bool = False) → pandas.DataFrame[source]#: Create cell metadata DataFrame.

_create_var_data(n_genes: int, gene_types: List[str], add_missing: bool, minimal_metadata: bool = False) → pandas.DataFrame[source]#: Create gene metadata DataFrame.

_create_expression_matrix(n_cells: int, n_genes: int, sparse: bool = False, density: float = 0.1, chunk_size: int = 10000, mode: str = 'realistic') → numpy.ndarray | scipy.sparse.spmatrix[source]#

Create expression matrix with realistic single-cell data patterns.

Parameters:

n_cells – Number of cells
n_genes – Number of genes
sparse – Whether to create a sparse matrix
density – Density of non-zero elements (0.0 to 1.0). Default 0.1 (10% non-zero)
chunk_size – Size of chunks for large matrix generation
mode – Generation mode - ‘realistic’ (unique indices), ‘fast’ (may have duplicates), or ‘skeleton’ (structure only, no values)

_create_single_sparse_matrix(n_cells: int, n_genes: int, density: float, mode: str) → scipy.sparse.spmatrix[source]#: Create a single sparse matrix using the optimized approach.

_create_chunked_sparse_matrix(n_cells: int, n_genes: int, density: float, chunk_size: int, mode: str) → scipy.sparse.spmatrix[source]#: Create large sparse matrices using chunked generation.

_create_incremental_csr_matrix(n_cells: int, n_genes: int, density: float, chunk_size: int, mode: str) → scipy.sparse.spmatrix[source]#: Create large sparse matrices using incremental CSR construction to save memory.

_generate_unique_chunk_indices(chunk_cells: int, n_genes: int, nnz: int, row_offset: int) → tuple[source]#: Generate unique indices for a chunk.

_generate_fast_chunk_indices(chunk_cells: int, n_genes: int, nnz: int, row_offset: int) → tuple[source]#: Generate indices quickly (may have duplicates).

_create_skeleton_matrix(n_cells: int, n_genes: int, nnz: int) → scipy.sparse.spmatrix[source]#: Create a skeleton matrix with structure but no meaningful values.

_create_moderately_sparse_matrix(n_cells: int, n_genes: int, nnz: int, mode: str) → scipy.sparse.spmatrix[source]#: Create moderately sparse matrices efficiently with guaranteed unique indices.

_create_dense_sparse_matrix(n_cells: int, n_genes: int, nnz: int, mode: str) → scipy.sparse.spmatrix[source]#: Create dense sparse matrices using simple approach (duplicates are less likely).

_create_very_sparse_matrix(n_cells: int, n_genes: int, nnz: int, mode: str) → Any[source]#: Create very sparse matrices efficiently using COO format.

_generate_realistic_expression_values(n_values: int) → numpy.ndarray[source]#

Generate realistic single-cell expression values.

Single-cell data typically follows a negative binomial distribution with many zeros and a long tail of high expression values.

_add_dimension_reductions(adata: scanpy.AnnData)[source]#: Add dimensionality reductions to the AnnData object.

_add_layers(adata: scanpy.AnnData, use_chunked_layers: bool)[source]#: Add expression layers to the AnnData object.

_add_unstructured_data(adata: scanpy.AnnData)[source]#: Add unstructured data to the AnnData object.

_create_edge_case_anndata() → scanpy.AnnData[source]#: Create AnnData with various edge cases and problematic data.

mdvtools.tests.mock_anndata.create_minimal_anndata(n_cells: int = 10, n_genes: int = 5, add_missing: bool = False) → scanpy.AnnData[source]#

Create minimal AnnData object for testing (backward compatibility).

This function maintains exact backward compatibility with the original implementation to ensure existing tests continue to pass.

mdvtools.tests.mock_anndata.create_realistic_anndata(n_cells: int = 1000, n_genes: int = 2000, add_missing: bool = True) → scanpy.AnnData[source]#: Create realistic AnnData object with typical single-cell data features.

mdvtools.tests.mock_anndata.create_large_anndata(n_cells: int = 10000, n_genes: int = 5000, add_missing: bool = True, density: float = 0.1) → scanpy.AnnData[source]#: Create large AnnData object for stress testing.

mdvtools.tests.mock_anndata.create_memory_efficient_large_anndata(n_cells: int = 10000, n_genes: int = 5000, add_missing: bool = True, density: float = 0.1) → scanpy.AnnData[source]#

Create memory-efficient large AnnData object for stress testing.

This function creates large datasets without dense layers to avoid excessive memory consumption during stress testing.

mdvtools.tests.mock_anndata.create_massive_anndata(n_cells: int = 100000, n_genes: int = 10000, add_missing: bool = True, density: float = 0.1, chunk_size: int = 10000, mode: str = 'realistic') → scanpy.AnnData[source]#

Create massive AnnData object for extreme stress testing.

This function creates datasets with 100k+ cells using chunked operations to handle memory efficiently. Suitable for testing with real-world scale data.

mdvtools.tests.mock_anndata.create_extreme_anndata(n_cells: int = 1000000, n_genes: int = 5000, density: float = 0.001, chunk_size: int = 50000, mode: str = 'fast') → scanpy.AnnData[source]#

Create extreme AnnData object for ultimate stress testing.

This function creates datasets with 1M+ cells using optimized chunked operations. Use ‘fast’ or ‘skeleton’ mode for best performance with large datasets.

mdvtools.tests.mock_anndata.create_fast_large_anndata(n_cells: int = 100000, n_genes: int = 5000, density: float = 0.1, chunk_size: int = 10000) → scanpy.AnnData[source]#

Create large AnnData object using fast generation mode.

This function prioritizes speed over perfect accuracy (may have duplicate indices). Suitable for stress testing where speed is more important than data quality.

mdvtools.tests.mock_anndata.create_skeleton_anndata(n_cells: int = 100000, n_genes: int = 5000, density: float = 0.1, chunk_size: int = 10000) → scanpy.AnnData[source]#

Create large AnnData object with skeleton matrix (structure only).

This function creates a matrix with the correct structure but placeholder values. Fastest option for testing pipeline structure without realistic data.

mdvtools.tests.mock_anndata.create_edge_case_anndata() → scanpy.AnnData[source]#: Create AnnData object with various edge cases and problematic data.

mdvtools.tests.mock_anndata.get_anndata_summary(adata: scanpy.AnnData) → Dict[str, Any][source]#: Get a summary of AnnData object properties for testing.

mdvtools.tests.mock_anndata.validate_anndata(adata: scanpy.AnnData) → bool[source]#: Validate that AnnData object has expected structure.