scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x #3099

ashish615 · 2024-06-05T09:26:31Z

Hi,
We are submitting PR for speed up of the _get_mean_var function.

	Time(sec)
Original	18.49
Updated	3.97
Speedup	4.65743073

experiment setup : AWS r7i.24xlarge

import time
import numpy as np

import pandas as pd

import scanpy as sc
from sklearn.cluster import KMeans

import os
import wget

import warnings



warnings.filterwarnings('ignore', 'Expected ')
warnings.simplefilter('ignore')
input_file = "./1M_brain_cells_10X.sparse.h5ad"

if not os.path.exists(input_file):
    print('Downloading import file...')
    wget.download('https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/1M_brain_cells_10X.sparse.h5ad',input_file)


# marker genes
MITO_GENE_PREFIX = "mt-" # Prefix for mitochondrial genes to regress out
markers = ["Stmn2", "Hes1", "Olig1"] # Marker genes for visualization

# filtering cells
min_genes_per_cell = 200 # Filter out cells with fewer genes than this expressed
max_genes_per_cell = 6000 # Filter out cells with more genes than this expressed

# filtering genes
min_cells_per_gene = 1 # Filter out genes expressed in fewer cells than this
n_top_genes = 4000 # Number of highly variable genes to retain

# PCA
n_components = 50 # Number of principal components to compute

# t-SNE
tsne_n_pcs = 20 # Number of principal components to use for t-SNE

# k-means
k = 35 # Number of clusters for k-means

# Gene ranking

ranking_n_top_genes = 50 # Number of differential genes to compute for each cluster

# Number of parallel jobs
sc._settings.ScanpyConfig.n_jobs = os.cpu_count()

start=time.time()
tr=time.time()
adata = sc.read(input_file)
adata.var_names_make_unique()
adata.shape
print("Total read time : %s" % (time.time()-tr))



tr=time.time()
# To reduce the number of cells:
USE_FIRST_N_CELLS = 1300000
adata = adata[0:USE_FIRST_N_CELLS]
adata.shape

sc.pp.filter_cells(adata, min_genes=min_genes_per_cell)
sc.pp.filter_cells(adata, max_genes=max_genes_per_cell)
sc.pp.filter_genes(adata, min_cells=min_cells_per_gene)
sc.pp.normalize_total(adata, target_sum=1e4)
print("Total filter and normalize time : %s" % (time.time()-tr))


tr=time.time()
sc.pp.log1p(adata)
print("Total log time : %s" % (time.time()-tr))


# Select highly variable genes
sc.pp.highly_variable_genes(adata, n_top_genes=n_top_genes, flavor = "cell_ranger")

# Retain marker gene expression
for marker in markers:
        adata.obs[marker + "_raw"] = adata.X[:, adata.var.index == marker].toarray().ravel()

# Filter matrix to only variable genes
adata = adata[:, adata.var.highly_variable]

ts=time.time()
#Regress out confounding factors (number of counts, mitochondrial gene expression)
mito_genes = adata.var_names.str.startswith(MITO_GENE_PREFIX)
n_counts = np.array(adata.X.sum(axis=1))
adata.obs['percent_mito'] = np.array(np.sum(adata[:, mito_genes].X, axis=1)) / n_counts
adata.obs['n_counts'] = n_counts


sc.pp.regress_out(adata, ['n_counts', 'percent_mito'])
print("Total regress out time : %s" % (time.time()-ts))

#scale

ts=time.time()
sc.pp.scale(adata)
print("Total scale time : %s" % (time.time()-ts))

add timer around _get_mean_var call

scanpy/scanpy/preprocessing/_scale.py

Line 167 in 706d4ef

mean, var = _get_mean_var(X)

we can also create _get_mean_var_std function that return std as well so we don't require to compute it in scale function(L168-L169).

codecov · 2024-06-05T09:51:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.31%. Comparing base (896e249) to head (7a1a62e).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3099      +/-   ##
==========================================
- Coverage   76.31%   76.31%   -0.01%     
==========================================
  Files         109      109              
  Lines       12513    12516       +3     
==========================================
+ Hits         9549     9551       +2     
- Misses       2964     2965       +1

Files	Coverage Δ
src/scanpy/preprocessing/_utils.py	`95.12% <100.00%> (-2.25%)`	⬇️

scverse-benchmark · 2024-06-17T12:00:23Z

Benchmark changes

Change	Before [`ad657ed`]	After [`e7a4662`]	Ratio	Benchmark (Parameter)
+	259M	310M	1.2	preprocessing_log.FastSuite.peakmem_mean_var('pbmc68k_reduced')
+	1.16±0.04ms	1.97±0.5ms	1.69	preprocessing_log.FastSuite.time_mean_var('pbmc68k_reduced')
+	255M	315M	1.23	preprocessing_log.peakmem_highly_variable_genes('pbmc68k_reduced')
-	373M	322M	0.86	preprocessing_log.peakmem_pca('pbmc68k_reduced')
-	1.03G	779M	0.76	preprocessing_log.peakmem_scale('pbmc3k')
-	729±5ms	517±5ms	0.71	preprocessing_log.time_scale('pbmc3k')

Comparison: https://github.com/scverse/scanpy/compare/ad657edfb52e9957b9a93b3a16fc8a87852f3f09..e7a466265b08f6973a5cf3fecfc27879104c02f4
Last changed: Tue, 18 Jun 2024 18:39:49 +0000

More details: https://github.com/scverse/scanpy/pull/3099/checks?check_run_id=26384736173

Intron7 · 2024-06-20T14:26:25Z

I have some small improvements that I would like to add next week for more precision for larger matrices

src/scanpy/preprocessing/_utils.py

Co-authored-by: Severin Dicks <[email protected]>

src/scanpy/preprocessing/_utils.py

Co-authored-by: Severin Dicks <[email protected]>

Intron7 · 2024-06-26T10:26:16Z

@ashish615 after doing some benchmarking myself I found out that your solution for axis=1 is under performing compared to axis=0 for larger arrays. I think that is because of the memory access pattern you choose. I rewrote the function with that in mind. I'll again make a PR to you, because for some reason you disallow us from making changes to your PR.

Intron7

Please merge IntelLabs#2

Scale mean variance

docs/release-notes/1.10.2.md

Intron7

This fixes the issues

src/scanpy/preprocessing/_utils.py

remove casting to match previous behavior Co-authored-by: Severin Dicks <[email protected]>

Intron7

Looks good to me

flying-sheep

I don’t see the claimed speedup in the benchmarks, what’s missing?

Also is numba.get_num_threads() safe? E.g. I think _get_mean_var is also called in each dask chunks. Will numba.get_num_threads() return a reasonable number in that case?

Otherwise nice! I’m not a huge fan of how unpythonic numba code looks, but I don’t think anything can be done about that.

flying-sheep · 2024-06-27T12:47:12Z

src/scanpy/preprocessing/_utils.py

+        # enforce R convention (unbiased estimator) for variance
+        var *= X.shape[axis] / (X.shape[axis] - 1)


Before your change, this line ran unconditionally, now it only runs for the not isinstance(X, np.ndarray) case. Is that intentional? Then you should mention that in _compute_mean_var’s docstring.

flying-sheep · 2024-06-27T12:48:09Z

src/scanpy/preprocessing/_utils.py

+
+
+@numba.njit(cache=True, parallel=True)
+def _compute_mean_var(


We already have _get_mean_var. Maybe rename this to _get_mean_var_ndarray or _get_mean_var_dense?

I think we can rename the kernel

flying-sheep · 2024-06-27T12:48:16Z

src/scanpy/preprocessing/_utils.py

+
+@numba.njit(cache=True, parallel=True)
+def _compute_mean_var(
+    X: _SupportedArray, axis: Literal[0, 1] = 0, n_threads=1


Suggested change

X: _SupportedArray, axis: Literal[0, 1] = 0, n_threads=1

X: _SupportedArray, axis: Literal[0, 1] = 0, n_threads: int = 1

flying-sheep · 2024-06-27T12:50:58Z

src/scanpy/preprocessing/_utils.py

+    if axis == 0:
+        axis_i = 1
+        sums = np.zeros((n_threads, X.shape[axis_i]), dtype=np.float64)
+        sums_squared = np.zeros((n_threads, X.shape[axis_i]), dtype=np.float64)
+        mean = np.zeros(X.shape[axis_i], dtype=np.float64)
+        var = np.zeros(X.shape[axis_i], dtype=np.float64)
+        n = X.shape[axis]
+        for i in numba.prange(n_threads):
+            for r in range(i, n, n_threads):
+                for c in range(X.shape[axis_i]):
+                    value = X[r, c]
+                    sums[i, c] += value
+                    sums_squared[i, c] += value * value
+        for c in numba.prange(X.shape[axis_i]):
+            sum_ = sums[:, c].sum()
+            mean[c] = sum_ / n
+            var[c] = (sums_squared[:, c].sum() - sum_ * sum_ / n) / (n - 1)
+    else:
+        axis_i = 0
+        mean = np.zeros(X.shape[axis_i], dtype=np.float64)
+        var = np.zeros(X.shape[axis_i], dtype=np.float64)
+        for r in numba.prange(X.shape[0]):
+            for c in range(X.shape[1]):
+                value = X[r, c]
+                mean[r] += value
+                var[r] += value * value
+        for c in numba.prange(X.shape[0]):
+            mean[c] = mean[c] / X.shape[1]
+            var[c] = (var[c] - mean[c] ** 2) / (X.shape[1] - 1)


Please don’t duplicate identical lines.

Suggested change

if axis == 0:

axis_i = 1

sums = np.zeros((n_threads, X.shape[axis_i]), dtype=np.float64)

sums_squared = np.zeros((n_threads, X.shape[axis_i]), dtype=np.float64)

mean = np.zeros(X.shape[axis_i], dtype=np.float64)

var = np.zeros(X.shape[axis_i], dtype=np.float64)

n = X.shape[axis]

for i in numba.prange(n_threads):

for r in range(i, n, n_threads):

for c in range(X.shape[axis_i]):

value = X[r, c]

sums[i, c] += value

sums_squared[i, c] += value * value

for c in numba.prange(X.shape[axis_i]):

sum_ = sums[:, c].sum()

mean[c] = sum_ / n

var[c] = (sums_squared[:, c].sum() - sum_ * sum_ / n) / (n - 1)

else:

axis_i = 0

mean = np.zeros(X.shape[axis_i], dtype=np.float64)

var = np.zeros(X.shape[axis_i], dtype=np.float64)

for r in numba.prange(X.shape[0]):

for c in range(X.shape[1]):

value = X[r, c]

mean[r] += value

var[r] += value * value

for c in numba.prange(X.shape[0]):

mean[c] = mean[c] / X.shape[1]

var[c] = (var[c] - mean[c] ** 2) / (X.shape[1] - 1)

axis_i = 1 - axis

mean = np.zeros(X.shape[axis_i], dtype=np.float64)

var = np.zeros(X.shape[axis_i], dtype=np.float64)

if axis == 0:

sums = np.zeros((n_threads, X.shape[axis_i]), dtype=np.float64)

sums_squared = np.zeros((n_threads, X.shape[axis_i]), dtype=np.float64)

n = X.shape[axis]

for i in numba.prange(n_threads):

for r in range(i, n, n_threads):

for c in range(X.shape[axis_i]):

value = X[r, c]

sums[i, c] += value

sums_squared[i, c] += value * value

for c in numba.prange(X.shape[axis_i]):

sum_ = sums[:, c].sum()

mean[c] = sum_ / n

var[c] = (sums_squared[:, c].sum() - sum_ * sum_ / n) / (n - 1)

else:

for r in numba.prange(X.shape[0]):

for c in range(X.shape[1]):

value = X[r, c]

mean[r] += value

var[r] += value * value

for c in numba.prange(X.shape[0]):

mean[c] = mean[c] / X.shape[1]

var[c] = (var[c] - mean[c] ** 2) / (X.shape[1] - 1)

I think we can slim this down a bit. The two different loops need to be separate though

Intron7 · 2024-06-27T13:06:08Z

The function should also work for 1 thread. numba.get_num_threads() is fine it works well with the sparse arrays. But I have no experience with it inside of dask.

ilan-gold · 2024-06-28T09:01:01Z

src/scanpy/preprocessing/_utils.py

+
+@numba.njit(cache=True, parallel=True)
+def _compute_mean_var(
+    X: _SupportedArray, axis: Literal[0, 1] = 0, n_threads=1


I don't think _SupportedArray is the right type annotation here. This doesn't run directly on dask.Array, unless I am misunderstanding something.

ashish615 added 4 commits June 5, 2024 08:40

_scale function updated, now speedup ~4.5x

65e7951

Merge branch 'scverse:main' into scale-mean-variance

6d1f278

print statement removed

37ef6ac

Merge branch 'main' into scale-mean-variance

34da6fb

ashish615 changed the title ~~scale function updated for dense array, speedup upto ~4.65x~~ scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x Jun 5, 2024

Zethson requested a review from Intron7 June 13, 2024 13:16

Zethson added this to the 1.10.2 milestone Jun 16, 2024

flying-sheep added the benchmark label Jun 17, 2024

ashish615 added 2 commits June 18, 2024 16:48

Merge branch 'main' into scale-mean-variance

f080c7f

_get_mean_var updated

e7a4662

ilan-gold modified the milestones: 1.10.2, 1.10.3 Jun 25, 2024

Intron7 reviewed Jun 26, 2024

View reviewed changes

src/scanpy/preprocessing/_utils.py Outdated Show resolved Hide resolved

Intron7 reviewed Jun 26, 2024

View reviewed changes

src/scanpy/preprocessing/_utils.py Outdated Show resolved Hide resolved

Apply suggestions from code review for commented line

b6b6139

Co-authored-by: Severin Dicks <[email protected]>

Intron7 reviewed Jun 26, 2024

View reviewed changes

src/scanpy/preprocessing/_utils.py Outdated Show resolved Hide resolved

Intron7 reviewed Jun 26, 2024

View reviewed changes

src/scanpy/preprocessing/_utils.py Outdated Show resolved Hide resolved

Apply suggestions from code review

06c0968

Co-authored-by: Severin Dicks <[email protected]>

Intron7 added 2 commits June 26, 2024 12:54

updates Intelkernel to work for axis= 1

d244924

adds supportedarray

53d40dd

Intron7 self-requested a review June 26, 2024 11:09

Intron7 requested changes Jun 26, 2024

View reviewed changes

Merge pull request #2 from scverse/scale-mean-variance

63404f2

Scale mean variance

Intron7 requested changes Jun 26, 2024

View reviewed changes

docs/release-notes/1.10.2.md Outdated Show resolved Hide resolved

ashish615 added 2 commits June 26, 2024 12:01

Merge branch 'main' into scale-mean-variance

9254a97

release notes updated for 1.10.3

357bfbc

Intron7 self-requested a review June 26, 2024 14:02

Intron7 requested changes Jun 26, 2024

View reviewed changes

src/scanpy/preprocessing/_utils.py Outdated Show resolved Hide resolved

src/scanpy/preprocessing/_utils.py Outdated Show resolved Hide resolved

ilan-gold reviewed Jun 26, 2024

View reviewed changes

src/scanpy/preprocessing/_utils.py Outdated Show resolved Hide resolved

Apply suggestions from code review

7a1a62e

remove casting to match previous behavior Co-authored-by: Severin Dicks <[email protected]>

Intron7 approved these changes Jun 26, 2024

View reviewed changes

Intron7 requested review from ilan-gold and flying-sheep June 26, 2024 23:19

flying-sheep requested changes Jun 27, 2024

View reviewed changes

ilan-gold reviewed Jun 28, 2024

View reviewed changes

ilan-gold assigned Intron7 Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x #3099

scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x #3099

ashish615 commented Jun 5, 2024 •

edited by flying-sheep

Loading

codecov bot commented Jun 5, 2024 •

edited

Loading

scverse-benchmark bot commented Jun 17, 2024 •

edited

Loading

Intron7 commented Jun 20, 2024

Intron7 commented Jun 26, 2024

Intron7 left a comment

Intron7 left a comment

Intron7 left a comment

flying-sheep left a comment •

edited

Loading

flying-sheep Jun 27, 2024

flying-sheep Jun 27, 2024

Intron7 Jun 27, 2024

flying-sheep Jun 27, 2024

flying-sheep Jun 27, 2024

Intron7 Jun 27, 2024

Intron7 commented Jun 27, 2024

ilan-gold Jun 28, 2024

		# enforce R convention (unbiased estimator) for variance
		var *= X.shape[axis] / (X.shape[axis] - 1)



		@numba.njit(cache=True, parallel=True)
		def _compute_mean_var(

	X: _SupportedArray, axis: Literal[0, 1] = 0, n_threads=1
	X: _SupportedArray, axis: Literal[0, 1] = 0, n_threads: int = 1

scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x #3099

Are you sure you want to change the base?

scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x #3099

Conversation

ashish615 commented Jun 5, 2024 • edited by flying-sheep Loading

codecov bot commented Jun 5, 2024 • edited Loading

Codecov Report

scverse-benchmark bot commented Jun 17, 2024 • edited Loading

Benchmark changes

Intron7 commented Jun 20, 2024

Intron7 commented Jun 26, 2024

Intron7 left a comment

Choose a reason for hiding this comment

Intron7 left a comment

Choose a reason for hiding this comment

Intron7 left a comment

Choose a reason for hiding this comment

flying-sheep left a comment • edited Loading

Choose a reason for hiding this comment

flying-sheep Jun 27, 2024

Choose a reason for hiding this comment

flying-sheep Jun 27, 2024

Choose a reason for hiding this comment

Intron7 Jun 27, 2024

Choose a reason for hiding this comment

flying-sheep Jun 27, 2024

Choose a reason for hiding this comment

flying-sheep Jun 27, 2024

Choose a reason for hiding this comment

Intron7 Jun 27, 2024

Choose a reason for hiding this comment

Intron7 commented Jun 27, 2024

ilan-gold Jun 28, 2024

Choose a reason for hiding this comment

ashish615 commented Jun 5, 2024 •

edited by flying-sheep

Loading

codecov bot commented Jun 5, 2024 •

edited

Loading

scverse-benchmark bot commented Jun 17, 2024 •

edited

Loading

flying-sheep left a comment •

edited

Loading