Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation of bytes failed #377

Open
nleroy917 opened this issue Feb 13, 2025 · 0 comments
Open

Memory allocation of bytes failed #377

nleroy917 opened this issue Feb 13, 2025 · 0 comments

Comments

@nleroy917
Copy link

Hello!

I'm running into a weird memory issue. I'm running snap.pp.scrublet on a large list of adatas and after I get ~66% of the way through, I start seeing the following errors/warnings:

memory allocation of 1179648 bytes failed
memory allocation of 32768 bytes failed

They happen sporadically, but don't panic/kill the process. the exact code I am running is:

import os

from glob import glob

import snapatac2 as snap
import pandas as pd
import plotly.io as pio
import numpy as np

from tqdm import tqdm

N_JOBS = 10

# check version of snap
print(snap.__version__)

output_dir = os.path.expandvars("$BRICKYARD/results_analysis/scatlas2/h5ads/")

os.makedirs(output_dir, exist_ok=True)

CELLRANGER_OUTS = os.path.expandvars("$BRICKYARD/results_analysis/scatlas2/fragments/")

# get a list of the fragment files in the data directory
fragment_files = glob(f"{CELLRANGER_OUTS}/*_fragments.tsv.gz")

print(f"Found {len(fragment_files)} fragment files")

outputs = []
for fl in fragment_files:
    name = fl.split("/")[-1].split(".tsv.gz")[0]
    # outputs.append(f"{output_dir}/{name}.h5ad")
    outputs.append(
        os.path.join(output_dir, f"{name}.h5ad")
    )

# def main():
# import the data (process it and save to h5ad files)
adatas = snap.pp.import_fragments(
    fragment_files,
    file=outputs,
    chrom_sizes=snap.genome.hg38,
    min_num_fragments=1000,
    n_jobs=N_JOBS,
    # sorted_by_barcode=False,
    # tempdir=tmp_dir,
)

# checkpoint
# adatas = [
#     snap.read(f) for f in outputs
# ]

snap.pp.add_tile_matrix(adatas, bin_size=5000, n_jobs=N_JOBS)
snap.pp.select_features(adatas, n_jobs=N_JOBS)

snap.pp.scrublet(adatas, n_jobs=N_JOBS)

All in all its about 150 fragment files... so its a lot of data. My resources are large, however at:

  • 40 cores
  • 256G RAM

I am curious if you have any insight? The process continues to run, but it feels like i shouldn't ignore this. What do you think? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant