Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError in prune2df #132

Closed
Matthias3033 opened this issue Feb 9, 2020 · 10 comments
Closed

AssertionError in prune2df #132

Matthias3033 opened this issue Feb 9, 2020 · 10 comments

Comments

@Matthias3033
Copy link

Hi,

I get the following error message when I use the function prune2df:

AssertionError Traceback (most recent call last)
in
3 # Calculate a list of enriched motifs and the corresponding target genes for all modules.
4 with ProgressBar():
----> 5 df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME_HS)
6
7 # Create regulons from this table of enriched motifs.

~/miniconda3/lib/python3.7/site-packages/pyscenic/prune.py in prune2df(rnkdbs, modules, motif_annotations_fname, rank_threshold, auc_threshold, nes_threshold, motif_similarity_fdr, orthologuous_identity_threshold, weighted_recovery, client_or_address, num_workers, module_chunksize, filter_for_annotation)
349 return _distributed_calc(rnkdbs, modules, motif_annotations_fname, transformation_func, aggregation_func,
350 motif_similarity_fdr, orthologuous_identity_threshold, client_or_address,
--> 351 num_workers, module_chunksize)
352
353

~/miniconda3/lib/python3.7/site-packages/pyscenic/prune.py in _distributed_calc(rnkdbs, modules, motif_annotations_fname, transform_func, aggregate_func, motif_similarity_fdr, orthologuous_identity_threshold, client_or_address, num_workers, module_chunksize)
298 if client_or_address == "dask_multiprocessing":
299 # ... via multiprocessing.
--> 300 return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
301 else:
302 # ... via dask.distributed framework.

~/miniconda3/lib/python3.7/site-packages/dask/base.py in compute(self, **kwargs)
154 dask.base.compute
155 """
--> 156 (result,) = compute(self, traverse=False, **kwargs)
157 return result
158

~/miniconda3/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
395 keys = [x.dask_keys() for x in collections]
396 postcomputes = [x.dask_postcompute() for x in collections]
--> 397 results = schedule(dsk, keys, **kwargs)
398 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
399

~/miniconda3/lib/python3.7/site-packages/dask/multiprocessing.py in get(dsk, keys, num_workers, func_loads, func_dumps, optimize_graph, **kwargs)
190 get_id=_process_get_id, dumps=dumps, loads=loads,
191 pack_exception=pack_exception,
--> 192 raise_exception=reraise, **kwargs)
193 finally:
194 if cleanup:

~/miniconda3/lib/python3.7/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
499 _execute_task(task, data) # Re-execute locally
500 else:
--> 501 raise_exception(exc, tb)
502 res, worker_id = loads(res_info)
503 state['cache'][key] = res

~/miniconda3/lib/python3.7/site-packages/dask/compatibility.py in reraise(exc, tb)
110 if exc.traceback is not tb:
111 raise exc.with_traceback(tb)
--> 112 raise exc
113
114 else:

~/miniconda3/lib/python3.7/site-packages/dask/local.py in execute_task()
270 try:
271 task, data = loads(task_info)
--> 272 result = _execute_task(task, data)
273 id = get_id()
274 result = dumps((result, id))

~/miniconda3/lib/python3.7/site-packages/dask/local.py in _execute_task()
250 elif istask(arg):
251 func, args = arg[0], arg[1:]
--> 252 args2 = [_execute_task(a, cache) for a in args]
253 return func(*args2)
254 elif not ishashable(arg):

~/miniconda3/lib/python3.7/site-packages/dask/local.py in ()
250 elif istask(arg):
251 func, args = arg[0], arg[1:]
--> 252 args2 = [_execute_task(a, cache) for a in args]
253 return func(*args2)
254 elif not ishashable(arg):

~/miniconda3/lib/python3.7/site-packages/dask/local.py in _execute_task()
251 func, args = arg[0], arg[1:]
252 args2 = [_execute_task(a, cache) for a in args]
--> 253 return func(*args2)
254 elif not ishashable(arg):
255 return arg

~/miniconda3/lib/python3.7/site-packages/pyscenic/transform.py in modules2df()
229 #TODO: Remove this restriction.
230 return pd.concat([module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func)
--> 231 for module in modules])
232
233

~/miniconda3/lib/python3.7/site-packages/pyscenic/transform.py in ()
229 #TODO: Remove this restriction.
230 return pd.concat([module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func)
--> 231 for module in modules])
232
233

~/miniconda3/lib/python3.7/site-packages/pyscenic/transform.py in module2df()
183 try:
184 df_annotated_features, rccs, rankings, genes, avg2stdrcc = module2features_func(db, module, motif_annotations,
--> 185 weighted_recovery=weighted_recovery)
186 except MemoryError:
187 LOGGER.error("Unable to process "{}" on database "{}" because ran out of memory. Stacktrace:".format(module.name, db.name))

~/miniconda3/lib/python3.7/site-packages/pyscenic/transform.py in module2features_auc1st_impl()
127 # Calculate recovery curves, AUC and NES values.
128 # For fast unweighted implementation so weights to None.
--> 129 aucs = calc_aucs(df, db.total_genes, weights, auc_threshold)
130 ness = (aucs - aucs.mean()) / aucs.std()
131

~/miniconda3/lib/python3.7/site-packages/pyscenic/recovery.py in aucs()
282 # for calculationg the maximum AUC.
283 maxauc = float((rank_cutoff+1) * y_max)
--> 284 assert maxauc > 0
285 return auc2d(rankings, weights, rank_cutoff, maxauc)

AssertionError:

As ranking database I use homo sapiens. I do not receive this error message when using Mus musculus for another data set. The error mentioned under issue 85 is not present here. Does anyone have an idea how to fix this error?

@cflerin
Copy link
Contributor

cflerin commented Feb 10, 2020

Hi @Matthias3033 ,

Can you list the databases you are using here? From the error, it sounds like there were no genes found in database that overlap with your data.

@Matthias3033
Copy link
Author

Hi @cflerin,

these are the databases that I use:
FeatherRankingDatabase(name="hg19-tss-centered-10kb-10species.mc9nr"),
FeatherRankingDatabase(name="hg19-tss-centered-10kb-7species.mc9nr"),
FeatherRankingDatabase(name="hg19-tss-centered-5kb-10species.mc9nr"),
FeatherRankingDatabase(name="hg19-500bp-upstream-7species.mc9nr"),
FeatherRankingDatabase(name="hg19-tss-centered-5kb-7species.mc9nr"),
FeatherRankingDatabase(name="hg19-500bp-upstream-10species.mc9nr")

@cflerin
Copy link
Contributor

cflerin commented Feb 10, 2020

The databases look fine (although there's no need to use the 7-species when also using 10-species, but it won't cause issues). Are you also using the correct motif annotations file (for human)? How many genes in your expression matrix? And how many modules do you have?

@Matthias3033
Copy link
Author

I am using the correct motif file. The number of genes is 17098. How do I get the number of modules? (with len(modules) I get 4996)

@cflerin
Copy link
Contributor

cflerin commented Feb 10, 2020

Just noticed:

186 except MemoryError:
187 LOGGER.error("Unable to process "{}" on database "{}" because ran out of memory.

which seems self-explanatory. You could try taking out the three 7-species databases and see if it works with the remaining databases.

@Matthias3033
Copy link
Author

Matthias3033 commented Feb 10, 2020

Same error. I've also tried it with only one 7 species database - still the same error.

@cflerin
Copy link
Contributor

cflerin commented Feb 11, 2020

How much memory do you have available on your machine? You could try reducing the number of processes that pyscenic is using...

@Matthias3033
Copy link
Author

Matthias3033 commented Feb 11, 2020

How can I reduce the number of processes?

@bramvds
Copy link
Contributor

bramvds commented Feb 11, 2020

Via CLI you have the parameter --num_workers N where N specifies the number of cores to use. Using the API for Jupyter notebooks, a similar parameter is available.

For the prune2df function (cistarget step) the parameter name is num_workers. For grnboost, I kindely refer you to the arboreto package documentation: https://github.com/tmoerman/arboreto . Briefly, you need to use a construct like this:

from pyscenic.prune import _prepare_client
from arboreto import grnboost2

client, shutdown_callback = _prepare_client('local_host', num_workers=12)
network = grnboost2(expression_data=ex_mtx, tf_names=tf_names, verbose=True, client_or_address=client)

@Matthias3033
Copy link
Author

Matthias3033 commented Feb 11, 2020

How much memory do you have available on your machine? You could try reducing the number of processes that pyscenic is using...

I ideally have 120 gb of RAM, so the memory should normally not be a problem

cflerin added a commit that referenced this issue Jun 3, 2020
- Previously such modules would cause an error, now these modules are
skipped.
- Related to #158, #177, #132, #85
cflerin added a commit that referenced this issue Jul 17, 2020
- Previously such modules would cause an error, now these modules are
skipped.
- Related to #158, #177, #132, #85
@cflerin cflerin mentioned this issue Jul 17, 2020
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants