Skip to content

Commit

Permalink
Merge branch 'dev' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
jkobject committed Apr 3, 2024
2 parents ef0ae57 + d954f5e commit c662c9c
Show file tree
Hide file tree
Showing 27 changed files with 20,318 additions and 710 deletions.
42 changes: 42 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,48 @@ then run the notebooks with the poetry installed environment

## Usage

```python
# initialize a local lamin database
# !lamin init --storage ~/scdataloader --schema bionty

from scdataloader import utils
from scdataloader.preprocess import LaminPreprocessor, additional_postprocess, additional_preprocess

# preprocess datasets
DESCRIPTION='preprocessed by scDataLoader'

cx_dataset = ln.Collection.using(instance="laminlabs/cellxgene").filter(name="cellxgene-census", version='2023-12-15').one()
cx_dataset, len(cx_dataset.artifacts.all())


do_preprocess = LaminPreprocessor(additional_postprocess=additional_postprocess, additional_preprocess=additional_preprocess, skip_validate=True, subset_hvg=0)

preprocessed_dataset = do_preprocess(cx_dataset, name=DESCRIPTION, description=DESCRIPTION, start_at=6, version="2")

# create dataloaders
from scdataloader import DataModule
import tqdm

datamodule = DataModule(
collection_name="preprocessed dataset",
organisms=["NCBITaxon:9606"], #organism that we will work on
how="most expr", # for the collator (most expr genes only will be selected)
max_len=1000, # only the 1000 most expressed
batch_size=64,
num_workers=1,
validation_split=0.1,
test_split=0)

for i in tqdm.tqdm(datamodule.train_dataloader()):
# pass #or do pass
print(i)
break

# with lightning:
# Trainer(model, datamodule)

```

see the notebooks in [docs](https://jkobject.github.io/scDataLoader/):

1. [load a dataset](https://jkobject.github.io/scDataLoader/notebooks/01_load_dataset.html)
Expand Down
24 changes: 24 additions & 0 deletions additional.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import scanpy as sc
import pandas as pd

https://cells.ucsc.edu/hoc/blood/exprMatrix.tsv.gz
https://cells.ucsc.edu/hoc/blood/meta.tsv


curl -O https://cf.10xgenomics.com/samples/cell-exp/1.3.0/1M_neurons/1M_neurons_filtered_gene_bc_matrices_h5.h5
curl -O https://cf.10xgenomics.com/samples/cell-exp/1.3.0/1M_neurons/1M_neurons_reanalyze.csv


https://ftp.ncbi.nlm.nih.gov/geo/series/GSE247nnn/GSE247719/suppl/GSE247719%5F20240213%5FPanSci%5Fall%5Fcells%5Fadata.h5ad.gz

ad = sc.read_mtx("matrix.mtx.gz")
meta = pd.read_csv("meta.tsv", sep="\t")
ad.var = meta

# 95G /home/ml4ig1/.cache/lamindb/cellxgene-data-public/cell-census/2023-12-15/h5ads/
# 40G /home/ml4ig1/scprint/cell-census/2023-07-25/h5ads/
# 197G /home/ml4ig1/scprint/.lamindb/
# /home/ml4ig1/Documents code/scGPT/mytests/attn_scores_l11.pkl 8G


/home/ml4ig1/Documents code/scPRINT
38 changes: 0 additions & 38 deletions config/config.py

This file was deleted.

Empty file removed config/config.yml
Empty file.
7 changes: 7 additions & 0 deletions docs/collator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Documentation for `Collator`

::: scdataloader.collator.Collator
handler: python

::: scdataloader.collator.AnnDataCollator
handler: python
4 changes: 0 additions & 4 deletions docs/dataloader.md

This file was deleted.

4 changes: 4 additions & 0 deletions docs/datamodule.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Documentation for `DataModule`

::: scdataloader.datamodule.DataModule
handler: python
3 changes: 3 additions & 0 deletions docs/dataset.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Documentation for `Dataset`

::: scdataloader.data.Dataset
handler: python

::: scdataloader.data.SimpleAnnDataset
handler: python
9 changes: 9 additions & 0 deletions docs/preprocess.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
# Documentation for `Preprocessor`

::: scdataloader.preprocess.Preprocessor
handler: python

::: scdataloader.preprocess.LaminPreprocessor
handler: python

::: scdataloader.preprocess.additional_preprocess
handler: python

::: scdataloader.preprocess.additional_postprocess
handler: python
21 changes: 10 additions & 11 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
site_name: scdataloader
theme:
theme:
name: readthedocs
# analytics:
# gtag: G-ABC123
# gtag: G-ABC123
site_url: https://www.jkobject.com/scdataloader/
nav:
- Home: index.md
- Example notebooks:
- download and preprocess: notebooks/1_download_and_preprocess.ipynb
- use the dataloader: notebooks/2_create_dataloader.ipynb
- download and preprocess: notebooks/1_download_and_preprocess.ipynb
- use the dataloader: notebooks/2_create_dataloader.ipynb
- documentation:
- dataset: dataset.md
- preprocess: preprocess.md
- utils: utils.md
- dataloader: dataloader.md
plugins:
- dataset: dataset.md
- preprocess: preprocess.md
- utils: utils.md
- datamodule: datamodule.md
- collator: collator.md
plugins:
- search
- mkdocstrings:
handlers:
Expand All @@ -33,5 +34,3 @@ plugins:
- mkdocs-jupyter:
include_source: True
include_requirejs: true


Loading

0 comments on commit c662c9c

Please sign in to comment.