Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepro rework #154

Merged
merged 209 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
209 commits
Select commit Hold shift + click to select a range
67e9682
fixed backend bug that lead to errors when a project contains no docu…
floschne Jun 22, 2023
fb73b1f
fixed backend bug that wrongly excluded password serialization from p…
floschne Jun 22, 2023
fa6c600
black formatting
floschne Jun 22, 2023
f079f95
minor changes and formatting
floschne May 16, 2023
4aa47f5
implemented crud methods for preprocessing jobs in redis service
floschne May 16, 2023
a537eec
fixed optional bug for project id parameter to get all crawler jobs i…
floschne May 16, 2023
280287b
fixed typo in var name for crawler job in redis service
floschne May 16, 2023
b1d81dd
fixed copy paste typo in crawler job
floschne May 16, 2023
7549be4
implemented endpoints for prepro jobs
floschne Jun 22, 2023
7a221e1
added exception handler for files that are already in the backend repo
floschne Jun 23, 2023
15e113e
fixed import order in backend api
floschne Jun 23, 2023
12b0a4f
changed payload dict to list in dto
floschne Jun 23, 2023
8fefb77
creating and returning a preprocessing job when importing documents t…
floschne Jun 23, 2023
2f31f6c
implemented first sketch of prepro service
floschne Jun 23, 2023
7053af7
using prepro service in project endpoint
floschne Jun 23, 2023
641ad52
added project id to prepro job payload and formatted code with black
floschne Jun 23, 2023
63203d4
added status to prepro payload and added init status
floschne Jun 23, 2023
7573d14
implemented method to get an url for a file in the repo
floschne Jun 27, 2023
d2fc913
removed unnecessary exposed ports and tidied up the .env files
floschne Jul 4, 2023
7edb828
fixed prepro endpoints
floschne Jul 5, 2023
e6e1fe6
implemented method to update single payloads in a PPJ
floschne Jul 5, 2023
b3fa075
excluding none values when updating PPJs
floschne Jul 5, 2023
a5e8252
isort and black
floschne Aug 7, 2023
b098694
renamed prepro payload file path to file name
floschne Aug 8, 2023
1230a31
minor commit to improve log
floschne Aug 10, 2023
982582a
flushing redis when resetting data in startup
floschne Aug 10, 2023
4fbe35b
renamed file path to file name in prepro pipeline functions
floschne Aug 10, 2023
5e83902
fixed filename type to str
floschne Aug 10, 2023
78f5b71
removed old code to import uploaded documents
floschne Aug 10, 2023
01ca04b
updated the interface of prepro methods to accept prepro job payloads
floschne Aug 10, 2023
ba73b58
added multiprocess as backend dep and updated transformers dep to 4.31.0
floschne Aug 17, 2023
9779154
updated backend deps
floschne Aug 17, 2023
7b60e12
isort and black
floschne Aug 17, 2023
df334d5
renamed PreprocessingJobStatus to Status
floschne Aug 17, 2023
e3d61c2
added current step, status, and error message to prepro job payload
floschne Aug 17, 2023
5c9f8c0
added method to create and update dto for PPJ status updates
floschne Aug 17, 2023
20fab16
minor fix when updating PPJ Payloads
floschne Aug 17, 2023
fc3a2a0
implemented pipeline cargo model
floschne Aug 17, 2023
3a117f6
implemented pipeline step model
floschne Aug 17, 2023
03f3e17
implemented preprocessing pipeline
floschne Aug 17, 2023
29aa083
updated prepro service to work with prepro pipelines
floschne Aug 17, 2023
9dc9af7
added more info to the preprojob item in frontend
floschne Sep 27, 2023
5485151
added spacy model download scripts to heavy jobs worker entrypoint sc…
floschne Aug 17, 2023
aecb2fb
updated spacy models in config files
floschne Aug 17, 2023
bcf1535
added lru cache decorators to methods that load simsearch models
floschne Aug 17, 2023
ac42b19
preprodoc base
floschne Aug 17, 2023
e0147ce
models for text pipeline
floschne Aug 17, 2023
7586471
implemented steps for text pipeline
floschne Aug 17, 2023
aa2093a
implemented build text pipeline method
floschne Aug 17, 2023
6470c3e
improved frontend husky precommit hook to only run on frontend changes
floschne Aug 17, 2023
a9bb8d8
models for audio pipeline
floschne Aug 17, 2023
d751412
steps for audio pipeline
floschne Aug 17, 2023
635a790
implemented build audio pipeline method
floschne Aug 17, 2023
5da9697
models for video pipeline
floschne Aug 17, 2023
962c085
steps for video pipeline
floschne Aug 17, 2023
cba85f8
implemented build video pipeline method
floschne Aug 17, 2023
cfdf54e
models for image pipeline
floschne Aug 17, 2023
2256e75
steps for image pipeline
floschne Aug 17, 2023
4db73b0
implemented build image pipeline method
floschne Aug 17, 2023
77d269c
introduced common status for all background jobs
floschne Aug 18, 2023
c6b30f0
fixed bug when generating thumbnail for video
floschne Aug 22, 2023
845612a
storing word level transcriptions as metadata for audio sdocs
floschne Aug 22, 2023
16411f7
typos and minor fixes
floschne Aug 22, 2023
54e76f0
fixed bug to receive linked transcription sdocs for video and audio docs
floschne Aug 23, 2023
1da224f
using backgroundjob base class and status
floschne Aug 23, 2023
b3fc52c
parent sdoc id is now optional when for sdoc link create dtos
floschne Aug 23, 2023
47b0c2e
removed unnecessary preprocessing files
floschne Aug 23, 2023
12654ac
fixed import in proj endpoints
floschne Aug 23, 2023
c45b6b7
added doctype member to prepro pipeline class
floschne Aug 23, 2023
cf69377
removed sdoc id from prepro docs
floschne Aug 23, 2023
acdb7d3
improved and added steps to text pipeline
floschne Aug 23, 2023
2f89cd3
improved and added steps to image pipeline
floschne Aug 23, 2023
12422ba
improved and added steps to audio pipeline
floschne Aug 23, 2023
6e7188e
improved and added steps to video pipeline
floschne Aug 23, 2023
c7d9eb3
added missing pipeline step files
floschne Aug 23, 2023
b30b535
moved celery code into own directory and renamed heavy jobs worker to…
floschne Aug 23, 2023
3e8991f
removed old docprepro code
floschne Aug 23, 2023
92c964b
updated generated backend api code in frontend
floschne Aug 23, 2023
0f5e997
reset default theme in frontend
floschne Aug 23, 2023
9219174
updated frontend code to use background job status
floschne Aug 23, 2023
29a2680
renamed crawler hook polling method
floschne Aug 23, 2023
2c98071
extended project background tasks to work with prepro jobs and impro…
floschne Aug 23, 2023
ff3d508
isort and black formatting
floschne Aug 23, 2023
57cf71a
formatting
floschne Aug 23, 2023
8b8e9b0
implemented common viewer for audio and video and improved automatic …
floschne Aug 23, 2023
8c76156
fixed missing import in crawler
floschne Aug 23, 2023
e128f4a
black and isort formatting for tools
floschne Aug 23, 2023
5bb6db6
improved crawler and prepro bg job view in project settings
floschne Aug 23, 2023
3c8b52a
removed noqa comments in main
floschne Aug 23, 2023
08e7369
renamed prepro start method
floschne Aug 23, 2023
26f6d22
improved prepro service and implemented prepro for crawler jobs
floschne Aug 23, 2023
a16878e
fixed extract content for html docs
floschne Aug 23, 2023
6097355
fixed import in simsearch
floschne Sep 4, 2023
b91acff
updated frontend code to comply to new backend api
floschne Sep 4, 2023
49d26c2
added support for celery debugging
floschne Sep 5, 2023
64d233e
fixed bug in text pipeline
floschne Sep 5, 2023
ae94820
removed text, audio, video, and image celery workers
floschne Sep 5, 2023
b1411df
improved prepro pipeline performance
floschne Sep 5, 2023
0a0b9e3
added support for zip files in prepro pipeline
floschne Sep 5, 2023
b533bfc
removed optional return types from redis service by using exceptions …
floschne Sep 6, 2023
dbf2ce1
improved prepro pipeline performance
floschne Sep 6, 2023
fdcdf81
removed optional from finished sdoc parameter in the api
floschne Sep 6, 2023
63054d5
added missing remove from es
floschne Sep 6, 2023
4a1d6a5
removed old and unused sdoc status
floschne Sep 6, 2023
d90b40e
implemented preprocessing step to remove unfinshed sdocs to make the …
floschne Sep 6, 2023
82bc291
fixed import typo
floschne Sep 6, 2023
9e4d9a6
fixed bug when updating ppj status in prepro pipeline
floschne Sep 6, 2023
ed30767
fixed docker-compose bugs
floschne Sep 12, 2023
86d4546
reset default theme in frontend
floschne Aug 23, 2023
d8896f5
Ray Whisper MWP
Alienmaster Jun 27, 2023
e1d2ae8
Dockerfile for mwp
Alienmaster Jun 29, 2023
6bd3a72
Multiple Model in one
Alienmaster Jul 4, 2023
06ab284
Multimodel Application
Alienmaster Jul 4, 2023
3da0022
Integration into project docker compose, refactor whisper
Alienmaster Aug 10, 2023
29746e5
initial code formatting for ray code
floschne Sep 12, 2023
066a7b9
restructured config files
floschne Sep 12, 2023
10e60d1
removed some prints
floschne Sep 12, 2023
4eb2aa8
restructured and refactored ray code
floschne Sep 12, 2023
887b365
restructured and refactored ray code: Part 2 -- probably final structure
floschne Sep 13, 2023
3370d21
fixed ray spacy and added ray specific config
floschne Sep 13, 2023
e71afa7
renamed ray model worker spec gen script
floschne Sep 13, 2023
9918d01
fixed ray dbert and whisper input bug
floschne Sep 13, 2023
b54142c
fixed ray dbert output
floschne Sep 13, 2023
d0ce88c
fixed prod docker compose
floschne Sep 13, 2023
3e5e4bd
first sketch of ray model service
floschne Sep 13, 2023
a3ab863
fixed spacy model output in ray
floschne Sep 13, 2023
4ba7adc
improved entrypoint script for ray
floschne Sep 13, 2023
23068d1
added ray config in configs
floschne Sep 13, 2023
e13bb2e
improved ray model service
floschne Sep 14, 2023
db37c40
added and restructured env vars for deployment
floschne Sep 14, 2023
aa48927
init ray model service in startup
floschne Sep 14, 2023
296130f
replaced spacy pipeline in preprocessing with ray spacy pipeline and …
floschne Sep 14, 2023
8766705
updated whisper dto, model and loading in ray worker
floschne Sep 14, 2023
550d811
renamed config var in spacy ray model worker
floschne Sep 14, 2023
0722239
replaced whisper in pipeline steps with ray modelservice calls
floschne Sep 14, 2023
7228872
commenting out OMP and MKL thread limiting env var... experimental
floschne Sep 14, 2023
6c1be44
added ray env vars to docker-compose file
floschne Sep 14, 2023
9b4a879
model eval and torch no grad for whisper
floschne Sep 14, 2023
ce13394
implemented detr object detection in ray worker
floschne Sep 14, 2023
9b5eeb2
updated ray worker requirements
floschne Sep 14, 2023
9258cb2
implemented detr object detection in ray model service
floschne Sep 14, 2023
00cbd88
replaced detr object detect in pipeline steps with ray modelservice c…
floschne Sep 14, 2023
e001eec
ray worker spec with detr
floschne Sep 14, 2023
86e79b8
added a parameter to ignore app when generating the ray spec
floschne Sep 20, 2023
4581e00
implemented vit2gpt img captioning in ray worker
floschne Sep 20, 2023
fab75f1
implemented vit2gpt img captioning in ray model service
floschne Sep 20, 2023
c851aed
using ray model service to generate caption in the prepro pipeline
floschne Sep 20, 2023
c97b078
added ray model spec file to gitignore
floschne Sep 20, 2023
9f44d2c
removed ray worker spec from git
floschne Sep 20, 2023
f28056e
removed dbert model from ray model worker
floschne Sep 20, 2023
1b12922
function to parse and pass the ray deployment config from the config …
floschne Sep 21, 2023
75881a9
implemented clip text and image embedding in ray model worker
floschne Sep 25, 2023
b2dd9f0
implemented clip text and image embedding in ray model service
floschne Sep 25, 2023
826450d
using ray model service to compute image and text embeddings in the p…
floschne Sep 25, 2023
9afefad
implemented simsearch as bg job in bg job worker
floschne Sep 25, 2023
73f3c07
removed old simsearch config from backend config files
floschne Sep 25, 2023
2d0945a
minor improvements in search and faiss service
floschne Sep 25, 2023
2a19baa
removed simsearch worker from docker compose
floschne Sep 25, 2023
239ad46
removed old simsearch docprepro
floschne Sep 25, 2023
f55084a
addded eslint-config-react-app to frontend deps to solve pre-commit c…
floschne Sep 25, 2023
ce66207
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 25, 2023
5188383
skipping eslint in precommit ci
floschne Sep 25, 2023
60c611c
added pre-commit ci badge to readme
floschne Sep 25, 2023
1d97678
added aborted state to bg job status
floschne Sep 25, 2023
a770b41
removed old App.tsx
floschne Sep 25, 2023
4218441
implemented propro job aborting
floschne Sep 25, 2023
302c7a1
implemented endpoint for propro job aborting
floschne Sep 25, 2023
e66c5db
method to abort preprojobs in frontend
floschne Sep 25, 2023
54df0e2
updated api to work with new endpoints
floschne Sep 25, 2023
e87e43e
ui ton abort a preprojob in frontend
floschne Sep 25, 2023
fb672f2
fixed requirements file location and updated depts for backend docker…
floschne Sep 27, 2023
f596a21
implemented new simsearch service based on weaviate
floschne Sep 27, 2023
38236db
removed husky from frontend
floschne Sep 27, 2023
4e80855
added weaviate service to docker compose
floschne Sep 27, 2023
85b1f5d
replaced faiss index service with simsearch service
floschne Sep 27, 2023
7d16223
minor improvements to elasticsearch service for more stable concurrency
floschne Sep 27, 2023
9c4c263
flushing weaviate when resetting data in startup
floschne Sep 27, 2023
c980a22
improved readability of chain function in parallel exec of prepro pip…
floschne Sep 27, 2023
e5f19a9
moved a lot of stuff into run_step of prepro pipeline so that it is a…
floschne Sep 27, 2023
f41d988
minor improvements to the preprojob list button in the frontend
floschne Sep 27, 2023
6d840cf
added more info to the preprojob item in frontend
floschne Sep 27, 2023
9736466
removed old docprepro code
floschne Sep 27, 2023
b713208
implemented ORM models for preprojob
floschne Sep 28, 2023
7910d0f
implemented CRUDs for ORM models for preprojob
floschne Sep 28, 2023
884cfd2
implemented and updated DTOs for preprojob
floschne Sep 28, 2023
78d0ae3
added prepro ORMs to SQLService
floschne Sep 28, 2023
4a5cf67
updated endpoint to use sqls service for prepro
floschne Sep 28, 2023
b273e9c
replaced redis preprojob code with sql
floschne Sep 28, 2023
08b70f6
pool size parameters for sqlservice
floschne Sep 28, 2023
0d022e9
re-added missing ffmpeg dep to bg jobs worker
floschne Sep 28, 2023
b691e28
added prepro pipeline num worker parameters in config
floschne Sep 28, 2023
6a2299e
improved entrypoint script for bg jobs worker and fixed issues for lo…
floschne Sep 28, 2023
52d7ce6
updated package json lock
floschne Sep 28, 2023
66c785e
updated frontend code to comply to new backend api
floschne Sep 28, 2023
61a36fe
removed unnecessary call to receive sdoc id from prepro jobs
floschne Sep 28, 2023
b4c03d9
added missing preprojob payload model in frontend
floschne Sep 28, 2023
7666bc8
read methods in crud ppj and ppj payload
floschne Sep 28, 2023
b8918b9
new prepro status route
floschne Sep 28, 2023
6225908
updated frontend code to comply to new backend api
floschne Sep 28, 2023
bf77173
added missing weaviate env vars and volume
floschne Sep 28, 2023
0fb9bdb
added cupy to ray deps
floschne Sep 28, 2023
2fc2d69
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 28, 2023
ad5827f
ray test config file
floschne Sep 29, 2023
ede8059
updated github test actions
floschne Sep 29, 2023
502e4c4
added free diskspace action to gh test actions
floschne Sep 29, 2023
2c884b1
updated github test actions
floschne Sep 29, 2023
fc6117d
updated github test actions
floschne Sep 29, 2023
523d9cb
updated github test actions
floschne Sep 29, 2023
8ef72d4
added req file for tools
floschne Sep 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions .github/workflows/backend_e2e_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,21 @@ jobs:
run-end2end-tests:
runs-on: ubuntu-latest
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
# this might remove tools that are actually needed,
# if set to "true" but frees about 6 GB
tool-cache: false

# all of these default to true, but feel free to set to
# "false" if necessary for your workflow
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: true
swap-storage: false
- name: Set Swap Space to 10GB
uses: pierotofy/set-swap-space@master
with:
Expand All @@ -54,12 +69,9 @@ jobs:
chmod -R a+rwx backend_repo/ models_cache/ spacy_models/ tika/
python monkey_patch_docker_compose_for_backend_tests.py
export GID=$(id -g)
export CELERY_TEXT_WORKER_CONCURRENCY=1
export CELERY_IMAGE_WORKER_CONCURRENCY=1
export CELERY_SIMSEARCH_WORKER_CONCURRENCY=1
export CELERY_ARCHIVE_WORKER_CONCURRENCY=1
export API_PRODUCTION_WORKERS=0
docker compose -f compose-test.yml up -d --quiet-pull
export RAY_CONFIG="./config_test_no_gpu.yaml"
docker compose -f compose-test.yml up -d
echo Waiting for containers to start...
sleep 240
cd ..
Expand All @@ -68,6 +80,7 @@ jobs:
TESTDATA_PASSWORD: ${{ secrets.TESTDATA_PASSWORD }}
run: |
cd tools/importer
pip install -r requirements.txt
wget -q http://ltdata1.informatik.uni-hamburg.de/dwts/totalitarismo.zip
unzip -q -P "$TESTDATA_PASSWORD" totalitarismo.zip
python dwts_importer.py --input_dir images --backend_url http://localhost:13120/ --project_name incel --tag_name totalitarisimo
Expand Down
20 changes: 16 additions & 4 deletions .github/workflows/backend_unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,21 @@ jobs:
run-unit-tests:
runs-on: ubuntu-latest
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
# this might remove tools that are actually needed,
# if set to "true" but frees about 6 GB
tool-cache: false

# all of these default to true, but feel free to set to
# "false" if necessary for your workflow
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: true
swap-storage: false
- name: Set Swap Space to 10GB
uses: pierotofy/set-swap-space@master
with:
Expand All @@ -61,11 +76,8 @@ jobs:
chmod -R a+rwx backend_repo/ models_cache/ spacy_models/ tika/
python monkey_patch_docker_compose_for_backend_tests.py
export GID=$(id -g)
export CELERY_TEXT_WORKER_CONCURRENCY=1
export CELERY_IMAGE_WORKER_CONCURRENCY=1
export CELERY_SIMSEARCH_WORKER_CONCURRENCY=1
export CELERY_ARCHIVE_WORKER_CONCURRENCY=1
export API_PRODUCTION_WORKERS=0
export RAY_CONFIG="./config_test_no_gpu.yaml"
docker compose -f compose-test.yml up -d --quiet-pull
echo Waiting for containers to start...
sleep 240
Expand Down
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
ci:
skip: [eslint]
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# D-WISE Tool Suite

[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/uhh-lt/dwts/mwp_v1.svg)](https://results.pre-commit.ci/latest/github/uhh-lt/dwts/mwp_v1)

This is the repository for the D-WISE Tool Suite (DWTS) - an outcome of
the [D-WISE Project](https://www.dwise.uni-hamburg.de/)

Expand Down
32 changes: 18 additions & 14 deletions backend/.env
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,9 @@ INSTALL_JUPYTER=true
API_PORT=5500
API_PRODUCTION_MODE=0
API_PRODUCTION_WORKERS=10
CELERY_TEXT_WORKER_CONCURRENCY=10
CELERY_IMAGE_WORKER_CONCURRENCY=1
CELERY_AUDIO_WORKER_CONCURRENCY=1
CELERY_VIDEO_WORKER_CONCURRENCY=1
CELERY_SIMSEARCH_WORKER_CONCURRENCY=1
CELERY_HEAVY_JOBS_WORKER_CONCURRENCY=1
CELERY_BACKGROUND_JOBS_WORKER_CONCURRENCY=1
CELERY_DEBUG_MODE=0

REDIS_HOST=redis
REDIS_PORT=6379
Expand All @@ -36,6 +33,10 @@ ES_HOST=elasticsearch
ES_PORT=9200
ES_MIN_HEALTH=50

RAY_HOST=ray
RAY_PORT=8000
RAY_PROTOCOL=http

API_EXPOSED=13120
POSTGRES_EXPOSED=13121
RABBIT1_EXPOSED=13123
Expand All @@ -44,16 +45,19 @@ RABBIT3_EXPOSED=13125
RABBIT_EXPOSED=13126
REDIS_EXPOSED=13127

JUPYTER_TEXT_EXPOSED=13128
JUPYTER_IMAGE_EXPOSED=13129
JUPYTER_SIMSEARCH_EXPOSED=13136
JUPYTER_HEAVY_JOBS_EXPOSED=13135
JUPYTER_API_EXPOSED=13130
KIBANA_EXPOSED=13128
ELASTICSEARCH_EXPOSED=13129
ELASTICSEARCH1_EXPOSED=13130
CONTENT_SERVER_EXPOSED=13131

RAY_API_EXPOSED=13132
RAY_DASHBOARD_EXPOSED=13133

KIBANA_EXPOSED=13131
ELASTICSEARCH_EXPOSED=13132
ELASTICSEARCH1_EXPOSED=13133
CONTENT_SERVER_EXPOSED=13134
JUPYTER_TEXT_EXPOSED=13134
JUPYTER_IMAGE_EXPOSED=13135
JUPYTER_API_EXPOSED=13136
JUPYTER_SIMSEARCH_EXPOSED=13137
JUPYTER_BACKGROUND_JOBS_EXPOSED=13138

# MAIL SERVICE
MAIL_ENABLED=True
Expand Down
31 changes: 31 additions & 0 deletions backend/.env.dev
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
DWISE_BACKEND_CONFIG=configs/default_localhost_dev.yaml
LOG_LEVEL=debug
JWT_TTL=10080
JWT_SECRET=f5b73acd6d6776350bce963bbcd2853fb5de00a4a4a081280ce1123b4a69aea9

API_PORT=33120
API_PRODUCTION_MODE=0
API_PRODUCTION_WORKERS=10

CELERY_DEBUG_MODE=1

REDIS_HOST=localhost
REDIS_PORT=13124
REDIS_PASSWORD=dwts123

RABBITMQ_HOST=localhost
RABBITMQ_PORT=13123
RABBITMQ_USER=dwtsuser
RABBITMQ_PASSWORD=dwts123

POSTGRES_HOST=localhost
POSTGRES_PORT=13122
POSTGRES_DB=dwts
POSTGRES_USER=dwtsuser
POSTGRES_PASSWORD=dwts123

FLOWER_BASIC_AUTH=dwtsuser:dwts123

ES_HOST=localhost
ES_PORT=13125
ES_MIN_HEALTH=50
1 change: 1 addition & 0 deletions backend/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
src/app/preprocessing/ray_model_worker/spec.yaml
src/dev_notebooks
backend_repo
sample_data
Expand Down
1 change: 1 addition & 0 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ RUN wget -q https://micro.mamba.pm/api/micromamba/linux-64/${MICROMAMBA_VERSION}
# create the 'dwts' python environment with all dependencies
ENV MAMBA_ROOT_PREFIX=/opt
COPY environment.yml .
COPY requirements.txt /tmp/requirements.txt
RUN micromamba create -f environment.yml -q -y &&\
micromamba clean -a -f -q -y &&\
find /opt/ -follow -type f -name '*.a' -delete &&\
Expand Down
16 changes: 4 additions & 12 deletions backend/environment.yml
Original file line number Diff line number Diff line change
@@ -1,20 +1,14 @@
name: dwts
channels:
- defaults
- huggingface
- pytorch
- fastai
- conda-forge
- defaults
dependencies:
- python=3.10
- pytorch::pytorch=1.12
- conda-forge::cudatoolkit=11.6
- sentence-transformers=2.2
- huggingface::transformers=4.21
- conda-forge::pip=23.2.1
- pydantic=1.8
- spacy=3.4
- cupy=11.2
- fastapi=0.85
- srsly=2.4.8
- tqdm=4.66.1
- sqlalchemy=1.4
- psycopg2-binary=2.9
- redis-py=4.3
Expand All @@ -32,9 +26,7 @@ dependencies:
- frozendict=2.3
- email_validator=1.3
- sqlalchemy-utils=0.38
- timm=0.6
- python-multipart=0.0.5
- spacy-transformers=1.1
- ftfy=6.1
- beautifulsoup4=4.11.1
- pytest=7.2.0
Expand Down
4 changes: 2 additions & 2 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
faiss-gpu==1.7.2
weaviate-client==3.24.1
fastapi-mail==1.2.5
git+https://github.com/linto-ai/whisper-timestamped.git@d767f4fc3b401c78c20d55515b382838ca3c86aa
multiprocess==0.70.15
Scrapy==2.10.0
scrapy-playwright==0.0.31
cssselect==1.2.0
Expand Down
2 changes: 1 addition & 1 deletion backend/src/api/dependencies.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ async def get_current_user(
email: str = payload.get("sub")
if email is None:
raise credentials_exception
except (JWTError, ValidationError) as e:
except (JWTError, ValidationError):
raise credentials_exception

user = crud_user.read_by_email(db=db, email=email)
Expand Down
2 changes: 1 addition & 1 deletion backend/src/api/endpoints/crawler.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from typing import List

from app.celery.background_jobs import prepare_and_start_crawling_job_async
from app.core.data.crawler.crawler_service import CrawlerService
from app.core.data.dto.crawler_job import CrawlerJobParameters, CrawlerJobRead
from app.docprepro.heavy_jobs import prepare_and_start_crawling_job_async
from fastapi import APIRouter

router = APIRouter(prefix="/crawler")
Expand Down
2 changes: 1 addition & 1 deletion backend/src/api/endpoints/export.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from app.celery.background_jobs import prepare_and_start_export_job_async
from app.core.data.dto.export_job import ExportJobParameters, ExportJobRead
from app.core.data.export.export_service import ExportService
from app.docprepro.heavy_jobs import prepare_and_start_export_job_async
from fastapi import APIRouter

router = APIRouter(prefix="/export")
Expand Down
2 changes: 0 additions & 2 deletions backend/src/api/endpoints/feedback.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,6 @@ async def reply_to(
) -> str:
# todo: load_feedback should raise exception, if it does not exist!
feedback: Optional[FeedbackRead] = RedisService().load_feedback(key=feedback_id)
if feedback is None:
return f"Feedback with id {feedback_id} not found."

user = crud_user.read(db=db, id=feedback.user_id)

Expand Down
Loading
Loading