Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-86956du3q revisit regression #470

Merged
merged 137 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
8af5c72
CU-86956du3q: Move to placeholder-based replacement
mart-r Jul 26, 2024
3437ed0
CU-86956du3q: Update regression tests to a more reasonable state.
mart-r Jul 26, 2024
f455bb7
CU-86956du3q: Further fixes for regression checking:
mart-r Jul 26, 2024
f71a206
CU-86956du3q: Add documentation for new clases and methods
mart-r Jul 26, 2024
cc02104
CU-86956du3q: Rename enum constant (SPAN_OVERLAP -> PARTIAL_OVERLAP)
mart-r Jul 26, 2024
52f6344
CU-86956du3q: Add matching for partially overlapping children
mart-r Jul 26, 2024
a2837c8
CU-86956du3q: Add tests for partially overlapping children
mart-r Jul 26, 2024
343d0b1
CU-86956du3q: Update regression checking to generate multiple sub-cas…
mart-r Jul 30, 2024
fd6690b
CU-86956du3q: Update some tests for new format
mart-r Jul 30, 2024
87183cd
CU-86956du3q: Remove old / unused / irrelevant tests and test-code
mart-r Jul 30, 2024
f7f7ff3
CU-86956du3q: Some renaming (filter -> placeholders)
mart-r Jul 30, 2024
90e1227
CU-86956du3q: Add some additional fail safes for option set
mart-r Jul 30, 2024
5910629
CU-86956du3q: Fix option set for only 1 placeholder
mart-r Jul 30, 2024
f1b4799
CU-86956du3q: Fix targeting
mart-r Jul 30, 2024
9c53cc6
CU-86956du3q: Add tests for targeting
mart-r Jul 30, 2024
beb0444
CU-86956du3q: Remove MCT export conversion (at least for now)
mart-r Jul 30, 2024
1d75257
CU-86956du3q: Remove MCT export conversion tests (at least for now)
mart-r Jul 30, 2024
d3a565e
CU-86956du3q: Remove suite editing (at least for now)
mart-r Jul 30, 2024
6e304aa
CU-86956du3q: Remove category separation (at least for now)
mart-r Jul 30, 2024
6b870e6
CU-86956du3q: Remove unused regression utils (at least for now)
mart-r Jul 30, 2024
95d92a3
CU-86956du3q: Remove serialisation tests (at least for now)
mart-r Jul 30, 2024
6504bac
CU-86956du3q: Improve quality of default regression test set
mart-r Jul 30, 2024
a7bb30c
CU-86956du3q: Improve exceptions in targeting
mart-r Jul 30, 2024
0eb8bdf
CU-86956du3q: Fix docstring issue regarding exceptions
mart-r Jul 30, 2024
f661129
CU-86956du3q: Update test with correct exceptions
mart-r Jul 30, 2024
d7e59a3
CU-86956du3q: Add utils for partial substitutions and corresponding t…
mart-r Jul 30, 2024
b8ce24c
CU-86956du3q: Allow multiple of the same placeholder in a phrase.
mart-r Jul 30, 2024
0d45b1f
CU-86956du3q: Add relevant tests for multi-placeholder checking
mart-r Jul 30, 2024
667c009
CU-86956du3q: Allow changing of multiple pre-processing placeholders
mart-r Jul 30, 2024
98f7245
CU-86956du3q: Fix 1-placeholder sub-case yielding
mart-r Jul 30, 2024
c0a7747
Merge branch 'master' into CU-86956du3q-revisit-regression2
mart-r Aug 7, 2024
bcc49ba
CU-86956du3q: Remove debug output
mart-r Aug 7, 2024
ba483ae
CU-86956du3q: Replace separator (~) with whitespace when checking
mart-r Aug 7, 2024
3a776a2
CU-86956du3q: Add utility method to limit string length for output
mart-r Aug 7, 2024
2238e1d
CU-86956du3q: Improve string length limiting method
mart-r Aug 7, 2024
b43de39
CU-86956du3q: Add a few tests for string length limiting method
mart-r Aug 7, 2024
747f8e5
CU-86956du3q: Add an ANYTHING strictness (mostly for example disbaling)
mart-r Aug 7, 2024
a8e709b
CU-86956du3q: Add storage of examples (of a certain strictness) as we…
mart-r Aug 7, 2024
10f4f8a
CU-86956du3q: Fix type (missing ending bracket) in report output
mart-r Aug 7, 2024
bb005fd
CU-86956du3q: Fix examples header appearing for every example
mart-r Aug 7, 2024
5970cd6
CU-86956du3q: Print the same phrase fewer times for examples
mart-r Aug 7, 2024
5b9a1f2
CU-86956du3q: Update fake CDB with (default) config
mart-r Aug 7, 2024
edf2e3d
CU-86956du3q: Add finding to examples and output
mart-r Aug 7, 2024
251e720
CU-86956du3q: Add config to another fake CDB during test time
mart-r Aug 7, 2024
e0a3c5c
CU-86956du3q: Allow strictness to propagate to parts when looking at …
mart-r Aug 7, 2024
8528e74
Merge branch 'master' into CU-86956du3q-revisit-regression2
mart-r Aug 8, 2024
81a5cb3
CU-86956du3q: Add placeholder to examples output
mart-r Aug 8, 2024
66565a4
CU-86956du3q: Refactor report output generation slightly
mart-r Aug 8, 2024
4d41ca5
CU-86956du3q: Show all non-identical examples
mart-r Aug 8, 2024
5eee717
CU-86956du3q: Update example checking with strictness requirement (in…
mart-r Aug 8, 2024
b79e5f0
CU-86956du3q: Simplify targeting somewhat (remove unnecessary method)
mart-r Aug 8, 2024
894fb0f
CU-86956du3q: Allow changing of ouptut phrase max length
mart-r Aug 8, 2024
cf9dd22
CU-86956du3q: Fix doc string for changed method
mart-r Aug 8, 2024
c3be60f
CU-86956du3q: Small whitespace fix
mart-r Aug 8, 2024
c0f99cf
CU-86956du3q: Fix total-included checking iteration
mart-r Aug 8, 2024
4d0bc6e
CU-86956du3q: Add strictness and max phrase length to CLI
mart-r Aug 8, 2024
df12eba
CU-86956du3q: Add examople strictness to CLI
mart-r Aug 8, 2024
2e7465d
CU-86956du3q: Fix default value for strictness in CLI
mart-r Aug 8, 2024
e40ec4b
CU-86956du3q: Update to use number of sub-cases for tqdm/progress bar
mart-r Aug 8, 2024
172f101
CU-86956du3q: Remove option to set the total for progress bar (the au…
mart-r Aug 8, 2024
a7670e3
CU-86956du3q: Simplify the progress bar by combining all cases
mart-r Aug 8, 2024
7f429d7
CU-86956du3q: Split subcase iteration
mart-r Aug 8, 2024
5bc1ee4
CU-86956du3q: Rename regression checker to regression suite
mart-r Aug 8, 2024
17fb479
CU-86956du3q: Streamline typing and the like by using intermediate da…
mart-r Aug 8, 2024
7ea65a5
CU-86956du3q: Remove redundant method
mart-r Aug 8, 2024
8860309
CU-86956du3q: Remove redundant method and acommpanying test
mart-r Aug 8, 2024
7629ac4
CU-86956du3q: Remove redundant class
mart-r Aug 8, 2024
ca600b4
CU-86956du3q: Add another intermediate data class
mart-r Aug 8, 2024
54ceca4
CU-86956du3q: Remove completed TODO notes and redundant method
mart-r Aug 8, 2024
1dc8eaf
CU-86956du3q: Add documentation to new methods and clases. Simplify e…
mart-r Aug 8, 2024
c210ef2
CU-86956du3q: Small update for how default test suite is handled for CLI
mart-r Aug 9, 2024
e090704
CU-86956du3q: Small to report output format
mart-r Aug 9, 2024
05b490c
CU-86956du3q: Add easier to read exception when unable to load a plac…
mart-r Aug 9, 2024
a086f37
CU-86956du3q: Update percentages output to avoid as many decimal places
mart-r Aug 9, 2024
6606cd2
CU-86956du3q: Use preferred name for run-to-run consistency
mart-r Aug 9, 2024
cd243aa
CU-86956du3q: Update test time fake CDBs
mart-r Aug 9, 2024
846a408
CU-86956du3q: Update default regression tests with new extensive (yet…
mart-r Aug 9, 2024
4fd8088
CU-86956du3q: Add initial README for regression stuff
mart-r Aug 9, 2024
72ec064
CU-86956du3q: Add option to for failing with having found another con…
mart-r Aug 14, 2024
e4f203e
CU-86956du3q: Add tests for parent and grandparent finding; fix tests…
mart-r Aug 14, 2024
5d9b08e
CU-86956du3q: Add preferred name to wrong CUI found
mart-r Aug 14, 2024
7683a88
CU-86956du3q: Fix tests for new form of determine cui description; ad…
mart-r Aug 14, 2024
741675e
CU-86956du3q: Fix determining partial matches for grandchildren and b…
mart-r Aug 14, 2024
d7bcd06
CU-86956du3q: Add test for partial matches of grandchildren
mart-r Aug 14, 2024
94b16ab
Fixing bug for metacat
shubham-s-agarwal Aug 8, 2024
7116ac7
CU-86956duhb: Add method to backport a model pack from 1.12 to previo…
mart-r Aug 12, 2024
09ec3d4
CU-8694cd9t2: Allow merging config into model pack config before init…
mart-r Aug 12, 2024
6fb68c2
CU-8694fwyje: Update all configs with pre-load parts documented (#473)
mart-r Aug 12, 2024
fc4ee7f
CU-86956du3q: Add converter from MCT export
mart-r Aug 14, 2024
680ad64
CU-86956du3q: Add documentation to MCT export converter
mart-r Aug 14, 2024
cad3cb2
CU-86956du3q: Add option to create a regression suite from an MCT export
mart-r Aug 14, 2024
2aa6370
CU-86956du3q: Add option to create a regression suite from an MCT exp…
mart-r Aug 14, 2024
b7e8c3c
CU-86956du3q: Add a small note for converter placeholder
mart-r Aug 14, 2024
37ebb51
CU-86956du3q: Add tests for MedCATtrainer export converter
mart-r Aug 14, 2024
c165bb5
CU-86956du3q: Add tests for regression suite generation based on MCT …
mart-r Aug 14, 2024
a894ed1
CU-86956du3q: Simplify regression case creation tests somewhat
mart-r Aug 14, 2024
eb0e26a
CU-86956du3q: Add option to create a regression suite YAML from MCT e…
mart-r Aug 14, 2024
9187752
CU-86956du3q: Add option to stop at MCT export conversion
mart-r Aug 14, 2024
c29dc33
CU-86956du3q: Make use of only-prefnames option
mart-r Aug 14, 2024
f416dd1
CU-86956du3q: Fix loading of only-prefnames option from yaml
mart-r Aug 14, 2024
97326a2
CU-86956du3q: Add comment for only using preferred names to the defau…
mart-r Aug 14, 2024
e5da37b
CU-86956du3q: Fix tests broken due to pref-name only change
mart-r Aug 14, 2024
4f30c68
CU-86956du3q: Add utility method to set runtime doc strings for enum …
mart-r Aug 14, 2024
6a88d97
CU-86956du3q: Add tests for runtime doc string addition
mart-r Aug 14, 2024
2491bef
CU-86956du3q: Add more tests for runtime doc string addition (to make…
mart-r Aug 14, 2024
5d3fad9
CU-86956du3q: Make Finding enum has runtime doc strings
mart-r Aug 14, 2024
7bf7f59
CU-86956du3q: Add CLI option to show the various descriptions of the …
mart-r Aug 14, 2024
a51f8b0
CU-86956du3q: Update dict and json methods for some results for JSON …
mart-r Aug 19, 2024
a7ebafd
CU-86956du3q: Add a few json serialisation tests
mart-r Aug 19, 2024
4e7a106
CU-86956du3q: Add json serialisation example strictness to CLI
mart-r Aug 19, 2024
e5db542
CU-86956du3q: Add a few more json serialisation tests
mart-r Aug 19, 2024
46b66cf
CU-86956du3q: Add usage of regression suite name from the name of the…
mart-r Aug 19, 2024
cfc0209
CU-86956du3q: Fix tests by adding the regression suite name where app…
mart-r Aug 19, 2024
11211a5
CU-86956du3q: Avoid examples in ResultDescriptor
mart-r Aug 19, 2024
ea97a63
CU-86956du3q: Make sure strictness propagates accross all parts of a …
mart-r Aug 20, 2024
b582366
CU-86956du3q: Update tests: Use correct reporting for generating fake…
mart-r Aug 20, 2024
90be850
CU-86956du3q: Fix small test issue
mart-r Aug 20, 2024
2390a0f
CU-86956du3q: Update tests for manual success/fail for results
mart-r Aug 20, 2024
4c3cc3e
CU-86956du3q: Separate calculation section of report finding
mart-r Aug 20, 2024
4000280
CU-86956du3q: Add a few more tests for report/results
mart-r Aug 20, 2024
f70386a
CU-86956du3q: Add option to force a non-0 exit status upon any regres…
mart-r Aug 21, 2024
682950e
CU-86956du3q: Add files for regression model creation and checking
mart-r Aug 21, 2024
b71d1ef
CU-86956du3q: Add new part to main workflow to create and regression …
mart-r Aug 21, 2024
eec8fec
CU-86956du3q: Update a mistyped comment
mart-r Aug 21, 2024
3f10fcd
CU-86956du3q: Make regression run at STRICTEST strictness at GHA work…
mart-r Aug 21, 2024
9fe402f
CU-86956du3q: Fix strictness matrix for anything-typed strictness
mart-r Aug 21, 2024
0ddd4a7
CU-86956du3q: Add strictness matrix information to --describe-only
mart-r Aug 21, 2024
1515548
CU-86956du3q: Add python version to created model pack for test time
mart-r Aug 21, 2024
ba35759
CU-86956du3q: Use the python version of creat model pack during test …
mart-r Aug 21, 2024
4bf3089
CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iter…
mart-r Aug 21, 2024
750e9b8
Revert "CU-86956du3q: [TEMP] Remove tests from main workflow (for fas…
mart-r Aug 21, 2024
b23aba0
CU-86956du3q: Make full model path the last line of the output upon c…
mart-r Aug 21, 2024
32ebe7f
CU-86956du3q: Move regression workflow logic to a separate bash script
mart-r Aug 21, 2024
101f2e4
CU-86956du3q: Update comments in regression bash script
mart-r Aug 21, 2024
068c2bf
CU-8694pz44d: Fix model cleanup during regression
mart-r Aug 21, 2024
b5a5b6a
CU-86956du3q: Fix typos in utils
mart-r Aug 28, 2024
6e23316
CU-86956du3q: Fix a bunch of various typos in doc strings and comments
mart-r Aug 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ jobs:
second_half_nl=$(echo "$all_files" | tail -n +$(($midpoint + 1)))
timeout 25m python -m unittest ${first_half_nl[@]}
timeout 25m python -m unittest ${second_half_nl[@]}

- name: Regression
run: source tests/resources/regression/run_regression.sh
- name: Get the latest release version
id: get_latest_release
uses: actions/github-script@v6
Expand Down
215 changes: 139 additions & 76 deletions configs/default_regression_tests.yml
Original file line number Diff line number Diff line change
@@ -1,79 +1,142 @@
# # Example of some test cases
# # They will try to cover as many possible use cases as possible
# # The idea is that the CUI corresponding to the name is expected to be
# # obtained by MedCAT
# # Only the 'filters' under 'targeting' and the 'phrases' under
# # the test case are the two required sections, the rest is optional
#
# test-case-name-1: # name of this test case
# targeting: # info regarding targets of this test case
# strategy: "ALL" # the strategy for dealing with the filters below
# # so "ALL" means the targets need to match all the below filters
# # and "ANY" means that the targets need to match at least one of the filters
# # if only one type of target it specified, this is irrelevant
# # the default value is "ALL" if not specified
# prefname-only: False # set to True if only prefered names should be checked (defaults to False)
# targfiltersets: # the filters for this specific test case
# # there has to be one type of target, but multiple can be specified
# # if multiple types are target, the strategy defined above is taken into affect
# # each type can specify one or multiple values
# # this example shows has one values
# # the next example (below) will have multiple values
# type_id: "0123" # type_id or type_ids
# cui: "01230" # the target CUI (or list of CUIS)
# name: "name0" # the target names
# # all specified names need to exist within the CDB
# phrases: "The quick brown %s jumped over the lazy cat" # the phrases to go through
# # for each phrases, '%s' is replaced
# # by each name that is to be tested
# test-case-name-2: # name of this test case
# targeting:
# filters:
# type_id: # multiple target type IDs
# - "123"
# - "223"
# cui: # multiple target CUI
# - "1234"
# - "2234"
# name: # multiple names
# - "name1"
# - "name2"
# cui_and_children: # an example with CUI and children
# cui: '111' # the CUI (or CUIs)
# depth: 2 # and the depth of children
# phrases:
# - "The %s was measured"
# - "The %s was not measured"
#
# # The following example was (rather arbitrarily) created and should work for
# # the included SNOMED models
test-case-1:
targeting:
strategy: "ALL"
filters:
type_id: "2680757"
phrases:
- "The %s was measured"
# this is an example test case
# it is based on SNOMED-CT
test-case-1: # The (somewhat) arbitrary name of the test case
targeting: # the description of the replacement targets in the phrase(s)
placeholders: # the placeholders to replace in the phrase(s)
# Note that only 1 concept will be tested for at one time.
# So if the prhase(s) has/have more than 1 placeholder, the
# rest of them will be substitued in without care for whether
# or how accurately the model is able to recognise them.
# For the concepts that are not under test at a given time
# the "first" name is used (because the implementation has
# names in a set, there is possibility for run-to-run variance
# because of different names being used).
#
# There are 2 modes for the placeholders:
# 1. any-combination: false
# In this mode, only the concepts in the same position
# in the various lists are used in conjunction to oneanother.
# Though this also means that it is expected that all of the
# placeholders have the same number of CUIs to use.
# Assuming each of the N placeholders defines M replacement
# cuis, this approach produces M*N cases.
# 2. any-combination: true
# In this mode, any combination of the replacement CUIs is
# allowed. This means that quite a few different combinations
# will be generated and used. It also means that different
# placeholders can have different number of concepts suitbale
# for them.
# Assuming eacho of the N placeholders defines M repalcement
# cuis, this approach produces N * N^M (where `^` is power)
# cases. But for a more complicated set up (i.e where different
# placeholders have a different number of swappable CUIs)
# this calculation is not as straight forward.
#
# NOTE: The above description does not take into account different
# number of names associated with different concepts. For each
# of the "primary" concepts, each possible name is attempted.
- placeholder: '[DISORDER]' # the palceholder that will be substituted in the phrase(s)
cuis: ['4473006', # Intracerebral hemorrhage
'85189001', # Acute appendicitis
'186738001', # vestibular neuritis
'186738001', # vestibular neuritis
]
- placeholder: '[FINDING1]'
cuis: ['162300006', # unilateral headache
'21522001', # abdominal pain
'103298005', # severe vertigo
'103298005', # severe vertigo
]
prefname-only: false # this is an optional keyword for wach placeholder
# if set to true, only the preferred name will be used for
# this concept. Otherwise, all names will be used as
# different sub-cases
- placeholder: '[FINDING2]'
cuis: ['409668002', # photophobia
'422587007', # nausea
'422587007', # nausea
'422587007', # nausea
]
- placeholder: '[FINDING3]'
cuis: ['2228002', # scintillating scotoma
'386661006', # fever
'81756001', # horizontal nystagmus
'81756001', # horizontal nystagmus
]
- placeholder: '[NEGFINDING]'
cuis: ['386661006', # fever
'62315008', # diarrhea
'15188001', # hearing loss
'60862001', # tinnitus
]
any-combination: false # if set to false, same length of CUIs is expected
# for each placeholder and only a combination is used
phrases: # The list of phrases
- >
Description: [DISORDER]

CC: [FINDING1] on presentation; then developed [FINDING3]

HX: On the day of presentation, this 32 y/o RHM suddenly developed [FINDING1] and [FINDING2].
Four hours later he experienced sudden [FINDING3] lasting two hours.
There were no other associated symptoms except for the [FINDING1] and [FINDING2].
He denied [NEGFINDING].
test-case-2:
targeting:
filters:
type_id: "9090192"
phrases:
- "Patient presented with %s"
- "No %s was present"
test-case-3:
targeting:
filters:
type_id: "67667581"
phrases:
- "The patient has been diagnosed with %s"
- "There are no signs of %s"
test-case-4:
targeting:
strategy: "ALL"
filters:
cui_and_children:
cui: "364075005" # 'heart rate'
depth: 4 # and children 4 deep
placeholders:
- placeholder: '[FINDING1]'
cuis: ['49727002', # cough
'29857009', # chest pain
'21522001', # abdominal pain
'57676002', # joint pain
'25064002', # headache
'271807003', # fever
'162397003', # hematuria (blood in urine)
'271757001', # fatigue
'386661006', # weight loss
'62315008', # dysuria (painful urination)
]
- placeholder: '[FINDING2]'
cuis: ['267036007', # shortness of breath
'68962001', # palpatations
'422587007', # nausea
'182888003', # swelling
'404640003', # dizziness
'422400008', # sore throat
'267036007', # shortness of breath
'267064002', # night sweats
'162607003', # back pain
'267102003', # urinary frequency
]
- placeholder: '[DISORDER]'
cuis: ['195967001', # asthma
'194828000', # angina pectoris
'25374005', # gastroenteritis
'69896004', # rheumatoid arthritis
'37796009', # migraine
'186747009', # influenza
'106063007', # urinary tract infection
'444814009', # chronic fatigue syndrome
'95281007', # tuberculosis
'431855005', # cystitis
]
any-combination: false
phrases:
- "The patient's %s was 82 bps"
- >
The patient presents with [FINDING1] and [FINDING2]. These findings are suggestive of [DISORDER].
Further diagnostic evaluation and investigations are required to confirm the diagnosis.
- >
The patient reports [FINDING1] and has also been experiencing [FINDING2]. These symptoms are consistent with a clinical presentation of [DISORDER].
Further assessment and diagnostic tests are required to establish the underlying cause.
- >
Upon evaluation, the patient exhibits [FINDING1] along with [FINDING2]. This combination of findings raises suspicion for [DISORDER].
Comprehensive diagnostic workup is advised to confirm the diagnosis and plan appropriate management.
- >
During the consultation, the patient described [FINDING1] and noted a recent history of [FINDING2]. These clinical features are suggestive of [DISORDER].
Further investigation is necessary to verify the diagnosis and rule out other potential causes.
- >
The patient's symptoms include [FINDING1] and [FINDING2], which are commonly associated with [DISORDER].
It is recommended that additional diagnostic procedures be performed to confirm this working diagnosis.
- >
The clinical presentation of [FINDING1] and [FINDING2] is indicative of [DISORDER].
To ensure accurate diagnosis, further clinical evaluation and diagnostic tests are required.
7 changes: 6 additions & 1 deletion medcat/cat.py
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,7 @@ def load_model_pack(cls,
zip_path: str,
meta_cat_config_dict: Optional[Dict] = None,
ner_config_dict: Optional[Dict] = None,
medcat_config_dict: Optional[Dict] = None,
load_meta_models: bool = True,
load_addl_ner: bool = True,
load_rel_models: bool = True) -> "CAT":
Expand All @@ -373,6 +374,10 @@ def load_model_pack(cls,
A config dict that will overwrite existing configs in transformers ner.
e.g. ner_config_dict = {'general': {'chunking_overlap_window': 6}.
Defaults to None.
medcat_config_dict (Optional[Dict]):
A config dict that will overwrite existing configs in the main medcat config
before pipe initialisation. This can be useful if wanting to change something
that only takes effect at init time (e.g spacy model). Defaults to None.
load_meta_models (bool):
Whether to load MetaCAT models if present (Default value True).
load_addl_ner (bool):
Expand All @@ -395,7 +400,7 @@ def load_model_pack(cls,

# load config
config_path = os.path.join(model_pack_path, "config.json")
cdb.load_config(config_path)
cdb.load_config(config_path, medcat_config_dict)

# TODO load addl_ner

Expand Down
14 changes: 13 additions & 1 deletion medcat/cdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,17 @@ async def save_async(self, path: str) -> None:
}
await f.write(dill.dumps(to_save))

def load_config(self, config_path: str) -> None:
def load_config(self, config_path: str, config_dict: Optional[Dict] = None) -> None:
"""Load the config from disk.

Args:
config_path (str): The path to the config file.
config_dict (Optional[Dict]): A config to merge with.

Raises:
ValueError: If a config was not found in CDB nor as a separate json.
Or if a config was found both in CDB as well as a separate json.
"""
if not os.path.exists(config_path):
if not self._config_from_file:
# if there's no config defined anywhere
Expand Down Expand Up @@ -544,6 +554,8 @@ def load_config(self, config_path: str) -> None:
# new config, potentially new weighted_average_function to read
self._init_waf_from_config()
# mark config read from file
if config_dict:
self.config.merge_config(config_dict)
self._config_from_file = True

@classmethod
Expand Down
11 changes: 9 additions & 2 deletions medcat/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,9 @@ class General(MixingConfig, BaseModel):
spacy_disabled_components: list = ['ner', 'parser', 'vectors', 'textcat',
'entity_linker', 'sentencizer', 'entity_ruler', 'merge_noun_chunks',
'merge_entities', 'merge_subtokens']
"""The list of spacy components that will be disabled.

NB! For these changes to take effect, the pipe would need to be recreated."""
checkpoint: CheckPoint = CheckPoint()
usage_monitor = UsageMonitor()
"""Checkpointing config"""
Expand Down Expand Up @@ -412,9 +415,13 @@ class Preprocessing(MixingConfig, BaseModel):
min_len_normalize: int = 5
"""Nothing below this length will ever be normalized (input tokens or concept names), normalized means lemmatized in this case"""
stopwords: Optional[set] = None
"""If None the default set of stowords from spacy will be used. This must be a Set."""
"""If None the default set of stowords from spacy will be used. This must be a Set.

NB! For these changes to take effect, the pipe would need to be recreated."""
max_document_length: int = 1000000
"""Documents longer than this will be trimmed"""
"""Documents longer than this will be trimmed.

NB! For these changes to take effect, the pipe would need to be recreated."""

class Config:
extra = Extra.allow
Expand Down
Loading
Loading