CU-86956du3q revisit regression (#470)

* CU-86956du3q: Move to placeholder-based replacement * CU-86956du3q: Update regression tests to a more reasonable state. Make sure to compare the correct annotation, not just hoping for any CUI annotated to match the one we are looking for. Output the specifics of the type of match that was found: - Identical - Bigger / smaller span - Random overlap - Parents / grandparetns, or children Add strictness options to summary (success / failure). * CU-86956du3q: Further fixes for regression checking: Remove 'Failure reason' and 'Failre descriptor' - now using Finding instead. Remove simplified success/failure metrics wherever relevant. Fix tests that relied on old logic and fix test-time replacement/cui location. * CU-86956du3q: Add documentation for new clases and methods * CU-86956du3q: Rename enum constant (SPAN_OVERLAP -> PARTIAL_OVERLAP) * CU-86956du3q: Add matching for partially overlapping children * CU-86956du3q: Add tests for partially overlapping children * CU-86956du3q: Update regression checking to generate multiple sub-cases for multiple placeholders * CU-86956du3q: Update some tests for new format * CU-86956du3q: Remove old / unused / irrelevant tests and test-code * CU-86956du3q: Some renaming (filter -> placeholders) * CU-86956du3q: Add some additional fail safes for option set * CU-86956du3q: Fix option set for only 1 placeholder * CU-86956du3q: Fix targeting * CU-86956du3q: Add tests for targeting * CU-86956du3q: Remove MCT export conversion (at least for now) * CU-86956du3q: Remove MCT export conversion tests (at least for now) * CU-86956du3q: Remove suite editing (at least for now) * CU-86956du3q: Remove category separation (at least for now) * CU-86956du3q: Remove unused regression utils (at least for now) * CU-86956du3q: Remove serialisation tests (at least for now) * CU-86956du3q: Improve quality of default regression test set * CU-86956du3q: Improve exceptions in targeting * CU-86956du3q: Fix docstring issue regarding exceptions * CU-86956du3q: Update test with correct exceptions * CU-86956du3q: Add utils for partial substitutions and corresponding tests * CU-86956du3q: Allow multiple of the same placeholder in a phrase. And more specifically, treat each one as their own sub-case * CU-86956du3q: Add relevant tests for multi-placeholder checking * CU-86956du3q: Allow changing of multiple pre-processing placeholders * CU-86956du3q: Fix 1-placeholder sub-case yielding * CU-86956du3q: Remove debug output * CU-86956du3q: Replace separator (~) with whitespace when checking * CU-86956du3q: Add utility method to limit string length for output * CU-86956du3q: Improve string length limiting method * CU-86956du3q: Add a few tests for string length limiting method * CU-86956du3q: Add an ANYTHING strictness (mostly for example disbaling) * CU-86956du3q: Add storage of examples (of a certain strictness) as well as relevant output * CU-86956du3q: Fix type (missing ending bracket) in report output * CU-86956du3q: Fix examples header appearing for every example * CU-86956du3q: Print the same phrase fewer times for examples * CU-86956du3q: Update fake CDB with (default) config * CU-86956du3q: Add finding to examples and output * CU-86956du3q: Add config to another fake CDB during test time * CU-86956du3q: Allow strictness to propagate to parts when looking at examples * CU-86956du3q: Add placeholder to examples output * CU-86956du3q: Refactor report output generation slightly * CU-86956du3q: Show all non-identical examples * CU-86956du3q: Update example checking with strictness requirement (instead of simple boolean) * CU-86956du3q: Simplify targeting somewhat (remove unnecessary method) * CU-86956du3q: Allow changing of ouptut phrase max length * CU-86956du3q: Fix doc string for changed method * CU-86956du3q: Small whitespace fix * CU-86956du3q: Fix total-included checking iteration * CU-86956du3q: Add strictness and max phrase length to CLI * CU-86956du3q: Add examople strictness to CLI * CU-86956du3q: Fix default value for strictness in CLI * CU-86956du3q: Update to use number of sub-cases for tqdm/progress bar * CU-86956du3q: Remove option to set the total for progress bar (the automated one works fine now) * CU-86956du3q: Simplify the progress bar by combining all cases * CU-86956du3q: Split subcase iteration * CU-86956du3q: Rename regression checker to regression suite * CU-86956du3q: Streamline typing and the like by using intermediate data classes * CU-86956du3q: Remove redundant method * CU-86956du3q: Remove redundant method and acommpanying test * CU-86956du3q: Remove redundant class * CU-86956du3q: Add another intermediate data class * CU-86956du3q: Remove completed TODO notes and redundant method * CU-86956du3q: Add documentation to new methods and clases. Simplify example keeping. * CU-86956du3q: Small update for how default test suite is handled for CLI * CU-86956du3q: Small to report output format * CU-86956du3q: Add easier to read exception when unable to load a placeholder * CU-86956du3q: Update percentages output to avoid as many decimal places * CU-86956du3q: Use preferred name for run-to-run consistency * CU-86956du3q: Update test time fake CDBs * CU-86956du3q: Update default regression tests with new extensive (yet simple) test case * CU-86956du3q: Add initial README for regression stuff * CU-86956du3q: Add option to for failing with having found another concept. Added other incorrect cui that was found (if applicable). Fixed issue with finding grandparents. * CU-86956du3q: Add tests for parent and grandparent finding; fix tests for new changes (with optionally found alternative CUI) * CU-86956du3q: Add preferred name to wrong CUI found * CU-86956du3q: Fix tests for new form of determine cui description; add test for exact span grandchild * CU-86956du3q: Fix determining partial matches for grandchildren and beyond * CU-86956du3q: Add test for partial matches of grandchildren * Fixing bug for metacat Fix issues with compute_class_weights JSON serialization and enforce fc2 usage when fc3 is enabled * Resolved an issue where compute_class_weights returns a NumPy array, causing an error when saving the configuration as JSON (since JSON does not support NumPy arrays). The fix ensures compatibility by converting the NumPy array to a JSON-serializable format. * Added a safeguard in the model_architecture_config for meta_cat_config. The current architecture assumes fc3 is only used when fc2 is enabled. If fc2 is set to False and fc3 is True, the model would fail due to a mismatch in hidden layer sizes. The fix automatically enables fc2 if fc3 is set to True, preventing potential errors. * CU-86956duhb: Add method to backport a model pack from 1.12 to previous version (#465) * CU-86956duhb: Add method to backport a model pack from 1.12 to previous version * CU-86956duhb: Fix some doc string issues * CU-86956duhb: Add deprecation decorator to old config-fix * CU-86956duhb: Mark backporting method as deprecated and to be removed in 1.14 * CU-8694cd9t2: Allow merging config into model pack config before init (#462) * CU-8694cd9t2: Allow merging config into model pack config before init * CU-8694fwyje: Update all configs with pre-load parts documented (#473) * CU-86956du3q: Add converter from MCT export * CU-86956du3q: Add documentation to MCT export converter * CU-86956du3q: Add option to create a regression suite from an MCT export * CU-86956du3q: Add option to create a regression suite from an MCT export to CLI * CU-86956du3q: Add a small note for converter placeholder * CU-86956du3q: Add tests for MedCATtrainer export converter * CU-86956du3q: Add tests for regression suite generation based on MCT export * CU-86956du3q: Simplify regression case creation tests somewhat * CU-86956du3q: Add option to create a regression suite YAML from MCT export * CU-86956du3q: Add option to stop at MCT export conversion * CU-86956du3q: Make use of only-prefnames option * CU-86956du3q: Fix loading of only-prefnames option from yaml * CU-86956du3q: Add comment for only using preferred names to the default regression suite yaml * CU-86956du3q: Fix tests broken due to pref-name only change * CU-86956du3q: Add utility method to set runtime doc strings for enum constants * CU-86956du3q: Add tests for runtime doc string addition * CU-86956du3q: Add more tests for runtime doc string addition (to make sure it fails without the change) * CU-86956du3q: Make Finding enum has runtime doc strings * CU-86956du3q: Add CLI option to show the various descriptions of the finding types (--only-describe) * CU-86956du3q: Update dict and json methods for some results for JSON serialisation * CU-86956du3q: Add a few json serialisation tests * CU-86956du3q: Add json serialisation example strictness to CLI * CU-86956du3q: Add a few more json serialisation tests * CU-86956du3q: Add usage of regression suite name from the name of the file being read * CU-86956du3q: Fix tests by adding the regression suite name where applicable * CU-86956du3q: Avoid examples in ResultDescriptor * CU-86956du3q: Make sure strictness propagates accross all parts of a multi-result descriptor * CU-86956du3q: Update tests: Use correct reporting for generating fake reports * CU-86956du3q: Fix small test issue * CU-86956du3q: Update tests for manual success/fail for results * CU-86956du3q: Separate calculation section of report finding * CU-86956du3q: Add a few more tests for report/results * CU-86956du3q: Add option to force a non-0 exit status upon any regression test failure * CU-86956du3q: Add files for regression model creation and checking * CU-86956du3q: Add new part to main workflow to create and regression check a simple model pack * CU-86956du3q: Update a mistyped comment * CU-86956du3q: Make regression run at STRICTEST strictness at GHA workflow time * CU-86956du3q: Fix strictness matrix for anything-typed strictness * CU-86956du3q: Add strictness matrix information to --describe-only * CU-86956du3q: Add python version to created model pack for test time * CU-86956du3q: Use the python version of creat model pack during test time to avoid conflicts with other python versions running in parallel * CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iteration) and add args to output upon regression checking * Revert "CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iteration) and add args to output upon regression checking" This reverts commit 4bf3089. * CU-86956du3q: Make full model path the last line of the output upon creation model for regression * CU-86956du3q: Move regression workflow logic to a separate bash script * CU-86956du3q: Update comments in regression bash script * CU-8694pz44d: Fix model cleanup during regression * CU-86956du3q: Fix typos in utils * CU-86956du3q: Fix a bunch of various typos in doc strings and comments --------- Co-authored-by: shubham-s-agarwal <66172189+shubham-s-agarwal@users.noreply.github.com>
CogStack · Aug 28, 2024 · 7862182 · 7862182
1 parent 209c5e4
commit 7862182
Show file tree

Hide file tree

Showing 29 changed files with 2,826 additions and 3,147 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -40,7 +40,8 @@ jobs:
           second_half_nl=$(echo "$all_files" | tail -n +$(($midpoint + 1)))
           timeout 25m python -m unittest ${first_half_nl[@]}
           timeout 25m python -m unittest ${second_half_nl[@]}
-
+      - name: Regression
+        run: source tests/resources/regression/run_regression.sh
       - name: Get the latest release version
         id: get_latest_release
         uses: actions/github-script@v6

diff --git a/configs/default_regression_tests.yml b/configs/default_regression_tests.yml
@@ -1,79 +1,142 @@
-# # Example of some test cases
-# # They will try to cover as many possible use cases as possible
-# # The idea is that the CUI corresponding to the name is expected to be
-# # obtained by MedCAT
-# # Only the 'filters' under 'targeting' and the 'phrases' under
-# # the test case are the two required sections, the rest is optional 
-#
-# test-case-name-1: # name of this test case
-#   targeting: # info regarding targets of this test case
-#     strategy: "ALL" # the strategy for dealing with the filters below
-#                     # so "ALL" means the targets need to match all the below filters
-#                     # and "ANY" means that the targets need to match at least one of the filters
-#                     # if only one type of target it specified, this is irrelevant
-#                     # the default value is "ALL" if not specified
-#     prefname-only: False # set to True if only prefered names should be checked (defaults to False)
-#     targfiltersets: # the filters for this specific test case
-#                     # there has to be one type of target, but multiple can be specified
-#                     # if multiple types are target, the strategy defined above is taken into affect
-#                     # each type can specify one or multiple values
-#                     #  this example shows has one values 
-#                     #  the next example (below) will have multiple values
-#       type_id: "0123" # type_id or type_ids
-#       cui: "01230" # the target CUI (or list of CUIS)
-#       name: "name0" # the target names
-#                      # all specified names need to exist within the CDB
-#   phrases: "The quick brown %s jumped over the lazy cat" # the phrases to go through
-#                                                          # for each phrases, '%s' is replaced
-#                                                          # by each name that is to be tested
-# test-case-name-2: # name of this test case
-#   targeting:
-#     filters:
-#       type_id: # multiple target type IDs
-#       - "123"
-#       - "223"
-#       cui: # multiple target CUI
-#       - "1234"
-#       - "2234"
-#       name: # multiple names
-#       - "name1"
-#       - "name2"
-#       cui_and_children: # an example with CUI and children
-#         cui: '111' # the CUI (or CUIs)
-#         depth: 2   # and the depth of children
-#   phrases:
-#   - "The %s was measured"
-#   - "The %s was not measured"
-#
-# # The following example was (rather arbitrarily) created and should work for
-# # the included SNOMED models
-test-case-1:
-  targeting:
-    strategy: "ALL"
-    filters:
-      type_id: "2680757"
-  phrases:
-  - "The %s was measured"
+# this is an example test case
+# it is based on SNOMED-CT
+test-case-1:  # The (somewhat) arbitrary name of the test case
+  targeting:  # the description of the replacement targets in the phrase(s)
+    placeholders:  # the placeholders to replace in the phrase(s)
+                   # Note that only 1 concept will be tested for at one time.
+                   # So if the prhase(s) has/have more than 1 placeholder, the
+                   # rest of them will be substitued in without care for whether
+                   # or how accurately the model is able to recognise them.
+                   # For the concepts that are not under test at a given time
+                   # the "first" name is used (because the implementation has
+                   # names in a set, there is possibility for run-to-run variance
+                   # because of different names being used).
+                   #
+                   # There are 2 modes for the placeholders:
+                   # 1. any-combination: false
+                   #   In this mode, only the concepts in the same position
+                   #   in the various lists are used in conjunction to oneanother.
+                   #   Though this also means that it is expected that all of the
+                   #   placeholders have the same number of CUIs to use.
+                   #   Assuming each of the N placeholders defines M replacement
+                   #   cuis, this approach produces M*N cases.
+                   # 2. any-combination: true
+                   #   In this mode, any combination of the replacement CUIs is
+                   #   allowed. This means that quite a few different combinations
+                   #   will be generated and used. It also means that different
+                   #   placeholders can have different number of concepts suitbale
+                   #   for them.
+                   #   Assuming eacho of the N placeholders defines M repalcement
+                   #   cuis, this approach produces N * N^M (where `^` is power)
+                   #   cases. But for a more complicated set up (i.e where different
+                   #   placeholders have a different number of swappable CUIs)
+                   #   this calculation is not as straight forward.
+                   #
+                   # NOTE: The above description does not take into account different
+                   #       number of names associated with different concepts. For each
+                   #       of the "primary" concepts, each possible name is attempted.
+      - placeholder: '[DISORDER]'  # the palceholder that will be substituted in the phrase(s)
+        cuis: ['4473006',  # Intracerebral hemorrhage
+               '85189001',  # Acute appendicitis
+               '186738001',  # vestibular neuritis
+               '186738001',  # vestibular neuritis
+              ]
+      - placeholder: '[FINDING1]'
+        cuis: ['162300006',  # unilateral headache
+               '21522001',  # abdominal pain
+               '103298005',  # severe vertigo
+               '103298005',  # severe vertigo
+              ]
+        prefname-only: false  # this is an optional keyword for wach placeholder
+                              # if set to true, only the preferred name will be used for
+                              # this concept. Otherwise, all names will be used as
+                              # different sub-cases
+      - placeholder: '[FINDING2]'
+        cuis: ['409668002',  # photophobia
+               '422587007',  # nausea
+               '422587007',  # nausea
+               '422587007',  # nausea
+              ]
+      - placeholder: '[FINDING3]'
+        cuis: ['2228002',  # scintillating scotoma
+               '386661006',  # fever
+               '81756001',  # horizontal nystagmus
+               '81756001',  # horizontal nystagmus
+              ]
+      - placeholder: '[NEGFINDING]'
+        cuis: ['386661006',  # fever
+               '62315008',  # diarrhea
+               '15188001',  # hearing loss
+               '60862001',  # tinnitus
+              ]
+    any-combination: false  # if set to false, same length of CUIs is expected
+                            # for each placeholder and only a combination is used
+  phrases:  # The list of phrases
+  - >
+      Description: [DISORDER]
+
+      CC: [FINDING1] on presentation; then developed [FINDING3]
+
+      HX: On the day of presentation, this 32 y/o RHM suddenly developed [FINDING1] and [FINDING2].
+      Four hours later he experienced sudden [FINDING3] lasting two hours.
+      There were no other associated symptoms except for the [FINDING1] and [FINDING2].
+      He denied [NEGFINDING].
 test-case-2:
   targeting:
-    filters:
-      type_id: "9090192"
-  phrases:
-  - "Patient presented with %s"
-  - "No %s was present"
-test-case-3:
-  targeting:
-    filters:
-      type_id: "67667581"
-  phrases:
-  - "The patient has been diagnosed with %s"
-  - "There are no signs of %s"
-test-case-4:
-  targeting:
-    strategy: "ALL"
-    filters:
-      cui_and_children:
-       cui: "364075005" # 'heart rate'
-       depth: 4         # and children 4 deep
+    placeholders:
+      - placeholder: '[FINDING1]'
+        cuis: ['49727002',  # cough
+               '29857009',  # chest pain
+               '21522001',  # abdominal pain
+               '57676002',  # joint pain
+               '25064002',  # headache
+               '271807003',  # fever
+               '162397003',  # hematuria (blood in urine)
+               '271757001',  # fatigue
+               '386661006',  # weight loss
+               '62315008',  # dysuria (painful urination)
+              ]
+      - placeholder: '[FINDING2]'
+        cuis: ['267036007',  # shortness of breath
+               '68962001',  # palpatations
+               '422587007',  # nausea
+               '182888003',  # swelling
+               '404640003',  # dizziness
+               '422400008',  # sore throat
+               '267036007',  # shortness of breath
+               '267064002',  # night sweats
+               '162607003',  # back pain
+               '267102003',  # urinary frequency
+              ]
+      - placeholder: '[DISORDER]'
+        cuis: ['195967001',  # asthma
+               '194828000',  # angina pectoris
+               '25374005',  # gastroenteritis
+               '69896004',  # rheumatoid arthritis
+               '37796009',  # migraine
+               '186747009',  # influenza
+               '106063007',  # urinary tract infection
+               '444814009',  # chronic fatigue syndrome
+               '95281007',  # tuberculosis
+               '431855005',  # cystitis
+        ]
+    any-combination: false
   phrases:
-  - "The patient's %s was 82 bps"
+  - >
+      The patient presents with [FINDING1] and [FINDING2]. These findings are suggestive of [DISORDER].
+      Further diagnostic evaluation and investigations are required to confirm the diagnosis.
+  - >
+      The patient reports [FINDING1] and has also been experiencing [FINDING2]. These symptoms are consistent with a clinical presentation of [DISORDER].
+      Further assessment and diagnostic tests are required to establish the underlying cause.
+  - >
+      Upon evaluation, the patient exhibits [FINDING1] along with [FINDING2]. This combination of findings raises suspicion for [DISORDER].
+      Comprehensive diagnostic workup is advised to confirm the diagnosis and plan appropriate management.
+  - >
+      During the consultation, the patient described [FINDING1] and noted a recent history of [FINDING2]. These clinical features are suggestive of [DISORDER].
+      Further investigation is necessary to verify the diagnosis and rule out other potential causes.
+  - >
+      The patient's symptoms include [FINDING1] and [FINDING2], which are commonly associated with [DISORDER].
+      It is recommended that additional diagnostic procedures be performed to confirm this working diagnosis.
+  - >
+      The clinical presentation of [FINDING1] and [FINDING2] is indicative of [DISORDER].
+      To ensure accurate diagnosis, further clinical evaluation and diagnostic tests are required.
diff --git a/medcat/utils/regression/README.md b/medcat/utils/regression/README.md
@@ -0,0 +1,111 @@
+# Regression with MedCAT
+
+We often end up creating new models when a new version of an ontology (e.g SNOMED-CT) comes out.
+However, it is not always clear whether the new model is comparable to the old one.
+To solve this, we've developed a regression suite system.
+
+The idea is that we can define a small set of patient records with different placeholders for different findings or disorders, or anything in the ontology, really.
+And we can then specify the concepts we think should fit in this patient record.
+
+An example patient record with placeholders (the simple one from the default regression suite):
+```
+The patient presents with [FINDING1] and [FINDING2]. These findings are suggestive of [DISORDER].
+Further diagnostic evaluation and investigations are required to confirm the diagnosis.
+```
+As we can see, there are three different placeholders in here: `[FINDING1]`, `[FINDING2]`, and `[DISORDER]`.
+Each can be replaced with a specific name of a specific concept.
+For instance, we've specified the following:
+ - `[FINDING1]` -> '49727002' (cough)
+ - `[FINDING2]` -> '267036007' (shortness of breath)
+ - `[DISORDER]` -> '195967001' (asthma)
+
+So with these swapped into the original patient record we get:
+```
+The patient presents with cough and shortness of bre. These findings are suggestive of asthma.
+Further diagnostic evaluation and investigations are required to confirm the diagnosis.
+```
+
+# Using regression suite
+
+The easiest way to use the regression suite is to use the built in endpoint:
+```
+python -m medcat.utils.regression.regression_checker <model pack name> [regression suite YAML]
+```
+While you need to specify a model pack, you do not need to specify a regression suite since the default one can be used instead.
+
+This will first read the regression suite from the YAML, then load the model pack, and finally run the regression suite.
+
+<details><summary>The output can look like this</summary>
+Output on the 2024-06 SNOMED-CT model on the first case in the default regression suite.
+
+```
+$ python -m medcat.utils.regression.regression_checker models/Snomed2024-06-gstt-trained_ae5b08e0fb5310b2.zip
+Loading RegressionChecker from yaml: configs/default_regression_tests.yml
+Loading model pack from file: models/Snomed2024-06-gstt-trained_ae5b08e0fb5310b2.zip
+Checking the current status
+100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:10<00:00,  1.96it/s]
+A total of 1 parts were kept track of within the group "ALL".
+And a total of 756 (sub)cases were checked.
+At the strictness level of Strictness.NORMAL (allowing ['FOUND_ANY_CHILD', 'BIGGER_SPAN_LEFT', 'SMALLER_SPAN', 'PARTIAL_OVERLAP', 'BIGGER_SPAN_BOTH', 'BIGGER_SPAN_RIGHT', 'FOUND_CHILD_PARTIAL', 'IDENTICAL']):
+The number of total successful (sub) cases: 737 (97.49%)
+The number of total failing (sub) cases   : 19 ( 2.51%)
+IDENTICAL               :       730 (96.56%)
+SMALLER_SPAN            :         2 ( 0.26%)
+FOUND_ANY_CHILD         :         5 ( 0.66%)
+FAIL                    :        19 ( 2.51%)
+	Tested 'test-case-1' for a total of 756 cases:
+		IDENTICAL               :       730 (96.56%)
+		SMALLER_SPAN            :         2 ( 0.26%)
+		FOUND_ANY_CHILD         :         5 ( 0.66%)
+		FAIL                    :        19 ( 2.51%)
+		Examples at Strictness.STRICTEST strictness
+		With phrase: 'Description: Acute appendicitis\nCC: abdo [277 chars] d Nausea. He denied Diarrhea.\n'
+			FOUND_ANY_CHILD for placeholder [FINDING1] with CUI '21522001' and name 'abdominal colic'
+		With phrase: 'Description: Acute appendicitis\nCC: [FIN [273 chars] d Nausea. He denied Diarrhea.\n'
+			SMALLER_SPAN for placeholder [FINDING1] with CUI '21522001' and name 'abdomen colic'
+		With phrase: 'Description: Acute appendicitis\nCC: abdo [273 chars] d Nausea. He denied Diarrhea.\n'
+			SMALLER_SPAN for placeholder [FINDING1] with CUI '21522001' and name 'abdomen colic'
+		With phrase: 'Description: Acute appendicitis\nCC: abdo [293 chars] d Nausea. He denied Diarrhea.\n'
+			FOUND_ANY_CHILD for placeholder [FINDING1] with CUI '21522001' and name 'abdominal colic finding'
+		With phrase: 'Description: Acute appendicitis\nCC: [FIN [271 chars] d Nausea. He denied Diarrhea.\n'
+			FAIL for placeholder [FINDING1] with CUI '21522001' and name 'abdomen pain'
+		With phrase: 'Description: Acute appendicitis\nCC: [FIN [271 chars] d Nausea. He denied Diarrhea.\n'
+			FAIL for placeholder [FINDING1] with CUI '21522001' and name 'colicky pain'
+		With phrase: 'Description: Acute appendicitis\nCC: coli [271 chars] d Nausea. He denied Diarrhea.\n'
+			FAIL for placeholder [FINDING1] with CUI '21522001' and name 'colicky pain'
+		With phrase: 'Description: Acute appendicitis\nCC: coli [271 chars] d Nausea. He denied Diarrhea.\n'
+			FAIL for placeholder [FINDING1] with CUI '21522001' and name 'colicky pain'
+		With phrase: 'Description: Acute appendicitis\nCC: Abdo [291 chars] d Nausea. He denied Diarrhea.\n'
+			FAIL for placeholder [FINDING3] with CUI '386661006' and name 'hyperthermia'
+		With phrase: 'Description: Acute appendicitis\nCC: Abdo [295 chars] d Nausea. He denied Diarrhea.\n'
+			FAIL for placeholder [FINDING3] with CUI '386661006' and name 'high temperature'
+		With phrase: 'Description: Acute appendicitis\nCC: Abdo [295 chars] d Nausea. He denied Diarrhea.\n'
+			FAIL for placeholder [FINDING3] with CUI '386661006' and name 'high temperature'
+		With phrase: 'Description: Migraine with aura\nCC: Unil [340 chars] obia. He denied [NEGFINDING].\n'
+			FAIL for placeholder [NEGFINDING] with CUI '386661006' and name 'hyperthermia'
+			FAIL for placeholder [NEGFINDING] with CUI '386661006' and name 'high temperature'
+		With phrase: 'Description: Acute appendicitis\nCC: Abdo [283 chars] usea. He denied [NEGFINDING].\n'
+			FAIL for placeholder [NEGFINDING] with CUI '62315008' and name 'loose stools'
+			FAIL for placeholder [NEGFINDING] with CUI '62315008' and name 'watery stool'
+			FAIL for placeholder [NEGFINDING] with CUI '62315008' and name 'loose bowel movement'
+			FOUND_ANY_CHILD for placeholder [NEGFINDING] with CUI '62315008' and name 'diarrhea symptom'
+			FAIL for placeholder [NEGFINDING] with CUI '62315008' and name 'loose bowel motion'
+			FAIL for placeholder [NEGFINDING] with CUI '62315008' and name 'loose bowel motions'
+			FAIL for placeholder [NEGFINDING] with CUI '62315008' and name 'loose stool'
+			FOUND_ANY_CHILD for placeholder [NEGFINDING] with CUI '62315008' and name 'diarrhea symptoms'
+			FOUND_ANY_CHILD for placeholder [NEGFINDING] with CUI '62315008' and name 'diarrhea symptom finding'
+			FAIL for placeholder [NEGFINDING] with CUI '62315008' and name 'watery stools'
+		With phrase: 'Description: Epidemic vertigo\nCC: Severe [311 chars] usea. He denied [NEGFINDING].\n'
+			FAIL for placeholder [NEGFINDING] with CUI '15188001' and name 'decreased hearing'
+			FAIL for placeholder [NEGFINDING] with CUI '15188001' and name 'decreased hearing finding'
+			FAIL for placeholder [NEGFINDING] with CUI '60862001' and name 'ringing in ear'
+```
+
+</details>
+
+## The regression suite format
+
+The format has some documentation in the default (`config/default_regression_tests.yml`).
+One should refer to those for now.
+
+