Refactor/python docstrings for sd estimator #176

Dioprz · 2024-09-16T23:56:11Z

Description

Implementation of #169 for SDEstimator.

Formatting changes were made by Black.

Review process

A quick glance at the new docstrings would be great.

Estimator-specific doctests

make docker-build
make docker-run
pytest --doctest-modules -n auto -vv cryptographic_estimators/SDEstimator/

Cumulative test (with all the already migrated docstrings)

make docker-pytest

Pre-approval checklist

The code builds clean without any errors or warnings
I've added/updated tests
I've added/updated documentation if needed

Applied changes discussed at #169 to SDEstimator.

…cstrings_for_sd_estimator

Dioprz

The Sonarcloud test failing is quite... Dumb.

Basically, it complains because Black has improved the formatting of some files by splitting their content into multiple lines. This means that if we previously had one line of unreadable code duplicated, and it's now split into three lines of readable code, it complains because there are "too many duplicated lines of code". This feels incorrect, in my opinion.

Anyway, I can offer you two options:

Ignore the Sonarcloud warning in this PR and future docstring-migration-related PRs, knowing that some of those checks could be sensible and reasonable, and address them later.
Ignore the Sonarcloud warning in this PR, and disable Black for the incoming PRs so Sonarcloud doesn't complain on the next ones. (Very discouraged from my side, as we are losing a great opportunity to improve the code quality by making it standard on formatting. However, it can also help in case you feel tihs PR too noisy).

Both options include ignoring the Sonarcloud warning for this PR, as I don't have a quick way to undo the formatting changes, so basically it would imply making all the docstring changes from scratch.

Here I'm pointing out some examples of the changes introduced by Black.

cryptographic_estimators/SDEstimator/SDAlgorithms/may_ozerov.py

…cstrings_for_sd_estimator

Dioprz · 2024-09-18T01:03:26Z

@Javierverbel Thank you so much for fixing the Sonarcloud warning 💯 .

FloydZ

only small comments, mainly about the citations and if the html doc generation still works

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py

FloydZ · 2024-09-18T13:32:19Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py

@@ -169,49 +159,41 @@ def get_optimal_parameters_dict(self):
        return a

    def __repr__(self):
-        """
-        """
+        """ """


just remove it

FloydZ · 2024-09-18T13:34:03Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py


            +--------------------------+-------------------+-------------------+
            | <-----+ n - k - l +----->|<--+ (k + l)/2 +-->|<--+ (k + l)/2 +-->|
            |           w - 2p         |        p          |        p          |
            +--------------------------+-------------------+-------------------+

-        INPUT:
+        Args:
+            problem (SDProblem): SDProblem object including all necessary parameters.


is this still a valid way to describe the arguments for the automatic html docs generation?

if I understood correctly there is an automatic html doc generation extension for this format (this napoleon sphincs extension) @Dioprz ?

It's correct. Napoleon will produce the typical reST docstring as an intermediary step for a neat display.

FloydZ · 2024-09-18T13:39:05Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py


-        INPUT:
+        The algorithm was introduced in [BJMM12] as an extension of [MMT11].


FloydZ · 2024-09-18T13:42:52Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm_dw.py

+            p_range (list, optional): Interval in which the parameter p is searched (default: [0, 25], helps speeding up computation).
+            memory_access (int, optional): Specifies the memory access cost model (default: 0, choices: 0 - constant, 1 - logarithmic, 2 - square-root, 3 - cube-root or deploy custom function which takes as input the logarithm of the total memory usage).
+
+        References:


ok here we change the "citation type" from [MMT] to this. Is there a reason for this? Because I think this way leads to
1 ) alot of copy paste
2) errors because of that

I liked the first central way of collecting citations, but maybe there are reasons against it.

I have no clue why that could happen. I mean, the information to create that reference section doesn't even exist in the previous docstring.

AI was making some really weird things here.

FloydZ · 2024-09-18T13:43:32Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm_dw.py

+            memory_access (int, optional): Specifies the memory access cost model (default: 0, choices: 0 - constant, 1 - logarithmic, 2 - square-root, 3 - cube-root or deploy custom function which takes as input the logarithm of the total memory usage).
+
+        References:
+            .. [EB22] Enguehard, Élise, and Anja Becker. "Quantum attacks on McEliece and Niederreiter cryptosystems." Designs, Codes and Cryptography 90.1 (2022): 1-23.


Btw this should be [Esser, Bellini]

I have completely removed the reference section, as it didn't even exist in the original docstring.

cryptographic_estimators/SDEstimator/SDAlgorithms/dumer.py

FloydZ · 2024-09-18T13:47:48Z

cryptographic_estimators/SDEstimator/SDAlgorithms/stern.py

@@ -28,24 +36,22 @@
 class Stern(SDAlgorithm):
    def __init__(self, problem: SDProblem, **kwargs):
        """
-        Construct an instance of Stern's estimator [Ste88]_, [BLP08]_.
+        Construct an instance of Stern's estimator.


citation completely deleted.

This is really bad.

Thanks for the catch.

FloydZ · 2024-09-18T13:52:01Z

cryptographic_estimators/SDEstimator/sd_helper.py

-
-        sage: from cryptographic_estimators.SDEstimator import _gaussian_elimination_complexity
-        sage: _gaussian_elimination_complexity(n=100,k=20,r=1) # random
+    References:


citation issue

Which is the error, sorry? The changed format (year position, pages instead of pp., etc)?

@FloydZ could you please add the comment to the actual problematic line in the future, like this i always have to go to the file to check it 😇

what floyd means is that citations have been removed. The citations [Bar07]_ [BLP08]_ should be mentioned. But the docstring was already inconsistent before

I am always marking the problematic position, but admittedly, the message in this case is not particularly good. I think the problem is, that if someone applies changes to this file after I commented, the line numbers are misaligned, which leads to this misaligned references.
Is is possible to mark multiple lines in such a comment? That would make it a lot easier.

@FloydZ Yes, it's possible: click the first line of the block, then press shift, and click the end line of the block.

FloydZ · 2024-09-18T13:54:08Z

cryptographic_estimators/SDEstimator/sd_problem.py


    def to_bitcomplexity_time(self, basic_operations: float):
        """
-        Returns the bit-complexity corresponding to basic_operations field additions
+        Calculates the bit-complexity corresponding to the number of field additions.


I think adding:

, which are the basic_operations for SDAlgorithms

should be helpful. As we often refer to basic_operations in the help files

Dioprz · 2024-09-19T00:25:03Z

only small comments, mainly about the citations and if the html doc generation still works

Thank you for the careful review, @FloydZ.

Some things that are worth explaining:

Sage uses reStructuredText-based docstrings, while Google-style docstrings don't (at least by default). This implies that:
- The current HTML docs generation process will likely be incompatible with the new docstring style out of the box. I believe we will need, for example, the Sphinx Napoleon extension to render our new docstring style.
- Because I wasn't expecting retrocompatibility, and the main goal right now is to have a pure-python executable library, I wasn't taking too much care to preserve the reST format.
- This is relatively easy to fix later. Due to the citation format, we only need a regex expression to look for places where the format [XXXX] is present and modify them as needed.
Regarding the more serious errors like removing citations or creating new reference sections with incorrect information, these were the result of a very experimental prompt and workflow process on my side. Because I concentrated so much on not creating or modifying values for the Examples: or Tests: sections, I completely forgot to refine the process for the other sections of the docstrings.
Once I sent this PR, I immediately improved the prompt used as well as my workflow process with the learnings made. Now I can inspect and review more carefully what is being generated by AI, so these errors shouldn't be present in subsequent PRs.

With that said, I left a 👍🏻 in every comment I applied, and a 👀 in those that requires feedback or consideration based on this explanation. Resolve those that you think are good now, and ping me to address any others that are missing please.

cryptographic_estimators/SDEstimator/SDAlgorithms/stern.py

Javierverbel · 2024-09-20T05:18:00Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py

-        """
-        computes the expected runtime and memory consumption for the depth 3 version
-        """
+        """Computes the expected runtime and memory consumption for the depth 3 version."""


Should we alway allow new line before and after the each """?
So """Computes the expected runtime and memory consumption for the depth 3 version.""" becomes

""" Computes the expected runtime and memory consumption for the depth 3 version. """

I assumed that this is part of the google styling guide. If not I also prefer the newlines. But I would also make it dependent on if there is any format requirement by the doc generation by sphincs

Yes, that's the reason. A typical google-styled docstring looks like this:

def func(arg1, arg2): """Summary line. Extended description of function. Args: arg1 (int): Description of arg1 arg2 (str): Description of arg2 Returns: bool: Description of return value """ return True

If only the summary line is provided, we should have something like:

def func(arg1, arg2): """Summary line. """ return True

But I think it can be useful for future changes (if any), to provide the new line at the end like this:

def func(arg1, arg2): """Summary line. """ return True

So the extendend description of the function can be added without friction.

I would discourage to add a leading new line as you suggest @Javierverbel; because then future editors must care to fix the docstring from this:

def func(arg1, arg2): """ Summary line. """ return True

To this:

def func(arg1, arg2): """Summary line. Long description """ return True

Do you agree with my proposal? @Javierverbel @Memphisd?

Javierverbel · 2024-09-20T05:19:37Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm_dw.py

-        """
-        Construct an instance of BJMM's estimator using *d*isjoint *w*eight distributions combined with
-         MitM-nearest neighbor search. [EB22]_, [MMT11]_, [BJMM12]_.
+        """Construct an instance of BJMM's estimator using disjoint weight distributions combined with MitM-nearest neighbor search. [EB22]_, [MMT11]_, [BJMM12]_.


Here the underscores were not removed. @Dioprz we have to be careful with this cases when adding the _ back to those for which it was removed

Javierverbel · 2024-09-20T05:20:35Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm_dw.py

-        Computes the expected runtime and memory consumption for a given parameter set.
-        """
-
+        """Computes the expected runtime and memory consumption for a given parameter set."""


New line before and after """

Javierverbel · 2024-09-20T05:23:24Z

cryptographic_estimators/SDEstimator/SDWorkfactorModels/bjmm.py

+    def __init__(
+        self, par_names: list, problem: SDProblem, iterations: int, accuracy: int
+    ):
+        """Optimization model for workfactor computation of BJMM algorithm in depth 3."""


New line before and after """

Javierverbel · 2024-09-20T05:23:47Z

cryptographic_estimators/SDEstimator/SDWorkfactorModels/prange.py

-        """
-        Optimization model for workfactor computation of Prange's algorithm
-        """
+        """Optimization model for workfactor computation of Prange's algorithm."""


New line before and after """

Javierverbel · 2024-09-20T05:24:05Z

cryptographic_estimators/SDEstimator/SDWorkfactorModels/workfactor_helper.py

-    computes the asymptotic runtime of the Nearest Neighbour Algorithm by
-    May-Ozerov [MO15]_
-    """
+    """Computes the asymptotic runtime of the Nearest Neighbour Algorithm by May-Ozerov [MO15]."""


New line before and after """ and _

Memphisd

Overall looks good. Just the AI changes of the docstrings sometimes lead to misunderstandings. We should be careful with reformulations of the explanations of the functions.

Memphisd · 2024-09-20T05:15:54Z

cryptographic_estimators/SDEstimator/SDAlgorithms/ball_collision.py

-            sage: from cryptographic_estimators.SDEstimator import SDProblem
-            sage: A = BallCollision(SDProblem(n=100,k=50,w=10))
-            sage: A.l()
+        Examples:


how does this napoleon extension for sphincs work? standard html doc generation via sphincs required capital spelling of EXAMPLES and TESTS and a double colon followed by a blank line? Maybe we should investigate this first to avoid later changes?

Napoleon creates and intermediary step to create the appropriate docstring expected by Sphinx.

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py

Memphisd · 2024-09-20T05:19:14Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py


-        expected weight distribution::
+        The expected weight distribution is:


In line with the above it should be "Expected weight distribution:"

Memphisd · 2024-09-20T05:20:09Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py

-            | <-----+ n - k - l +----->|<--+ (k + l)/2 +-->|<--+ (k + l)/2 +-->|
-            |           w - 2p         |        p          |        p          |
-            +--------------------------+-------------------+-------------------+
+        +---------------------------+-------------------+-------------------+


the indentation should be kept (in line with the above at least) and for sphincs this ensured being displayed in a codeblock without spaces being removed

Memphisd · 2024-09-20T05:22:22Z

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py

@@ -64,7 +68,7 @@ def __init__(self, problem: SDProblem, **kwargs):

    def initialize_parameter_ranges(self):
        """
-        initialize the parameters p, l, p1 and for d=3 p2
+        Initialize the parameters p, l, p1, and for d=3, p2.


this docstring was wrong before already, it should say "Initialize parameter range for d"

Memphisd · 2024-09-20T05:34:52Z

cryptographic_estimators/SDEstimator/SDAlgorithms/stern.py

-            | <----+ n - k - l +----> |<-- l -->|<--+ k/2 +-->|<--+ k/2 +-->|
-            |          w - 2p         |    0    |      p      |      p      |
-            +-------------------------+---------+-------------+-------------+
+        +-------------------------+---------+-------------+-------------+


Memphisd · 2024-09-20T05:36:48Z

cryptographic_estimators/SDEstimator/SDWorkfactorModels/bjmm.py

+        """Compute the time to construct each list.
+
+        The time to construct each list is the maximum of the size of the input
+        list and twice the size of the input list minus a constant.


Should be
The time to construct the lists is taken as the maximum of the involved lists sizes, i.e., the input and output lists.

Memphisd · 2024-09-20T05:39:30Z

cryptographic_estimators/SDEstimator/SDWorkfactorModels/workfactor_helper.py

-    via the sum of two length-$vector_length$ weight-($target_weight$/2+$weight_to_cancel$) vectors
+    """Computes the asymptotic number of representations of a length-`vector_length` weight-`target_weight` vector.
+
+    This is done via the sum of two length-`vector_length` weight-(`target_weight`/2 + `weight_to_cancel`) vectors.


the meaning has changed. Please update to

Computes the asymptotic number of representations of a length-vector_length weight-target_weight vector constructed as sum of two length-vector_length weight-(target_weight/2 + weight_to_cancel) vectors.

Memphisd · 2024-09-20T05:41:01Z

cryptographic_estimators/SDEstimator/sd_helper.py

-
-        sage: from cryptographic_estimators.SDEstimator import _gaussian_elimination_complexity
-        sage: _gaussian_elimination_complexity(n=100,k=20,r=1) # random
+    References:


what floyd means is that citations have been removed. The citations [Bar07]_ [BLP08]_ should be mentioned. But the docstring was already inconsistent before

Memphisd · 2024-09-20T05:42:56Z

cryptographic_estimators/SDEstimator/sd_helper.py

-        sage: from cryptographic_estimators.SDEstimator import _list_merge_async_complexity
-        sage: _list_merge_async_complexity(L1=2**16,L2=2**14,l=16,hmap=1) # random
-    """
+    """Compute the complexity of merging two lists exact.


Compute the complexity of merging two different sized lists on l bits.

Dioprz · 2024-09-20T08:05:29Z

Alright. Digging more in the Napoleon extension for Sphinx, I found that:

It's a preparser able to translate from Google-styled docstrings to typical reST python docstrings.
For this reason, we still have all the power of reST where needed.
With that said, I strongly discourage the usage of reST elements where avoidable. Basically because it's usage requires to learn about how to express a mathematical expression, a table, or whatever else.
Napoleon provides a good number of sections and functionalities for most use-cases (Supported sections, docstring types and some configuration options). So let's just apply the KISS principle here.
I completely agree with making the exception for hyperlink references with the trailing _. I will apply the changes.
@FloydZ I was using backticks for monospaced font rendering. But looking at the Sphinx documentation on the topic, single backtick usage is context-dependant. What do you think we should do with that then? (EDIT: Precisely what I'm asking for is: what we should do with references to, for example: function arguments references, short math formulas, etc?)

Nice to hear your thoughts too: @Memphisd @Javierverbel

EDIT: I'm not able to generate the docs to make tests, because some MAYO-related errors arise

Dioprz · 2024-09-20T10:21:29Z

All the actionable comments were applied in 96c10fe. My last question in the previous comment shouldn't be a blocker in case we want to merge this now; but I'm also open to new comments.

Please don't review #178 and #179 until I have implemented the same changes as here. I will ping you all when those are done.

sonarcloud · 2024-09-20T10:45:37Z

Quality Gate passed

Issues
25 New issues
0 Accepted issues

Measures
0 Security Hotspots
68.0% Coverage on New Code
0.1% Duplication on New Code

See analysis details on SonarCloud

Javierverbel · 2024-09-20T11:09:11Z

I looks good to me @Dioprz . The let's MAYO-related in the next PR docstring update, Maybe in #178

refactor: sage docstrings migration to python SDE

87e2d2e

Applied changes discussed at #169 to SDEstimator.

Dioprz requested review from Memphisd, FloydZ and Javierverbel September 16, 2024 23:56

Dioprz added the priority high label Sep 16, 2024

Merge remote-tracking branch 'origin/develop' into refactor/python_do…

a8d3e13

…cstrings_for_sd_estimator

Dioprz commented Sep 17, 2024

View reviewed changes

cryptographic_estimators/SDEstimator/SDAlgorithms/may_ozerov.py Outdated Show resolved Hide resolved

cryptographic_estimators/SDEstimator/SDAlgorithms/may_ozerov.py Show resolved Hide resolved

cryptographic_estimators/SDEstimator/SDAlgorithms/may_ozerov.py Show resolved Hide resolved

Dioprz and others added 6 commits September 17, 2024 01:53

refactor: apply Black with 120 line-length

a00fae4

Merge remote-tracking branch 'origin/develop' into refactor/python_do…

2bfa0d2

…cstrings_for_sd_estimator

Fix: SonarClould duplication code

e7dbde8

Fix: SonarClould duplication code

b669f61

Fix: SonarClould duplication code

b8741e5

Fix: Reliability issues from SonarCloud

8217714

FloydZ reviewed Sep 18, 2024

View reviewed changes

fix: poorly migrated docstrings

e4a8448

Merge branch 'develop' into refactor/python_docstrings_for_sd_estimator

689d0e4

Javierverbel reviewed Sep 20, 2024

View reviewed changes

cryptographic_estimators/SDEstimator/SDAlgorithms/stern.py Show resolved Hide resolved

Javierverbel reviewed Sep 20, 2024

View reviewed changes

Javierverbel requested changes Sep 20, 2024

View reviewed changes

Memphisd requested changes Sep 20, 2024

View reviewed changes

fix: docstring style consistency and references

96c10fe

Dioprz requested review from Memphisd, Javierverbel and FloydZ September 20, 2024 10:16

Javierverbel approved these changes Sep 20, 2024

View reviewed changes


		INPUT:
		The algorithm was introduced in [BJMM12] as an extension of [MMT11].


		expected weight distribution::
		The expected weight distribution is:

Refactor/python docstrings for sd estimator #176

Are you sure you want to change the base?

Refactor/python docstrings for sd estimator #176

Conversation

Dioprz commented Sep 16, 2024 • edited Loading

Description

Review process

Pre-approval checklist

Dioprz left a comment

Choose a reason for hiding this comment

Dioprz commented Sep 18, 2024

FloydZ left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dioprz Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dioprz Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dioprz commented Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dioprz Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Memphisd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dioprz commented Sep 20, 2024 • edited Loading

Dioprz commented Sep 20, 2024

sonarcloud bot commented Sep 20, 2024

Quality Gate passed

Javierverbel commented Sep 20, 2024

Dioprz commented Sep 16, 2024 •

edited

Loading

Dioprz Sep 18, 2024 •

edited

Loading

Dioprz Sep 20, 2024 •

edited

Loading

Dioprz commented Sep 19, 2024 •

edited

Loading

Dioprz Sep 20, 2024 •

edited

Loading

Dioprz commented Sep 20, 2024 •

edited

Loading