Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/python docstrings for sd estimator #176

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

Dioprz
Copy link
Collaborator

@Dioprz Dioprz commented Sep 16, 2024

Description

Implementation of #169 for SDEstimator.

Formatting changes were made by Black.

Review process

  • A quick glance at the new docstrings would be great.

Estimator-specific doctests

make docker-build
make docker-run
pytest --doctest-modules -n auto -vv cryptographic_estimators/SDEstimator/

Cumulative test (with all the already migrated docstrings)

make docker-pytest

Pre-approval checklist

  • The code builds clean without any errors or warnings
  • I've added/updated tests
  • I've added/updated documentation if needed

Applied changes discussed at
#169
to SDEstimator.
Copy link
Collaborator Author

@Dioprz Dioprz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Sonarcloud test failing is quite... Dumb.

Basically, it complains because Black has improved the formatting of some files by splitting their content into multiple lines. This means that if we previously had one line of unreadable code duplicated, and it's now split into three lines of readable code, it complains because there are "too many duplicated lines of code". This feels incorrect, in my opinion.

Anyway, I can offer you two options:

  1. Ignore the Sonarcloud warning in this PR and future docstring-migration-related PRs, knowing that some of those checks could be sensible and reasonable, and address them later.

  2. Ignore the Sonarcloud warning in this PR, and disable Black for the incoming PRs so Sonarcloud doesn't complain on the next ones. (Very discouraged from my side, as we are losing a great opportunity to improve the code quality by making it standard on formatting. However, it can also help in case you feel tihs PR too noisy).

Both options include ignoring the Sonarcloud warning for this PR, as I don't have a quick way to undo the formatting changes, so basically it would imply making all the docstring changes from scratch.

Here I'm pointing out some examples of the changes introduced by Black.

@Dioprz
Copy link
Collaborator Author

Dioprz commented Sep 18, 2024

@Javierverbel Thank you so much for fixing the Sonarcloud warning 💯 .

Copy link
Collaborator

@FloydZ FloydZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only small comments, mainly about the citations and if the html doc generation still works

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py Outdated Show resolved Hide resolved
@@ -169,49 +159,41 @@ def get_optimal_parameters_dict(self):
return a

def __repr__(self):
"""
"""
""" """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove it

+--------------------------+-------------------+-------------------+
| <-----+ n - k - l +----->|<--+ (k + l)/2 +-->|<--+ (k + l)/2 +-->|
| w - 2p | p | p |
+--------------------------+-------------------+-------------------+
INPUT:
Args:
problem (SDProblem): SDProblem object including all necessary parameters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still a valid way to describe the arguments for the automatic html docs generation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understood correctly there is an automatic html doc generation extension for this format (this napoleon sphincs extension) @Dioprz ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct. Napoleon will produce the typical reST docstring as an intermediary step for a neat display.

INPUT:
The algorithm was introduced in [BJMM12] as an extension of [MMT11].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_

p_range (list, optional): Interval in which the parameter p is searched (default: [0, 25], helps speeding up computation).
memory_access (int, optional): Specifies the memory access cost model (default: 0, choices: 0 - constant, 1 - logarithmic, 2 - square-root, 3 - cube-root or deploy custom function which takes as input the logarithm of the total memory usage).
References:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok here we change the "citation type" from [MMT] to this. Is there a reason for this? Because I think this way leads to
1 ) alot of copy paste
2) errors because of that

I liked the first central way of collecting citations, but maybe there are reasons against it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue why that could happen. I mean, the information to create that reference section doesn't even exist in the previous docstring.

AI was making some really weird things here.

memory_access (int, optional): Specifies the memory access cost model (default: 0, choices: 0 - constant, 1 - logarithmic, 2 - square-root, 3 - cube-root or deploy custom function which takes as input the logarithm of the total memory usage).
References:
.. [EB22] Enguehard, Élise, and Anja Becker. "Quantum attacks on McEliece and Niederreiter cryptosystems." Designs, Codes and Cryptography 90.1 (2022): 1-23.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this should be [Esser, Bellini]

Copy link
Collaborator Author

@Dioprz Dioprz Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completely removed the reference section, as it didn't even exist in the original docstring.

@@ -28,24 +36,22 @@
class Stern(SDAlgorithm):
def __init__(self, problem: SDProblem, **kwargs):
"""
Construct an instance of Stern's estimator [Ste88]_, [BLP08]_.
Construct an instance of Stern's estimator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

citation completely deleted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really bad.

Thanks for the catch.

sage: from cryptographic_estimators.SDEstimator import _gaussian_elimination_complexity
sage: _gaussian_elimination_complexity(n=100,k=20,r=1) # random
References:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

citation issue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is the error, sorry? The changed format (year position, pages instead of pp., etc)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FloydZ could you please add the comment to the actual problematic line in the future, like this i always have to go to the file to check it 😇

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what floyd means is that citations have been removed. The citations [Bar07]_ [BLP08]_ should be mentioned. But the docstring was already inconsistent before

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am always marking the problematic position, but admittedly, the message in this case is not particularly good. I think the problem is, that if someone applies changes to this file after I commented, the line numbers are misaligned, which leads to this misaligned references.
Is is possible to mark multiple lines in such a comment? That would make it a lot easier.

Copy link
Collaborator Author

@Dioprz Dioprz Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FloydZ Yes, it's possible: click the first line of the block, then press shift, and click the end line of the block.


def to_bitcomplexity_time(self, basic_operations: float):
"""
Returns the bit-complexity corresponding to basic_operations field additions
Calculates the bit-complexity corresponding to the number of field additions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding:

, which are the basic_operations for SDAlgorithms

should be helpful. As we often refer to basic_operations in the help files

@Dioprz
Copy link
Collaborator Author

Dioprz commented Sep 19, 2024

only small comments, mainly about the citations and if the html doc generation still works

Thank you for the careful review, @FloydZ.

Some things that are worth explaining:

  • Sage uses reStructuredText-based docstrings, while Google-style docstrings don't (at least by default). This implies that:

    • The current HTML docs generation process will likely be incompatible with the new docstring style out of the box. I believe we will need, for example, the Sphinx Napoleon extension to render our new docstring style.
    • Because I wasn't expecting retrocompatibility, and the main goal right now is to have a pure-python executable library, I wasn't taking too much care to preserve the reST format.
    • This is relatively easy to fix later. Due to the citation format, we only need a regex expression to look for places where the format [XXXX] is present and modify them as needed.
  • Regarding the more serious errors like removing citations or creating new reference sections with incorrect information, these were the result of a very experimental prompt and workflow process on my side. Because I concentrated so much on not creating or modifying values for the Examples: or Tests: sections, I completely forgot to refine the process for the other sections of the docstrings.

  • Once I sent this PR, I immediately improved the prompt used as well as my workflow process with the learnings made. Now I can inspect and review more carefully what is being generated by AI, so these errors shouldn't be present in subsequent PRs.

With that said, I left a 👍🏻 in every comment I applied, and a 👀 in those that requires feedback or consideration based on this explanation. Resolve those that you think are good now, and ping me to address any others that are missing please.

"""
computes the expected runtime and memory consumption for the depth 3 version
"""
"""Computes the expected runtime and memory consumption for the depth 3 version."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we alway allow new line before and after the each """?
So """Computes the expected runtime and memory consumption for the depth 3 version.""" becomes

"""
Computes the expected runtime and memory consumption for the depth 3 version.
"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that this is part of the google styling guide. If not I also prefer the newlines. But I would also make it dependent on if there is any format requirement by the doc generation by sphincs

Copy link
Collaborator Author

@Dioprz Dioprz Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the reason. A typical google-styled docstring looks like this:

def func(arg1, arg2):
    """Summary line.

    Extended description of function.

    Args:
        arg1 (int): Description of arg1
        arg2 (str): Description of arg2

    Returns:
        bool: Description of return value

    """
    return True

If only the summary line is provided, we should have something like:

def func(arg1, arg2):
    """Summary line. """
    return True

But I think it can be useful for future changes (if any), to provide the new line at the end like this:

def func(arg1, arg2):
    """Summary line. 
    """
    return True

So the extendend description of the function can be added without friction.

I would discourage to add a leading new line as you suggest @Javierverbel; because then future editors must care to fix the docstring from this:

def func(arg1, arg2):
    """
    Summary line. 
    """
    return True

To this:

def func(arg1, arg2):
    """Summary line. 

    Long description
    """
    return True

Do you agree with my proposal? @Javierverbel @Memphisd?

"""
Construct an instance of BJMM's estimator using *d*isjoint *w*eight distributions combined with
MitM-nearest neighbor search. [EB22]_, [MMT11]_, [BJMM12]_.
"""Construct an instance of BJMM's estimator using disjoint weight distributions combined with MitM-nearest neighbor search. [EB22]_, [MMT11]_, [BJMM12]_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the underscores were not removed. @Dioprz we have to be careful with this cases when adding the _ back to those for which it was removed

Computes the expected runtime and memory consumption for a given parameter set.
"""

"""Computes the expected runtime and memory consumption for a given parameter set."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line before and after """

def __init__(
self, par_names: list, problem: SDProblem, iterations: int, accuracy: int
):
"""Optimization model for workfactor computation of BJMM algorithm in depth 3."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line before and after """

"""
Optimization model for workfactor computation of Prange's algorithm
"""
"""Optimization model for workfactor computation of Prange's algorithm."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line before and after """

computes the asymptotic runtime of the Nearest Neighbour Algorithm by
May-Ozerov [MO15]_
"""
"""Computes the asymptotic runtime of the Nearest Neighbour Algorithm by May-Ozerov [MO15]."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line before and after """ and _

Copy link
Collaborator

@Memphisd Memphisd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Just the AI changes of the docstrings sometimes lead to misunderstandings. We should be careful with reformulations of the explanations of the functions.

sage: from cryptographic_estimators.SDEstimator import SDProblem
sage: A = BallCollision(SDProblem(n=100,k=50,w=10))
sage: A.l()
Examples:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this napoleon extension for sphincs work? standard html doc generation via sphincs required capital spelling of EXAMPLES and TESTS and a double colon followed by a blank line? Maybe we should investigate this first to avoid later changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Napoleon creates and intermediary step to create the appropriate docstring expected by Sphinx.

cryptographic_estimators/SDEstimator/SDAlgorithms/bjmm.py Outdated Show resolved Hide resolved
expected weight distribution::
The expected weight distribution is:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line with the above it should be "Expected weight distribution:"

| <-----+ n - k - l +----->|<--+ (k + l)/2 +-->|<--+ (k + l)/2 +-->|
| w - 2p | p | p |
+--------------------------+-------------------+-------------------+
+---------------------------+-------------------+-------------------+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indentation should be kept (in line with the above at least) and for sphincs this ensured being displayed in a codeblock without spaces being removed

@@ -64,7 +68,7 @@ def __init__(self, problem: SDProblem, **kwargs):

def initialize_parameter_ranges(self):
"""
initialize the parameters p, l, p1 and for d=3 p2
Initialize the parameters p, l, p1, and for d=3, p2.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docstring was wrong before already, it should say "Initialize parameter range for d"

| <----+ n - k - l +----> |<-- l -->|<--+ k/2 +-->|<--+ k/2 +-->|
| w - 2p | 0 | p | p |
+-------------------------+---------+-------------+-------------+
+-------------------------+---------+-------------+-------------+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

"""Compute the time to construct each list.
The time to construct each list is the maximum of the size of the input
list and twice the size of the input list minus a constant.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be
The time to construct the lists is taken as the maximum of the involved lists sizes, i.e., the input and output lists.

via the sum of two length-$vector_length$ weight-($target_weight$/2+$weight_to_cancel$) vectors
"""Computes the asymptotic number of representations of a length-`vector_length` weight-`target_weight` vector.
This is done via the sum of two length-`vector_length` weight-(`target_weight`/2 + `weight_to_cancel`) vectors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the meaning has changed. Please update to

Computes the asymptotic number of representations of a length-vector_length weight-target_weight vector constructed as sum of two length-vector_length weight-(target_weight/2 + weight_to_cancel) vectors.

sage: from cryptographic_estimators.SDEstimator import _gaussian_elimination_complexity
sage: _gaussian_elimination_complexity(n=100,k=20,r=1) # random
References:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what floyd means is that citations have been removed. The citations [Bar07]_ [BLP08]_ should be mentioned. But the docstring was already inconsistent before

sage: from cryptographic_estimators.SDEstimator import _list_merge_async_complexity
sage: _list_merge_async_complexity(L1=2**16,L2=2**14,l=16,hmap=1) # random
"""
"""Compute the complexity of merging two lists exact.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compute the complexity of merging two different sized lists on l bits.

@Dioprz
Copy link
Collaborator Author

Dioprz commented Sep 20, 2024

Alright. Digging more in the Napoleon extension for Sphinx, I found that:

  1. It's a preparser able to translate from Google-styled docstrings to typical reST python docstrings.
  2. For this reason, we still have all the power of reST where needed.
  3. With that said, I strongly discourage the usage of reST elements where avoidable. Basically because it's usage requires to learn about how to express a mathematical expression, a table, or whatever else.
  4. Napoleon provides a good number of sections and functionalities for most use-cases (Supported sections, docstring types and some configuration options). So let's just apply the KISS principle here.
  5. I completely agree with making the exception for hyperlink references with the trailing _. I will apply the changes.
  6. @FloydZ I was using backticks for monospaced font rendering. But looking at the Sphinx documentation on the topic, single backtick usage is context-dependant. What do you think we should do with that then? (EDIT: Precisely what I'm asking for is: what we should do with references to, for example: function arguments references, short math formulas, etc?)

Nice to hear your thoughts too: @Memphisd @Javierverbel

EDIT: I'm not able to generate the docs to make tests, because some MAYO-related errors arise

@Dioprz
Copy link
Collaborator Author

Dioprz commented Sep 20, 2024

All the actionable comments were applied in 96c10fe. My last question in the previous comment shouldn't be a blocker in case we want to merge this now; but I'm also open to new comments.

Please don't review #178 and #179 until I have implemented the same changes as here. I will ping you all when those are done.

Copy link

sonarcloud bot commented Sep 20, 2024

@Javierverbel
Copy link
Collaborator

I looks good to me @Dioprz . The let's MAYO-related in the next PR docstring update, Maybe in #178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants