-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #1080 : Fixing Parsing code for documentation retrieval in Python 3.13. #1082
Conversation
replaced old parsing code to retrieve criterion section of NeuralNet documentation from (\n\s+)(criterion .*\n)(\s.+){1,99} to (\n\s+)(criterion .*\n)(\s.+|.){1,99}. This ensures proper parsing in both 3.13 and previous python versions.
replaced old parsing code to retrieve criterion section of NeuralNet documentation from "(\n\s+)(criterion .*\n)(\s.+){1,99}" to "(\n\s+)(criterion .*\n)(\s.+|.){1,99}". This ensures proper parsing in both 3.13 and previous python versions.
…was previously cutting out the criterion section so even with the proper regexp for 3.13 and below the pattern was not matching. Now works fine.
Thanks for the PR. I checked the docstring based on your branch with Python 3.13 and found that some paragraphs are missing. This is what I get:
Is it the same for you? What I would expect to see is:
Do you see the same results? Apart from that, we could try to fix the inconsistent indentation for Python 3.13 using |
Thanks for catching that, this was stemming from the fact that in Python 3.13 there is one Fixing the argument back to "\n" solved that. However, there was the I then proceeded to use Once again I tested imports from both Python 3.13 and Python 3.12 and both works with correct documentation retrieval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. I verified locally that the docstrings are now identical for Python 3.12 and 3.13.
Regarding the dedent/indent roundtrip, as it may not be quite obvious when reading the code, could you please add a comment on why that is done?
Furthermore, I noticed that the functions are using 5 spaces in the body, could you please revert that to 4 spaces?
…extwrap.dedent in documentation retrieval for NeuralNetClassifier, NeuralNetBinaryClassifier, NeuralNetRegressor
…ious commit is wrong about the use of textwrap.dedent, it is indeed necessary for correct functionality in both Python 3.13 and 3.12. Reverted to version with textwrap.dedent in documentation retrieval.
Okay so I corrected functions body. About the roundtrip, I orginally forgot why I had done that after reading your comment, went on testing in Pyhthon 3.13 found that it was not necessary in the version and committed. That was a mistake thus the followig sections explains why we actually need this roundtrip and thus why I have done 2 commits (the last one reverted the removal of When using only a simple
So to ensure correct indentation in both versions, performing a dedentation operation before applying indentation ensures proper indentation :
This is the rationale behind the use of the roundtrip. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the further explanation. To summarize: Since the change in Python 3.13 is basically that the docstring is automatically dedented, it makes sense to do dedent + indent to normalize the indentation for <3.13 and >= 3.13. Let's add this as a comment for future readers.
skorch/classifier.py
Outdated
start, end = pattern.search(doc).span() | ||
doc = doc[:start] + neural_net_clf_additional_text + doc[end:] | ||
doc = doc + neural_net_clf_additional_attribute | ||
doc = doc + textwrap.indent(neural_net_clf_additional_attribute, indentation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the doc is indented again but for the other classes, this step is missing. I think it can be removed here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it cannot, here's why.
As it is currently defined, neural_net_clf_additional_attribute
is not indented whereas for other additional texts for NeuralNetRegressor and NeuralNetBinaryClassifier objects, the indentation is done when defining the text to add.
You would expect 3.13 automatic dedentation to also affect other objects additional texts but it does not. Thus I suppose that the automatic dedentation we identified for doc.split
might stem from how the .split
method generates the text and how it is then processed by python 3.13's new default string management ?
Here are some outputs :
3.13 with indentation of neural_net_clf_additional_attribute
:
_optimizers : list of str
List of names of all optimizers. This list is collected dynamically when
the net is initialized. Typically, there is no reason for a user to modify
this list.
classes_ : array, shape (n_classes, )
A list of class labels known to the classifier.
3.13 without indentation of neural_net_clf_additional_attribute
:
_optimizers : list of str
List of names of all optimizers. This list is collected dynamically when
the net is initialized. Typically, there is no reason for a user to modify
this list.
classes_ : array, shape (n_classes, )
A list of class labels known to the classifier.
Same for Python 3.12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for explaining. The real issue is than how we define the neural_net_clf_additional_attribute
constant:
Lines 38 to 41 in 547d3ff
neural_net_clf_additional_attribute = """classes_ : array, shape (n_classes, ) | |
A list of class labels known to the classifier. | |
""" |
In contrast to the other doc snippets, this one lacks the new lines and indentation at the beginning. So how about we add those and then remove the textwrap.indent
call here? That way, all three classes handle the __doc__
in a consistent manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, it has now been corrected.
Should we consider the discussion resolved ?
…of textwrap.indent as a consequence of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing the docstring issue with Python 3.13 and ensuring consistency between the versions. I tested it locally and the docstrings are identical.
Fixes parsing code so that documentation from NeuralNet is properly processed when adapting the documentation for NeuralNetClassifier, NeuralNetBinaryClassifier and NeuralNetRegressor classes in Python 3.13 and below.
See Issue #1080 for more details.