Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #1080 : Fixing Parsing code for documentation retrieval in Python 3.13. #1082

Merged
merged 8 commits into from
Jan 18, 2025
17 changes: 8 additions & 9 deletions skorch/classifier.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""NeuralNet subclasses for classification tasks."""

import re
import textwrap

import numpy as np
from sklearn.base import ClassifierMixin
Expand Down Expand Up @@ -40,16 +41,15 @@

"""


def get_neural_net_clf_doc(doc):
doc = neural_net_clf_doc_start + " " + doc.split("\n ", 4)[-1]
pattern = re.compile(r'(\n\s+)(criterion .*\n)(\s.+){1,99}')
indentation = " "
doc = neural_net_clf_doc_start + " " + textwrap.indent(textwrap.dedent(doc.split("\n", 5)[-1]), indentation)
pattern = re.compile(r'(\n\s+)(criterion .*\n)(\s.+|.){1,99}')
start, end = pattern.search(doc).span()
doc = doc[:start] + neural_net_clf_additional_text + doc[end:]
doc = doc + neural_net_clf_additional_attribute
doc = doc + textwrap.indent(neural_net_clf_additional_attribute, indentation)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the doc is indented again but for the other classes, this step is missing. I think it can be removed here, right?

Copy link
Contributor Author

@raphaelrubrice raphaelrubrice Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it cannot, here's why.

As it is currently defined, neural_net_clf_additional_attribute is not indented whereas for other additional texts for NeuralNetRegressor and NeuralNetBinaryClassifier objects, the indentation is done when defining the text to add.
You would expect 3.13 automatic dedentation to also affect other objects additional texts but it does not. Thus I suppose that the automatic dedentation we identified for doc.split might stem from how the .split method generates the text and how it is then processed by python 3.13's new default string management ?

Here are some outputs :

3.13 with indentation of neural_net_clf_additional_attribute :

    _optimizers : list of str
      List of names of all optimizers. This list is collected dynamically when
      the net is initialized. Typically, there is no reason for a user to modify
      this list.

    classes_ : array, shape (n_classes, )
          A list of class labels known to the classifier.

3.13 without indentation of neural_net_clf_additional_attribute :

    _optimizers : list of str
      List of names of all optimizers. This list is collected dynamically when
      the net is initialized. Typically, there is no reason for a user to modify
      this list.

classes_ : array, shape (n_classes, )
      A list of class labels known to the classifier.

Same for Python 3.12.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for explaining. The real issue is than how we define the neural_net_clf_additional_attribute constant:

neural_net_clf_additional_attribute = """classes_ : array, shape (n_classes, )
A list of class labels known to the classifier.
"""

In contrast to the other doc snippets, this one lacks the new lines and indentation at the beginning. So how about we add those and then remove the textwrap.indent call here? That way, all three classes handle the __doc__ in a consistent manner.

Copy link
Contributor Author

@raphaelrubrice raphaelrubrice Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, it has now been corrected.

Should we consider the discussion resolved ?

return doc


# pylint: disable=missing-docstring
class NeuralNetClassifier(ClassifierMixin, NeuralNet):
__doc__ = get_neural_net_clf_doc(NeuralNet.__doc__)
Expand Down Expand Up @@ -249,15 +249,14 @@ def predict(self, X):
Probabilities above this threshold is classified as 1. ``threshold``
is used by ``predict`` and ``predict_proba`` for classification."""


def get_neural_net_binary_clf_doc(doc):
doc = neural_net_binary_clf_doc_start + " " + doc.split("\n ", 4)[-1]
pattern = re.compile(r'(\n\s+)(criterion .*\n)(\s.+){1,99}')
indentation = " "
doc = neural_net_binary_clf_doc_start + " " + textwrap.indent(textwrap.dedent(doc.split("\n", 5)[-1]), indentation)
pattern = re.compile(r'(\n\s+)(criterion .*\n)(\s.+|.){1,99}')
start, end = pattern.search(doc).span()
doc = doc[:start] + neural_net_binary_clf_criterion_text + doc[end:]
return doc


class NeuralNetBinaryClassifier(ClassifierMixin, NeuralNet):
# pylint: disable=missing-docstring
__doc__ = get_neural_net_binary_clf_doc(NeuralNet.__doc__)
Expand Down
8 changes: 4 additions & 4 deletions skorch/regressor.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""NeuralNet subclasses for regression tasks."""

import re
import textwrap

from sklearn.base import RegressorMixin
import torch
Expand All @@ -23,15 +24,14 @@
criterion : torch criterion (class, default=torch.nn.MSELoss)
Mean squared error loss."""


def get_neural_net_reg_doc(doc):
doc = neural_net_reg_doc_start + " " + doc.split("\n ", 4)[-1]
pattern = re.compile(r'(\n\s+)(criterion .*\n)(\s.+){1,99}')
indentation = " "
doc = neural_net_reg_doc_start + " " + textwrap.indent(textwrap.dedent(doc.split("\n", 5)[-1]), indentation)
pattern = re.compile(r'(\n\s+)(criterion .*\n)(\s.+|.){1,99}')
start, end = pattern.search(doc).span()
doc = doc[:start] + neural_net_reg_criterion_text + doc[end:]
return doc


# pylint: disable=missing-docstring
class NeuralNetRegressor(RegressorMixin, NeuralNet):
__doc__ = get_neural_net_reg_doc(NeuralNet.__doc__)
Expand Down