-
-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for print_docstring()'s docstring.find(quote)
Type error
#502
Conversation
I need to investigate and think about this a some. In something like:
Is the binary string a docstring or an unused string expression analogous to:
Of course, uncompyle6 should not be throwing an error, whichever way it interprets this. |
Indeed! I blindly assumed that it was, as I was in the print_docstring function. |
@gdesmar If you are up for research, I would appreciate enlightenment on when something is a real docstring and when it is some kind of string expression across the various Python versions. Consider this program: """Module docstring"""
class A:
b"""Got \xe7\xfe Bytes?"""
print(__doc__)
def a_func():
"""function docstring?"""
print(__doc__)
print(__doc__)
a_func()
A() When run in Python 3.12, it outputs:
If you remove the "b" in the class string it turns into docstring and when run it we get:
|
Considering the following code (an extension of what you provided): """Module docstring"""
class A:
b"""Got \xe7\xfe Bytes?"""
print(__doc__)
def class_func(self):
b"""Got \xe7\xfe Bytes?"""
print(__doc__)
class B:
"""Got no Bytes?"""
print(__doc__)
def class_func(self):
"""Got no Bytes?"""
print(__doc__)
def single_func():
"""single docstring?"""
print(__doc__)
def single_byte_func():
b"""Got \xe7\xfe Bytes?"""
print(__doc__)
print("\nStart of program:")
print("Module:")
print(__doc__)
print("\nSingle function:")
print(single_func.__doc__)
single_func()
print("\nSingle Byte function:")
print(single_byte_func.__doc__)
single_byte_func()
print("\nClass A:")
print(A.__doc__)
print(A.class_func.__doc__)
a = A()
print(a.class_func.__doc__)
a.class_func()
print("\nClass B:")
print(B.__doc__)
print(B.class_func.__doc__)
b = B()
print(b.class_func.__doc__)
b.class_func() The behaviour is different based on the python version used.
This confirms the logic that the docstring cannot be a byte string in those versions. It also raises an interesting discussion on which docstring the variable I've also ran it with python 2.7.18:
So it looks like python 2.7 supports byte docstrings. That first line, piped into hexdump would show the expected values: This difference also explain my initial statement of "python 3.8 is dropping the [byte] docstring", while it may simply be considering it an unused string expression, and optimized the bytecode. Compiling the script, and trying to use uncompyle6 on them may confirm this for python 3.8 (edited to remove some empty lines and uninteresting prints at the end). It interestingly shows how the """Module docstring"""
class A:
print(__doc__)
def class_func(self):
print(__doc__)
class B:
__doc__ = "Got no Bytes?"
print(__doc__)
def class_func(self):
"""Got no Bytes?"""
print(__doc__)
def single_func():
"""single docstring?"""
print(__doc__)
def single_byte_func():
print(__doc__) For python 2.7, it showed another "problem", as it errored out. I fixed it by making the function n_docstring() in n_action.py call the print_docstring function, as it was identical. """Module docstring"""
class A:
r"""Got \xe7\xfe Bytes?"""
print __doc__
def class_func(self):
r"""Got \xe7\xfe Bytes?"""
print __doc__
class B:
"""Got no Bytes?"""
print __doc__
def class_func(self):
"""Got no Bytes?"""
print __doc__
def single_func():
"""single docstring?"""
print __doc__
def single_byte_func():
r"""Got \xe7\xfe Bytes?"""
print __doc__ On another note, if someone absolutely want a byte docstring in python 3.8, they can change their code to the following 😢 : class A:
__doc__ = b"""Got \xe7\xfe Bytes?"""
print(__doc__) Which will gives this, considering I didn't change the docstring for the function in class A:
I'll try to keep investigating, but I need to catch up with other things first. I hope this gives a bit more context. I could execute the test with every minor release (2.x, 3.x) to see if they differ from the two examples here. Hopefully it doesn't and each minor versions in a major version handles it the same way. |
Many thanks for the great investigative work and fix. To make sure this great work is not lost or forgotten, would you turn your Python code into a self-checking test, by turning the Then this would go into |
I converted the script to use asserts instead of print statements. I tried to figure out how to add the tests to the repository, but would appreciate confirmations: I'm not completely confident, but I ran the |
Thanks!
The number prefix allows you to specify roughly where in the order of testing this test should be run. Simpler tests have lower number and more complicated tests have higher numbers. This allows me when creating a new version to find simple tests to run and then add the more complicated ones later.
Looking at now. Thanks again for doing this. |
Description
The print_docstring function assumes the docstring is going to be a string, but it could be a bytestring.
How to reproduce
Five files can be found in brokendocstring.zip:
brokenbytedocstring.py: Docstring with actual bytes in the docstring
brokenbytedocstring.pyc: Previous file compiled with python 2.7
brokendocstring.py: Docstring with backslash escaping for bytes in docstring
brokendocstring.pyc: Previous file compiled with python 2.7
brokendocstring.cpython-38.pyc: Previous file compiled with python 3.8
The python 3.8 file is given for completeness, but from my investigation, it drops the docstring at compilation time, so not a problem.
The brokenbytedocstring files are examples that are technically not executable with a call like
python brokenbytedocstring.py
, but starting a python2.7 interpreter, we can invoke the function using import:Output Given
Using those python 2.7 bytecode from the zip file, the following error should be reproducible:
Solution
By just converting the docstring to a string if it is a byte array, the rest of the code is successful.
I decided to use backslashreplace so that we could still see which bytes were in the original docstring, in case it becomes important for analysis.
Potential problem with the solution
The recovered docstring in the examples are different than the ones that were compiled, both in the string, and the prefix (
b""
vsr""
). I decided to offer a solution that would keep code change to a minimum while still giving a result that is very close to reality. The alternative would be to use two different functions to handle bytedocstrings and normaldocstrings, or to convert the docstring from string to bytes and do everything in bytes.Tests
I tried to run the tests with
make check
and they succeeded. Just in case, I tried to make the print_docstring functionreturn True
as its first line, and none of the tests broke. I am therefore not certain if there are no tests for the docstring, or I did something wrong to test.Real life example
5d9fe2735d4399d98e6e6a792b1feb26d6f2d9a5d77944ecacb4b4837e5e5fca (Photo.scr, exe)
-> 9cacd1265b06c6853d7a034686d38689dedfdb77a3021604f766f31876a26785 (ftpcrack.pyc)