Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size() does not work in Python 2 in a KaitaiStream backed by a 'file' object #72

Closed
generalmimon opened this issue Nov 11, 2022 · 4 comments
Labels

Comments

@generalmimon
Copy link
Member

As @armijnhemel suggested in #61 (comment), in Python 3 the io.tell() is redundant:

# Seek to the end of the File object
io.seek(0, SEEK_END)
# Remember position, which is equal to the full length
full_size = io.tell()

and can be simplified to this (because io.IOBase.seek returns the new absolute position already):

# Seek to the end of the stream and remember the full length
full_size = io.seek(0, SEEK_END)

I applied this suggestion in 255f5b7. This has been released in 0.10.


However, I was reading through some old discussions and came across this comment by @arekbulski:

Unfortunately I also found out that on Python 2, seek method doesnt return current offset but None. That precludes that optimisation. Eh.

I tried this in my Python 2.7 installation and indeed - if you open a file using the built-in open() function, you get a 'file' object, and if you use it to initialize the KaitaiStream, the size() returns None:

Python 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:25:05) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from kaitaistruct import KaitaiStream
>>> with open('C:\\temp\\kaitai_struct\\tests\\src\\position_to_end.bin', 'rb') as f:
...     print(type(f))
...     _io = KaitaiStream(f)
...     print(_io.size())
...
<type 'file'>
None

This is documented in Python 2 docs (https://docs.python.org/2/library/stdtypes.html#file.seek):

file.seek(offset[, whence])
Set the file’s current position, like stdio’s fseek(). (...) There is no return value.

On the other hand, the typical from_file helper works because it uses io.open() (as recommended in https://docs.python.org/3/howto/pyporting.html#text-versus-binary-data since it's consistent from Python 2 to 3) instead of open():

Python 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:25:05) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from position_to_end import PositionToEnd
>>> with PositionToEnd.from_file('C:\\temp\\kaitai_struct\\tests\\src\\position_to_end.bin') as r:
...     print(type(r._io._io))
...     print(r._io.size())
...
<type '_io.BufferedReader'>
21

I'm not sure why none of the tests caught this - yes, they typically use the from_file helper, but I remember that the read_bytes method was raising errors in almost all tests (IIRC) in the CI when I tried calling _io.seekable() unconditionally:

# in Python 2, there is a common error ['file' object has no
# attribute 'seekable'], so we need to make sure that seekable() exists
and callable(getattr(self._io, 'seekable', None))
and self._io.seekable()

and that error clearly mentioned the 'file' object.

@armijnhemel
Copy link

A non-technical solution to this would be to end support for Python 2 (it was EOLed nearly three years ago).

@generalmimon
Copy link
Member Author

@armijnhemel:

A non-technical solution to this would be to end support for Python 2 (it was EOLed nearly three years ago).

Yes, that is a valid solution. Frankly, I don't know how much (if any) interest there is among Python users of Kaitai Struct to keep support for Python 2.

From https://pypistats.org/packages/kaitaistruct, the scarce units of downloads on Python 2 look more like accidental downloads, far from any serious use. It's possible that the effort we put into keeping Python 2 from extinction isn't appreciated by anyone and doesn't have any real reason.

@KOLANICH
Copy link
Contributor

None of my own projects have ever supported python 2, since when started using it, 3 was already enough to cover all my needs (and in rare cases of python 2-only code 2to3 worked mostly fine (soketimes small fixes were required) ). Native modules though an issue. For example I have ported metakit4 cext from 2 to 3 manually (by replacement old dropped API functions with new ones), but now the ported version has memory-safety issues and sometimes crashes the interpreter. I don't have a deep enough understanding of cpython and the changes in it that have caused that to fix that.

+1 for dropping 2, but also +1 for keeping its support, if there was be a person who would fix the broken compat when it happens. I mean that maintaining compat to python 2 in newly-introduced code (under this I mean it should not be a war of changes, so intentionally dropping compat is not allowed, but introducing new code written the way not taking into account the case of python 2 is allowed) should not be a req for accepting PRs, but also that PRs fixing the compat are welcome too.

@generalmimon
Copy link
Member Author

generalmimon commented Feb 17, 2023

I'm not sure why none of the tests caught this - yes, they typically use the from_file helper, but I remember that the read_bytes method was raising errors in almost all tests (IIRC) in the CI when I tried calling _io.seekable() unconditionally

I see that my memory is a bit deceptive (it wasn't "in almost all tests") - I've just checked out the CI logs and the only test that was failing due to this was ParamsDef (test_out/python/ci.json):

  "ParamsDef": {
    "status": "failed",
    "elapsed": 0.001,
    "failure": {
      "file_name": null,
      "line_num": null,
      "message": "'file' object has no attribute 'seekable'",
      "trace": "Traceback (most recent call last):\n  File \"/home/travis/build/kaitai-io/ci_targets/tests/spec/python/test_params_def.py\", line 9, in test_params_def\n    r = ParamsDef(5, True, io, None, None)\n  File \"/home/travis/build/kaitai-io/ci_targets/tests/compiled/python/params_def.py\", line 17, in __init__\n    self._read()\n  File \"/home/travis/build/kaitai-io/ci_targets/tests/compiled/python/params_def.py\", line 20, in _read\n    self.buf = (self._io.read_bytes(self.len)).decode(u\"UTF-8\")\n  File \"/home/travis/build/kaitai-io/ci_targets/runtime/python/kaitaistruct.py\", line 303, in read_bytes\n    if self._io.seekable():\nAttributeError: 'file' object has no attribute 'seekable'"
    }
  },

And it was exactly because the ParamsDef test class itself calls the open() function (test_params_def.py:6-14):

class TestParamsDef(unittest.TestCase):
    def test_params_def(self):
        io = KaitaiStream(open("src/term_strz.bin", "rb"))
        r = ParamsDef(5, True, io, None, None)

        self.assertEqual(r.buf, "foo|b")
        self.assertEqual(r.trailer, 0x61)

        io.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants