Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: type error when accessing length() without schema information #10651

Open
1 task done
choucavalier opened this issue Jan 2, 2025 · 7 comments
Open
1 task done
Labels
bug Incorrect behavior inside of ibis

Comments

@choucavalier
Copy link

What happened?

The following example is self-explanatory

By default, it's assumed that t.some_col is of type Column, while it could be of type ArrayColumn (or other more specific types for that matter).

This results in typing issues.

import ibis
import pandas as pd


d = pd.DataFrame(
  {
    "some_col": [
      [1, 2, 3],
      [4, 5],
    ]
  }
)
t = ibis.memtable(d)

t.some_col.length()  # ■ Cannot access attribute "length" for class "Column"

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

DuckDB

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@choucavalier choucavalier added the bug Incorrect behavior inside of ibis label Jan 2, 2025
@cpcloud
Copy link
Member

cpcloud commented Jan 2, 2025

I can't reproduce this on main:

In [4]: import ibis
   ...: import pandas as pd
   ...:
   ...:
   ...: d = pd.DataFrame(
   ...:   {
   ...:     "some_col": [
   ...:       [1, 2, 3],
   ...:       [4, 5],
   ...:     ]
   ...:   }
   ...: )
   ...: t = ibis.memtable(d)

In [5]: t.some_col.length()
Out[5]:
r0 := InMemoryTable
  data:
    PandasDataFrameProxy:
          some_col
      0  [1, 2, 3]
      1     [4, 5]

ArrayLength(some_col): ArrayLength(r0.some_col)

@choucavalier
Copy link
Author

The code works. It's the static type checking that outputs an error while it should not.

@cpcloud
Copy link
Member

cpcloud commented Jan 2, 2025

Schema information generally can't be made static in Python.

A type checker would need access to information from a running database.

@choucavalier
Copy link
Author

choucavalier commented Jan 2, 2025

Of course, but in this case I think the DX would be much better if instead of assuming that the column can't be a list, we assume it could be, and prevent the type checker from erroring

Right now I have to do

from typing import cast
from ibis.expr.types import ArrayColumn

t.select(t).filter(
  cast(ArrayColumn, t.my_list_column).length() > ibis.literal(0)
)

(type checking also requires me to write ibis.literal(...) for any constant value.

More and more projects use type checking, and not having it in ibis will prevent all of them from having a good DX and reduce ibis' adoption.

@cpcloud
Copy link
Member

cpcloud commented Jan 2, 2025

It sounds like you effectively want every type-specific method to exist on Column. Is that correct?

@choucavalier
Copy link
Author

That could be a solution. Maybe there's another one which does not imply modifying the Column definition, but instead provide some type hints that could help the static type checkers understand that "by default this could actually be an ArrayColumn, or a StructColumn (or others) so let's assume that these collection-specific methods might exist because we don't have access to the schema of the table so let's trust this guy that he knows what he's doing when querying his own data. if he's trying to access the method length on a column that's actually not an ArrayColumn we'll throw an error at runtime. but let's not ask the guy to write type casting code everywhere or disable the type checker on all the lines that actually use ibis"

@cpcloud
Copy link
Member

cpcloud commented Jan 2, 2025

In that case a giant ir.StringColumn | ir.IntegerColum | ... with every single column subclass as the return value of Table.__getattr__ would probably suffice for this specific use case.

If this pattern shows up in a bunch of places maybe we can make a type alias to simplify our lives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

2 participants