Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of FROM subquery name can break aggregate functions #35

Open
ezekg opened this issue Nov 5, 2024 · 2 comments
Open

Lack of FROM subquery name can break aggregate functions #35

ezekg opened this issue Nov 5, 2024 · 2 comments

Comments

@ezekg
Copy link

ezekg commented Nov 5, 2024

Not sure if this is a bug in Rails or this gem, but since a subquery name is not provided to from here (even though it's technically available on the Arel::Nodes::TableAlias instance), Active Record can't determine its table name. This results in AR using an unqualified column name, which can cause ambiguous column errors for aggregate functions like maximum(:updated_at) if there's a join after the union, resulting in an ActiveRecord::StatementInvalid error getting raised.

Ideally, AR should be a bit smarter here, since right now it tries to cast the Arel::Nodes::TableAlias instance to a string. I'd assume they'd argue Arel is a private API and that this isn't a bug, so I opened the issue here first.

Here's a quick script reproducing the ActiveRecord::StatementInvalid error:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "sqlite3"

  gem "rails"
  gem "active_record_union"
end

require "rails"
require "active_record"
require "active_record_union"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :accounts, force: true do |t|
    t.timestamps
  end

  create_table :users, force: true do |t|
    t.references :account
    t.string :role
    t.timestamps
  end
end

class Account < ActiveRecord::Base; end
class User < ActiveRecord::Base
  belongs_to :account

  scope :foos, -> { where(role: 'foo') }
  scope :bars, -> { where(role: 'bar') }
end

require "minitest/autorun"
require "rack/test"

class AssociationReferencesTest < Minitest::Test
  def test_union_aggregate_without_join
    account = Account.create!
    user_1  = User.create!(account:, role: 'foo')
    user_2  = User.create!(account:, role: 'bar')
    user_3  = User.create!(account:, role: 'bar')

    union = User.foos.union(
      User.bars,
    )

    assert_not_raised ActiveRecord::StatementInvalid do
      assert_equal union.maximum(:updated_at), user_3.updated_at
    end
  end

  def test_union_aggregate_with_join
    account = Account.create!
    user_1  = User.create!(account:, role: 'foo')
    user_2  = User.create!(account:, role: 'bar')
    user_3  = User.create!(account:, role: 'bar')

    union = User.foos.union(
      User.bars,
    )

    assert_not_raised ActiveRecord::StatementInvalid do
      assert_equal union.joins(:account).maximum(:updated_at), user_3.updated_at
    end
  end

  private

  def assert_not_raised(exception_class, failure_message = nil)
    yield
  rescue => e
    if e.is_a?(exception_class)
      flunk(failure_message || "An exception was not expected but was raised: #{e}")
    else
      raise e
    end
  end
end

Unfortunately, in addition to one-off aggregations, this also breaks stale? and fresh_when caching/etag controller methods, since they use maximum(:updated_at) to calculate the cache's last modified timestamp.

The only workaround I could come up with is explicitly qualifying the column name:

union.maximum(:"#{union.table_name}.updated_at")

Do you think this is worth fixing?

@ezekg
Copy link
Author

ezekg commented Nov 12, 2024

To make matters more complex, you cannot qualify the column if the relation is already loaded:

users = User.foos.union(User.bars)
users.load

# raises ActiveRecord::UnmodifiableRelation:
users.maximum('users.updated_at')

# raises ActiveRecord::StatementInvalid:
users.maximum(:updated_at)

# workaround:
users.collect(&:updated_at).max

I'll open up a PR soon.

@ezekg
Copy link
Author

ezekg commented Nov 15, 2024

This also effects the relation's ids method:

users = User.foos.union(User.bars).joins(:account)

# raises ActiveRecord::StatementInvalid:
users.ids

First test passes, second fails:

def test_union_ids_without_join
  account = Account.create!
  user_1  = User.create!(account:, role: 'foo')
  user_2  = User.create!(account:, role: 'bar')
  user_3  = User.create!(account:, role: 'bar')

  union = User.foos.union(
    User.bars,
  )

  assert_not_raised ActiveRecord::StatementInvalid do
    assert_equal union.ids, [user_1.id, user_2.id, user_3.id]
  end
end

def test_union_ids_with_join
  account = Account.create!
  user_1  = User.create!(account:, role: 'foo')
  user_2  = User.create!(account:, role: 'bar')
  user_3  = User.create!(account:, role: 'bar')

  union = User.foos.union(
    User.bars,
  )

  assert_not_raised ActiveRecord::StatementInvalid do
    assert_equal union.joins(:account).ids, [user_1.id, user_2.id, user_3.id]
  end
end

Workaround:

union.pluck(:"#{union.table_name}.id")

ezekg added a commit to keygen-sh/keygen-api that referenced this issue Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant