Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized hash-like data structure for collections #269

Open
neilconway opened this issue Mar 31, 2012 · 2 comments
Open

Optimized hash-like data structure for collections #269

neilconway opened this issue Mar 31, 2012 · 2 comments

Comments

@neilconway
Copy link
Member

We currently use a Ruby Hash to map keys to tuples in Bud collections. This works fine, but there is room to optimize Bud collections by using a custom hash-like type:

  1. We currently store [key_cols] => tuple mappings. This is redundant, since the key column values can be extracted from the tuple when needed -- you'd just want custom hash and equality functions that would consult the collection schema to find the columns to hash on.
  2. Extracting the key columns on every insertion could then also be avoided.
  3. The current code extracts the key columns, checks for a matching entry in the hash table, and only then does an insertion. If the underlying storage provided an "upsert"-like method, we could avoid the need to probe the storage twice.

In the grand scheme of things, this isn't a priority but might be fun.

@sriram-srinivasan
Copy link
Member

Custom hash and equality functions on the tuple don't work because the tuple doesn't know its keycols. It cannot know its keycols because we pass a tuple around by ref.

Custom hash and equality functions in the collection don't work well; I experimented with a set data structure written in Ruby with schema-specific hash and equals support, but it doesn't match the built-in Ruby hash for performance.

The fastest method I have seen so far (staying in pure Ruby) is to wrap the tuple with an object with schema-specific hash and equals, but which doesn't extract the fields. This is faster than our current method of creating an array: on jruby it is twice is fast, but marginal (10%) on ruby 1.8.7 and 1.9.3

Given,
table :foo, [:k1, :k2] => [:v]

generate a key class as follows

class Foo_Key 
      def initialize(tup) 
          @tup = tup
      end
      def hash
           @hash = @tup.k1.hash * 31 + @tup.k2.hash // good to cache the hashcode.
      end
      def ==(other)
           other.class <= Foo_Key and @tup.k1 = other.tup.k1 and @tup.k2 == other.tup.k2
      end
end

@neilconway
Copy link
Member Author

By "custom hash and equality functions", I was assuming we'd be able to control the hash/equality function APIs, so it would be easy to arrange to pass a reference to the hash table/collection when invoking the function. That's what I did in C4.

Right -- to beat the performance of the builtin Hash, we'd probably want a C extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants