Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone table #2803

Open
wojons opened this issue Aug 4, 2014 · 9 comments
Open

Clone table #2803

wojons opened this issue Aug 4, 2014 · 9 comments
Milestone

Comments

@wojons
Copy link
Contributor

wojons commented Aug 4, 2014

In the UI and CLI it would be useful to be able to clone an existing table. Table cloning should allow for a few different things.

  • Rename the table this is giving the clone its own name other then clone
  • Select a different logical database for this table
  • Clone indexes
  • Clone Data
  • Lastly should let you select the primary node of the clone
@danielmewes
Copy link
Member

Maybe we could have a tableDescribe() command that provides an object of some sort that can be used to re-create the table including all settings and secondary indexes?
This is closely related to the question of copying a secondary index (see #2797 ), though some additional work is required for cloning the whole table.
I think this will also become easier when we have a ReQL interface for configuring a table's metadata.

I imagine that cloning the actual (non-meta) data would be a separate step.

@coffeemug coffeemug added this to the backlog milestone Aug 12, 2014
@wojons
Copy link
Contributor Author

wojons commented Dec 17, 2014

I wa also thinking if this is to be done it should be something like a copy-on-write sort of things where the copy table has a new file that has blocks matching to the old file and when ever a block starts to get touched by the parerent able or the slave table the new entry (if being updatd by clone table) or the old entry (updated by parent table). This way crazy load is not added copying the table. The big use case for this is anaytlics and data transforming.

@danielmewes
Copy link
Member

Maybe we could implement the cloning by backfilling from an existing into a second new table.

@wojons
Copy link
Contributor Author

wojons commented Dec 17, 2014

i mean that works but think about how long it could take to do that. if the table has a billion records where as if you do copy-on-write then only as items change do you need to do that operation. also saves disk space and if your running an anaytlics job that transforrms that data you dont want to wait for the backfill to finish so you can transform.

@danielmewes
Copy link
Member

@wojons: If the file system supports it we should certainly consider using that. Implementing our own copy-on-write, especially across two different files, would add a lot of additional complexity to our codebase.

@wojons
Copy link
Contributor Author

wojons commented Dec 17, 2014

@danielmewes i know you can do it with one hand tied behind your back in less than an hour. well if we had triggers this could be possiable :)

@internalfx
Copy link

Its not COW, but I wrote a command line tool to help with cloning databases. It uses some techniques suggested by @danielmewes on a similar stack overflow question.

internalfx/thinker

@danielmewes
Copy link
Member

@internalfx That's a very neat tool. Thanks for sharing!
A minor improvement: To handle tables that don't use the default primary key field name id, you can retrieve the primary key of the source table through r.db(sourceDb).table(table).info()('primary_key') and pass it into the tableCreate call through the primaryKey opt arg.

@internalfx
Copy link

@danielmewes

Issue Filed. #3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants