Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDS PostgreSQL dump not ordered correctly preventing restore. #208

Open
davidbruce opened this issue Jul 29, 2022 · 9 comments
Open

RDS PostgreSQL dump not ordered correctly preventing restore. #208

davidbruce opened this issue Jul 29, 2022 · 9 comments

Comments

@davidbruce
Copy link

When I create a dump from a database over the network it does not order the dump properly. Because of this I am unable to restore the dump preventing me from using Replibyte.

Idea of what the dump.sql from Replibyte looks like:

- INSERTS statements ...
- CREATE TABLE statements ...
- CREATE VIEW statements ...
- INSERT statements ...

Even the insert statements themselves are not ordered properly so I cannot simply move the DDL to the top of the file.

This isn't an issue with a database that is in a local docker instance.

@evoxmusic
Copy link
Contributor

@davidbruce , can you show me your Replibyte config .yaml, please? Can you also confirm that you are using the latest version of Replibyte?

@gugacavalieri
Copy link

@davidbruce, @evoxmusic, I am having the same problem when dumping some tables that are over 100MB.

I think this could be related to the order in which the dumped chunks are being restored.

To keep parameters more flexible and as a workaround I added a PR to make it possible to set the buffer size: #234.

When restoring a 160MB table with buffer_size = 100MB I got a inverted order when restoring the dump:

2.dump was restored before 1.dump and the resulting file started with a bunch of insert statement with high id numbers.

When restoring the same table using buffer_size = 200MB I only got 1.dump chunk generated and the resulting file has all the correct info in the correct order.

I still don't know 100% how this chunk order is being messed up when restoring but if you can share a little more info on that I can try to help out with a solution 😃

@evoxmusic
Copy link
Contributor

evoxmusic commented Nov 6, 2022

From what you describe, I feel that the issue could come from this function.

objects.sort_by(|a, b| a.key.cmp(&b.key));

@gugacavalieri can you show me your replibyte config.yaml and can you also show me the content of your metadata.json at the root of your backup bucket if you are using S3? I suspect an issue there.

@gugacavalieri
Copy link

@evoxmusic , looks like it is only happening when I run with local_disk storage. Here is my configuration:

source:
  connection_uri: $SOURCE_DATABASE_URL
  transformers: []

  only_tables: # optional - dumps only specified tables.
    - database: staging
      table: added_stickers

datastore:
  local_disk:
    dir: ./storage

destination:
  connection_uri: $DESTINATION_DATABASE_URL

When running with buffer_size = 100 it creates the following metadata and chunks:

{"v":"0.10.0","dumps":[{"directory_name":"dump-1667763130600","size":10603029,"created_at":1667763242581,"compressed":true,"encrypted":false}]}

image

When restoring to local file (restore local -i mysql -v latest -o > output.sql), the first line of the file shows:

INSERT INTO `added_stickers` (`id`, ...) VALUES (511158, ...);

So ID count is at 511158. And If I scroll to line 904,501 / 2,428,792 I can see an insert with the last ID followed by the end of the dump file and then the beginning of the MySQL dump file:

INSERT INTO `added_stickers` (`id`, ...) VALUES (813513, ...);



/*!40000 ALTER TABLE `added_stickers` ENABLE KEYS */;



UNLOCK TABLES;



SET @@SESSION.SQL_LOG_BIN = @MYSQLDUMP_TEMP_LOG_BIN;



/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;






/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;



/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;



/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;



/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;



/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;



/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;



/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;




-- Dump completed on 2022-11-06 12:32:44
-- MySQL dump 10.13  Distrib 8.0.31, for macos12.6 (arm64)
--
-- Host: xxx    Database: staging
-- ------------------------------------------------------
-- Server version	5.7.38-log
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;



/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;

If I set buffer_size = 200 the problem does not happen. Also, I tested S3 with buffer_size = 100 and the chuncks were restored in the correct order so I guess this could be a problem with the local_disk storage.

@gugacavalieri
Copy link

Probably this read_dir function is returning the files in the wrong order:

let entries = read_dir(format!("{}/{}", self.dir, dump.directory_name))?;

@evoxmusic
Copy link
Contributor

Yes, a sort is necessary. Otherwise, it can not work. We can't rely on read_dir(..) that depends on the filesystem

@gugacavalieri
Copy link

Would something like this do the trick?

        let dump_directory_name = format!("{}/{}", self.dir, dump.directory_name);
        let mut sorted_entries: Vec<_> = read_dir(dump_directory_name).unwrap()
                                            .map(|r| r.unwrap())
                                            .collect();
        sorted_entries.sort_by_key(|dir| dir.path());

        for entry in sorted_entries {
          // ...
        }

@evoxmusic
Copy link
Contributor

I think it should be ok, we can write a test function. I will just make sure that the .unwrap() does not lead to an unexpected panic.

@gugacavalieri
Copy link

I think it should be ok, we can write a test function. I will just make sure that the .unwrap() does not lead to an unexpected panic.

Sounds good! You probably have better Rust skills for this one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants