RDS PostgreSQL dump not ordered correctly preventing restore. #208

davidbruce · 2022-07-29T19:08:22Z

When I create a dump from a database over the network it does not order the dump properly. Because of this I am unable to restore the dump preventing me from using Replibyte.

Idea of what the dump.sql from Replibyte looks like:

- INSERTS statements ...
- CREATE TABLE statements ...
- CREATE VIEW statements ...
- INSERT statements ...

Even the insert statements themselves are not ordered properly so I cannot simply move the DDL to the top of the file.

This isn't an issue with a database that is in a local docker instance.

The text was updated successfully, but these errors were encountered:

evoxmusic · 2022-08-06T20:42:06Z

@davidbruce , can you show me your Replibyte config .yaml, please? Can you also confirm that you are using the latest version of Replibyte?

gugacavalieri · 2022-11-05T19:23:04Z

@davidbruce, @evoxmusic, I am having the same problem when dumping some tables that are over 100MB.

I think this could be related to the order in which the dumped chunks are being restored.

To keep parameters more flexible and as a workaround I added a PR to make it possible to set the buffer size: #234.

When restoring a 160MB table with buffer_size = 100MB I got a inverted order when restoring the dump:

2.dump was restored before 1.dump and the resulting file started with a bunch of insert statement with high id numbers.

When restoring the same table using buffer_size = 200MB I only got 1.dump chunk generated and the resulting file has all the correct info in the correct order.

I still don't know 100% how this chunk order is being messed up when restoring but if you can share a little more info on that I can try to help out with a solution 😃

evoxmusic · 2022-11-06T10:42:19Z

From what you describe, I feel that the issue could come from this function.

Replibyte/replibyte/src/datastore/s3.rs

Line 555 in da8cd34

objects.sort_by(|a, b| a.key.cmp(&b.key));

@gugacavalieri can you show me your replibyte config.yaml and can you also show me the content of your metadata.json at the root of your backup bucket if you are using S3? I suspect an issue there.

gugacavalieri · 2022-11-06T19:52:15Z

@evoxmusic , looks like it is only happening when I run with local_disk storage. Here is my configuration:

source:
  connection_uri: $SOURCE_DATABASE_URL
  transformers: []

  only_tables: # optional - dumps only specified tables.
    - database: staging
      table: added_stickers

datastore:
  local_disk:
    dir: ./storage

destination:
  connection_uri: $DESTINATION_DATABASE_URL

When running with buffer_size = 100 it creates the following metadata and chunks:

{"v":"0.10.0","dumps":[{"directory_name":"dump-1667763130600","size":10603029,"created_at":1667763242581,"compressed":true,"encrypted":false}]}

When restoring to local file (restore local -i mysql -v latest -o > output.sql), the first line of the file shows:

INSERT INTO `added_stickers` (`id`, ...) VALUES (511158, ...);

So ID count is at 511158. And If I scroll to line 904,501 / 2,428,792 I can see an insert with the last ID followed by the end of the dump file and then the beginning of the MySQL dump file:

INSERT INTO `added_stickers` (`id`, ...) VALUES (813513, ...);



/*!40000 ALTER TABLE `added_stickers` ENABLE KEYS */;



UNLOCK TABLES;



SET @@SESSION.SQL_LOG_BIN = @MYSQLDUMP_TEMP_LOG_BIN;



/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;






/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;



/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;



/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;



/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;



/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;



/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;



/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;




-- Dump completed on 2022-11-06 12:32:44
-- MySQL dump 10.13  Distrib 8.0.31, for macos12.6 (arm64)
--
-- Host: xxx    Database: staging
-- ------------------------------------------------------
-- Server version	5.7.38-log
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;



/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;

If I set buffer_size = 200 the problem does not happen. Also, I tested S3 with buffer_size = 100 and the chuncks were restored in the correct order so I guess this could be a problem with the local_disk storage.

gugacavalieri · 2022-11-06T20:07:32Z

Probably this read_dir function is returning the files in the wrong order:

Replibyte/replibyte/src/datastore/local_disk.rs

Line 183 in da8cd34

let entries = read_dir(format!("{}/{}", self.dir, dump.directory_name))?;

evoxmusic · 2022-11-06T20:18:23Z

Yes, a sort is necessary. Otherwise, it can not work. We can't rely on read_dir(..) that depends on the filesystem

gugacavalieri · 2022-11-06T20:19:58Z

Would something like this do the trick?

        let dump_directory_name = format!("{}/{}", self.dir, dump.directory_name);
        let mut sorted_entries: Vec<_> = read_dir(dump_directory_name).unwrap()
                                            .map(|r| r.unwrap())
                                            .collect();
        sorted_entries.sort_by_key(|dir| dir.path());

        for entry in sorted_entries {
          // ...
        }

evoxmusic · 2022-11-06T20:22:48Z

I think it should be ok, we can write a test function. I will just make sure that the .unwrap() does not lead to an unexpected panic.

gugacavalieri · 2022-11-06T20:58:17Z

I think it should be ok, we can write a test function. I will just make sure that the .unwrap() does not lead to an unexpected panic.

Sounds good! You probably have better Rust skills for this one :)

davidbruce mentioned this issue Jul 29, 2022

Cannot create a Replibyte dump from a PostgreSQL .dump or .sql file with --source_type postgresql --input --file #209

Open

gugacavalieri mentioned this issue Nov 5, 2022

Add Buffer Size Parameter to dump create #234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDS PostgreSQL dump not ordered correctly preventing restore. #208

RDS PostgreSQL dump not ordered correctly preventing restore. #208

davidbruce commented Jul 29, 2022

evoxmusic commented Aug 6, 2022

gugacavalieri commented Nov 5, 2022

evoxmusic commented Nov 6, 2022 •

edited

Loading

gugacavalieri commented Nov 6, 2022

gugacavalieri commented Nov 6, 2022

evoxmusic commented Nov 6, 2022

gugacavalieri commented Nov 6, 2022

evoxmusic commented Nov 6, 2022

gugacavalieri commented Nov 6, 2022

RDS PostgreSQL dump not ordered correctly preventing restore. #208

RDS PostgreSQL dump not ordered correctly preventing restore. #208

Comments

davidbruce commented Jul 29, 2022

evoxmusic commented Aug 6, 2022

gugacavalieri commented Nov 5, 2022

evoxmusic commented Nov 6, 2022 • edited Loading

gugacavalieri commented Nov 6, 2022

gugacavalieri commented Nov 6, 2022

evoxmusic commented Nov 6, 2022

gugacavalieri commented Nov 6, 2022

evoxmusic commented Nov 6, 2022

gugacavalieri commented Nov 6, 2022

evoxmusic commented Nov 6, 2022 •

edited

Loading