Data Partitioner

Project to handle table partitioning task so that data can be processed faster in postgres environment. It can also be used as Data Archival if followed the PostRun step below.

Pre-requisite

Python environment(preferably 3.7)
Virtual env with packages installed from requirements.txt
Host machine with access to postgres instance endpoint
The instance should be checked for the storage and must be set to dynamically increase the storage, at most twice the current size depending on the run cycle

Preferably when the postgres instance is running as an RDS instance in AWS

Process

Checkout the How To? file to understand the process.

Post Run

Once the process is completed and all tests are done. We need to take the manual snapshot of rds, serving as a historical point in time recovery and drop past child in current master.

Test Performed

A test was performed on the rds "test table" as reference table of 50 cols of size 225 gigs, (needed extra 250gb for new data inserts.)
Process took about 10-14 hours to get completed.
Switch masters was run separately will the script running and creating a process of rapid data insertion.
Master switch took at max 1 min to get completed.

Concerns

After the switch was performed, there is some data diff between the old master and new master.
It can be solved by changing the sequence id of new master to max+1 of old master. And then inserting the diff records seaprately with defined primary column in the insert statements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Partitioner

Pre-requisite

Process

Post Run

Test Performed

Concerns

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Partitioner

Pre-requisite

Process

Post Run

Test Performed

Concerns