Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended way for use in production? #6

Open
nicostombros opened this issue Feb 5, 2024 · 10 comments
Open

Recommended way for use in production? #6

nicostombros opened this issue Feb 5, 2024 · 10 comments

Comments

@nicostombros
Copy link

When deploying a project that includes local-geocode to production, I am wondering if there are any specific considerations to take into account? So far, I am struggling to be able to change the Geocode initialisation during production. Generally, I don't mind adding time to my build-time. I am wondering what sort of workflow is best here.

Might it be:

...
-> Python dependencies installed, including local-geocode
-> Some way to run the local-geocode initialisation
-> Dump the output of the load to S3/other storage DB or keep this on the server?
-> Continue build and deploy

Thank you

@mar-muel
Copy link
Owner

mar-muel commented Feb 5, 2024

So just to clarify things:

  • When you pip install local-geocode it will pull two pickle files that use the default configuration. You can check their size using
du -sh geocode/data
 30M    geocode/data
  • Then at runtime, you need to run gc.load() at the beginning of your program once. This simply loads the pickle files under geocode/data

If you want to change default parameters, you need to recompute the pickle files. This means when you deploy with non-standard configuration you need to manage the pickle files yourself.

One limitation of the current codebase is that there is no way to configure the data directory. What would be useful in your case would be to initialize like so:

gc = geocode.Geocode(data_dir=<location of your pickle files>)
...

@nicostombros
Copy link
Author

Okay can you point me to where that initial pip install is doing the retrieval of the pickle files? If you're on board with it, I'd like to find some way to override the default where possible?

@mar-muel
Copy link
Owner

mar-muel commented Feb 5, 2024

The pickle files are packaged and part of the library when you pip install it. You can find the location of the pickle files e.g. with

gc = geocode.Geocode()
gc.data_dir
# /home/martin/miniconda/lib/python3.9/site-packages/geocode/data

import os
os.listdir(gc.data_dir)
# ['geonames_6b64aaafc53116f.pkl', 'geonames_keyword_processor_6b64aaafc53116f.pkl']

@nicostombros
Copy link
Author

Sorry maybe I misunderstood, is there a method to override the default parameters so that the install of the local-geocode will just pickle the data it downloads for the parameters given?

My thinking is you'd have 1-2 minutes for the pip install phase of local-geocode and then running the python main.py script with new prepare args would result in another 1-2 minutes of refetching the dataset and repickling

@mar-muel
Copy link
Owner

mar-muel commented Feb 5, 2024

Not sure if I understand, but for now, if you are using Docker, I would recommend to compute the new pickle files in the build process and then overwrite the existing ones in the site-packages folder. Then the correct pickle files would be "baked" into your Docker image and at runtime you just load them as usual.

@nicostombros
Copy link
Author

nicostombros commented Feb 9, 2024

Thanks @mar-muel, I'm personally using a VM so that data_dir override parameter would likely be useful. May I have permissions to create a PR?

@mar-muel
Copy link
Owner

mar-muel commented Feb 9, 2024

@nicostombros Yes, that would be great

@nicostombros
Copy link
Author

@mar-muel Think I need access rights to make the push?

@nicostombros
Copy link
Author

Thank you, created a PR now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants