Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for serving Intake Catalogs #852

Open
pgierz opened this issue Jan 21, 2025 · 4 comments
Open

Example for serving Intake Catalogs #852

pgierz opened this issue Jan 21, 2025 · 4 comments

Comments

@pgierz
Copy link

pgierz commented Jan 21, 2025

Hi there,

Looking through issues (#88) and pull requests (#122) only got me part of the way, so I thought I'd ask for some help as a new issue.

I'm trying to set up a tiled server that will allow users to upload completed intake catalogs via the tiled client. On the producing side, I have an simulation pipeline system for an Earth System Model that produces intake catalogues of its output as part of the simulation pipeline. The main building block on the Python side is a giant dictionary, config. Essentially, part of the model clean up is:

import json
import tiled.client

def create_intake_catalog(config):
   config["intake"] = {}
   my_cat = _create_catalog(config)  # Implementation details not shown here, too specific
   type(my_cat)  # intake_esm.core.esm_datastore
   config["intake"]["catalog"] = my_cat
   return config

def serialize_catalog(config):
    my_cat = config["intake"]["catalog"]
    my_cat.serialize("example_catalog.json")  # Dumps JSON to disk, useful for users on the machine
    with open("example_catalog.json", "r") as cat_file:
        my_cat_json = json.load(f)
    config["intake"]["catalog_json"] = my_cat_json
    return config

def upload_catalog(config):
    my_cat_json = config["intake"]["catalog_json"]
    tiled_server_url = "http://localhost:8000"  # Replaced later with some central place
    client = tiled.client.from_uri(tiled_server_url)
    # Imaginary authentication...
    client.login()

    # This next line is not real, here I need some help!
    client.write_catalog(my_cat_json, metadata={"name": "my sim"})
    return config

I think as a consumer it is then really straightforward:

import intake
cat = intake.open_catalog("http://localhost:8000")
cat.my_sim  # Gives you the catalog we had produced earlier

I hope I expressed what I am trying to do clearly enough. Could someone help me figure out how to serve such catalogs and have users actively upload them?

Thanks!
Paul

@pgierz
Copy link
Author

pgierz commented Jan 21, 2025

Here is also a minimal reproducible example with the same kinds of objects I would eventually like to use.

Assuming requirements of:

intake_esm
tiled[all]
import intake_esm
import tempfile
import pathlib

from tiled.client import Context, from_context
from tiled.examples.generated_minimal import tree
from tiled.server.app import build_app

# Create dummy catalog
url = intake_esm.tutorial.get_url('google_cmip6')
cat = intake.open_esm_datastore(url)
catalog_json_file = pathlib.Path(tempfile.NamedTemporaryFile(suffix=".json"))
cat.serialize(name=catalog_json_file.name, directory=catalog_json_file.stem)

# Create dummy client
app = build_app(tree)
context = Context.from_app(app)
client = from_context(context)

client.write(cat)  # <-- This is the part I don't know how to write down

@danielballan
Copy link
Member

Hello @pgierz! I'm happy to see someone pushing on this, and help if I can.

I would suggest starting a tiled server in a separate process, from the CLI:

tiled serve catalog --temp --api-key secret

(If you don't specify an API key, a random token will be generated, just as jupyter notebook does. But for dev it is convenient to set a stable, memorable API key. Of course, don't accept external traffic on a dev server.)

Then, I think something like this:

x = tiled_client.create_container(key="x")
for key, data in your_intake_catalog.items():
    x.write_array(data, key=key)

Or use x.write_dataframe or any x.write_* as appropriate.

@pgierz
Copy link
Author

pgierz commented Jan 22, 2025

Thanks, I'll try that out and report back. One question though: wouldn't that upload the data to the tiled server? Maybe that wasn't clear from my question, I here only want to share the actual catalogs via the server. I can image cases where the data itself would be too large to host in such a way, or that the server is running on infrastructure that isn't attached to the same storage.

@pgierz
Copy link
Author

pgierz commented Jan 27, 2025

Hi @danielballan,

I think I've experimented enough now to formulate some more concrete questions:

  1. Is the supported data storage limited to arrays and dataframes? Could I, for example, store a plain text, YAML, or JSON dataset?

  2. When writing, can I pass in a path to the storage data which would be served, rather than the data itself (assuming the server running tiled has access to the same storage)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants