Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter Notebooks to illustrate the inference-contract family. #31

Conversation

prakashngit
Copy link
Contributor

The Notebooks' structure, installation, execution patterns are same as those used by the Noteboks provided for the Exchange contract family. Please note that the inference guardian service must be started prior to executing the notebooks. Please see inference-contract/README.md for instructions.

Some of the additional features offered by this PR taking advantage of recent commits to the contracts repo:

  1. It is possible to use the inference contract jupyter notebook via docker (using features from PR 23)
  2. When using docker, the inference guardian will run in a separate container (using features from PR 27)
  3. The notebooks are constructed using jupytext python modules (using features from PR 29)

Copy link

@g2flyer g2flyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a few hurdles, i managed to build, run & execute all notebooks. A few comments though ..

@@ -150,6 +150,8 @@ try ${PDO_HOME}/contracts/inference/scripts/ss_start.sh -c -o ${PDO_HOME}/logs -
--config-dir ${PDO_HOME}/etc/contracts \
--identity guardian_sservice

sleep 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this sleep is to make sure that storage service is running before guardian is started? How robust is that 3 seconds and in case for whatever reason the sleep happened to be too short, how easy would it be to diagnose the issue? Couldn't we do some similar synchronization we do as when we wait for CCF or alike?

Copy link
Contributor

@cmickeyb cmickeyb Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ccf we wait for a file to be created. in pdo we curl a url found in the configuration file. i don't think either approach would work here. there is no guarantee that the db file (the obvious example) was removed between runs and the configuration file has several urls in it that make the grep approach "brittle" at best. maybe a better way for this service AND for the ones in pdo is to explicitly write a URL into a file (like what we do with the pid) and then use that to test for a service listening at the interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@g2flyer curious about your thoughts on hyperledger-labs/private-data-objects#476. This would be a clean way to address your concerns as well.

inference-contract/docs/notebooks/documents/overview.md Outdated Show resolved Hide resolved
--config-dir ${PDO_HOME}/etc/contracts \
--identity guardian_sservice

sleep 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for sleep in script_test.sh

# token_metadata : additional information about the token
# count : the number of tokens to mint for the asset

# Note that the notebook assumes that there is a key file for the identity of the form: `${keys}/${identity}_private.pem`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that User1 to User10 exist but more could be created via /documents/key_manager.ipynb notebook if

Copy link
Contributor

@cmickeyb cmickeyb Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an improvement to this coming (see key-manager.py in #32 ). I don't think we should rely on the user keys... they should be removed from the PDO client install by default (fine for tests, but not for a real client).

# Create a token notebook for each of the minted tokens.

# %%
# %%skip True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know these (undocumented!) skips are inherited but i still found them quite annoying ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it is documented. exchange/pdo/contracts/jupyter.py (this is CODE so the question is whether code documents it). and also in docs/notebooks/README.md

# ```
#

# %% tags=["parameters"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as with token-issuer, duplication of (slightly re-ordered) fields in two code blocks, one of them auto-generated is highly confusing. Seems to me relying just on the auto-generated one should be completely sufficient ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't do it. the parameters field is necessary for papermill to inject the parameters into the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified that the parameters block must have a superset of the variables that are inherited.

inference-contract/docs/notebooks/templates/token.py Outdated Show resolved Hide resolved
inference-contract/docs/notebooks/templates/token.py Outdated Show resolved Hide resolved
Copy link
Contributor

@cmickeyb cmickeyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several things I would like to see addressed.

# %% tags=["parameters"]
identity = input('Identity of the token issuer: ')
token_class = input('Name of the class of tokens to issue: ')
service_host = input('Service host [localhost]: ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest you pick up the actual defaults... service_host = input('...') or 'localhost'

token_metadata = 'created by {}'.format(token_owner)
count = 1
service_host = 'localhost'
instance_identifier = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no description for instance_identifyer; More interestingly, though, the "magic" second "parameter" code block (see below) actually redefines this variable anyway? So drop it here?

this is a property of the papermill templating modules. these are "placeholders".

# service host by running the following:

# %%
# %%skip True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is skip default True here? Seems to contract text from above? In any case above we should say when to skip or not and that one correspondingly has to change the code block? Given that we have creation conditional on existence of toml file and so re-eval block is safe, why even bother with %%skip at alll?

this only need to be run once per "session". note it will be changing completely soon.


context = pc_jupyter.ml_jupyter.initialize_token_context(state, bindings, context_file, token_path, **context_bindings)
pc_jupyter.pbuilder.Context.SaveContextFile(state, context_file, prefix=token_path)
print('context initialized')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this particular message is not an error and the context comes back successfully. that being said... we are trusting to exceptions (the python way) to be thrown when something break. more or less.

Comment on lines +54 to +62
# ### Initialize the PDO Environment
#
# Initialize the PDO environment. This assumes that a functional PDO configuration
# is in place and that the PDO virtual environment has been activated. In particular,
# ensure that the groups file and eservice database have been configured correctly.
This can be done most easily by running the following in a shell:
#
# `$PDO_HOME/bin/pdo-create-service-groups.psh --service_host <service_host>`
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you might have come through this factory first. and... #32 will change the way this is handled completely. suggest we not push for changes at this time. this code is one of the prime motivations for #32.

state, token_context, image=image_name, search_path=[path])

# %% [markdown]
# ### New User Performs Operations on the Asset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much a TEST not part of the factory or template. This file is supposed to be a (primitive) interface to the token, not a test file.

--config-dir ${PDO_HOME}/etc/contracts \
--identity guardian_sservice

sleep 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is necessary only if we do not use docker for the guardian (to be started before starting the jupyter server). Question is do we even need to support such a mode ? My suggestion is that we always expect the guardian to be deployed via the docker container. Please comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, have removed this file completely. As part of documentation, have given commands using ss_start and gs_start as to how to deploy guardian directly, if needed for bare-metal.

…t family use-case.

The Notebooks' structure, installation, execution patterns are same as those used by the
Noteboks provided for the Exchange contract family. Please note that the inference guardian service
must be started prior to executing the notebooks. Please see inference-contract/README.md for instructions.

Some of the additional features offered by this PR taking advantage of recent commits to the contracts repo:

1. It is possible to use the inference contract jupyter notebook via docker (using features from PR 23)
2. When using docker, the inference guardian will run in a separate container (using features from PR 27)
3. The notebooks are constructed using jupytext python modules (using features from PR 29)

Signed-off-by: Prakash Narayana Moorthy <[email protected]>
@prakashngit prakashngit force-pushed the Prakash.Jupyter_Notebooks_inference_contract_family branch from 3e1fcc3 to 676abb5 Compare April 12, 2024 20:09
@g2flyer g2flyer self-requested a review April 13, 2024 03:46
Copy link

@g2flyer g2flyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem to address the easy addressable concerns and others seem currently general limitations (and matter of taste, e.g., on the skip ;-)

Couldn't test it though as my ICX server has some boot problems (while trying to get DCAP working on it) and the other test machines i have cannot run the code: After a bit of annoying debugging, turns out that the tensorflow libraries we use do not work if you don't have AVX (as for these gemini lake) and the error message is not visible in docker lock. Two suggestions:

  • make before running anything involving tensorflow we might to do something like $(lscpu | grep -q avx) || die "AVX not avilable"
  • more generally for docker, might be useful to redirect error messages not only in a file but also to /dev/tty? That said, i found now also a found a reasonably convenient way to investigate failures after docker-compose stopped (you will though have to run docker-compose without --abort-on-container-exit as currently used in Makefile!): https://stackoverflow.com/questions/65953634/how-do-i-inspect-the-stopped-docker-container-files

Copy link
Contributor

@cmickeyb cmickeyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple cosmetic changes in the additional commits. I need to test this with the changes to the papermill parameters.

service_host = 'localhost'
instance_identifier = ''
# rest of the parameters such as token_owner, token_class, service_host, and instance_identifier will get auto filled from factory via papermill
# note that papermill places the parameters cell after first paramters cell in this notebook

Copy link
Contributor

@cmickeyb cmickeyb Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you verified that this works? papermill generally requires that the template contain complete list of parameters. it isn't pretty, but if you're setting through the templating mechanism it has to be there.

Verified that papermill generates error/warning messages for missing variables. Just add the parameters you expect with reasonable defaults. You aren't testing for missing variables anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, tested, paperill does not need repeating all parameters. per papermill documentation, papermill populates the instance notebook with parameters after the first cell in the target notebook that has parameters. (that's why we currently see new parameters appearing after the default set). there is no need for the default set to have anything that papaermill will add. Yes, agreed, warnings will be generated while running the factory notebook.

Copy link
Contributor

@cmickeyb cmickeyb Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it actually doesn't work. an error is generated. lets just add the variables that are necessary. to be clear... you can think about the papermill parameters as being the parameters specified to a script. its the "argparse" for notebooks. while @g2flyer doesn't like the duplication, it is a standard way of extending notebooks. your parameter block in the template should contain EVERY variable that you currently (or may reasonably in the future) specify when creating the contracts. for example, you don't currently have the ability to specify the metadata. in the future, that should absolutely be a parameter that could be specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, added back all parameters that are being parametrized by papermill.

Copy link
Contributor

@cmickeyb cmickeyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See message about missing papermill parameters.

Copy link
Member

@marcelamelara marcelamelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly have nitpicky formatting comments, though there are a couple of places that adding links/references to other documentation would be really helpful.

@@ -51,9 +51,10 @@ The Digital Assets contract family implements a basic digital asset for bitmap i
### Inference


The Inference contract family provides contracts for creating a confidentiality preserving policy-wrapper around the usage of a machine learning (ML) model. At its core, the implementation uses a token contract to specify and enforce policies to be followed while using the ML model for inferecing operations.
The Inference contract family provides contracts for creating a confidentiality
preserving policy-wrapper around the usage of a machine learning (ML) model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a policy-wrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, have clarified about both policy-wrapper and token object : "The policy-wrapper specifies and checks the
policies to be followed while using the ML model for inferencing
operations, and is implemented as a PDO contract that we refer to as the Token Object."

The Inference contract family provides contracts for creating a confidentiality preserving policy-wrapper around the usage of a machine learning (ML) model. At its core, the implementation uses a token contract to specify and enforce policies to be followed while using the ML model for inferecing operations.
The Inference contract family provides contracts for creating a confidentiality
preserving policy-wrapper around the usage of a machine learning (ML) model.
At its core, the implementation uses a token contract to specify and enforce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't assume the reader knows what a "token contract" is. It would be helpful to add a link to "token contract" to documentation where the reader can find more information on the underlying implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, please see comment above for fix.

The Inference contract family provides contracts for creating a confidentiality
preserving policy-wrapper around the usage of a machine learning (ML) model.
At its core, the implementation uses a token contract to specify and enforce
policies to be followed while using the ML model for inferecing operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
policies to be followed while using the ML model for inferecing operations.
policies to be followed while using the ML model for inferencing operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done.

Comment on lines 123 to 128
even though the Jupyter server is deployed on bare-metal. Please consult the
`https://github.com/hyperledger-labs/private-data-objects/tree/main` repo for deploypment
options for pdo ledger and services. Regarding deployment of the model guardian,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit cleaner.

Suggested change
even though the Jupyter server is deployed on bare-metal. Please consult the
`https://github.com/hyperledger-labs/private-data-objects/tree/main` repo for deploypment
options for pdo ledger and services. Regarding deployment of the model guardian,
even though the Jupyter server is deployed on bare-metal. Please consult the [PDO repo](https://github.com/hyperledger-labs/private-data-objects/tree/main) for deployment
options for pdo ledger and services. Regarding deployment of the model guardian,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

We provide [Jupyter Notebooks](./docs/notebooks/README.md) that can be executed
in an interactive manner to illustrate the functionality of the inference
contract family. We support Jupyter notebook deployment via both docker containers
as well as bare-metal. For docker, please invoke the `make test_jupyter` command
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Para break.

Suggested change
as well as bare-metal. For docker, please invoke the `make test_jupyter` command
as well as bare-metal.
For docker, please invoke the `make test_jupyter` command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

Object and the Asset Guardian Service.

The asset guardian service here is implemented as a web service
containing a frontend and backend portions.The backend of the guardian
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
containing a frontend and backend portions.The backend of the guardian
containing a frontend and backend portions. The backend of the guardian

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

contract family. We support Jupyter notebook deployment via both docker containers
as well as bare-metal. For docker, please invoke the `make test_jupyter` command
from within the docker folder contained within this repo. This command deploys the
pdo ledger, pdo services, guardian frontend as well as the openvino containers in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure consistent capitalization here an in other docs.

Suggested change
pdo ledger, pdo services, guardian frontend as well as the openvino containers in
pdo ledger, pdo services, guardian frontend as well as the openVINO containers in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

Comment on lines 39 to 48
functions: 1. Processes the capability package generated by the token
object and generates/invokes the API for the backend. The capability
package may be thought of the asset-usage approval generated by the
token object smart contract. 2. Implements any input preprocessing and
output postprocessing (collectively referred to as model scoring) as
determined necessary by the original asset owner.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will ensure these are rendered as numbered bullets.

Suggested change
functions: 1. Processes the capability package generated by the token
object and generates/invokes the API for the backend. The capability
package may be thought of the asset-usage approval generated by the
token object smart contract. 2. Implements any input preprocessing and
output postprocessing (collectively referred to as model scoring) as
determined necessary by the original asset owner.
functions:
1. Processes the capability package generated by the token
object and generates/invokes the API for the backend. The capability
package may be thought of the asset-usage approval generated by the
token object smart contract.
2. Implements any input preprocessing and
output postprocessing (collectively referred to as model scoring) as
determined necessary by the original asset owner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

script, and thus ensuring tight binding between the token object
contract and the model usage.

We note that while the Token Object smart contract is executed within
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, it may be good to add a link to "token object smart contract" here for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, hope the clarification of token object addresses this.;


We note that while the Token Object smart contract is executed within
Intel SGX enclaves (via PDO), the asset guardian service in this
impelementation is not protected by TEEs. A prospective asset-user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
impelementation is not protected by TEEs. A prospective asset-user
implementation is not protected by TEEs. A prospective asset-user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done.

Signed-off-by: Prakash Narayana Moorthy <[email protected]>
@prakashngit prakashngit force-pushed the Prakash.Jupyter_Notebooks_inference_contract_family branch from d72723f to 5e367ba Compare April 16, 2024 22:24
@cmickeyb cmickeyb merged commit a27b504 into hyperledger-labs:main Apr 17, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants