Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin documentation #229

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions docs/topic/nbgitpuller-downloder-plugins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# nbgitpuller - downloader plugin documentation

nbgitpuller uses [pluggy](https://pluggy.readthedocs.io/en/stable/) as a framework
to load any installed nbgitpuller-downloader plugins. There are three downloader plugins
available right now:
- [nbgitpuller-downloader-googledrive](https://github.com/jupyterhub/nbgitpuller-downloader-googledrive)
- [nbgitpuller-downloader-dropbox](https://github.com/jupyterhub/nbgitpuller-downloader-dropbox)
- [nbgitpuller-downloader-generic-web](https://github.com/jupyterhub/nbgitpuller-downloader-generic-web)


There are several pieces to be aware of for the plugin to work correctly:
1. The setup.cfg(or setup.py) file must have the entry_points definition.
For example:

```toml
[options.entry_points]
nbgitpuller = dropbox=nbgitpuller_downloader_dropbox.dropbox_downloader
```

2. The file referenced for use by nbgitpuller in the plug-in (the above example is looking for the
file, dropbox_downloader) must implement the function handle_files(query_line_args) and be decorated with `@hookimpl`.
3. As a consequence of this, the following must be imported:
- `from nbgitpuller.hookspecs import hookimpl`
4. The implementation of the handle_files function in your plugin needs to return
two pieces of information:
- the name of the folder, the archive is in after decompression
- the path to the local git repo mimicking a remote origin repo

nbgitpuller provides a function in plugin_helper.py called handle_files_helper that handles the downloading
and returning of the correct information if given a URL, the extension of the
file to decompress(zip or tar) and the progress function(I will describe that
more later) but you are welcome to implement the functionality of handle_files_helper in your
plug-in. There may be use cases not covered by the currently available plugins like needing to authenticate against
the webserver or service where your archive is kept. Either way, it behooves you
to study the handle_files_helper function in nbgitpuller to get a sense of how this function
is implemented.

For the rest of the steps, I refer you to the [nbgitpuller-downloader-dropbox](https://github.com/jupyterhub/nbgitpuller-downloader-dropbox) plugin.

```python
@hookimpl
def handle_files(query_line_args):
query_line_args["repo"] = query_line_args["repo"].replace("dl=0", "dl=1") # dropbox: dl set to 1
ext = determine_file_extension(query_line_args["repo"])`
query_line_args["extension"] = ext
loop = asyncio.get_event_loop()
tasks = handle_files_helper(query_line_args), query_line_args["progress_func"]()
result_handle, _ = loop.run_until_complete(asyncio.gather(*tasks))
return result_handle
```

The following pieces describe what happens in handle_files before, at least, in this case, we call
the handle_files_helper function:

1) The parameter, query_line_args, is all the query line arguments you include on the nbgitpuller link. This means you
can put keyword arguments into your nbgitpuller links and have access to these arguments in the handle_files
function.
For example, you might set up a link like this:
http://[your hub]/hub/user-redirect/git-pull?repo=[link to your archive]&keyword1=value1&keyword2=value2&provider=dropbox&urlpath=tree%2F%2F
In your handle_files function, you could make this call to get your custom arguments:

```python
query_line_args["keyword1"]
query_line_args["keyword2"]
```
2) The query_line_args parameter also includes the progress function used to monitor the download_q
for messages; messages in the download_q are written to the UI so users can see the progress and
steps being taken to download their archives. You will notice the progress function is passed into
handle_files_helper and accessed like this:
```python
query_line_args["progress_func"]
query_line_args["download_q"]
```
3) The first line of the handle_files function for the dropbox downloader is specific to DropBox. The URL to a file
in DropBox contains one URL query parameter(dl=0). This parameter indicates to Dropbox whether to download the
file or open it in their browser-based file system. In order to download the file, this parameter
needs to be changed to dl=1.
4) The next line determines the file extension (zip, tar.gz, etc).
This is added to the query_lines_args map and passed off to the handle_files_helper to
help the application know which utility to use to decompress the archive -- unzip or tar -xzf.
5) Since we don't want the user to have to wait while the download process finishes, we have made
downloading of the archive a non-blocking process using the package asyncio. Here are the steps:
- get the event loop
- setup two tasks:
- a call to the handle_files_helper with our arguments
- the progress_loop function
- execute the two tasks in the event loop.
6) The function returns two pieces of information to nbgitpuller:
- the directory name of the decompressed archive
- the local_origin_repo path.

The details of what happens in handle_files_helper can be found by studying the function. Essentially, the archive is downloaded, decompressed, and set up in a file
system to act as a remote repository(e.g. the local_origin_repo path).