Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate iiif.archive.org/iiif off of Labs APIs #66

Closed
mekarpeles opened this issue Apr 11, 2024 · 2 comments
Closed

Migrate iiif.archive.org/iiif off of Labs APIs #66

mekarpeles opened this issue Apr 11, 2024 · 2 comments
Assignees
Milestone

Comments

@mekarpeles
Copy link
Member

mekarpeles commented Apr 11, 2024

Currently hitting https://iiif.archive.org/iiif still hits https://api.archivelab.org/iiif. The Labs APIs are likely going to go away ~this year and so we should try to move that dependent code out of api.archivelab.org/iiif into the iiif service itself.

Here we can see where/how iiif.archive.org calls to api.archivelab.org/iiif for the purpose of generating a searchable json list of items available to be accessed in iiif format:

https://github.com/internetarchive/iiif/blob/main/iiify/resolver.py#L32-L38

The corresponding code on api.archivelab.org/iiif is the Catalog class:
https://github.com/ArchiveLabs/api.archivelab.org/blob/master/server/views/apis/v1/iiif.py#L21-L34

def get(self, page=1, limit=1000):
        q = request.args.get('q', '')
        query = "(mediatype:(texts) OR mediatype:(image))" + \
                ((" AND %s" % q) if q else "")
        fields = request.args.get('fields', '')
        sorts = request.args.get('sorts', '')
        cursor = request.args.get('cursor', '')
        version = 'v2'
        limit = 1000
        return items(page=page, limit=limit, fields=fields, sorts=sorts,
                     query=query, cursor=cursor, version=version)

which calls items:
https://github.com/ArchiveLabs/api.archivelab.org/blob/master/server/api/archive.py#L303-L314

def items(iid=None, query="", page=1, limit=100, fields="", sorts="",
          cursor=None, version=''):
    # aaron's idea: Weekly dump of ID of all identifiers (gzip)
    # elastic search query w/ paging
    if iid:
        return item(iid)
    # 'all:1' also works
    q = "NOT identifier:..*" + (" AND (%s)" % query if query else "")
    if version == 'v2':
        return scrape(query=q, fields=fields, sorts=sorts, count=limit,
                      cursor=cursor)
    return search(q, page=page, limit=limit)

Which either calls item or scrape or search:

def item(iid):
    try:
        return requests.get('%s/metadata/%s' % (API_BASEURL, iid)).json()
    except ValueError as v:
        return v

def scrape(query, fields="", sorts="", count=100, cursor="", security=True):
    """
    params:
        query: the query (using the same query Lucene-like queries supported by Internet Archive Advanced Search.
        fields: Metadata fields to return, comma delimited
        sorts: Fields to sort on, comma delimited (if identifier is specified, it must be last)
        count: Number of results to return (minimum of 100)
        cursor: A cursor, if any (otherwise, search starts at the beginning)
    """
    if not query:
        raise ValueError("GET 'query' parameters required")

    if int(count) > 1000 and security:
        raise MaxLimitException("Limit may not exceed 1000.")

    #sorts = sorts or 'date+asc,createdate'
    fields = fields or 'identifier,title'

    params = {
        'q': query
    }
    if sorts:
        params['sorts'] = sorts
    if fields:
        params['fields'] = fields
    if count:
        params['count'] = count
    if cursor:
        params['cursor'] = cursor

    r = requests.get(SCRAPE_API, params=params)
    return r.json()

def search(query, page=1, limit=100, security=True, sort=None, fields=None):
    if not query:
        raise ValueError("GET query parameters 'q' required")

    if int(limit) > 1000 and security:
        raise MaxLimitException("Limit may not exceed 1000.")

    sort = sort or 'sort%5B%5D=date+asc&sort%5B%5D=createdate'
    fields = fields or 'identifier,title'
    return requests.get(
        ADVANCED_SEARCH + sort,
        params={'q': query,
                'rows': limit,
                'page': page,
                'fl[]': fields,
                'output': 'json',
            }).json()
@mekarpeles
Copy link
Member Author

TL;DR --

Change the functionality of https://github.com/internetarchive/iiif/blob/main/iiify/resolver.py#L32-L38 so instead of calling api.archivelab.org external API, we instead move its code (https://github.com/ArchiveLabs/api.archivelab.org/blob/master/server/views/apis/v1/iiif.py#L21-L34) here.

@mekarpeles
Copy link
Member Author

Closed by #67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants