Skip to content

Latest commit

 

History

History
246 lines (162 loc) · 9.49 KB

README.md

File metadata and controls

246 lines (162 loc) · 9.49 KB

Alt-Text

A PyPi package used for finding, generating, and setting alt-text for images in HTML files.

Developed as a Computer Science Senior Design Project at Stevens Institute of Technology in collaboration with the Free Ebook Foundation.

Learn more about the developers.

Getting Started

Installation

You can find the PyPi package here. To install the package via, you can execute the following in a terminal for your respective system...

Windows
py -m pip install alt-text

Unix/MacOS
python3 -m pip install alt-text

Developer Dependencies

All developer dependencies can be found here. You will only need to install these individually when working directly with the source code.

Engine Dependencies

As of the moment, the image analyzation tools that Alt-Text uses are not fully bundled with the package itself. Hence, depending on the type of engines you are using (for Description Generation and/or Character Recognition), you will need to install various applications/get API keys for the respective functionalities.

Description Engines

Description Engines are used to generate descriptions of an image. If you are to use one of these, you will need to fulfill that specific Engine's dependencies before use.

ReplicateAPI

ReplicateAPI Engine uses the Replicate API, hence you will need to get an API key via Logging in with Github on the Replicate website.

GoogleVertexAPI

GoogleVertexAPI Engine uses the Vertex AI API, hence you will need to get access from the Google API Marketplace. Additionally, Alt-Text uses Service Account Keys to get authenticated with Google Cloud, hence you will need to Create a Service Account Key with permission for the Vertex AI API and have its according JSON.

BlipLocal

The BlipLocal Engine uses a modified version of the cobanov/image-captioning repository, which allows for the use of Blip locally via a CLI. To get started, you must download this fork of the repository and download/install the BLIP-Large checkpoint as described in the README.

OCR Engines

Optical Character Recognition Engines are used to find text within images. If you are to use one of these, you will need to fulfill that specific Engine's dependencies before use.

Tesseract

The Tesseract Engine uses Tesseract, hence you will need to install the Tesseract OCR.

Language Engines

Language Engines are used to generate a alt-text given an image description (from the Description Engine), characters found in an image (from the OCR Engine), and context within the Ebook. If you are to use one of these, you will need to fulfill that specific Engine's dependencies before use.

OpenAI API

The OpenAI API Engine gives access to Open AI's GPT Models via their API. To use this, you will need an API Key with access to the appropriate tier (more info on their pricing page).

PrivateGPT

The PrivateGPT Engine gives allows for easy integration with an instance of PrivateGPT. To use this, you'll need a running instance of a PrivateGPT API Server.

Quickstart & Usage

Setup

Standard Setup

The standard setup assumes that you have access to a Description Engine and Language Engine (the OCR Engine being optional).

from alttext.alttext import AltTextHTML

alt = AltTextHTML(
    ReplicateAPI("REPLICATE_KEY"),
    # Tesseract(),
    OpenAIAPI("OPENAI_KEY", "gpt-3.5-turbo"),
)

Legacy Setup

This setup assumes that you have access to a Description Engine (the OCR Engine and Language Engine being optional).

from alttext.alttext import AltTextHTML

alt = AltTextHTML(
    ReplicateAPI("REPLICATE_KEY"),
    # Tesseract(),
    # OpenAIAPI("OPENAI_KEY", "gpt-3.5-turbo"),
    options = {"version": 1}
)

Options

Below are the default options for the AltTextHTML class. You can change these by passing a dict into the options parameter during instantiation. When passing options, you only need the options you'd like to change from the default values in the dict.

DEFOPTIONS = {
    "withContext": True,
    "withHash": True,
    "multiThreaded": True,
    "version": 2,
}

Basic Usage

Loading an Ebook

# from a file
alt.parseFile("/path/to/ebook.html")

# or from a string
alt.parse("<HTML>...</HTML>")

Getting Images

# getting all images
imgs : list[bs4.element.Tag] = alt.getAllImgs()

# getting all images with no alt attribute or where alt = ""
imgs_noalt : list[bs4.element.Tag] = alt.getNoAltImgs()

# get a specific image by src
img : bs4.element.Tag = alt.getImg("path_as_in_html/image.png")

Generating Alt-Text

# generate alt-text for a single image by src
alt_text : str = alt.genAltText("path_as_in_html/image.png")

# generate an association from an image tag
# example_association = {
#   "src" : "path_as_in_html/image.png"
#   "alt" : "generated alt text"
#   "hash" : 1234
# }
association : dict = alt.genAssociation(img : bs4.element.Tag)

# generate a list of associations given a list of image tags
associations : list[dict] = alt.genAltAssociations(imgs : list[bs4.element.Tag])

Setting Alt-Text

# setting alt-text for a single image by src
new_img_tag : bs4.element.Tag = alt.setAlt("path_as_in_html/image.png", "new alt")

# setting alt-text for multiple images given a list of associations
new_img_tags : list[bs4.element.Tag] = alt.setAlts(associations : list[dict])

Exporting Current HTML Status

# getting current html as string
html : str = alt.export()

# exporting to a file
path : str = alt.exportToFile("path/to/new_html.html")

Our Mission

The Alt-Text project is developed for the Free Ebook Foundation as a Senior Design Project at Stevens Institute of Technology.

As Ebooks become a more prominant way to consume written materials, it only becomes more important for them to be accessible to all people. Alternative text (aka alt-text) in Ebooks are used as a way for people to understand images in Ebooks if they are unable to use images as intended (e.g. a visual impaired person using a screen reader to read an Ebook).

While this feature exists, it is still not fully utilized and many Ebooks lack alt-text in some, or even all their images. To illustrate this, the Gutenberg Project, the creator of the Ebook and now a distributor of Public Domain Ebooks, have over 70,000 Ebooks in their collection and of those, there are about 470,000 images without alt-text (not including images with insufficient alt-text).

The Alt-Text project's goal is to use the power of various AI technologies, such as machine vision and large language models, to craft a solution capable of assisting in the creation of alt-text for Ebooks, closing the accessibility gap and improving collections, such as the Gutenberg Project.

Contact Information

The emails and relevant information of those involved in the Alt-Text project can be found below.

The Deveolpers

The Client

Advisor

APIs, Tools, & Libraries Used

Alt-Text is developed using an assortment of tools...

Development Tools

Alt-Text is developed using...

APIs and Supplementary Tools

Packaging/Distribution Tools

Alt-Text is distributed using...