The repository for Vision-guided Navigation Assistance for the Visually Impaired project at Shared Reality Lab.
Key words: React Native, Nginx, Gunicorn, Python, YOLOv5, PyTorch, Docker, Linux.
This application helps visually impaired people to reach objects of interest by performing camera captured image analysis and giving audio navigation on mobile phones. We expect the application is capable to run on multiple mobile platforms, e.g. Android and iOS. And the analysis should carry out locally or on cloud.
To achieve this goal, we decompose it into several senarios. For example, navigation to doorways. In each senario, the application performs object detection, distance measurement, results rendering and audio feedback.
This is the senario that we are working on. Ideally, the application gives audio guidance and inform the user the location of the nearby doorways. However, due to the absence of doorways dataset, we change to focus on doors and handles specifically.
This section berifly presents some of key points of the whole system, including basic architecture, frameworks and workflows.
As the rising demand of application capabilities, it is hard and time-consuming for developers to transform the source code to different platforms. To solve this issue, we use a cross-platform framework React Native when developing the client app. The app starts the built-in camera in the mobile phone, and captures pictures that will be sent to the server for analysis. In addition, the client app gives feedback after it retrieving the analysis result from the server.
Figure 1: App Client View (v0.0.1).
Figure 2: App Client View (v0.0.2).
The Flask server is responsible for receiving and processing requests from clients and making responses to them. Nginx and gunicorn help to listen requests and run python scripts. We use a customized YOLOv5 (You Only Look Once v5) [1] model to detect and locate doors in the image.
The most important task is creating a robust object detection workflow. After comparing a varity of deep learning CV approaches, we choose to take advantage of YOLOv5 as it is highly customizable and has a strong capability of detecting multiple objects.
The Door Detect dataset [2] serves training and testing purposes. The training set includes 1092 randomly picked images and labels, the remaining 121 images and labels are used for testing. A YOLOv5m with 1280 inputs is trained with the Door-detect dataset, the following figure shows the training and validation results.
The server side application supports docker and docker compose. The base image for this docker image is from PyTorch with CUDA runtime [3], the tag for the specific image that we use is 1.11.0-cuda11.3-cudnn8-runtime. GPU accesses from docker container requires docker compose, the initial configuration points to the GPU with index 0 on the device.
[1] Ultralytics. You Only Look Once v5 (YOLOv5). Available at: https://github.com/ultralytics/yolov5.
[2] MiguelARD. Door Detect Dataset. Available at: https://github.com/MiguelARD/DoorDetect-Dataset.
[3] PyTorch. PyTorch Docker Image. Available at: https://hub.docker.com/r/pytorch/pytorch.