Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update forked code #24

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
519a351
Update README.md
jinyoungkim927 Jan 13, 2024
b1311da
Update README.md
jinyoungkim927 Jan 13, 2024
a5b81d2
Update README.md
jinyoungkim927 Jan 13, 2024
dcc8ce3
GPT-4
oliviaylee Jan 13, 2024
f76d00d
Update README.md
jinyoungkim927 Jan 13, 2024
7d9c25f
GPT-4
oliviaylee Jan 13, 2024
b580886
Flutter with threading
jinyoungkim927 Jan 13, 2024
e7ceca2
object counting cloud vision API
oliviaylee Jan 13, 2024
580ea5e
New logic for Vision API + GPT-4
oliviaylee Jan 13, 2024
8df8238
Basic GUI website
jinyoungkim927 Jan 13, 2024
f810521
Fixes to fluttering mechanism
jinyoungkim927 Jan 13, 2024
ab743f7
Fix google cloud dependency
jinyoungkim927 Jan 13, 2024
410f48d
Web UI debugging #1
jinyoungkim927 Jan 13, 2024
9f04d38
Web UI debugging #2
jinyoungkim927 Jan 13, 2024
494d2d1
old web ui deleted
Jan 13, 2024
c28db83
Automatic stop fluttering
Jan 13, 2024
4f5aaf0
Hook up to Vision OpenAI model
jinyoungkim927 Jan 13, 2024
db5344e
vision model edits
oliviaylee Jan 13, 2024
2f7429d
basic skeleton complex commands
oliviaylee Jan 13, 2024
81c80b4
Merged changes
jinyoungkim927 Jan 13, 2024
4708f0e
Image debugging #2
jinyoungkim927 Jan 13, 2024
c6ac406
Include self
jinyoungkim927 Jan 13, 2024
5cad17a
Image recognition error #4
jinyoungkim927 Jan 13, 2024
f0a05f9
Image recognition error #5
jinyoungkim927 Jan 13, 2024
8696fb1
Image recognition error #6
jinyoungkim927 Jan 13, 2024
89982eb
Image recognition error #7
jinyoungkim927 Jan 13, 2024
54bbfa7
Image recognition error #8
jinyoungkim927 Jan 13, 2024
d87caa0
count function
oliviaylee Jan 13, 2024
43e59f0
count function
oliviaylee Jan 13, 2024
d48c1e0
added-images
Jan 13, 2024
12fad05
Try adjusting settings #1
jinyoungkim927 Jan 13, 2024
31bedbc
settings change 2
jinyoungkim927 Jan 13, 2024
c2659fc
vision model, search in progress
oliviaylee Jan 13, 2024
69f44ff
Merge branch 'main' of https://github.com/jinyoungkim927/PromptCraft-…
oliviaylee Jan 13, 2024
a99e68b
count done, search in progress
oliviaylee Jan 13, 2024
3aff568
correct photo taking
jinyoungkim927 Jan 13, 2024
a8dd48d
correct photo taking #2
jinyoungkim927 Jan 13, 2024
1cfb77f
correct photo taking #3
jinyoungkim927 Jan 13, 2024
3410146
correct photo taking #3
jinyoungkim927 Jan 13, 2024
a51e243
search done
oliviaylee Jan 13, 2024
da84d3c
Merge branch 'main' of https://github.com/jinyoungkim927/PromptCraft-…
oliviaylee Jan 13, 2024
bdff6c2
added new functions to prompt txt
oliviaylee Jan 13, 2024
e727c4c
correct photo taking #4
jinyoungkim927 Jan 13, 2024
15e8133
Edited airsim_wrapper.py
jinyoungkim927 Jan 13, 2024
bfd6d5a
App backend
kaien-yang Jan 13, 2024
677ce28
Webapp frontend
kaien-yang Jan 13, 2024
620bd79
Fix bugs with liv's methods
jinyoungkim927 Jan 13, 2024
b8952b9
Multiple drones trial
jinyoungkim927 Jan 13, 2024
1928a0a
Update index.html
kaien-yang Jan 13, 2024
a16c8cd
fixed count, vision model work in progress
oliviaylee Jan 13, 2024
7aa6e11
Merge branch 'main' of https://github.com/jinyoungkim927/PromptCraft-…
oliviaylee Jan 13, 2024
aa46720
Update app.py
kaien-yang Jan 13, 2024
b6e7fce
Merge branch 'main' of https://github.com/jinyoungkim927/PromptCraft-…
oliviaylee Jan 13, 2024
b6656eb
Merge branch 'main' of https://github.com/jinyoungkim927/PromptCraft-…
Jan 13, 2024
0332745
Webapp and stuff
kaien-yang Jan 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 23 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,33 @@
# PromptCraft-Robotics
# Plan for Building

The PromptCraft-Robotics repository serves as a community for people to test and share interesting prompting examples for large language models (LLMs) within the robotics domain. We also provide a sample [robotics simulator](https://github.com/microsoft/PromptCraft-Robotics/tree/main/chatgpt_airsim) (built on Microsoft AirSim) with ChatGPT integration for users to get started.

We currently focus on OpenAI's [ChatGPT](https://openai.com/blog/chatgpt/), but we also welcome examples from other LLMs (for example open-sourced models or others with API access such as [GPT-3](https://openai.com/api/) and Codex).
### 1. Build GUI
- side by side Web interface + windows GUI
- Hook this up to whisper for voice to text command

Users can contribute to this repository by submitting interesting prompt examples to the [Discussions](https://github.com/microsoft/PromptCraft-Robotics/discussions) section of this repository. A prompt can be submitted within different robotics categories such as [Manipulation](https://github.com/microsoft/PromptCraft-Robotics/discussions/categories/llm-manipulation), [Home Robotics](https://github.com/microsoft/PromptCraft-Robotics/discussions/categories/llm-home-robots), [Physical Reasoning](https://github.com/microsoft/PromptCraft-Robotics/discussions/categories/llm-physical-reasoning), among many others.
Once submitted, the prompt will be reviewed by the community (upvote your favorites!) and added to the repository by a team of admins if it is deemed interesting and useful.
We encourage users to submit prompts that are interesting, fun, or useful. We also encourage users to submit prompts that are not necessarily "correct" or "optimal" but are interesting nonetheless.
### 2. Try different GPTs
- Run different versions of GPTs rather than 3.5 turbo commands (like GPT-4)
- 4 is solid, but for testing we will use 3.5

We encourage prompt submissions formatted as markdown, so that they can be easily transferred to the main repository. Please specify which LLM you used, and if possible provide other visuals of the model in action such as videos and pictures.
### 3. Try vision models
- Take photo the drone takes and then process that into answering certain instruction based questions like GPT-4
- If we have time, switch this out for a fancy model like LLaVA

## Paper, videos and citations
### 4. Implement 'flutter'
- So the drones aren't sitting ducks for targets
- Oscillates
- STRETCH* a version where if it is a swarm, it moves in a confusing swarm pattern

Blog post: <a href="https://aka.ms/ChatGPT-Robotics" target="_blank">aka.ms/ChatGPT-Robotics</a>
### 5. Multiple drones/objects
- Multiple drones swarming to one object at once.
- Detect multiple objects
- Not just turbines and people (try other objects)

Paper: <a href="https://www.microsoft.com/en-us/research/uploads/prod/2023/02/ChatGPT___Robotics.pdf" target="_blank">ChatGPT for Robotics: Design Principles and Model Abilities
### 6. New environment
- Switch out the environment

Video: <a href="https://youtu.be/NYd0QcZcS6Q" target="_blank">https://youtu.be/NYd0QcZcS6Q</a>
## Interesting Tasks:
- Counting tasks
- Following objects

If you use this repository in your research, please cite the following paper:

```
@techreport{vemprala2023chatgpt,
author = {Vemprala, Sai and Bonatti, Rogerio and Bucker, Arthur and Kapoor, Ashish},
title = {ChatGPT for Robotics: Design Principles and Model Abilities},
institution = {Microsoft},
year = {2023},
month = {February},
url = {https://www.microsoft.com/en-us/research/publication/chatgpt-for-robotics-design-principles-and-model-abilities/},
number = {MSR-TR-2023-8},
}
```

## ChatGPT Prompting Guides & Examples

The list below contains links to the different robotics categories and their corresponding prompt examples. We welcome contributions to this repository to add more robotics categories and examples. Please submit prompt examples to the [Discussions](https://github.com/microsoft/PromptCraft-Robotics/discussions) page, or submit a pull request with your category and examples.

* Embodied agent
* [ChatGPT - Habitat, closed loop object navigation 1](examples/embodied_agents/visual_language_navigation_1.md)
* [ChatGPT - Habitat, closed loop object navigation 2](examples/embodied_agents/visual_language_navigation_2.md)
* [ChatGPT - AirSim, object navigation using RGBD](examples/embodied_agents/airsim_objectnavigation.md)
* Aerial robotics
* [ChatGPT - Real robot: Tello deployment](examples/aerial_robotics/tello_example.md) | [Video Link](https://youtu.be/i5wZJFb4dyA)
* [ChatGPT - AirSim turbine Inspection](examples/aerial_robotics/airsim_turbine_inspection.md) | [Video Link](https://youtu.be/38lA3U2J43w)
* [ChatGPT - AirSim solar panel Inspection](examples/aerial_robotics/airsim_solarpanel_inspection.md)
* [ChatGPT - AirSim obstacle avoidance](examples/aerial_robotics/airsim_obstacleavoidance.md) | [Video Link](https://youtu.be/Vn6NapLlHPE)
* Manipulation
* [ChatGPT - Real robot: Picking, stacking, and building the MSFT logo](examples/manipulation/pick_stack_msft_logo.md) | [Video Link](https://youtu.be/wLOChUtdqoA)
* [ChatGPT - Manipulation tasks](examples/manipulation/manipulation_zeroshot.md)
* Spatial-temporal reasoning
* [ChatGPT - Visual servoing with basketball](examples/spatial_temporal_reasoning/visual_servoing_basketball.md)


## ChatGPT + Robotics Simulator

We provice a sample [AirSim](https://github.com/microsoft/AirSim) environment for users to test their ChatGPT prompts. The environment is a binary containing a sample inspection environment with assets such as wind turbines, electric towers, solar panels etc. The environment comes with a drone and interfaces with ChatGPT such that users can easily send commands in natural language. [[Simulator Link]](chatgpt_airsim/README.md)

We welcome contributions to this repository to add more robotics simulators and environments. Please submit a pull request with your simulator and environment.

## Related resources

Beyond the prompt examples here, we leave useful and related links to the use of large language models below:

* [Read about the OpenAI APIs](https://openai.com/api/)
* [Azure OpenAI service](https://azure.microsoft.com/en-us/products/cognitive-services/openai-service)
* [OPT language model](https://huggingface.co/docs/transformers/model_doc/opt)

## Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [[email protected]](mailto:[email protected]) with any additional questions or comments.

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
225 changes: 223 additions & 2 deletions chatgpt_airsim/airsim_wrapper.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
import airsim
import base64
import json
import math
import os
import random
import threading
import time

import airsim
import cv2
import numpy as np
import openai
import requests
from google.cloud import vision

objects_dict = {
"turbine1": "BP_Wind_Turbines_C_1",
Expand All @@ -20,6 +31,8 @@ def __init__(self):
self.client.confirmConnection()
self.client.enableApiControl(True)
self.client.armDisarm(True)
self.stop_thread = False
self.flutter_thread = None

def takeoff(self):
self.client.takeoffAsync().join()
Expand All @@ -44,7 +57,15 @@ def fly_path(self, points):
airsim_points.append(airsim.Vector3r(point[0], point[1], -point[2]))
else:
airsim_points.append(airsim.Vector3r(point[0], point[1], point[2]))
self.client.moveOnPathAsync(airsim_points, 5, 120, airsim.DrivetrainType.ForwardOnly, airsim.YawMode(False, 0), 20, 1).join()
self.client.moveOnPathAsync(
airsim_points,
5,
120,
airsim.DrivetrainType.ForwardOnly,
airsim.YawMode(False, 0),
20,
1,
).join()

def set_yaw(self, yaw):
self.client.rotateToYawAsync(yaw, 5).join()
Expand All @@ -60,4 +81,204 @@ def get_position(self, object_name):
while len(object_names_ue) == 0:
object_names_ue = self.client.simListSceneObjects(query_string)
pose = self.client.simGetObjectPose(object_names_ue[0])
if object_name == "crowd":
return [pose.position.x_val+2, pose.position.y_val, pose.position.z_val]
return [pose.position.x_val, pose.position.y_val, pose.position.z_val]

@staticmethod
def is_within_boundary(start_pos, current_pos, limit_radius):
"""Check if the drone is within the spherical boundary"""
distance = math.sqrt(
(current_pos.x_val - start_pos.x_val) ** 2
+ (current_pos.y_val - start_pos.y_val) ** 2
+ (current_pos.z_val - start_pos.z_val) ** 2
)
return distance <= limit_radius

def flutter(self, speed=5, change_interval=1, limit_radius=10):
"""Simulate Brownian motion /fluttering with the drone"""
# Takeoff and get initial position
self.client.takeoffAsync().join()
start_position = self.client.simGetVehiclePose().position

while not self.stop_thread:
# Propose a random direction
pitch = random.uniform(-1, 1) # Forward/backward
roll = random.uniform(-1, 1) # Left/right
yaw = random.uniform(-1, 1) # Rotate

# Move the drone in the proposed direction
self.client.moveByRollPitchYawrateThrottleAsync(
roll, pitch, yaw, 0.5, change_interval
).join()

# Get the current position
current_position = self.client.simGetVehiclePose().position

# Check if the drone is within the boundary
if not self.is_within_boundary(
start_position, current_position, limit_radius
):
# If outside the boundary, adjust to a new random direction
self.client.moveToPositionAsync(
start_position.x_val,
start_position.y_val,
start_position.z_val,
speed,
).join()

# Wait for the next change
time.sleep(change_interval)

def start_fluttering(self, speed=5, change_interval=1, limit_radius=10):
self.stop_thread = False
self.flutter_thread = threading.Thread(
target=self.flutter, args=(speed, change_interval, limit_radius)
)
self.flutter_thread.start()

def stop_fluttering(self):
self.stop_thread = True
if self.flutter_thread is not None:
self.flutter_thread.join()

def generate_circular_path(center, radius, height, segments=12):
path = []
for i in range(segments):
angle = 2 * math.pi * i / segments
x = center[0] + radius * math.cos(angle)
y = center[1] + radius * math.sin(angle)
z = height
path.append(x, y, z)
return path

def take_photo(self, filename="image.png"):
responses = self.client.simGetImages(
[airsim.ImageRequest("0", airsim.ImageType.Scene, False, False)]
)
response = responses[0]

# get numpy array
img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8)

# reshape array to 3 channel image array H X W X 3
img_rgb = img1d.reshape(response.height, response.width, 3)

# # original image is flipped vertically
# img_rgb = np.flipud(img_rgb)

# write to png
filename = os.path.normpath(filename + ".png")
cv2.imwrite(filename, img_rgb)

# encode image to base64 string
with open(filename, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode("utf-8")

return base64_image

def analyze_with_vision_model(self, image_data):
# Google Vision API: https://cloud.google.com/vision/docs/object-localizer
#path = "path to image"
client = vision.ImageAnnotatorClient()

#with open(path, "rb") as image_file:
content = base64.b64decode(image_data)
image = vision.Image(content=content)

objects = client.object_localization(image=image).localized_object_annotations

return objects

# Load API key from config.json
# with open("config.json") as f:
# data = json.load(f)
# api_key = data["OPENAI_API_KEY"]

# if isinstance(image_data, str):
# image_data = image_data.encode()

# # Convert image data to base64
# base64_image = base64.b64encode(image_data).decode("utf-8")

# headers = {
# "Content-Type": "application/json",
# "Authorization": f"Bearer {api_key}",
# }

# payload = {
# "model": "gpt-4-vision-preview",
# "messages": [
# {
# "role": "user",
# "content": [
# {"type": "text", "text": "How many people are in this image?"},
# {
# "type": "image_url",
# "image_url": {
# "url": f"data:image/jpeg;base64,{base64_image}"
# },
# },
# ],
# }
# ],
# "max_tokens": 2000,
# }

# response = requests.post(
# "https://api.openai.com/v1/chat/completions", headers=headers, json=payload
# )

# # Return the response
# return response.json()

def query_language_model(self, prompt):
with open("config.json", "r") as f:
config = json.load(f)
openai.api_key = config["OPENAI_API_KEY"]
chat_history = [
{
"role": "user",
"content": prompt,
}
]
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo", messages=chat_history, temperature=0
)
return completion.choices[0].message.content

# Complex commands
def count(self, object_name):
image_data = self.take_photo()
vision_outputs = self.analyze_with_vision_model(image_data)
# Naive: converts vision model json output to string, append to count prompt
prompt = "\n\n Based on this json output, count the number of instances of " + object_name + " in the scene. Return a single number"
response = self.query_language_model(str(vision_outputs) + prompt)
print(response)
return response


def search(self, object_name, radius):
# code motion
self.fly_to(self.get_position(object_name))
# fly in a circle
circular_path = self.generate_circular_path(
self.get_position(object_name)[:2],
radius,
self.get_position(object_name)[2],
)
vision_outputs = ""
for point in circular_path:
self.fly_to(point)
image_data = self.take_photo(str(point))
vision_output = self.analyze_with_vision_model(image_data)
vision_outputs += str(vision_output)
prompt = "\n Based on these json outputs, is " + object_name + "present in the scene? Return TRUE or FALSE."
return self.query_language_model(str(vision_outputs) + prompt)

def get_latitude_longitude(self, object_name):
self.fly_to(self.get_position(object_name))
return (
self.get_position(object_name)[0],
self.get_drone_position(object_name)[1],
)
Loading