microsoft · jinyoungkim927 · Jan 13, 2024 · Jan 13, 2024 · Jan 13, 2024 · Jan 13, 2024
diff --git a/README.md b/README.md
@@ -1,89 +1,33 @@
-# PromptCraft-Robotics
+# Plan for Building
 
-The PromptCraft-Robotics repository serves as a community for people to test and share interesting prompting examples for large language models (LLMs) within the robotics domain. We also provide a sample [robotics simulator](https://github.com/microsoft/PromptCraft-Robotics/tree/main/chatgpt_airsim) (built on Microsoft AirSim) with ChatGPT integration for users to get started.
 
-We currently focus on OpenAI's [ChatGPT](https://openai.com/blog/chatgpt/), but we also welcome examples from other LLMs (for example open-sourced models or others with API access such as [GPT-3](https://openai.com/api/) and Codex).
+### 1. Build GUI
+- side by side Web interface + windows GUI
+- Hook this up to whisper for voice to text command
 
-Users can contribute to this repository by submitting interesting prompt examples to the [Discussions](https://github.com/microsoft/PromptCraft-Robotics/discussions) section of this repository. A prompt can be submitted within different robotics categories such as [Manipulation](https://github.com/microsoft/PromptCraft-Robotics/discussions/categories/llm-manipulation), [Home Robotics](https://github.com/microsoft/PromptCraft-Robotics/discussions/categories/llm-home-robots), [Physical Reasoning](https://github.com/microsoft/PromptCraft-Robotics/discussions/categories/llm-physical-reasoning), among many others.
-Once submitted, the prompt will be reviewed by the community (upvote your favorites!) and added to the repository by a team of admins if it is deemed interesting and useful.
-We encourage users to submit prompts that are interesting, fun, or useful. We also encourage users to submit prompts that are not necessarily "correct" or "optimal" but are interesting nonetheless.
+### 2. Try different GPTs
+- Run different versions of GPTs rather than 3.5 turbo commands (like GPT-4)
+- 4 is solid, but for testing we will use 3.5 
 
-We encourage prompt submissions formatted as markdown, so that they can be easily transferred to the main repository. Please specify which LLM you used, and if possible provide other visuals of the model in action such as videos and pictures.
+### 3. Try vision models
+- Take photo the drone takes and then process that into answering certain instruction based questions like GPT-4
+- If we have time, switch this out for a fancy model like LLaVA
 
-## Paper, videos and citations
+### 4. Implement 'flutter'
+- So the drones aren't sitting ducks for targets
+- Oscillates
+- STRETCH* a version where if it is a swarm, it moves in a confusing swarm pattern
 
-Blog post: <a href="https://aka.ms/ChatGPT-Robotics" target="_blank">aka.ms/ChatGPT-Robotics</a>
+### 5. Multiple drones/objects
+- Multiple drones swarming to one object at once. 
+- Detect multiple objects
+- Not just turbines and people (try other objects)
 
-Paper: <a href="https://www.microsoft.com/en-us/research/uploads/prod/2023/02/ChatGPT___Robotics.pdf" target="_blank">ChatGPT for Robotics: Design Principles and Model Abilities
+### 6. New environment 
+- Switch out the environment
 
-Video: <a href="https://youtu.be/NYd0QcZcS6Q" target="_blank">https://youtu.be/NYd0QcZcS6Q</a>
+## Interesting Tasks: 
+- Counting tasks
+- Following objects 
 
-If you use this repository in your research, please cite the following paper:
 
-```
-@techreport{vemprala2023chatgpt,
-author = {Vemprala, Sai and Bonatti, Rogerio and Bucker, Arthur and Kapoor, Ashish},
-title = {ChatGPT for Robotics: Design Principles and Model Abilities},
-institution = {Microsoft},
-year = {2023},
-month = {February},
-url = {https://www.microsoft.com/en-us/research/publication/chatgpt-for-robotics-design-principles-and-model-abilities/},
-number = {MSR-TR-2023-8},
-}
-```
-
-## ChatGPT Prompting Guides & Examples
-
-The list below contains links to the different robotics categories and their corresponding prompt examples. We welcome contributions to this repository to add more robotics categories and examples. Please submit prompt examples to the [Discussions](https://github.com/microsoft/PromptCraft-Robotics/discussions) page, or submit a pull request with your category and examples.
-
-* Embodied agent 
-  * [ChatGPT - Habitat, closed loop object navigation 1](examples/embodied_agents/visual_language_navigation_1.md)
-  * [ChatGPT - Habitat, closed loop object navigation 2](examples/embodied_agents/visual_language_navigation_2.md)
-  * [ChatGPT - AirSim, object navigation using RGBD](examples/embodied_agents/airsim_objectnavigation.md)
-* Aerial robotics
-  * [ChatGPT - Real robot: Tello deployment](examples/aerial_robotics/tello_example.md) | [Video Link](https://youtu.be/i5wZJFb4dyA)
-  * [ChatGPT - AirSim turbine Inspection](examples/aerial_robotics/airsim_turbine_inspection.md) | [Video Link](https://youtu.be/38lA3U2J43w)
-  * [ChatGPT - AirSim solar panel Inspection](examples/aerial_robotics/airsim_solarpanel_inspection.md)
-  * [ChatGPT - AirSim obstacle avoidance](examples/aerial_robotics/airsim_obstacleavoidance.md) | [Video Link](https://youtu.be/Vn6NapLlHPE)
-* Manipulation
-  * [ChatGPT - Real robot: Picking, stacking, and building the MSFT logo](examples/manipulation/pick_stack_msft_logo.md) | [Video Link](https://youtu.be/wLOChUtdqoA)
-  * [ChatGPT - Manipulation tasks](examples/manipulation/manipulation_zeroshot.md)
-* Spatial-temporal reasoning
-  * [ChatGPT - Visual servoing with basketball](examples/spatial_temporal_reasoning/visual_servoing_basketball.md)
-
-
-## ChatGPT + Robotics Simulator
-
-We provice a sample [AirSim](https://github.com/microsoft/AirSim) environment for users to test their ChatGPT prompts. The environment is a binary containing a sample inspection environment with assets such as wind turbines, electric towers, solar panels etc. The environment comes with a drone and interfaces with ChatGPT such that users can easily send commands in natural language. [[Simulator Link]](chatgpt_airsim/README.md)
-
-We welcome contributions to this repository to add more robotics simulators and environments. Please submit a pull request with your simulator and environment.
-
-## Related resources
-
-Beyond the prompt examples here, we leave useful and related links to the use of large language models below:
-
-* [Read about the OpenAI APIs](https://openai.com/api/)
-* [Azure OpenAI service](https://azure.microsoft.com/en-us/products/cognitive-services/openai-service)
-* [OPT language model](https://huggingface.co/docs/transformers/model_doc/opt)
-
-## Contributing
-
-This project welcomes contributions and suggestions.  Most contributions require you to agree to a
-Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
-the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
-
-When you submit a pull request, a CLA bot will automatically determine whether you need to provide
-a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
-provided by the bot. You will only need to do this once across all repos using our CLA.
-
-This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
-For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
-contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
-
-## Trademarks
-
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
-trademarks or logos is subject to and must follow 
-[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
-Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
-Any use of third-party trademarks or logos are subject to those third-party's policies.
diff --git a/chatgpt_airsim/airsim_wrapper.py b/chatgpt_airsim/airsim_wrapper.py
@@ -1,6 +1,17 @@
-import airsim
+import base64
+import json
 import math
+import os
+import random
+import threading
+import time
+
+import airsim
+import cv2
 import numpy as np
+import openai
+import requests
+from google.cloud import vision
 
 objects_dict = {
     "turbine1": "BP_Wind_Turbines_C_1",
@@ -20,6 +31,8 @@ def __init__(self):
         self.client.confirmConnection()
         self.client.enableApiControl(True)
         self.client.armDisarm(True)
+        self.stop_thread = False
+        self.flutter_thread = None
 
     def takeoff(self):
         self.client.takeoffAsync().join()
@@ -44,7 +57,15 @@ def fly_path(self, points):
                 airsim_points.append(airsim.Vector3r(point[0], point[1], -point[2]))
             else:
                 airsim_points.append(airsim.Vector3r(point[0], point[1], point[2]))
-        self.client.moveOnPathAsync(airsim_points, 5, 120, airsim.DrivetrainType.ForwardOnly, airsim.YawMode(False, 0), 20, 1).join()
+        self.client.moveOnPathAsync(
+            airsim_points,
+            5,
+            120,
+            airsim.DrivetrainType.ForwardOnly,
+            airsim.YawMode(False, 0),
+            20,
+            1,
+        ).join()
 
     def set_yaw(self, yaw):
         self.client.rotateToYawAsync(yaw, 5).join()
@@ -60,4 +81,204 @@ def get_position(self, object_name):
         while len(object_names_ue) == 0:
             object_names_ue = self.client.simListSceneObjects(query_string)
         pose = self.client.simGetObjectPose(object_names_ue[0])
+        if object_name == "crowd":
+            return [pose.position.x_val+2, pose.position.y_val, pose.position.z_val]
         return [pose.position.x_val, pose.position.y_val, pose.position.z_val]
+
+    @staticmethod
+    def is_within_boundary(start_pos, current_pos, limit_radius):
+        """Check if the drone is within the spherical boundary"""
+        distance = math.sqrt(
+            (current_pos.x_val - start_pos.x_val) ** 2
+            + (current_pos.y_val - start_pos.y_val) ** 2
+            + (current_pos.z_val - start_pos.z_val) ** 2
+        )
+        return distance <= limit_radius
+
+    def flutter(self, speed=5, change_interval=1, limit_radius=10):
+        """Simulate Brownian motion /fluttering with the drone"""
+        # Takeoff and get initial position
+        self.client.takeoffAsync().join()
+        start_position = self.client.simGetVehiclePose().position
+
+        while not self.stop_thread:
+            # Propose a random direction
+            pitch = random.uniform(-1, 1)  # Forward/backward
+            roll = random.uniform(-1, 1)  # Left/right
+            yaw = random.uniform(-1, 1)  # Rotate
+
+            # Move the drone in the proposed direction
+            self.client.moveByRollPitchYawrateThrottleAsync(
+                roll, pitch, yaw, 0.5, change_interval
+            ).join()
+
+            # Get the current position
+            current_position = self.client.simGetVehiclePose().position
+
+            # Check if the drone is within the boundary
+            if not self.is_within_boundary(
+                start_position, current_position, limit_radius
+            ):
+                # If outside the boundary, adjust to a new random direction
+                self.client.moveToPositionAsync(
+                    start_position.x_val,
+                    start_position.y_val,
+                    start_position.z_val,
+                    speed,
+                ).join()
+
+            # Wait for the next change
+            time.sleep(change_interval)
+
+    def start_fluttering(self, speed=5, change_interval=1, limit_radius=10):
+        self.stop_thread = False
+        self.flutter_thread = threading.Thread(
+            target=self.flutter, args=(speed, change_interval, limit_radius)
+        )
+        self.flutter_thread.start()
+
+    def stop_fluttering(self):
+        self.stop_thread = True
+        if self.flutter_thread is not None:
+            self.flutter_thread.join()
+
+    def generate_circular_path(center, radius, height, segments=12):
+        path = []
+        for i in range(segments):
+            angle = 2 * math.pi * i / segments
+            x = center[0] + radius * math.cos(angle)
+            y = center[1] + radius * math.sin(angle)
+            z = height
+            path.append(x, y, z)
+        return path
+
+    def take_photo(self, filename="image.png"):
+        responses = self.client.simGetImages(
+            [airsim.ImageRequest("0", airsim.ImageType.Scene, False, False)]
+        )
+        response = responses[0]
+
+        # get numpy array
+        img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8)
+
+        # reshape array to 3 channel image array H X W X 3
+        img_rgb = img1d.reshape(response.height, response.width, 3)
+
+        # # original image is flipped vertically
+        # img_rgb = np.flipud(img_rgb)
+
+        # write to png
+        filename = os.path.normpath(filename + ".png")
+        cv2.imwrite(filename, img_rgb)
+
+        # encode image to base64 string
+        with open(filename, "rb") as image_file:
+            base64_image = base64.b64encode(image_file.read()).decode("utf-8")
+
+        return base64_image
+
+    def analyze_with_vision_model(self, image_data):
+        # Google Vision API: https://cloud.google.com/vision/docs/object-localizer
+        #path = "path to image"
+        client = vision.ImageAnnotatorClient()
+
+        #with open(path, "rb") as image_file:
+        content = base64.b64decode(image_data)
+        image = vision.Image(content=content)
+
+        objects = client.object_localization(image=image).localized_object_annotations
+
+        return objects
+
+        # Load API key from config.json
+        # with open("config.json") as f:
+        #     data = json.load(f)
+        #     api_key = data["OPENAI_API_KEY"]
+
+        # if isinstance(image_data, str):
+        #     image_data = image_data.encode()
+
+        # # Convert image data to base64
+        # base64_image = base64.b64encode(image_data).decode("utf-8")
+
+        # headers = {
+        #     "Content-Type": "application/json",
+        #     "Authorization": f"Bearer {api_key}",
+        # }
+
+        # payload = {
+        #     "model": "gpt-4-vision-preview",
+        #     "messages": [
+        #         {
+        #             "role": "user",
+        #             "content": [
+        #                 {"type": "text", "text": "How many people are in this image?"},
+        #                 {
+        #                     "type": "image_url",
+        #                     "image_url": {
+        #                         "url": f"data:image/jpeg;base64,{base64_image}"
+        #                     },
+        #                 },
+        #             ],
+        #         }
+        #     ],
+        #     "max_tokens": 2000,
+        # }
+
+        # response = requests.post(
+        #     "https://api.openai.com/v1/chat/completions", headers=headers, json=payload
+        # )
+
+        # # Return the response
+        # return response.json()
+
+    def query_language_model(self, prompt):
+        with open("config.json", "r") as f:
+            config = json.load(f)
+        openai.api_key = config["OPENAI_API_KEY"]
+        chat_history = [
+            {
+                "role": "user",
+                "content": prompt,
+            }
+        ]
+        completion = openai.ChatCompletion.create(
+            model="gpt-3.5-turbo", messages=chat_history, temperature=0
+        )
+        return completion.choices[0].message.content
+
+    # Complex commands
+    def count(self, object_name):
+        image_data = self.take_photo()
+        vision_outputs = self.analyze_with_vision_model(image_data)
+        # Naive: converts vision model json output to string, append to count prompt
+        prompt = "\n\n Based on this json output, count the number of instances of " + object_name + " in the scene. Return a single number"
+        response = self.query_language_model(str(vision_outputs) + prompt)
+        print(response)
+        return response
+
+
+    def search(self, object_name, radius):
+        # code motion
+        self.fly_to(self.get_position(object_name))
+        # fly in a circle
+        circular_path = self.generate_circular_path(
+            self.get_position(object_name)[:2],
+            radius,
+            self.get_position(object_name)[2],
+        )
+        vision_outputs = ""
+        for point in circular_path:
+            self.fly_to(point)
+            image_data = self.take_photo(str(point))
+            vision_output = self.analyze_with_vision_model(image_data)
+            vision_outputs += str(vision_output)
+        prompt = "\n Based on these json outputs, is " + object_name + "present in the scene? Return TRUE or FALSE."
+        return self.query_language_model(str(vision_outputs) + prompt)
+
+    def get_latitude_longitude(self, object_name):
+        self.fly_to(self.get_position(object_name))
+        return (
+            self.get_position(object_name)[0],
+            self.get_drone_position(object_name)[1],
+        )