This repository contains the code supporting the CLIP base model for use with Autodistill.
Florence 2, introduced in the paper Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks is a multimodal vision model.
You can use Florence 2 to generate object detection annotations for use in training smaller object detection models with Autodistill.
Read the full Autodistill documentation.
Read the Florence 2 Autodistill documentation.
To use Florence 2 with Autodistill, you need to install the following dependency:
pip3 install autodistill-florence-2
from autodistill_florence_2 import Florence2
from autodistill.detection import DetectionOntology
from PIL import Image
# define an ontology to map class names to our Florence 2 prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
# then, load the model
base_model = Florence2(
ontology=CaptionOntology(
{
"person": "person",
"a forklift": "forklift"
}
)
)
image = Image.open("image.jpeg")
result = base_model.predict('image.jpeg')
bounding_box_annotator = sv.BoundingBoxAnnotator()
annotated_frame = bounding_box_annotator.annotate(
scene=image.copy(),
detections=detections
)
sv.plot_image(image=annotated_frame, size=(16, 16))
# label a dataset
base_model.label("./context_images", extension=".jpeg")
This project is licensed under an MIT license. See the Florence 2 license for more information about the Florence 2 model license.
We love your input! Please see the core Autodistill contributing guide to get started. Thank you 🙏 to all our contributors!