Foundation models have become invaluable in advancing the medical field. Despite their promise, the strategic deployment of LLMs for effective utility in complex medical tasks remains an open question. Our novel framework, Medical Decision-making Agents (MDAgents) aims to address this gap by automatically assigning the effective collaboration structure for LLMs. Assigned solo or group collaboration structure is tailored to the complexity of the medical task at hand, emulating real-world medical decision making processes. We evaluate our framework and baseline methods with state-of-the-art LLMs across a suite of challenging medical benchmarks: MedQA, MedMCQA, PubMedQA, DDXPlus, PMC-VQA, Path-VQA, and MedVidQA, achieving the best performance in 5 out of 7 benchmarks that require an understanding of multi-modal medical reasoning. Ablation studies reveal that MDAgents excels in adapting the number of collaborating agents to optimize efficiency and accuracy, showcasing its robustness in diverse scenarios. We also explore the dynamics of group consensus, offering insights into how collaborative agents could behave in complex clinical team dynamics.
Create a new virtual environment, e.g. with conda
~$ conda create -n mdagents python>=3.9
Install the required packages:
~$ pip install -r requirements.txt
Activate the environment:
~$ conda activate mdagents
- MedQA: https://github.com/jind11/MedQA?tab=readme-ov-file
- MedMCQA: https://github.com/medmcqa/medmcqa
- PubMedQA: https://github.com/pubmedqa/pubmedqa
- DDXPlus: https://github.com/mila-iqia/ddxplus
- PMC-VQA: https://github.com/xiaoman-zhang/PMC-VQA
- Path-VQA: https://github.com/UCSD-AI4H/PathVQA
- MedVidQA: https://github.com/deepaknlp/MedVidQACL
~$ python3 main.py --model {gpt-3.5, gpt-4, gpt-4v, gemini-pro, gemini-pro-vision} --dataset {medqa, medmcqa, pubmedqa, ddxplus, pmc-vqa, path-vqa, medvidqa}
- add baseline models
- add eval.py
- add more benchmarks