This project serves as a bottom-up survey of transformer-based multimodal models, covering the model, agent, and system layers.
A Bottom-Up Survey of Transformer-Based Multimodal Models [Paper]
Note: In this report, multi-agent frameworks are considered systems, however, in practice, they are often referred to as agents as well.