MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Khare, Yash; Priyakumar, U Deva; BAGAL, VIRAJ; Jawahar, C.V.; Devi, Adithi; Mathew, Minesh

dc.contributor.author	Khare, Yash
dc.contributor.author	BAGAL, VIRAJ
dc.contributor.author	Mathew, Minesh
dc.contributor.author	Devi, Adithi
dc.contributor.author	Priyakumar, U Deva
dc.contributor.author	Jawahar, C.V.
dc.coverage.spatial	Nice, France	en_US
dc.date.accessioned	2022-06-21T05:17:00Z
dc.date.available	2022-06-21T05:17:00Z
dc.date.issued	2021-05
dc.identifier.citation	2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).	en_US
dc.identifier.uri	https://ieeexplore.ieee.org/document/9434063/authors	en_US
dc.identifier.uri	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7139
dc.description.abstract	Images in the medical domain are fundamentally different from the general domain images. Consequently, it is infeasible to directly employ general domain Visual Question Answering (VQA) models for the medical domain. Additionally, medical image annotation is a costly and time-consuming process. To overcome these limitations, we propose a solution inspired by self-supervised pretraining of Transformer-style architectures for NLP, Vision, and Language tasks. Our method involves learning richer medical image and text semantic representations using Masked Vision-Language Modeling as the pretext task on a large medical image + caption dataset. The proposed solution achieves new state-of-the-art performance on two VQA datasets for radiology images - VQA-Med 2019 and VQA-RAD, outperforming even the ensemble models of previous best solutions. Moreover, our solution provides attention maps which help in model interpretability.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	Chemistry	en_US
dc.subject	2021	en_US
dc.title	MMBERT: Multimodal BERT Pretraining for Improved Medical VQA	en_US
dc.type	Conference Papers	en_US
dc.contributor.department	Dept. of Chemistry	en_US
dc.identifier.doi	https://doi.org/10.1109/ISBI48211.2021.9434063	en_US
dc.publication.originofpublisher	Foreign	en_US