PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

PromptMRG(上—背景介绍): Diagnosis-Driven Prompts for Medical Report Generation-LMLPHP

Abstract

Automatic medical report generation (MRG) is of great research value as it has the potential to relieve radiologists from
the heavy burden of report writing. Despite recent advancements, accurate MRG remains challenging due to the need
for precise clinical understanding and disease identifcation. Moreover, the imbalanced distribution of diseases makes the
challenge even more pronounced, as rare diseases are underrepresented in training data, making their diagnosis unreliable.

首先说明了,报告生成意义,问题有如下
  1. 数据中少量病可能欠表达,数据样本导致
  2. 表达必须准确无误

To address these challenges, we propose diagnosisdriven prompts for medical report generation (PromptMRG), a novel framework that aims to improve the diagnostic accuracy of MRG with the guidance of . Specifcally, PromptMRG is based on encoder-decoder architecture with an When generating reports, the diagnostic results from the classifcation branch are converted into token prompts to explicitly guide the generation process. To further improve the diagnostic accuracy, we design cross-modal feature enhancement, which retrieves similar reports from the database to assist the diagnosis of a query image by leveraging the knowledge from a pre-trained CLIP. Moreover, the disease imbalanced issue is addressed by applying an adaptive logit-adjusted loss to the classifcation branch based on the individual learning status of each disease, which overcomes the barrier of text decoder’s inability to manipulate disease distributions.

改进如下
  1. 使用诊断敏感的prompt,它具体将一些可以进行分类的种类变成special token加入到词汇表中
  2. 使用预训练的CLIP,从数据库中检索与查询相似的报告,以辅助诊断
  3. 此外,通过基于每种疾病的个体学习状态应用自适应对数调整损失到分类分支,解决了疾病不平衡问题,克服了文本解码器无法操纵疾病分布的障碍。

Experiments on two MRG benchmarks show the effectiveness of the proposed method, where it obtains state-of-the-art clinical effcacy performance on both datasets.

Introduction

Automated analysis of medical images involves wide range of tasks, such as anomaly detection (Cai et al. 2022), disease
classifcation (Luo et al. 2022, 2020), lesion detection (Luo et al. 2021), landmark detection (Jin, Che, and Chen 2023), etc. Among them, medical report generation (MRG) is a task to generate a free-text description of a medical image, where it provides a comprehensive summary of the image’s content. Due to its potential in relieving the heavy workload of radiologists, many works haven been proposed for MRG in recent years.
PromptMRG(上—背景介绍): Diagnosis-Driven Prompts for Medical Report Generation-LMLPHP
这里面第一个prediction1,它的bleu,rouge,等参数一定是好的,因为和原句字很像,但是内容是完全错误的。而prediction2,虽然语言风格与金标准不同,但是内容却是对的。

However, it is challenging to generate an accurate medical report as it demands a comprehensive understanding of the given image, especially the ability to identify clinical fndings. For example, Figure 1(a) shows two sample predictions of a chest X-ray alongside the ground-truth (GT). While the wording of the frst prediction is highly similar to the GT, its diagnosis regarding opacity and pneumonia is incorrect. In contrast, the second prediction is preferred as it successfully identifes opacity and pneumonia, albeit the different wording.

To obtain a MRG system with satisfactory performance, various methods have been proposed. For example, knowledge graph is an effective technique to enhance feature learning and diagnostic ability by injecting domain knowledge into the model (Zhang et al. 2020; Liu et al. 2021a);

Knowledge:(在医学知识下的)

Zhang, Y.; Wang, X.; Xu, Z.; Yu, Q.; Yuille, A.; and Xu, D. 2020. When radiology report generation meets knowledge graph. In Proceedings of the AAAI Conference on Artifcial Intelligence, volume 34, 12910–12917.

PromptMRG(上—背景介绍): Diagnosis-Driven Prompts for Medical Report Generation-LMLPHP

Liu, F.; Wu, X.; Ge, S.; Fan, W.; and Zou, Y. 2021a. Exploring and distilling posterior and prior knowledge for radiology report generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 13753–13762.

PromptMRG(上—背景介绍): Diagnosis-Driven Prompts for Medical Report Generation-LMLPHP
Multi-task:使用多种任务,训练,优化强化的
multi-task learning has also been widely used for obtaining better feature representations, where extra auxiliary tasks are
simultaneously conducted (Jing, Xie, and Xing 2018; Wang et al. 2022; Yan and Pei 2022).

Jing, B.; Xie, P.; and Xing, E. 2018. On the Automatic Generation of Medical Imaging Reports. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2577–2586.

PromptMRG(上—背景介绍): Diagnosis-Driven Prompts for Medical Report Generation-LMLPHP

Wang, Z.; Liu, L.; Wang, L.; and Zhou, L. 2023. METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11558–11567.

multi-task learning has also been widely used for obtaining better feature representations, where extra auxiliary tasks are simultaneously conducted (Jing, Xie, and Xing 2018; Wang et al. 2022; Yan and Pei 2022). Despite the success, state-ofthe-art (SOTA) methods still lack the ability in generating diagnostically correct reports. As evidenced by our observation shown in Figure 1(b), a vanilla disease classifcation model outperforms most SOTA MRG methods signifcantly in terms of the F1 score of clinical effcacy (CE). In MRG, CE serves as a metric for assessing the diagnostic accuracy of generated reports. Thus, the figure indicates the existing MRG methods have not fully leveraged the diagnostic information in medical images, which is an obstacle to the application of MRG. Additionally, the biased distribution of diseases leads to imbalanced CE performance (see Figure 1©).
Yet, this issue has not been addressed in prior works, which further reduces the clinical value of current MRG models as their diagnosis on rare diseases are unreliable.

由于报告生成存在上面提到的胡言乱语问题,语言风格很像,但是内容却是不对的。虽然报告能力不行,但是模型在分类任务上表现突出。作者的思路就是使用成熟的分类任务,它得到的可靠性较高,用分类得出的结果作为特殊的标记,输入到语言模型,帮助模型更好的生成报告。

Inspired by the above observations, we propose PromptMRG, a MRG framework with diagnosis-driven prompts (DDP), aiming to improve the CE performance of MRG with the guidance of diagnostic results. Specifcally, based on the encoder-decoder architecture, PromptMRG is also equipped with a disease classifcation branch.

To further improve the diagnostic accuracy, we design cross-modal feature enhancement (CFE), which retrieves similar reports from the database to assist the diagnosis of a query image by leveraging a pre-trained CLIP model.

Moreover, the disease imbalanced issue is also explicitly addressed via self-adaptive disease-balanced learning (SDL), which
adaptively adjusts the optimization objectives of different diseases based on their learning status.

Experiments on two MRG benchmarks show the effectiveness of the proposed method, where it obtains SOTA CE performance on both datasets. We summarize contributions as follows.

Conclusion:
  1. We propose a new MRG framework that utilizes a disease classifcation branch to guide the report generation
    process via token prompts, enabling the model to produce diagnostically correct reports. We demonstrate its
    superiority via two benchmarks, where it obtains SOTA CE performance on both datasets.
  2. A feature enhancement module is designed to improve the disease classifcation performance by leveraging the
    multi-modal knowledge from a pre-trained foundation model for similar records retrieval.
  3. Self-adaptive disease-balanced learning is proposed to address the imbalanced learning among diseases by applying an adaptive logit-adjusted loss to the classifcation branch, which overcomes the barrier of text decoder’s inability to manipulate disease distributions.

Related Works (这个相关工作介绍的很全面,应该回过来查阅)

Medical Report Generation Most MRG models adopted the encoder-decoder architecture from image captioning (Xu et al. 2015; Lu et al. 2017;Ji et al. 2021) due to the similarity of the two tasks. However, MRG is more challenging than captioning as reports
are much longer than captions while the clinical abnormalities are more diffcult to identify than natural objects. Therefore, various methods have been proposed to tackle the above challenges. Chen et al. (2020) and Yang et al. (2023) proposed extra memory modules to record past similar patterns for providing informative content during the decoding process, such that the generation performance could be improved. The proposed CFE in this paper also retrieves similar records as extra information, but differently, it utilizes these information to enhance the disease classifcation branch rather than the generation process.

PromptMRG(上—背景介绍): Diagnosis-Driven Prompts for Medical Report Generation-LMLPHP

Knowledge graph has been widely used to incorporate domain knowledge to assist report generation. For example, Zhang et al. (2020) and Liu et al. (2021a) proposed to combine a pre-constructed graph to denote the relationship between diseases and organs via graph neural networks, which allows for dedicated feature learning of the abnormalities. Later, Li et al. (2023) developed a method to dynamically update the graph by injecting new knowledge on-thefy. Huang, Zhang, and Zhang (2023) designed an injected knowledge distiller to fuse the knowledge from a symptom graph into the fnal decoding stage, which shares a similar spirit to our DDP. Nevertheless, DDP explicitly tackles the CE issue via a different guidance mechanism (i.e., prompts),
and shows much stronger performance in CE.

Multi-task learning is another common technique to facilitate the representation learning of MRG. Among the auxiliary tasks, disease classifcation is the most popular one as it helps model to learn discriminative features (Jing, Xie, and Xing 2018; Wang et al. 2022; Yan and Pei 2022). Similarly, weakly supervised contrastive learning was introduced by Yan et al. (2021) as an auxiliary task to learn a semantically meaningful space. Additionally, image-text matching was explored (Wang et al. 2022, 2021; Yan and Pei 2022) to learn an aligned image-text representations in a fne-grained manner. Despite the usage of disease classifcation in this work, we highlight the key difference as follows. Previous methods often treat classifcation as a parallel task and expect it to beneft report generation in an implicit way through learning discriminative features. In contrast, we make use of the diagnostic results from the classifcation via prompts to explicitly guide the generation process. RGRG (Tanida et al. 2023) is the most related work to ours, which leverages object detector as a region guidance for sentence-wise generation. However, their decoder only attends to the regional visual features as most previous works do while ours attends to both visual features and prompts, where the prompts enable the decoder to explicitly leverage the diagnostic information for generating clinically correct reports.

Prompt as Guidance
Prompting is originally a technique from natural language processing for improving the generalization of language models (Liu et al. 2023). Instead of training various tasks in supervised learning individually, prompting enables language models to unify and adapt to a wide range of tasks by modifying inputs into textual templates. Later, some works (Li and Liang 2021; Lester, Al-Rfou, and Constant 2021; Liu et al. 2021b) adopted this technique for effcient fine-tuning, where prompts act as trainable task-specifc vectors. Due to the effectiveness and simplicity, prompt tuning was further introduced to vision (Jia et al. 2022) and vision-language models (Radford et al. 2021; Zhou et al. 2022; Tsimpoukelli et al. 2021; Alayrac et al. 2022). More recently, there are works treating prompts as a guidance for improving the performance of specifc tasks. For example, Qin et al. (2023) developed an automatic generation method of medical prompts to improve the knowledge transferability of pre-trained vision-language models to medical object detection. Ge et al. (2022) proposed to embed domain information into prompts for unsupervised domain adaptation. In this paper, we convert diagnostic results into prompts to guide report generation. To the best of our knowledge,

04-10 13:06