我们发现,提示和图像之间的错位主要源于交叉层和自我注意力层的语义泄漏。Bounded Attention 通过赋予每个主体“Be yourself”的能力来解决这个问题,优先考虑个性并尽量减少图像中其他主体的影响。
Omer Dahary1 Or Patashnik1,2 Kfir Aberman2 Daniel Cohen-Or1,2
Be Yourself (omer11a.github.io)
本研究探讨了文本到图像生成模型在捕捉复杂输入提示时的挑战,并提出了一种新的解决方案。尽管文本到图像生成模型具有生成多样化和高质量图像的能力,但通常难以准确捕捉包含多个主题的预期语义。最近的研究引入了许多布局到图像的扩展,以改善用户控制,但这些方法往往会产生语义上不准确的图像,尤其是在处理多个语义或视觉上相似的主题时。
This study explores the challenges faced by text-to-image generation models in capturing complex input prompts and proposes a novel solution. While text-to-image generation models have the ability to generate diverse and high-quality images, they often struggle to accurately capture the intended semantics of prompts containing multiple themes. Recent research has introduced many extensions from layout to image to improve user control, but these methods often produce semantically inaccurate images, especially when dealing with multiple semantically or visually similar themes.
文本到图像生成模型的发展为生成多样化和高质量的图像提供了新的可能性,但其在语义控制方面的局限性仍然值得关注。本研究旨在分析这些局限性的原因,并提出相应的解决方案。
Background: The development of text-to-image generation models has opened up new possibilities for generating diverse and high-quality images, but their limitations in semantic control are still worth addressing. This study aims to analyze the reasons behind these limitations and propose corresponding solutions.
通过对现有文本到图像生成模型的研究和分析,我们发现主要问题源于去噪过程中主体之间的语义泄漏。为了解决这一问题,我们引入了一种名为有界注意力的方法,通过限制信息流,在采样过程中防止了有害泄漏,并促进了每个主题的个性。
Methodology:Through research and analysis of existing text-to-image generation models, we identified that the main issue lies in semantic leakage between subjects during the denoising process. To address this problem, we introduced a method called Bounded Attention, which prevents harmful leakage between subjects and promotes the individuality of each theme by limiting information flow during sampling.
实验结果表明,有界注意力方法能够生成与给定提示和布局一致的多个主题,并提高了模型的语义准确性。我们还发现语义泄漏可能发生在自我注意力层中,甚至在语义上不同的主题之间。
Results:Experimental results demonstrate that the Bounded Attention method can generate multiple themes consistent with the given prompts and layouts, improving the semantic accuracy of the model. We also found that semantic leakage may occur in the self-attention layer, even between subjects with different semantics.
有界注意力方法以引导和降噪两种模式运行,在保证生成图像语义准确性的同时提高了模型的稳健性和生成质量。
Discussion:The Bounded Attention method operates in two modes: guiding and denoising. It enhances the robustness and quality of image generation while ensuring semantic accuracy.
结论:本研究提出了一种新型的文本到图像生成模型的语义控制方法,通过限制注意力流动,有效地解决了语义泄漏问题,并提高了生成图像的质量和准确性。
Conclusion: This study proposes a novel semantic control method for text-to-image generation models, effectively addressing the issue of semantic leakage and improving the quality and accuracy of generated images by limiting attention flow.
文本到图像生成模型,语义控制,注意力机制,语义泄漏,图像生成质量
Text-to-Image Generation Models, Semantic Control, Attention Mechanism, Semantic Leakage, Image Generation Quality
[链接]:(https://arxiv.org/pdf/2402.15391.pdf)Be Yourself (omer11a.github.io)
评论0