Current AI systems have shown impressive results in the automatic synthesis of realistic images from text descriptions tasks. In fact, Generative Adversarial Networks (GANs) are widely used in ...text-to-image generation tasks. The generator generates realistic images given the noise and sentence vectors, and the discriminator produces a probability of how the synthetic images are reals. In this paper, in order to generate images from Arabic text, we fuse DF-GAN as a sample and efficient text-to-image generation framework and AraBERT architecture. To achieve this purpose, firstly, we re-create new datasets matching the Arabic text-to-image generation task by applying DeepL-Translator from English to Arabic on text descriptions of original datasets. Secondly, we leverage the power of AraBERT which is trained on billions of Arabic words to produce a strong sentence embedding, and we reduce that vector’s dimension to match with DF-GAN shape. Thirdly, we inject the reduced sentence embedding into the UPBlocks sections of DF-GAN and we train the proposed architecture on two challenging datasets. Following the previous works, we use CUB and Oxford-102 flowers as original datasets. Further, we measure our framework with FID and IS. Our framework is the first that achieve much success in generating high-resolution realistic and text-matching images conditioned with Arabic text.
Text-to-image (T2I) generation, which involves synthesizing an image from a textual description, has emerged as a popular research topic in computer vision. Meanwhile, transformer-based models, such ...as BERT, GPT-2, and T5, have demonstrated promising results in various natural language processing tasks, including text generation and translation. However, their application to T2I generation remains largely unexplored. Therefore, a comparative study investigating the performance of BERT, GPT-2, and T5 in T2I generation is of significant importance. Such a study would shed light on the strengths and weaknesses of each model, facilitating the identification of the most suitable approach for this task. In this paper, we propose three architectures to conduct a comparative study of T5, GPT-2, and BERT in T2I generation tasks. We fine-tune these models to generate text vectors and then transform the textual information into images using the affine transformation into the DF-GAN generator. Subsequently, we evaluate the quality, diversity, and the models’ ability to recognize the words. Our experiments on the challenging CUB and Oxford-102 flower datasets demonstrate that T5 exhibits promising potential for T2I generation. It has the capability to generate visually appealing and semantically coherent images from textual descriptions.
•GPT-2, BERT, and T5 have succeeded in synthesizing the images from the textual description with varying degrees of success.•BERT allows the model to capture the relationships between words in the input text and generate a more coherent image.•GPT-2 allows the model to generate plausible images, but the quality of the images is often lower compared to other models.•T5 allows the model to be more flexible, it is the powerful transformer-based model for text-to-image generation tasks.•The quality of the generated images is still lower compared to other generation models.
In the field of Artificial Intelligence (AI) and Medicine, chest X-ray images are crucial for diagnosing various diseases. However, training AI models presents challenges, particularly with limited ...data or cases involving significant pathology or minor anomalies. To address the constraint of limited data, data augmentation has emerged as a popular technique in medical imaging. One promising approach to augmenting chest X-ray images involves leveraging text-to-image generation, which transforms textual disease descriptions into synthetic images. This technique effectively rectifies class imbalances and enhances the accuracy and reliability of AI models used in medical imaging applications. This study introduces a text-to-image generation architecture based on DF-GAN to augment chest X-ray images. The study aims to assess the impact of augmented data on the performance of two AI models, namely VGG16 and ResNet50, in a classification task. The experimentation is conducted on two challenging datasets, namely Chest X-rays from Indiana University and NIH Chest X-rays. The findings reveal that integrating text-to-image generated data enhances sensitivity by 2.1%, specificity by 1.9%, and AUC by 1.4%, while also mitigating overfitting during training across both datasets. These results underscore the potential of text-to-image generation in bolstering the accuracy and robustness of AI models employed in medical imaging tasks.
•Proposed text-to-image generation using DF-GAN to augment chest X-ray images.•Demonstrated impact of augmented data on VGG16 and ResNet50 in classifying X-rays.•Evaluated technique on challenging datasets: Chest X-rays (IU) and NIH X-rays.•Augmented data improved sensitivity by 2.1% and specificity by 1.9% across datasets.•Text-to-image generation mitigated overfitting and enhanced accuracy in medical imaging.