Data availability and quality are crucial for the development of semantic segmentation techniques. However, creating high-quality fire scene datasets in a safe and efficient manner remains an ...unsolved challenge. To fill this gap, we introduce FireDM, the first method to generate unlimited fire segmentation datasets at virtually no cost. FireDM takes full advantage of the combined strengths of a combination of pre-trained diffusion models (Stable Diffusion XL 1.0 and Stable Diffusion 2.1) and text-guided diffusion using ChatGPT4-Fire to generate multi-scale and detail-rich fire images. The innovative fire-decoder module in FireDM then efficiently converts the cross-attention and multi-scale feature maps obtained during diffusion into accurate segmentation masks. This process requires only about 100 images and their corresponding segmentation masks for training. In our experiments, we trained the segmentation algorithms using the large-scale segmentation dataset generated by FireDM and all publicly available fire segmentation datasets respectively, and found that the segmentation algorithms trained with the former dataset outperformed the latter by at least 5% or more in terms of IoU, accuracy, F1-score and AP. This demonstrates the capability of FireDM in expanding a limited fire segmentation dataset. Additionally, the datasets generated by FireDM, with their multiple image resolutions, can adapt to the input sizes of different segmentation algorithms, significantly reducing information loss caused by resizing the image (e.g., cropping and scaling). Finally, we have created the world’s first high-quality fire segmentation dataset benchmark using FireDM. The complete code and dataset of FireDM are publicly available at https://github.com/ZhengHongtao2001/FireDM.
•Low-Cost Fire Diffusion Model FireDM for Endless Generation of Segmentation Datasets.•LLM generates textual descriptions to prompt generation of multi-scenario fire datasets.•Dual diffusion architecture (SD2.1 and SDXL 1.0) to generate multi-resolution datasets.•Perception Mask Module Fire-Decoder Accurately Generates Flame Mask.•The first benchmark of the fire segmentation dataset was created using FireDM.
Display omitted
Text-to-image generative models can produce diverse high-quality images of concepts with a text prompt, which have demonstrated excellent ability in image generation, image translation, etc. We in ...this work study the problem of synthesizing instantiations of a user's own concepts in a never-ending manner, i.e., create your world, where the new concepts from user are quickly learned with a few examples. To achieve this goal, we propose a L ife l ong text-to-image D iffusion M odel (L <inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> DM), which intends to overcome knowledge "catastrophic forgetting" for the past encountered concepts, and semantic "catastrophic neglecting" for one or more concepts in the text prompt. In respect of knowledge "catastrophic forgetting", our L <inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> DM framework devises a task-aware memory enhancement module and an elastic-concept distillation module, which could respectively safeguard the knowledge of both prior concepts and each past personalized concept. When generating images with a user text prompt, the solution to semantic "catastrophic neglecting" is that a concept attention artist module can alleviate the semantic neglecting from concept aspect, and an orthogonal attention module can reduce the semantic binding from attribute aspect. To the end, our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics, when comparing with the related state-of-the-art models. The code will be released at https://wenqiliang.github.io/ .
Rooftop photovoltaic (PV) segmentation based on remote sensing images is highly applied in solar potential assessment and prediction. Still, such methods often feature dataset limitations of PV data, ...poor robustness, and are non-generalizable. General Generative AI eliminates the need for pre-training emerging to improve the sample diversity and algorithm robustness and generalizability of the segmentation. This paper designs a PV sample generation method based on the generative model, which leverages the text-guided stable diffusion inpainting model to augment the PV dataset and generate massive multi-background rooftop PV panel samples. The real and generated samples are mixed in different proportions to form a new training set for ablation experiments. Results show that a small number of real datasets mixed with generated data could reach a high relative IoU and Precision value. In small sample learning, the generated data achieves similar effects as real data during the segmenting process even better than without generated data. It demonstrates that the generated datasets outperform traditionally augmented data and that the manual text prompts are tested more accurately than ChatGPT-generated ones. This study highlights the efficiency and robustness of generated datasets in PV segmentation tasks and moves beyond the constraints of remote sensing data acquisition and limited data diversity. Further, it would facilitate large-scale assessments of the urban PV potential for urban planners and policymakers using an efficient and low-cost method.
•This study leverages a text-guided stable diffusion inpainting model to augment the PV dataset and improve data diversity.•It proves that a small number of real datasets mixed with generated data could reach high relative precision and robustness.•The General Generative AI-based data augmentation reaches a higher accuracy than conventional data augmentation.•The manual text prompts perform better than the ChatGPT-generated versions.
•The proposed scheme is the first work for ambiguity attack of SDM watermarking.•The implementation of the scheme only requires a small cost.•Our method does not need the original training data.•The ...proposed method is effective to various SDM watermarking.
In recent years, the text-to-image diffusion models have achieved excellent performance. Among them, stable diffusion models (SDMs) have become one of the most widely used models because of their excellent performance. Scholars have proposed many model watermarking techniques to protect the copyright of the text-to-image diffusion models. In order to measure the security and potential risks of the existing text-to-image diffusion model watermarking techniques, an ambiguity attack against the text-to-image diffusion model watermarking is proposed for the first time in this paper. Specifically, we take the SDMs as an example, take advantage of the reversibility of the model watermarking and combine the ideas of adversarial examples and discrete prompt optimization to re-embed a forged watermark in the watermarked SDMs, thus confounding the watermark containing copyright information. A large number of experiments show that our ambiguity attack is effective and can make the original watermark lose its uniqueness without changing the watermarked text-to-image diffusion models.
Casual users nowadays can create almost arbitrary image content by providing textual prompts to generative machine-learning models. These models rapidly improve image quality with each new ...generation, providing means to create photos, paintings in different styles, and even videos. One feature of such models is the ability to take an image as input and adjust content according to a prompt. A visual obfuscation of content can be achieved for static images and videos by slightly changing persons, text, and other objects. The potential of this technique can be applied in eye-tracking experiments for post-hoc dissemination of analysis results and visualization. In this work, we discuss how the technique could serve to anonymize stimuli (e.g., for double-blind reviews, remove product placements, etc.) and protect the privacy of people visible in the stimuli. We further investigate how the application of this anonymization process influences visual saliency and the depiction of stimuli in visualization techniques. Our results show that slight image transformations do not drastically change the saliency of a scene but obfuscate objects and faces while keeping important image structures for context.
Display omitted
•Investigation of parameters on different stimuli.•Influence of altered image content on a saliency model.•Influence of altered image content on visualization techniques.•A discussion of important aspects for the future applications.
•Integrated multi-source satellite data and stochastic modeling for Lower Yellow River planform evolution.•Forecasted long-term flood period channel adjustments to reservoir operations using AI ...methods.•Quantified spatially differentiated channel migration dynamics and informing adaptive management.•Developed a generalizable methodological framework for studying human-nature fluvial dynamics.•Identified priority locations for channel training works based on morphological evolution predictions.
This study investigates the stable diffusion-evolution tendencies of the Lower Yellow River (LYR) corridor during floods in the 21st century following Xiaolangdi Reservoir’s operation. While previous studies have assessed short-term, cross-sectional channel adjustments in the LYR, our understanding remains limited regarding the long-term, centennial-scale planform morphological responses to Xiaolangdi Reservoir’s operation, as field data are limited and multi-spatiotemporal information is difficult to obtain. We integrate multi-source satellite data to extract typical river morphology information. A stochastic channel-forming relationship model based on hydraulic geometry theory links water–sediment inputs with channel adjustments. The data-model integration effectively captures the dynamics of key channel morphology variables. Long-term forecasts of flood period flows and incoming sediment loads are generated using a GradientBoosted Long Short-Term Memory (LSTM) machine learning approach. The channel centerline evolution is numerically simulated by incorporating the forecasted inputs into a physically based meander migration model and representatively analyzed through clustering and gradient descent regression after Monte Carlo simulation. The spatially differentiated analysis and quantification of channel evolution dynamics fill existing knowledge gaps by providing important insights for developing adaptive channel maintenance and corridor management strategies. The integrated methodologies advance understanding of river channel dynamics under complex human-nature interactions in large plains watersheds.
•Using Stable Diffusion artificial images to augment real datasets boosts detection, even when used alone.•Implementing a structured workflow enables Stable Diffusion to generate high-quality ...artificial weed images.•The augmentation of weed datasets by artificial images enhances weed detection CNNs' performance.•The one-stage object detectors YOLOv8 and RetinaNet were trained on real and artificial-generated weed images.
Weeds challenge crops by competing for resources and spreading diseases, impacting crop yield and quality. Effective weed detection can enhance herbicide application, thus reducing environmental and health risks. A major challenge in Site-Specific Weed Management (SSWM) is developing a reliable weed identification system, especially given the diversity and similarity between certain weeds and crops during early growth stages. Image-based deep learning (DL) methods have become vital for weed classification. However, accurate weed classification and detection using DL techniques face the bottleneck of requiring large labeled data. Furthermore, labeling this specific extensive data is a time-consuming and tedious task apart from necessitating weed science experts. This research's central focus is to present a novel approach to weed detection using convolutional neural network (CNN) classifiers, specifically Yolov8l and RetinaNet, augmented with Stable Diffusion data i.e., artificial weed images. Stable Diffusion enhanced the training data, increasing the classifiers' adaptability. The study targeted specific weeds (Solanum nigrum L.; Portulaca oleracea L.; Setaria Verticillata L.) found in tomato crops, using a limited number of real images (30 samples) to produce artificial training images for the CNNs. All validation and test sets are comprised of real weed images. Results showed high performance when using only artificial images in terms of Mean Average Precision (mAP). In isolated conditions (0.91 mAP), i.e., only one weed species per image, an average performance gain of about 3% in all tests is obtained. When adding the artificial images to the real ones (mixed dataset), a mAP of 0.99 is obtained. In contrast, results using only artificial images obtained 0.81 mAP when detecting more than a single weed species. However, when implementing the trained CNNs with a mixed dataset, a 6% − 9% performance gain was achieved in all cases. A mAP of up to 0.93 was achieved in the most challenging conditions where weed species could overlap. The results indicate that the proposed approach outperformed existing methods, such as Generative Adversarial Networks (GANs) regarding mAP. Furthermore, the Yolov8l model distinctly emerged as the most favorable option for real-time detection systems considering Frame Detection Speed (FDS). Specifically, the Yolov8l model registered an FDS of 10.2 ms, which is considerably faster when compared to the 21.2 ms that the RetinaNet model exhibited. Additionally, the method is versatile and applicable to various crops and weed species, thereby enhancing automated weed management systems. This research illustrates that Stable Diffusion can efficiently expand small image sets, significantly reducing field imaging. The study offers valuable insights for future SSWM efforts utilizing artificially generated images for weed detection and classification.
Machine-vision-based defect detection for large metal stamping is a fundamental requirement for improving product quality and inspection speed. However, the performance of machine vision in detecting ...defects is limited by the large variety of stamped products, significant dimensional differences, and long data-acquisition cycles. To overcome these problems, metal stamping detection has been achieved using the improved YOLOV5 model, which features a slim neck and a multiheaded self-attentive mechanism for detecting wrinkles, holes, and cracks in metal stamping. An image-processing module was used to reduce the impact of metal reflections and improve image quality. Stable diffusion improvement was used to augment the dataset to overcome the small dataset size problem and enhance its generalisation capability. In addition, a BotNet structure with a self-attentive mechanism was introduced into the YOLOV5 model backbone to improve the image feature extraction capability. We then optimised the prediction head structure of YOLOV5 to improve the detection speed and accuracy. Ablation experiments were performed to analyse and verify the effectiveness of each module. The results of the ablation experiments show that the mAP of our proposed stable diffusion improvement data enhancement method, and the YOLO-Bot-VOV algorithm, for metal stamped part defect detection reached 98.2 %, and the parameters were reduced by 0.432 million.