JuDifformer: Multimodal fusion model with transformer and diffusion for jujube disease detection

This paper proposes a deep learning model based on multimodal data fusion for detecting jujube tree diseases in desert environments. Due to the complex lighting and environmental conditions in desert areas, existing disease detection methods face significant limitations in feature extraction and accuracy. By fusing image and sensor data, this study designs a feature extraction mechanism that combines transformer and diffusion modules to achieve precise capture of disease features. Experimental results demonstrate that the proposed model outperforms mainstream object detection models and state-of-the-art methods across multiple metrics. Specifically, the model achieves an accuracy of 0.90, precision of 0.93, recall of 0.89, and an mAP of 0.91, which are significantly higher than those of other comparison models. Compared to DETR, YOLOv10, EfficientDet, and others, the proposed method not only converges faster but also shows superior final performance. When compared to recent methods, the proposed model also exhibits better detection performance and robustness. The results highlight the practical value of the proposed model in disease detection under challenging environmental conditions, demonstrating its capability to effectively handle low- light and high-dust scenarios while maintaining high detection accuracy and robustness. These findings confirm the model's potential for improving disease monitoring efficiency in large-scale agricultural applications. Future work will further optimize the model's real-time performance and lightweight design to adapt to more real-world scenarios.