МОДУЛЬ МАШИННОГО ЗОРУ ДЛЯ ВИЯВЛЕННЯ ОБ’ЄКТІВ НА ЗОБРАЖЕННЯХ І У ВІДЕОПОТОЦІ

Вікторія Смолій; Натан Смолій

doi:10.31548/itees.2025.02.009

Authors

Smolij Viktorija National University of Life and Environmental Sciences of Ukraine
Smolij Natan National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

DOI:

https://doi.org/10.31548/itees.2025.02.009

Keywords:

Object Detection, Image Segmentation, Detection Model, Model Pre-Training, Augmentation Procedures, Fine-Tuning, Binary Object Masks

Abstract

This work addresses the topical issue of developing and applying intelligent computer vision methods for the automated detection and segmentation of protective equipment objects in images and video streams. The aim of the work is to identify objects of a specified type in images captured by a video camera, as well as to develop a detection model capable of effectively identifying and localising these objects under various lighting conditions, scales and perspectives. The article examines the data preparation process, in particular the application of augmentation methods to improve the representativeness of the sample, and performs a comparative analysis of the parameters, performance and results of the SAM and YOLO models. The results of experimental studies are presented, demonstrating the positive impact of increasing the volume and diversity of the dataset on data balance and the generalisation ability of computer vision models. The proposed approaches to training separate models for segmentation and classification tasks have proven their effectiveness in the context of automated image processing. In this work, a specialised dataset of protective equipment items was created to address the image segmentation task. During the model pre-training phase, data augmentation techniques—including mirroring, rotation, scaling and brightness adjustment—were applied to this dataset, which significantly increased the diversity of the training examples. Increasing the size of the dataset ensured a more balanced representation of the data and improved the model’s generalisation ability. The results obtained in this work confirm the feasibility and effectiveness of the authors’ proposed approach to this problem, namely the separate training of models for image segmentation and classification. Prospects for further research include expanding the dataset, optimising the computational complexity of the models, and investigating their application in real-time for video analytics.

Author Biographies

Smolij Viktorija, National University of Life and Environmental Sciences of Ukraine

Doctor of Technical Sciences, Professor, Department of Information Systems and Technologies,
National University of Life and Environmental Sciences of Ukraine
Smolij Natan, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Postgraduate Student of the specialty "Information Systems and Technologies",
National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

References

1. Schunfelder, P., Stebel, F., Andreou, N., & Kunig, M. (2024). Deep learning-based text detection and recognition on architectural floor plans. Automation in Construction, 157, Article 105156. https://doi.org/10.1016/j.autcon.2023.105156.

2. Giakoumoglou, N., Pechlivani, E. M., & Tzovaras, D. (2023). Generate-paste-blend-detect: Synthetic dataset for object detection in the agriculture domain. Smart Agricultural Technology, 5, Article 100258. https://doi.org/10.1016/j.atech.2023.100258.

3. Ashourpour, M., Azizpour, G., & Johansen, K. (2024). Real-time defect and object detection in assembly line: A case for in-line quality inspection. In Lecture Notes in Mechanical Engineering (pp. 99–106). Springer. https://doi.org/10.1007/978-3-031-38241-3_12.

4. Azevedo, P., & Santos, V. (2024). Comparative analysis of multiple YOLO-based target detectors and trackers for ADAS in edge devices. Robotics and Autonomous Systems, 171, Article 104558. https://doi.org/10.1016/j.robot.2023.104558.

5. Sanjai Siddharthan, M., Aravind, S., & Sountharrajan, S. (2024). Real-time road hazard classification using object detection with deep learning. In Lecture Notes in Networks and Systems (Vol. 789, pp. 479–492). Springer. https://doi.org/10.1007/978-981-99-6586-1_33.

6. Wei, Z., Zhang, Y., Wang, X., Zhou, J., Dou, F., & Xia, Y. (2024). A YOLOv8-based approach for steel plate surface defect detection. Metallurgija, 63(1), 28–30.

7. Wu, F., Zhang, Y., Wang, L., Hu, Q., Fan, S., & Cai, W. (2023). A deep learning-based lightweight model for the detection of marine fishes. Journal of Marine Science and Engineering, 11(11), Article 2156. https://doi.org/10.3390/jmse11112156.

8. Zhang, G., Tang, Y., Tang, H., Li, W., & Wang, L. (2023). A global lightweight deep learning model for express package detection. Journal of Intelligent & Fuzzy Systems, 45(6), 12013–12025. https://doi.org/10.3233/JIFS-232874.

9. Wang, J., Dai, H., Chen, T., Liu, H., Zhang, X., Zhong, Q., & Lu, R. (2023). Toward surface defect detection in electronics manufacturing by an accurate and lightweight YOLO-style object detector. Scientific Reports, 13, Article 33804. https://doi.org/10.1038/s41598-023-33804-w.

10. Li, A., Zhang, Z., Sun, S., Feng, M., & Wu, C. (2023). MultiNet-GS: Structured road perception model based on multi-task convolutional neural network. Electronics, 12(19), Article 3994. https://doi.org/10.3390/electronics12193994.

11. Han, L., Ma, C., Liu, Y., Jia, J., & Sun, J. (2023). SC-YOLOv8: A security check model for the inspection of prohibited items in X-ray images. Electronics, 12(20), Article 4208. https://doi.org/10.3390/electronics12204208.

12. Mao, J., Wang, L., Wang, N., Hu, Y., & Sheng, W. (2023). A novel method of human identification based on dental impression image. Pattern Recognition, 144, Article 109864. https://doi.org/10.1016/j.patcog.2023.109864.

13. Kara, E., Zhang, G., Williams, J. J., Ferrandez-Quinto, G., Rhoden, L. J., Kim, M., Kutz, J. N., & Rahman, A. (2023). Deep learning based object tracking in walking droplet and granular intruder experiments. Journal of Real-Time Image Processing, 20, 269–311. https://doi.org/10.1007/s11554-023-01341-4.

14. Zhou, S., Zhong, M., Chai, X., Zhang, N., Zhang, Y., Sun, Q., & Sun, T. (2024). Framework of rod-like crops sorting based on multi-object oriented detection and analysis. Computers and Electronics in Agriculture, 216, Article 108516. https://doi.org/10.1016/j.compag.2023.108516

15. Shan, P., Yang, R., Xiao, H., Zhang, L., Liu, Y., Fu, Q., & Zhao, Y. (2023). UAVPNet: A balanced and enhanced UAV object detection and pose recognition network. Measurement, 222, Article 113654. https://doi.org/10.1016/j.measurement.2023.113654.

16. Talaat, F. M., & ZainEldin, H. (2023). An improved fire detection approach based on YOLOv8 for smart cities. Neural Computing and Applications, 35, 20939–20954. https://doi.org/10.1007/s00521-023-08809-1.

17. Liu, S., Fan, Q., Zhao, C., & Li, S. (2023). RTAD: A real-time animal object detection model based on a large selective kernel and channel pruning. Information, 14(10), Article 535. https://doi.org/10.3390/info14100535.

18. Smolii, V. M., Smolii, N. V., Kovalenko, O. Y., & Shvydenko, M. Z. (2025). Channel extractor for UAV PPM signals. CEUR Workshop Proceedings, 3917, 226–236. https://ceur-ws.org/Vol-3917/.

19. Su, Y., Tan, W., Dong, Y., Xu, W., Huang, P., Zhang, J., & Zhang, D. (2024). Enhancing concealed object detection in active millimeter wave images using wavelet transform. Signal Processing, 216, Article 109303. https://doi.org/10.1016/j.sigpro.2023.109303.

20. Liu, C., Wang, K., Li, Q., Zhao, F., Zhao, K., & Ma, H. (2024). Powerful-IoU: More straightforward and faster bounding box regression loss with a nonmonotonic focusing mechanism. Neural Networks, 170, 276–284. https://doi.org/10.1016/j.neunet.2023.11.041.

21. Xu, W., Liu, C., Wang, G., Zhao, Y., Yu, J., Muhammad, A., & Li, D. (2024). Behavioral response of fish under ammonia nitrogen stress based on machine vision. Engineering Applications of Artificial Intelligence, 128, Article 107442. https://doi.org/10.1016/j.engappai.2023.107442.

22. Dimauro, G., Barbaro, N., Camporeale, M. G., Fiore, V., Gelardi, M., & Scalera, M. (2024). DeepCilia: Automated, deep-learning-based engine for precise ciliary beat frequency estimation. Biomedical Signal Processing and Control, 90, Article 105808. https://doi.org/10.1016/j.bspc.2023.105808.

23. Zhao, X., & Song, Y. (2023). Improved ship detection with YOLOv8 enhanced with MobileViT and GSConv. Electronics, 12(22), Article 4666. https://doi.org/10.3390/electronics12224666.

24. Smolii, V. M., Smolii, N. V., & Sayapin, S. P. (2024). Search and classification of objects in the zone of reservoirs and coastal zones. CEUR Workshop Proceedings, 3666, 37–51. https://ceur-ws.org/Vol-3666/paper04.pdf.

25. Ultralytics. Train settings. In Ultralytics Docs. https://docs.ultralytics.com/modes/train/#train-settings.

26. Tang, J., Xie, N., Li, K., Liang, Y., & Shen, X. (2024). Trajectory tracking control for fixed-wing UAV based on DDPG. Journal of Aerospace Engineering, 37(2), Article 04024011. https://doi.org/10.1061/JAEEEZ.ASENG-5286.

MACHINE VISION MODULE FOR OBJECT DETECTION IN IMAGES AND VIDEO STREAMS

Authors

DOI:

Keywords:

Abstract

Author Biographies

References

Downloads

Published

Issue

Section

License

Developed By

Language

Information