СИСТЕМА КОМПЛЕКСНОГО АНАЛІЗУ КОМУНІКАТИВНОЇ ПОВЕДІНКИ В ПУБЛІЧНИХ ДИСКУСІЯХ НА ОСНОВІ ДІАРИЗАЦІЇ МОВЦІВ

Роман Руденський; Володимир Кравченко

doi:10.31548/itees.2026.01.019

Authors

Rudensky Roman National University of Life and Environmental Sciences of Ukraine
Kravchenko Volodymyr National University of Life and Environmental Sciences of Ukraine

DOI:

https://doi.org/10.31548/itees.2026.01.019

Keywords:

Speaker Diarization, Automatic Speech Recognition, Artificial Intelligence, Public Discussions, Analysis of Communicative Behavior

Abstract

The relevance of this study is driven by the increasing volume of online meetings and public discussions in digital formats, creating a demand for automated tools to analyze group communication. Traditional manual coding and transcription methods are highly labor-intensive and subjective, limiting large-scale research on communication patterns. The aim of this research is to develop and validate a comprehensive system for automated analysis of communicative behavior that integrates modern speaker diarization technologies, automatic speech recognition, and statistical analysis to provide a detailed picture of group dynamics in public discussions. Methods. The system is implemented using a microservice architecture with Python 3.10+, FastAPI, and React. Speaker diarization is performed using the pyannote.audio algorithm, which combines convolutional encoders with pre-trained WavLM models. Automatic speech recognition is carried out using transformer architectures (Whisper, AssemblyAI, Conformer). Communicative behavior analysis includes calculation of activity statistics, network analysis of interactions, and assessment of communication style. Results. The developed system successfully integrates speaker diarization with 0.5-second precision, automatic transcription, and multidimensional analysis of communication patterns. The modular architecture ensures flexibility for adaptation to various application domains. The system generates detailed timestamps of participant activity, visualizes speaking time distribution, and provides comprehensive analytics to improve decision-making processes. Prospects. Further development of the system includes integration of multimodal analysis considering non-verbal communication, improvement of stability in noisy conditions, domain adaptation for specific sectors, and implementation of real-time analysis of live discussions. The system opens new opportunities for studying group dynamics in corporate, educational, and governmental sectors.

Received 2026-02-20

Accepted 2026-03-27

References

1. Bredin, H., Laurent, A., Rouvier, M., Meignier, S., & Duponchel, L. (2020). Pyannote.audio: Neural building blocks for speaker diarization. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7124–7128). IEEE. https://doi.org/10.48550/arXiv.1911.01255.

2. Bredin, H. (2023). pyannote.audio 2.1 speaker diarization pipeline: Principle, benchmark, and recipe. In Proc. Interspeech 2023. International Speech Communication Association. https://doi.org/10.21437/Interspeech.2023-1294.

3. Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 28492–28518). PMLR. https://proceedings.mlr.press/v202/radford23a.html. https://doi.org/10.48550/arXiv.2212.04356.

4. Gulati, A., Qin, J., Chiu, C.-C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., Wu, Y., & Pang, R. (2020). Conformer: Convolution-augmented Transformer for speech recognition. In Proceedings of Interspeech 2020 (pp. 5036–5040). ISCA. https://doi.org/10.21437/Interspeech.2020-3015.

5. Ao, J., Wang, R., Yang, L., Zhou, J., Liu, S., Wei, L., Qian, C., & Li, X. (2022). SpeechT5: Unified-modal encoder-decoder pre-training for spoken language processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 5723–5738). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.393.

6. Park, T. J., Zhang, N., Lu, X., Wu, Y., & Glass, J. (2021). AutoVC: Zero-shot voice style transfer with only autoencoder loss. In International Conference on Machine Learning (pp. 8291–8300). PMLR. https://proceedings.mlr.press/v139/park21b.html. https://doi.org/10.48550/arXiv.1905.05879.

A SYSTEM FOR COMPREHENSIVE ANALYSIS OF COMMUNICATIVE BEHAVIOR IN PUBLIC DISCUSSIONS BASED ON SPEAKER DIARIZATION

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Developed By

Language

Information