A Review of Cross-Platform Document File Reader Using Speech Synthesis

  • Ogenyi Fabian Chukwudi
  • Val Hyginus Udoka Eze
  • Ugwu Chinyere
Keywords: Cross-Platform, Document and File, Reader, Speech and Synthesis, Text

Abstract

Document files are files used to store documents on storage devices primarily for computer use. Software is used to view these files, displaying their text content in a legible way. However, it is essential to have programs for transforming electronic files into versions usable by those who suffer from specific disabilities. This paper reviewed fifteen published articles in the field of document file reading.  It was observed from the review that various attempts have been made by different researchers in order to develop a software cable for converting document files that consist of text to an audio format. Text may now be easily translated into natural-sounding voice across many platforms using different software. It was observed from the systematic review that the use of AI such as the GPT-3.5 and GPT-4 Turbo Large Language Model (LLM) technologies has the best performance because it does not end at producing a vocal sound that is similar to human own, but it also translates different languages. In conclusion, cross-platform document file reader (text-to-speech) synthesis has improved user experiences in a variety of applications such as language learning, audiobooks and virtual assistants.

Downloads

Download data is not yet available.

Author Biographies

Ogenyi Fabian Chukwudi

Department of Publication and Extension, Kampala International University. Uganda.

Val Hyginus Udoka Eze

Department of Publication and Extension, Kampala International University. Uganda.

Ugwu Chinyere

Department of Publication and Extension, Kampala International University. Uganda.

This is an open access article, licensed under CC-BY-SA

Creative Commons License
Published
        Views : 84
2023-12-27
    Downloads : 97
How to Cite
[1]
O. F. Chukwudi, V. H. U. Eze, and U. Chinyere, “A Review of Cross-Platform Document File Reader Using Speech Synthesis”, International Journal of Artificial Intelligence, vol. 10, no. 2, pp. 103-110, Dec. 2023.
Section
Articles

References

F. C. Ogenyi, H. Udeani, “Design and Implementation of a Cross Platform Document File Reader using Speech Synthesis,” Newport International Journal of Engineering and Physical Sciences, vol. 3, no. 1, pp. 1-11, 2023.

X. Cai, D. Dai, Z. Wu, X. Li, J. Li, and H. Meng, “Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition,” In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5734-5738, June 2021.

K. Abramski, S. Citraro, L. Lombardi, G. Rossetti, M. Stella, “Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students. Big Data Cogn,” Comput, vol. 7, no. 3, pp. 124, 2023.

S. Rautiainen, “A look at Portable Document Format vulnerabilities,” Information Security Technical Report, vol. 14, no. 1, pp. 30-33, 2009.

Microsoft Office File Formats. MSDN Library. Microsoft. [Accessed: February. 2, 2013].

B. Popa. Foxit Reader. Softpedia. SoftNews. 2014.

Foxit Software Foxit Reader 3.0. PC World. Archana. [Accessed: July 7, 2015].

H. Ingo. Open Life: The Philosophy of Open Source. Lulu.com – via Google Books. 2006

J. Sodnik and S. Tomažic, “Spatial speaker: 3D Java text-to-speech converter,” In Proceedings of the World Congress on Engineering and Computer Science, vol. 2, 2009.

R. A. Cole. Survey of the State of the Art in Human Language Technology, 1996.

J. Flanagan. Speech Analysis, Synthesis, and Perception. New York: Springer-Verlag, Berlin-Heidelberg, 1972.

J. Flanagan, and L. Rabiner. Speech Synthesis. Pennsylvania: Dowden, Hutchinson & Ross, Inc., 1973.

M. Schroeder, “A Brief History of Synthetic Speech,” Speech Communication, vol. 13, pp. 231-237, 1993.

D. Klatt, “Review of Text-to-Speech Conversion for English,” Journal of the Acoustical Society of America, vol. 82, no. 3, pp. 737-793, 1987.

K.F. Lee. Automatic Speech Recognition: The development of the SPHINX system. Boston: Kluwer Academic publishers, 1989.

N. Kaur, P. Singh, “Conventional and contemporary approaches used in text to speech synthesis: a review,” Artif Intell Review, vol. 56, no. 7, pp. 5837–5880, July 2023.

C. N. Ugwu and V. H. U. Eze, “Qualitative Research,” IDOSR of Computer and Applied Science, vol. 8, no. 1, pp. 20–35, 2023.

V. H. U. Eze, “Qualities and Characteristics of a Good Scientific Research Writing; Step-by-Step Approaches,” IAA Journal of Applied Sciences, vol. 9, no. 2, pp. 71–76, 2023.

C. N. Ugwu, V. H. U. Eze, J. N. Ugwu, F. C. Ogenyi, and O. P. Ugwu, “Ethical Publication Issues in the Collection and Analysis of Research Data,” Newport International Journal of Scientific and Experimental Sciences, vol. 3, no. 2, pp. 132–140, 2023.