BackgroundThe integration of Text-to-Speech (TTS) and virtual reality (VR) technologies in K-12 education is an emerging trend. However, little is known about how students perceive these technologies and whether these technologies effectively facilitate learning.ObjectivesThis study aims to investigate the perception and effectiveness of TTS voices and VR agents in a K-12 classroom setting, with a focus on information recall.MethodsUsing a recent TTS architecture, we developed four different synthetic voices based on 5, 10, 15 and 20 h of training materials. Two experiments were conducted involving students in a K-12 setting. The first experiment examined students' evaluations of TTS voices with varying hours of training material and the impact on information recall. The second experiment assessed the effect of pairing TTS voices with a VR agent on students' perception and recall performance.Results and ConclusionsHuman voices received superior quality ratings over TTS voices within the classroom context. The integration of a VR agent was found to enhance the perception of TTS voices, aligning with existing literature on the positive impact of virtual agents on speech synthesis. However, this incorporation did not translate to improved recall, suggesting that the student focus may have been compromised by the VR agent's novelty and its design limitations. What is currently known about this topic? Human-like agents in multimedia learning applications have a positive effect on learning outcomes. For adult users, it is acceptable that virtual agents communicate with synthetic speech. Little is known about the impact of speech synthesis on comprehension and learning in primary school children. The perception of realistic virtual agents in primary school applications remains unknown.What does this paper add? VR agents enhance the perceived quality of human and synthetic voices in educational settings. The presence of VR agents diminishes the perceived differences between human and TTS voices. While VR enhances voice perception, it may also distract from content, reducing recall. Static VR agents without full-body movements could reduce their educational effectiveness.Implications for practice/or policy The presence of VR agents does not guarantee an improvement in learning performance. Enhancements in VR agent design, like full-body expressiveness, could improve learning outcomes. While VR might make synthetic voices feel more natural, its educational benefits are still uncertain. Ongoing research is needed to optimize VR technologies for enhanced learning experiences.