1 Introduction

The traditional classroom setting has been the predominant mode of education delivery for many years. However, this approach has limitations as teachers often struggle to provide personalized learning experiences that cater to each student’s unique needs. This can result in disengagement and demotivation among students, particularly when they do not receive immediate support to overcome learning challenges (Akyuz, 2020). In the twenty-first century, the education landscape is rapidly evolving, and educators are exploring new and innovative ways to engage students.

1.1 Current landscape

Robot-assisted teaching has become increasingly popular in recent years, with many schools and universities adopting this technology to enhance the learning experience. Robots can be used in various educational settings, from traditional classrooms to online learning environments. Example applications include language learning (Wu et al., 2015), STEM education (Ahmad et al., 2020), and special education (Amanatiadis et al., 2017). Additionally, robots can provide personalized learning experiences, assist grading and assessment, and promote student engagement and motivation.

1.2 Advantages

One of the main advantages of using robots in education is their ability to provide personalized learning experiences (Alam, 2021). Robots can adapt to students’ needs and learning styles, providing tailored instruction and feedback (Kanda et al., 2004). Also, robots can provide a sense of companionship and emotional support to students (Belpaeme et al., 2018), which can be especially beneficial for those with special needs or who struggle with social interactions. Finally, robots can provide a fun and engaging learning experience to promote student’s motivation and interest in learning (Hong et al., 2016).

1.3 Challenges

However, existing research has highlighted the insufficient social and affective skills of current robots (Cooper et al., 2020). While these robots can perform various tasks and provide emotional feedback by expressing happiness, satisfaction, or disappointment, they lack empathy (Bourguet et al., 2020). Studies have shown that the socially supportive behaviour of humanoid robots can contribute more to students’ learning engagement than knowledge transfer (Saerbeck et al., 2010), and further improve students’ learning ability, intrinsic motivation, task motivation, and attention through nonverbal elements (Donnermann et al., 2021).

Extensive research efforts have been devoted to the effectiveness of humanoid robot-assisted learning in fostering student interaction (Yang et al., 2022). However, most of them focused on knowledge transfer, such as programming (Kozma, 2000) or leadership development (Morgan et al., 2019). On the other hand, the design of robot-assisted learning can also impact the way students interact with the robots (Zipke, 2017). Yet, the research on the design that is critical for interaction, such as mobility, gesture and animacy, that has not been fully investigated. This study aims to fill this gap by designing an app for a robot with humanoid elements to engage primary school students in learning Traditional Chinese. Specifically, the study serves as a probe to explore the use of humanoid robot learning in both functional and affective dimensions by addressing the following research questions:

  • RQ1: To what extent can a humanoid robot impact students’ learning engagement and motivation?

  • RQ2: How does a humanoid robot affect students’ learning proficiency?

To answer the two research questions, 64 students participated in a five-day training programme, comprising a control group learning in the traditional classroom (N=33), and an experimental group learning with humanoid robots (N=31). The study utilized a triangulation process, including questionnaires, observations and language proficiency tests, to ensure the rigour and validity of the findings. The results showed that the experimental group significantly improved their behavioural engagement, emotional engagement, cognitive engagement and intrinsic motivation. On the contrary, the control group only demonstrated a small improvement in cognitive engagement.

The paper is organized as follows. Section 2 reviews the related works on humanoid robots and student engagement. Section 3 discusses the self- determination theory. Section 4 introduces a language learning system, including humanoid robots, pedagogy, game user interfaces (UIs), system, and interaction and motivation design. Section 5 explains the experiment, including (1) participants, procedures, and regulations, and (2) data analysis and key implications. Section 6 discusses three engagements, intrinsic motivation, HRI and the limitations of this study. Section 7 deliberates the future directions.

2 Related works

In recent years, the integration of robot-empowered learning has shown potential for enhancing student engagement and motivation. Social robots in primary school classrooms have been found to maintain students’ attention (Kennedy et al., 2016). Additionally, robot tutors have been shown to improve learning outcomes and increase student engagement (Belpaeme et al., 2018). By demonstrating empathetic behaviour and offering emotional support, the robots enhance students’ engagement (Kory-Westlund & Breazeal, 2019). Robot-empowered learning can contribute to equitable education, ensuring that all students can benefit from the innovative potential of humanoid robots in the classroom. In the following, we discuss the related works on robot- empowered learning, especially the impact of humanoid design features on student’s engagement and motivation, and learning proficiency.

2.1 Robot-empowered learning

Robot-empowered learning integrates robotic technologies into education to enhance teaching and learning (Eguchi, 2016). It leverages artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) to facilitate knowledge acquisition and promote student engagement (Chen et al., 2023). Robot-empowered learning aims to create interactive and personalized learning experiences that cater to diverse student needs.

The applications of robot-empowered learning in education are vast and varied. In STEM education, robots can be used as hands-on learning tools to teach coding, programming, and problem-solving skills, engaging students in complex concepts and encouraging critical thinking (Sen et al., 2021). Additionally, robots can be employed as peer tutors or teaching assistants, providing support, feedback and guidance to students based on their learning needs and progress (Alam, 2022). Furthermore, robot-empowered learning can be particularly beneficial for students with special education needs, as robots can be designed to offer tailored support and therapeutic interventions, fostering an accessible learning environment (Papakostas et al., 2021). Robot-empowered learning has the potential to revolutionize education by enhancing teaching methodologies, promoting student engagement, and fostering equitable and inclusive learning experiences.

In language learning, robots can serve as conversation partners to help students practice speaking and listening skills in a supportive and non-judgmental environment (W. Huang et al., 2022). As shown in Table 1, robots are suitable for language instruction, offering repeatability, flexibility, and interaction. Chang et al. (2010) discussed the use of robots to facilitate the teaching of second languages (i.e., English) in different scenarios, such as storytelling, oral reading, cheerleader, action command, and question-and-answer modes. They found that students actively participated in the learning activities as they wanted to see the robot dance to them. Also, the robot encouraged students to practice language skills naturally. Furthermore, the lower-achieving students benefited from the robot’s anthropomorphic features.

Table 1 Existing robot language learning tools. The acronyms of L1, L2 and Lan are the first language, second language and language, respectively

Numerous researchers have conducted research regarding the impact of robot-empowered learning on language acquisition and motivation. Tanaka and Matsuzoe (2012) designed an English word-learning robot for Japanese young children. The result found that children taught to the robots learned words better than those without robots. Hsiao et al. (2015) developed a robot, iRobiQ, with multimedia content, to encourage kindergarteners to read, speak, and answer questions in their mother tongue, Mandarin. They found that children in the robot condition improved more than the tablet-assisted children. Wang et al. (2013) designed a robot as a learning companion for Taiwanese children to learn English speaking. Children learning with the robots had higher motivation and engagement than those without robots. However, the majority of language learning tools focused on second-language learning, i.e., English, rather than the mother tongue. The research regarding robot-empowered learning in language acquisition (i.e., vocabulary, short sentence and sentence expansion) in Traditional Chinese is under-examined.

2.2 Humanoid design features

Humanoid design features are essential to learning motivation and engagement as they play a crucial role in creating a meaningful and immersive learning experience. Humanoid robots are designed to interact and communicate with people by replicating human-to-human interactions (den Berghe et al., 2019) and utilizing common behaviours, i.e., design features (Bartneck & Forlizzi, 2004), to make them more relatable and engaging for learners. Unlike animated characters with limited interaction that offer learners through computer screens, the robots are tangible machines and exist in the same physical space as humans, giving them greater potential in educational settings (Leyzberg et al., 2012). Incorporating humanoid design features, such as gesture, mobility, and animacy, can enhance language acquisition and engagement during human-robot interactions (HRI) (Kennedy et al., 2016), which will be discussed in the following sections.

Gesture refers to the expression made by humanoid robots to convey a message or emotion, such as hand movements (Skinner & Belmont, 1993). Gesture is essential in human cognition and helps engage and motivate students (Liu et al., 2017). Incorporating gestures in robot-assisted learning can facilitate effective learning and engagement (C.-M. Huang & Mutlu, 2013). Specific gestures, such as thumbs up, clapping, nodding, smiling, and eye contact, can emphasize positive reinforcement and engagement (Maloney et al., 2020).

Vogt et al. (2019) argued that the robot’s use of gestures did not result in increased learning outcomes. However, Van Dijk et al. (2013) found that learners interacted with robots with gestures had better learning outcomes than those did not. In addition, de Wit et al. (2020) proved that engagement between children and robots was higher when gestures are presented. The reason is that the robot with more bodily movements can lead the robot to be perceived as more friendly and human-like (Asselborn et al., 2017), resulting in a higher level of engagement with the robot as children enjoyed the interaction more.

Mobility refers to the ability of humanoid robots to move around (Jung et al., 2018), including physical movement, as well as the ability to perform actions and respond to stimuli in real-time. The mobility of humanoid robots can arouse students’ learning attention (Page et al., 2021) due to several reasons including novelty, interactivity, personalization, and social presence. A movable humanoid robot can increase students’ engagement and motivation by capturing their attention and stimulating curiosity (Chin et al., 2014). The robot’s interactivity creates an immersive learning experience that engages students to participate and learn (Kukulska-Hulme & Shield, 2008). Personalized feedback and support in real-time help students become more integral and motivated in the learning process (Leite et al., 2014). The robot’s social presence and companionship increase students’ engagement and motivation by creating a feeling of connection and closeness (So & Brush, 2008).

Animacy refers to the degree to which a machine or robot exhibits human-like effects, such as facial expressions, sound, and responsiveness (Coeckelbergh, 2022). Simple animations or sound effects can help students to learn effectively due to reasons (Johnson et al., 2000), such as social, task, and emotional animacy. This work utilized emotional animacy, which refers to the ability of a humanoid robot to express emotions (Chiang et al., 2022), such as happiness, sadness, and confusion. A humanoid robot with emotional animacy can make learning more engaging and relatable for students (Kim et al., 2019). When the robot can express emotions, it may create a more empathetic and supportive learning environment (Alves-Oliveira et al., 2019).

Kennedy et al. (2015) compared the effects of different robot embodiments on children’s learning experiences. Physically present robots are more effective in promoting collaboration and maintaining attention compared to virtual agents or on-screen avatars. However, Bartneck et al. (2009) argued that the behaviour of a robot was more important than its embodiment as robots play an important role in the learners’ perception of animacy, such as the combination of facial expression and human-like physical features. DiSalvo et al. (2002) also pointed out that facial and physical features can render the robot into a friendly interaction partner.

Alemi et al. (2014) examined the effect of Robot-Assisted Language Learning (RALL) on vocabulary learning, in which the robot-assisted group had positive learning outcomes. Their robots employed different interactive elements (i.e., motions, visions, and audio) to reinforce students’ vocabulary knowledge. de Wit et al. (2018) evaluated gestures in robot-assisted and found that gestures benefited retention and engagement. Gordon et al. (2016) also utilized an automatic facial expression analysis system to assess children’s valence and engagement. They employed a child-tablet-robot scenario where the learning content was separated from the robot. Students were learning with tablets and the robot provided verbal feedback with gestures. The study revealed that children responded positively to personalized robot-assisted learning, but the learning outcomes did not improve. Separating the learning content from the robot may result in a less immersive and engaging learning experience for students. Relying solely on verbal feedback with gestures from the robot may not be as effective as having the robot directly involved in the learning process. Therefore, humanoid design features are crucial to students’ learning experience. However, the humanoid robots with combinatorial humanoid design features to improve students’ learning engagement and motivation is under-researched.

This work aims to bridge the gap by designing a novel app that utilizes a robot with humanoid design features. The app is specifically designed to promote primary school students’ engagement and motivation in studying Traditional Chinese. First, we develop a built-in app that employs a humanoid robot with different design features combinations, including gestures, mobility and animacy to stimulate learning engagement and motivation. Second, our system incorporates automatic speech recognition (ASR) technology, including speech-to-text (STT) and text-to-speech (TTS) functions, to provide students with instant feedback during the learning process. Third, our system can support mother tongue learning, i.e., Cantonese and Traditional Chinese, making it suitable for local use.

To answer the RQs, we conducted a pilot test. The results showed that the experimental group who learnt with the humanoid robots improved significantly in behavioural (+13.24%), emotional (+13.14%), and cognitive engagement (21.56%), and intrinsic motivation (12.07%). Additionally, the experimental group had a statistically significant improvement in the language proficiency test (+9.88%, p<.001) in a five-day learning.

3 Self-determination theory

Self-determination theory (SDT) is a psychological framework promoting intrinsic motivation and learning engagement (i.e., behavioural, emotional and cognitive) through utilizing digital autonomy, competence, and relatedness (i.e., three supports) (Ryan & Deci, 2020). Robots can promote engagement in language learning by providing personalized feedback, interactive activities, and social interactions. Several studies have shown that robots can enhance learners’ motivation, engagement (Ekström & Pareto, 2022), and language learning outcomes (Bahari, 2023). For the rest, we briefly discuss three supports, three engagements, and intrinsic motivation.

3.1 Three supports

Three supports include autonomy, competence, and relatedness. Autonomy support refers to learners’ perceptions of how technology-assisted learning environments facilitate their autonomy in the learning process (Chiu, 2021). Competence support refers to learners’ perceptions of how technology-assisted learning environments facilitate their competence in using digital tools and resources (Falloon, 2020). Relatedness support refers to learners’ perceptions of how technology-assisted learning environments facilitate their sense of belonging and social connection (Shea & Bidjerano, 2009).

3.2 Behavioural engagement

Behavioural engagement refers to learners’ active participation and persistence in learning activities (Skinner & Belmont, 1993). The behaviour can be programmed to be consistent with social norms and the rules of the language so as to enhance children’s language and communication skills (Neumann, 2020).

3.3 Emotional engagement

Emotional engagement refers to learners’ affective responses and emotional investment in learning tasks (Phung, 2017). Robots can enhance emotional engagement in language learning by providing personalized and adaptive feedback, creating a supportive learning environment, and using affective computing to detect learners’ emotional states (Aslan et al., 2022).

3.4 Cognitive engagement

Cognitive engagement refers to learners’ active processing and deep learning of information (Nasir et al., 2022). The physical robot can enhance students’ cognitive engagement as the physical presence may imbue the robot with the perceived authority (Leyzberg et al., 2012).

3.5 Intrinsic motivation

Intrinsic motivation refers to explore, manipulate or probe the environment, fostering human’s curiosity and engagement in new activities (Oudeyer & Kaplan, 2007). Recently, there has been an increasing interest in developing social robots for education purposes, and more specifically for children’s language learning (Vogt et al., 2017). Social robots are found to be able to stimulate children’s intrinsic motivation. Van Minkelen et al. (2020) found that the robot can improve learners’ performance in the strength and duration of task engagement.

SDT has been used in other studies, e.g., Chiu (2022). Different from this work, Chiu (2022) explored the application of SDT to measure the relationship between perceived need satisfaction and student engagement in the online learning environment. Chiu suggested three empirical implications, including (1) digital support strategies can fulfil three needs in online learning, (2) fulfilling these needs is likely to enhance the four dimensions of student engagement in online learning, (3) perceived relatedness is the primary predictor of behavioural, emotional, and agentic engagement, while perceived competence is the most important predictor of cognitive engagement. Perceived autonomy is a significant factor in all dimensions of student engagement in online learning. The differences are shown in Table 2.

Table 2 Comparison between Chiu’s study and this work

4 The language learning system

In this work, we designed an educational application on a humanoid robot, Kebbi Air S (Nuwa, 2022). The learning system and game UIs were co-designed by a professional specializing in traditional Chinese teaching. Also, we invited five students (excluded from the experiment) to give feedback on the game UIs, the number of questions per set, and the robot’s functions.

4.1 Humanoid robots

Kebbi has one head, two movable hands, and four wheels. As shown in Fig. 1, Kebbi has seven motors to control seven parts of its body, including a neck, two shoulders, two elbows and two fists. With the motor, Kebbi can turn its head left and right and move the head up and down. Also, different human- like hand movements can be designed, such as shaking hands, making fists, cheering, agreeing, or disagreeing. With two swivel wheels and two auxiliary wheels, Kebbi can “dance” by rolling the wheels. The functions of Kebbi enable more humanoid interactive design elements to increase student engagement.

Fig. 1
figure 1

Kebbi has seven motors to control seven parts of its body, including a neck (1), two shoulders (2), two elbows (3) and two fists (4)

4.2 Pedagogy design

The pedagogy was co-designed with an education professional and reviewed by an experienced teacher. This study’s application consists of four training sets covering vocabulary, short sentences, and sentence expansion (Fig. 2), discussed in detail as follows.

Fig. 2
figure 2

The game samples

Game 1: Vocabulary

When learning a language, students usually start with vocabulary which is often considered a language’s foundation (Fung et al., 2023). Insufficient vocabulary knowledge causes difficulties in language use (Laufer, 1986). Furthermore, vocabulary acquisition is necessary for basic communication (Andersen, 1983). However, effective teaching of Chinese vocabulary is challenging. In this application, the screen pops up a vocabulary for the vocabulary-oriented game, as shown in Fig. 2(a). Students have a maximum of three chances to read out the vocabulary. Whether students read correctly or not, the robot leads them to read again. The purpose is to consolidate students’ memory of the vocabulary.

Game 2: Short sentence

Vocabulary is essential to language development, but it needs integration with other aspects (Gu, 1994). Short-sentence training helps consolidate the foundation of language learning (Gu & Johnson, 1996). As a result, this work includes a series of short-sentence learning modules. In the short-sentence game, the screen shows an image with context. The robot asks questions about the image, as shown in Fig. 2(b). Students were asked to read a short question related to the image, as shown in Fig. 3. For example,

  1. (1).

    Robot: “呢個係咩嚟 ?” (What is this?); Student: “係蘋果。” (This is an apple.)

  2. (2).

    Robot: “呢個蘋果係咩顏色 ?” (What is the colour of this apple?); Student: “蘋果係紅色。” (The apple is red.).

Fig. 3
figure 3

The whole interaction process

Game 3 Sentence expansion

Some young students need to improve their ability of sentence expansion because they have to expand the sentence’s key points and add further information before and after (Willis, 1981). For example, the student may construct a short sentence in daily communication, such as “我去茶樓。” (I went to a dim-sum restaurant) (Fig. 2(d)). The audience may want to know more about who went with you, when you visited the restaurant, and what you did there. So, the sentence can be expanded as “我今日同屋企人去茶樓食燒賣。” (I went to the restaurant with my family to eat Siu Mai today.). Therefore, the humanoid robot is designed to guide students in learning step-by-step about sentence expansion by asking when, why, how, and so on. For example, (1) Robot: “你今日去邊呀?” (Where did you go today?); Student: “我今日去茶樓。” (I went to a dim-sum restaurant today). (2) Robot: “你同邊個去茶樓呢?” (Who went with you?); Student: “我今日同屋企人去茶樓。” (I went to the restaurant with my family today?). (3) Robot: “你今日同屋企人去茶樓做咩?” (What did you do in the restaurant.); Student: “我今日同屋企人去茶樓食燒賣。” (I went to the restaurant with my family to eat Siu Mai today).

4.3 Game interface design

The game UI design is based on a standardized specification to ensure consistency and usability across different games and learning contexts, as shown in Fig. 4. The font, font size, font colour, background colour, and button design were iterated based on the feedback from five students. This approach allowed us to incorporate user feedback into the design process and create a more user-friendly and engaging UI. We also aimed to maintain consistency and high contrast of colour comparison to enhance readability and accessibility for all users.

Fig. 4
figure 4

This is the standardized specification. Area A is for the main content, while Area B is for the answer display and sound recording animation

As shown in Fig. 4, all games adopt the standardized grid to display different learning elements, such as vocabulary, short sentences, and sentence expansion. The large home button icon is in the upper left corner, while the recording button is in the lower middle. When students are ready to read their answers, they need to press the recording button, which triggers a recording animation to remind them that they are being recorded, as illustrated by Fig. 5. Also, to foster students’ autonomy in the learning process, our application design involves progress tracking (Fig. 6(a)) and performance report (Fig. 6(b)) and 6(c)).

Fig. 5
figure 5

This is the recording animation to remind the students that they are recording

Fig. 6
figure 6

The progress bar and performance report

4.4 System design

The language learning system was developed using Android Studio, incorporating automatic speech recognition (ASR) technology, which includes STT and TTS functions. The system can facilitate the conversion of spoken language into text and supports Traditional Chinese (Cantonese) transcription and translation. The pre-programmed system compares student input with transcription and translation, enabling students to receive instant feedback and improve their language skills.

Fig. 3 illustrates the four-step process for students to learn with the robot. In Step 1, the screen displays the learning task and prompts students to complete it. In Step 2, the robot provides instruction and instant feedback, guiding students to input their answers. Step 3 involves teaching, where the robot corrects students when they answer incorrectly. Finally, Step 4 is memory consolidation, where the robot recaps the answer and asks students to read it once more to strengthen their memory.

The learning system prototype consists of four training sets, each containing 13 games that cover three vocabulary acquisition, five short sentences, and five sentence expansions. Each game carries equal weight, and students can make a maximum of three attempts to record their answers for each game. The recording duration is limited to 10 seconds, but students can press the stop button to finish recording before the time limit.

4.5 Interaction and motivation design

In the humanoid robot training, three humanoid interaction elements are implemented to trigger students’ learning motivation and engagement. When students answer the question correctly, the robot praises them: “Well done! You are smart!” with a positive facial expression, a shaking head and a happy gesture, as shown in Fig. 7(a). If students answer the question incorrectly, the robot encourages them by saying: “I believe you can do it! Let’s try again!” with an encouraging facial expression, a turning head and an empathy gesture, see Fig. 7(b). However, if students have no interaction with the robot, it motivates them by asking: “Hey! Are you thinking of the answer?

Fig. 7
figure 7

The facial expression of the humanoid robot

Let’s try!” with a supportive facial expression, a nodding head and a gesture, and a thinking gesture (Fig. 7(c)). After finishing the training, the robot appreciates students’ interaction by showing an exciting face, a moving head and a powerful gesture with a lively song, as shown in Fig. 7(d). The functions and designs are shown as follow:

  1. 1.

    Gesture

    • Our design: With a simple command, Kebbi can act like a human to interact with participants utilizing gestures such as “add oil” and “hurray”, as shown in Fig. 7.

  2. 2.

    Mobility

    • Function of Kebbi: Kebbi has four wheels, which can help it to move forward, backwards and turn around, as shown in Fig. 7. The speed of Kebbi can be controlled by programming commands, allowing it to walk or dance like a human to interact with participants with a simple command.

    • Our Design: The robot is programmed to perform specific actions along with songs, gestures and facial expressions.

  3. 3.

    Animacy

    • Our design: Kebbi can support animation displays. When developing the educational app, image frames were organized in GIF to produce short animation, as shown in Fig. 8. This education app consists of 125 combinations (Table 3) for gestures, facial expressions, and motivation, to react to correct, incorrect and no interaction conditions, respectively, which are randomly shown to participants. There are 10 songs to praise the participants after finishing each training set. The large number of combinations can guarantee that participants have a sense of freshness and curiosity.

      Fig. 8
      figure 8

      The samples of facial expression in three conditions, including correct, incorrect and no interaction

      Table 3 Combinations of gestures, facial expressions, and motivation sentences (Appendix A.2)

5 Experiment

This section explains the experiments designed to investigate the effectiveness of the humanoid robot and answer the research questions.

5.1 Participants, procedures, and regulations

For a fair comparison, school teachers formed two groups of students with similar backgrounds: a control group (students who learn with teachers) and an experimental group (students who learn with humanoid robots).

5.1.1 Triangulation process

Our work followed the triangulation process, utilizing questionnaires (Section A.1), observation (i.e., students’ interactions with robots and informal conversations, Section A.3), and language proficiency tests (Section 5.1), to measure students’ engagement. For the observation, two raters, one with a major in psychology and the other in inclusive education, independently analyzed the feedback received from the students. To determine the consistency of the feedback, Cohen Kappa was used, and the inter-rater reliability was found to be substantial (k = .808, p < .01).

5.1.2 Questionnaires

The pre- and post-questionnaires consisted of four variables categorized into need satisfaction and student engagement. Each variable was assessed using a 5-point Likert scale. To ensure the clarity of the questionnaire, an experienced teacher reviewed the items. The questionnaire is presented in Appendix A.1.

5.1.3 The control group

33 students (15 females and 18 males) aged 6 to 11 years old were recruited into the control group from a local primary school in Hong Kong (M¯ =9.06-year-old, SD=1.47). They did the pre- and post-questionnaire on the first and fifth days. For the control group, the participants learned in the classroom as usual. Their teachers are registered Chinese teachers with a few years of teaching experience. The learning content was prepared and designed by the teachers. Exercises had the same level of content as the robots.

5.1.4 The experimental group

31 students (14 females and 17 males) aged 6 to 10 from the same primary school (M¯ =8.81 year-old, SD=0.94) took part in the experimental group, as shown in Fig. 9. They did a pre-and post-questionnaire on the first and fifth day, respectively and played the humanoid robot training game for five days, i.e., one set per day. From Day 1 to Day 4, students were required to play from Set 1 (pre-test) to Set 4, respectively. The difficulty of the four training sets increases incrementally. Students were required to finish Set 1 again (post- test) on the fifth day to test their learning efficacy, i.e., language proficiency tests. Students take about 20 minutes to finish a set of tests. Students were divided into groups of six to conduct the training session in a classroom. Each student was allocated to a table, and neighboring tables were separated by two meters to reduce training interference, such as arousal or social comparison.

Fig. 9
figure 9

The participants were learning with the humanoid robot in a classroom setting

The inclusion criteria for students to participate in this study were: (1) studying in grade 1 to grade 5; (2) being able to read traditional Chinese characters and speak Cantonese; and (3) having no other medical or physical disabilities that might interfere with the interaction with the robot and reading aloud ability. Also, all students have experience in using digital tools such as tablets. Before running the experiment, informed consent was obtained from the student’s parents. Participation is entirely voluntary and based on consent. The University Institutional Review Board (IRB) approved the experimental protocol. We do not provide any remuneration to the participants.

5.2 Data analysis

Data analysis is conducted to evaluate and answer the two research questions, including (1) students’ learning engagement and motivation, and (2) learning proficiency.

5.2.1 RQ1: To what extent can a humanoid robot impact students’ learning engagement and motivation?

RQ1 evaluates the extent to which humanoid features help students become more engaged and motivated in learning. As shown in Fig. 10, the analysis of covariance (ANCOVA) result revealed that learning with robots can significantly improve students’ engagement in the behaviour, emotional, cognitive and intrinsic motivation compared with traditional learning.

Fig. 10
figure 10

Overview of analysis of covariance (ANCOVA) for three engagements

Behavioural engagement

The experimental group exhibited a statistically significant improvement, p < .01 (pre-test: M = 11.71; post-test: M = 13.26). However, the control group slightly decreased, p = .60 (pre-test: M = 10.91; post-test: M = 10.61).

Emotional engagement

The experimental group exhibited a statistically significant improvement, p < .01 (pre-test: M =11.26; post-test: M = 12.74). However, the control group slightly decreased, p = .72 (pre-test: M =10.97; post-test: M = 12.58).

Cognitive engagement

The experimental group exhibited a statistically significant improvement, p < .001 (pre-test: M = 10.65; post-test: M = 12.94). However, the control group slightly increased, p = .96 (pre-test: M = 10.85, post-test: M = 12.88).

Intrinsic motivation

The experimental group exhibited a statistically significant improvement, p < .05 (pre-test: M = 11.48; post-test: M = 12.87). However, the control group slightly decreased, p = .51 (pre-test: M =10.45; post-test: M = 10.06).

From the descriptive statistics shown in Table 4, we can observe that, before the five-day learning, the three engagement and intrinsic motivation of the two groups are similar. After the five-day learning with robots, the control group dropped in behavioural engagement, showed no change in emotional engagement, and slightly improved in cognitive engagement and intrinsic motivation learning. However, the experimental group greatly improved in three engagements and intrinsic motivation. Additionally, the variation of the experimental group was less dispersed. The preliminary findings implied that the robot-empowered learning could motivate students to engage behaviourally, emotionally, cognitively, and actively in technology-based learning.

Table 4 Descriptive statistics of the questionnaire. *, ** and *** denote p <.05, p < .01 and p < .001, respectively

5.2.2 RQ2: How does a humanoid robot affect students’ learning proficiency?

RQ2 examines whether the humanoid robot enhances learning proficiency in the experimental group. As such, the one-sample paired t-tests were used to compare the average pre- and post-test performance. It can be observed from Table 5 that the experimental group demonstrated a significant improvement with p < 0.05. In addition, descriptive statistics of the training performance for the experimental group in the pre-and post-training sets are presented in Table 5. These results indicate that all the assumptions were met for conducting the paired t-tests for RQ2. Throughout the five-day training, the humanoid robot can significantly enhance the learning efficacy in the experimental group. This reinforces the SDT theory, i.e., when the psychological demands are appropriately addressed by pedagogical design, students are motivated to engage in learning. In the following, we describe some key implications. As shown in Fig. 5, the improvement was statistically significant after the learning sessions (Pre-test: M = 0.81, SD = 0.15; Post-test: M = 0.89, SD = 0.13; p < .001).

Table 5 Overview of one-sample paired t-tests for the experimental group

5.3 Key implications

The experiment results indicated that participants liked the feature of moving hands. With mobility, gesture and facial expression, Kebbi has changed from a cold plastic to a warm companion. Students are more engaged in learning. User behaviour expressed non-verbally told us they interacted with Kebbi, such as eye gaze and smile. When participants answered a question incorrectly, Kebbi showed a “hugging” gesture with a wronged facial expression and moved to the participant, as shown in Fig. 11(b). Kebbi acts as an intimate person to empathize with the participants’ learning difficulties and then comfort them.

Fig. 11
figure 11

Students interact with the robot (touching the robot’s hands)

5.3.1 RQ1: To what extent can a humanoid robot impact students’ learning engagement and motivation?

In the traditional classroom setting, some students may find it challenging to respond to questions. This could be due to feelings of embarrassment if they give incorrect answers in front of their peers, despite the encouragement provided by the teacher. As a result, these students may choose not to participate due to low self-esteem. However, when studying with a robot as a study partner, they may be more willing to answer questions since there is no fear of being judged by others for giving incorrect responses.

We received positive feedback from some students who expressed their happiness with Kebbi. The students requested Kebbi’s presence the following day, expressing a desire to learn with the cute robot. For example, the students always asked us, “Will you come tomorrow? I want to learn with this robot. It is very cute!” Also, there is convincing proof of the effectiveness of the robot’s ability to assist in the learning process. According to the class teachers, one student hates Chinese and would become angry. The student shows his temper when being asked to complete a few lines of Chinese writing. However, since interacting with Kebbi, the student has shown a remarkable improvement in his engagement and attitude towards learning. Furthermore, we observed students’ interaction with the robots during the training sessions. Students were very excited about knowing their robots’ names because they felt a sense of belonging with their robots. The observation is in line with RQ1.

The humanoid features can help students become more engaged in progressing interaction, positive learning manners, and active learning. Students in the experimental group increased 13.24% in behavioural engagement, 13.14% in emotional engagement, 21.56% in cognitive engagement, and 12.07% in intrinsic motivation, respectively. However, students in the control group only showed slight improvement and even a decrease in the three engagements and intrinsic motivation. Also, almost all students enjoyed interacting with the robots, such as touching the robots’ hands, heads, and cheeks and even hugging the robots, as shown in Fig. 11. For example, a student received happy face feedback from the robot as a form of praise after answering a question (Fig. 12). This positive feedback encouraged the student to increase his level of interaction with the robot, progressing from touching its hand to taking the robot’s hand and touching it to his cheek.

Fig. 12
figure 12

A student interacted with Kebbi

5.3.2 RQ2: How does a humanoid robot affect students’ learning proficiency?

Students are actively engaged in the learning tasks with robots. As shown in Fig. 5, the experimental group increased the training score from 0.81 to 0.89, equivalent to a 9.88% improvement in a five-day learning. They felt loved by the robots and actively learned with the robot, which conforms with RQ2. For example, student C said, “You see, the robot can sing and dance! Very cute!”

6 Discussion

6.1 Emotional engagement

Emotion and empathy in Human-Robot Interaction (HRI) are crucial for fostering student engagement and motivation (Mejbri et al., 2022). Empathic robots can better understand and respond to students’ emotional states, leading to more effective learning experiences (Paiva et al., 2017). In our study, the humanoid robot’s ability to display emotions and empathic behaviours significantly contributed to the students’ emotional engagement and intrinsic motivation. As shown in Table 4, the emotional engagement of the experimental group had a statistically significant improvement (+13.14%) and was less dispersed (pre-test: SD =2.16, post-test: SD =2.05) after a five-day training.

6.1.1 Gesture

As aforementioned, a robot-assisted learning environment can enhance students’ perceived digital relatedness by fostering a sense of belonging and social connection. In our study, a humanoid robot provided social support through non-verbal communication, including gestures and facial expressions, which contributed to students’ overall engagement and motivation.

6.1.2 Mobility

A robot’s mobility can enhance students’ engagement by providing unique interactions. Our research utilized a highly mobile robot with movable hands, head, and wheels, allowing us to program specific movements to accompany songs, gestures, and animations. By creating a supportive and relaxed learning environment, students can improve their emotional engagement with the robot. Also, students actively learned from the robot. For example, a student laughed happily, “It can nod and shake the head.”

6.1.3 Animacy

Animacy in robots can stimulate students’ curiosity as the animate robots can create a sense of wonder, leading to increase engagement (Schulz et al., 2019). In our work, the use of animations in response to students’ answers, such as crying and smiling, can create a more interactive and engaging learning environment. For example, students saw the robot’s blinking eyes and asked, “If I talk to him, will he reply to me?”. Once they started the training, students were very excited when the robot replied to them, “You see, you see, he talked to me!”

6.2 Behavioural engagement

Conventional learning methods are often passive, leading to disengagement. In contrast, robot-assisted learning provides a vibrant and dynamic learning environment with personalized feedback and support (Tanaka & Matsuzoe, 2012). In our work, students in the control group slightly dropped in behavioural engagement. There are two possible reasons why this may have occurred.

First, traditional classroom instruction may be less engaging than humanoid robot instruction. A teacher told us that the robot was cute, smart and entertaining. However, teachers could not sing or dance to students when they answered questions correctly. Instead, teachers could only praise students verbally or by sending stickers. Learning with humanoid robots is more interactive than human tutors. Therefore, students in the experimental group showed a 13% improvement in five-day training, while those in the control group dropped slightly. Second, studying in a traditional classroom can sometimes feel stressful. The factors, such as teaching quality, learning environment, student-teacher relationships, instructional methods, curriculum, and student characteristics, can impact engagement levels. A student told us that he disliked attending Chinese classes because the teacher was too strict. The student even refused to enter the classroom because of a Chinese quiz.

The humanoid robot can support a variety of engagement and learning styles to establish a more enjoyable and supportive learning environment. For instance, robots can provide learners with feedback and encouragement, participate in discourse with them, and customize their training based on their needs and preferences. By establishing a more positive and supportive learning environment, this form of engagement can encourage more effective and efficient (Kanda et al., 2004). In our work, the robot can provide a positive response, as depicted in Figures 7(a), 7(b) and 7(c). Students were more eager to join the training session. In addition, after students completed the game, the robot would sing and dance to support the kids. Numerous students exclaimed that the robot was adorable and that they adored it.

6.3 Cognitive engagement

Conventional training methods may make students feel tired. In contrast, students can control over their learning pace and level with this robot, which can enhance their cognitive engagement. For example, a student told us, “I like this robot, especially the challenging content. When I am at home, my family speaks English. My sister can only teach me basic Chinese. So, I can use this robot to learn more complex Chinese content.”

Reeve (2009) argued that autonomous motivation is critical to promoting cognitive engagement, and our study supports this claim. However, our analysis found a significant improvement in cognitive engagement among primary school students who interacted with the educational robots (+21.56%, p<.001) compared to those who did not (+0.28%, p=.96). We also observed that the students followed the robot to read the correct answer. It demonstrated the participants’ self-learning ability with robots.

6.4 Intrinsic motivation

The robot that uses expressive gestures may be more engaging to participants because it can convey emotion to help build a rapport with them. As shown in Fig. 7(a), the robot displayed rotating “admired” eyes, nodded, and showed cheering gestures each time the children were correct. The observation aligned with the performance, which the participants in the experimental group rated 12.87 points in the post-test, which is 10 points higher than the pre-test.

6.5 Human-robot interaction

HRI has emerged as a crucial area of research as robots become increasingly incorporated into various aspects of daily life. Based on the results of the questionnaire concerning emotional engagement, the improvement of the experimental group is more significant than that of the control group. Our findings reinforce a prior study by Powers et al. (2007): a robot’s ability to exhibit emotions and empathy can considerably increase its effectiveness in providing social assistance. The participants in the experimental group were engaged, happy, and experienced learning with the robot to be delightful. For example, after learning with the robot, students followed the robot to read the word once.

6.6 Limitations

Despite the effectiveness of humanoid robots in enhancing student engagement, the educational application lacks customization. Implementing generative AI could enhance personalization. Physical distancing measures and a pipeline method were used to minimize arousal levels. Short-term studies provide valuable insights into the benefits of educational robots.

Lack of customization

While the humanoid robots used in this study have been effective in enhancing student engagement, it is worth noting that their educational application has not been fully customized to meet the individual learning needs of each student, which may limit their effectiveness to some extent. For instance, students may desire more than just an instant reply with STT technology, as they may want instant communication with the robots. As such, it may be beneficial to implement generative AI to enhance the personalized learning experience for students.

Classroom setting

Due to space limitations, multiple students are required to learn in the same classroom simultaneously, which can trigger arousal in some individuals. However, we implemented physical distancing measures in our classroom, with all participants seated two meters apart, limiting social comparison’s impact on arousal levels. Furthermore, we used a pipeline method for our training sessions, where each participant’s start and end times varied. As a result, the impact on arousal levels was minimal.

Short-term study

The study has only been conducted over a short period, and a long-term study may better reveal the long-term effect of robot-assisted learning. However, short-term studies can still provide valuable insights into the potential benefits of educational robots. It may be less burdensome for students and teachers as they do not require a significant time commitment.

7 Conclusion and future work

This study demonstrated that robot-assistive interactive learning positively promoted students’ self-learning ability and attracted them to learn independently. After five days of training, the students in the experimental group showed significant learning efficiency. This short-term study has offered some design opportunities in humanoid robot-assisted learning. For example, students want to feel a sense of belonging. Thus, giving a robot a specific name can make it more human and improve engagement. Also, students prefer interactive learning over traditional learning, and robots can provide students with more interactive communication or personalized dialogue instead of conversations solely reflecting the learning content. If robots can teach in a more personalized way, students will be engaged and feel a sense of connection. The application can contribute to learning on a large scale and longer-term study in alternative languages, such as English and Mandarin.

On the other hand, the humanoid robot can also be used to support learners with special needs or those who face learning challenges, such as children with autism. We will invite more diversified groups of students with varied learning conditions, such as students with dyslexia and autism and non-Chinese speaking students (NCS), to participate in the experiments in the future.