Introduction

Emotion generation is an important research problem in emotional dialogue systems, where the goal is to enable models to produce natural, realistic, and emotionally rich expressions of emotions. However, in emotion generation, focusing solely on the emotions themselves may not adequately captured the diversity and complexity of human emotions. As a result, more and more studies have begun to incorporate personality factors into emotion generation models to more fully simulate and express human emotional experiences.

Researches have shown that personality plays an important role in an individual’s emotional experience [1,2,3]. Different personality traits can influence the way an individual feels, expresses, and regulates emotions; Zhang et al. [4] have explored the relationship. For example, an extroverted and cheerful person may be more likely to express positive emotions, whereas an introverted and nervous person may be more inclined to experience negative emotions. Thus, incorporating personality factors into models of emotion generation can more accurately model and predict an individual’s emotional responses and provide more personalized and adaptive emotion generation [5].

Previous research has explored various approaches to integrate personality and emotion within generation models. Zhang et al. [6] explored how to add personalized and emotionally rich features to conversational systems, employing an unsupervised learning-based approach that correlates user-provided personal information and conversation history with personality traits to enable more personalized and emotionally rich conversational interactions. Wen et al. [7] proposed to improve the accuracy of future emotion by combining BigFive personality traits in the VAD emotional space in the context of emotion prediction of texts. Chen et al. [8] also constructed an annotated dataset of personality emotions in Chinese for use in an AI dialogue system. However, current research efforts that incorporate personality tend to focus on emotion accuracy prediction and neglect emotion anthropomorphic personality generation, which may degrade the experience of human-computer emotion interactions in dialogue systems.

This paper aims to provide anthropomorphic generation of emotion for the dialogue system, thus creating a natural and harmonious HCI. To address this problem, we have to consider how to generate anthropomorphic emotion in future moments from the limited dialogue history information. Therefore, inspired by real human emotion processes and psychological work, we generalize three main factors influencing the generation of emotion from the perspective of time span. These are long-term factor (personality), medium-term factor (sentiment), and short-term factor (emotion) to enhance the anthropomorphic generation of emotion. Thus, a method of emotion generation based on deep neural network is proposed, which is called Personality-Enhanced Emotion Generation Model (PEEGM). PEEGM first assigns personality traits to the dialogue agent and then uses the emotion perception to analyze the input corpus, and the obtained emotion is sent to the emotion inference unit to realize emotion generation. Experiments are carried out on datasets PELDFootnote 1 to verify the effectiveness of our proposed method. At the same time, our model has advantages in matching degree and appropriateness compared with the baseline model. To summary, our contributions are as follows:

  • A new task is defined: emotion generation task of dialogue systems.

  • Three main factors in modeling the emotional state of a dialogue system are summarized from a temporal perspective: personality, sentiment, and emotion.

  • Proposed a personality enhanced emotion generation model for emotion generation task.

  • Experimental shows that PEEGM can properly and reasonably achieve emotional state generation in dialogue.

Related Work

In this section, we review the works related to the modeling of personality, sentiment, and emotion. Next, we make a review of important theoretical models of personality. Finally, we review emotion modeling that integrates personality traits in conversations.

Personality, Sentiment, and Emotion

Personality, sentiment, and emotion are interconnected but distinct aspects of human psychology. Funder [9] states that personality refers to consistent patterns of thoughts, feelings, and behaviors that remain stable across situations and over time. These stable patterns of thoughts, emotions, and behaviors persist across various contexts and situations, providing a unique and consistent framework for understanding an individual’s psychological makeup. Sentiment refers to an individual’s attitude towards the satisfaction of objective things or events [10, 11]. It is often expressed as a tendency towards certain emotions, such as positive or negative sentiments. Sentiment is subjective and highlights an individual’s personal experiences over a period of time. It is characterized by relative stability and feedback, meaning that one’s sentiment can influence and be influenced by external factors and interactions. However, emotion are complex psychological and physiological responses to specific events or stimuli [12]. Emotion are often intense but relatively short-lived experiences that can fluctuate rapidly in response to external or internal triggers. One of the most popular models in the field is the Six Basic Emotions Model proposed by Ekman et al. [13].

Theoretical Models of Personality

In conversation, we discussed several personality models and their application to emotion generation. The Big Five model (FFM) [14] is a widely used personality model that includes five dimensions: extraversion, agreeableness, conscientiousness, emotional stability, and openness. These personality traits are related to emotional states and can be used in computational methods of emotion generation. The FFM personality traits and description are shown in Table 1.

Table 1 The BigFive personality traits and description

MBTI is a personality model based on psychological type theory that categorizes personality into 16 types. Myers and McCaulley [15] developed the original MBTI. One critique of the MBTI is that it is less reliable than other tests and which may account for many disparity in findings [9]. Its relevance is still of value in emotion generation, but more in-depth research is needed to determine its application and effectiveness in emotion generation.

These personality models provide important theoretical foundations and computational methods for emotion generation, which can be personalized by integrating personality traits and emotional states. However, since Bigfive’s personality evaluation is more comprehensive and quantified with numerical values, which is very much in line with the computer’s processing paradigm, the model has been widely used in research on NLP, personality/emotional computation, and so on.

Integrating Personality Emotion Modeling in Dialogue Systems

Emotion modeling that incorporates personality in a dialogue system is intended to enable the system to interact with the user in a more personalized and emotionally rich manner. Egges et al. [2] presented a generic model for describing and updating the parameters related to emotional behavior and also explored how existing theories for appraisal can be integrated into the framework. Egges et al. [3] proposed a model based on personality and emotional states to change the robot’s head behavior when pronouncing words and determined the intensity of the head response based on different personality traits. Mechanisms of emotional personality effects on machine behavior are explored. Ball et al. [16] obtained the transformation relationship between personality and emotion in PAD emotion space through statistical analysis and fitting. Therefore, Breese et al. [17] considered the correction of personality to emotion in the constructed emotion model. Johns et al. [18] are also interacting robots with predefined personalities to influence emotional state transfer. Zhu et al. [19] propose a comparative learning and generation-based model for zero-shot personality attribute extraction to facilitate HCI research under personality. Wen et al. [7] constructed a dataset with personality and emotion annotations and designed an emotion prediction model to go through the conversation to predict emotion in future moments.

By incorporating emotion modeling of personality, dialogue systems can better understand the emotional state of the user and be able to respond and interact in a more personalized and emotionally rich manner [20]. However, there are still challenges in performing emotion modeling, such as data scarcity, emotion ambiguity, and emotion transfer. In this paper, we focus on how to achieve plausible emotion generation with personality using limited rounds of conversation data combined with personality.

Fig. 1
figure 1

The illustration PEEGM, which consists of a transformer encoder, an emotion perception, and an emotional state inference unit

Methods

In a dialogue system, incorporating appropriate emotional expressions can greatly enhance user experience and engagement. In terms of the duration of the emotion, the human’s emotional process is influenced by factors like long-term, medium-term, and short-term factors. Inspired by psychology, this paper categorizes these three factors as personality, sentiment, and emotion.

  • Relevant work shows that [16, 21]personality and emotional expression have a strong correlation, and there are differences in the emotional expression of different personalities in dealing with things, and this difference is long-term stable. Therefore, we view personality as a long-term factor.

  • Sentiment is a part of the cognitive attitude, and its emotional tendency (positive, negative, or neutral) will largely influence the emotional state of the future moment. But this emotional impact is relatively volatile. Therefore, we treat sentiment as a medium-term factor in dialogue.

  • Emotion is short-lived, and there will be different emotional states at each moment in the dialogue, and there is a temporal relationship between these states, that is, emotions at future moments are affected by emotional states at previous moments. Therefore, we view emotions as short-term factors. Therefore, the method design of this paper revolves around these three points.

In this section, we will present the PEEGM for the emotion generation task in dialogue systems, along with the details of its implementation.

Task Formulation

A typical dialogue process involves multiple interactors. However, in this paper, we focus on dialogues involving two interactors and do not consider scenarios with multiple interactors.

The objective of the emotion generation task in dialogue systems is to effectively model the emotional states of dialogue agents, thereby enhancing their emotional capabilities. Given the dialogue history \(D=\{U_1, R_1, U_2, R_2,..., U_T\}\) until a previous time T and the specified personality traits \(P_S\), we aim to generate the emotional state \(e^R_T\) for \(P_S\) at the next time. We formulate this task mathematically as Eq. 1.

$$\begin{aligned} \begin{aligned}&p(e^R_1,e^R_2,...,e^R_T)\\ {}&=p(e^R_1|U_1,P_S)p(e^R_2|U_1,R_1,U_2,P_S)...p(e^R_T|U_1,R_1,...,U_T,P_S)\\&=\sum _{i=1}^{T}p(e^R_i|U/R_{i\le T},P_S) \end{aligned} \end{aligned}$$
(1)

The emotional state \(e^R_i \in \{Surprise, Joy, None, Fear, Sadness, Anger\}\), where i represents the subscript index and T denotes the total number of dialogue turns.

In contrast to prior research, this paper specifically addresses two aspects: (1) modeling the emotional states of dialogue agents themselves and (2) generating emotions while taking into account the influence of personality traits. Consequently, the primary challenge lies in effectively leveraging the limited emotional information present in dialogues to generate emotional states that are both reasonable and appropriate, aligning with specific personalities.

Personality Enhanced Emotion Generation Model

Referring to the human emotional process, an individual’s emotional state at a given time is influenced by three key factors: (1) long-term and stable personality traits, (2) medium-term and volatile sentiment, and (3) short-term and rapidly changing emotional states. Taking these factors into consideration, we have developed the Personality Enhanced Emotion Generation Model (PEEGM) that integrates them within an LSTM framework. The model architecture is depicted in Fig. 1.

The workflow of the PEEGM model is as follows: At time T, the emotional state \(e^R_T\) of the dialogue agent is calculated based on the user’s current external input and the previous internal input \(T-1\). For the user state input at time i, the dialogue text \(U_i\) is encoded using a transformer encoder, followed by emotion perception to discern the user’s emotional state. Subsequently, the emotional state inference unit (ESIU) processes the user’s emotional state. Similar to the user state input processing, the dialogue agent’s input comprises specific personality traits \(P_S\) and the content \(R_i\) of the dialogue text. Here, \(x^R_{T-1,1}\) represents the first word of the response at time \(T-1\), and the remaining terms follow a similar convention.

Emotional State Inference Unit

The process of dialogue interaction can be viewed as a time series. Drawing inspiration from text emotion prediction and empathic response generation, we have developed an emotional state inference unit based on the LSTM time series model. The primary objective is to leverage limited dialogue information and personality traits to predict and generate future emotional states.

To enhance the performance of the LSTM model, we have made significant improvements for the input gate in our Emotional State Inference Unit (ESIU). These enhancements cater to the input of long-term, medium-term, and short-term factors that influence emotional states. Additionally, we have introduced an emotional forgetting mechanism to facilitate the processing of the retention degree of the previous moment’s state. Furthermore, we have designed an emotion regulation mechanism to control the impact of personality and emotions on the current emotional state. Finally, to enable more accurate emotion generation inference, we have implemented an emotion feedback mechanism on the output gate. The unit structure is illustrated in Fig. 2.

Fig. 2
figure 2

The structure of ESIU, which includes emotion forgetting mechanism and emotion regulation mechanism

\(h_{T-1}\) represents the hidden state of the unit at the previous time, reflecting the short-term emotional information of the agent. \(s_{T-1}\) denotes the emotional state of the cell at the last time, capturing the medium-term sentiment information of the agent. \(c_{T-1}\) signifies the unit state of the cell, representing the long-term personality information. \(e^U_T\) corresponds to the user’s emotional state at time t. \(P_S\) stands for the agent’s personality traits. \(e^R_T\) represents the agent’s emotional state at time T, which is calculated based on the previous moment’s state passed by the unit and the current moment’s input.

Specifically, we define the improved input gate as 2:

$$\begin{aligned} i_T=\sigma (W_i*[h_{T-1},P_S,e^U_T]+b_i) \end{aligned}$$
(2)

The output gate is calculated by 3:

$$\begin{aligned} o_T=\sigma (W_o*[h_{T-1},P_S,e^U_T]+b_o) \end{aligned}$$
(3)

For Eqs. 2 and 3, \(\sigma\) is Sigmoid activation function, \(W_*\) is the parameter weight, and \(b_*\) is the parameter bias value.

Emotion Forgetting Mechanism

The generation of emotional states is influenced by both personality and the previous emotional state at the last time. Therefore, it is crucial to assess the effectiveness of the transmitted information from these two sources on the current emotional state. However, the native forget gate of LSTM falls short in effectively processing both \(c_{T-1}\) and \(s_{T-1}\) simultaneously. To address this limitation, we have designed a forgetting mechanism (FM) that replaces the forget gate in LSTM. This innovative mechanism enables us to exert reasonable control over medium- and long-term emotional memory, ensuring a more comprehensive integration of information from both \(c_{T-1}\) and \(s_{T-1}\).

Fig. 3
figure 3

The structure of emotion forgetting mechanism

The forgetting mechanism structure is shown in Fig. 3, which is calculated by Eqs. 47:

$$\begin{aligned} FM_{out}=FM([I_1,I_2])=\alpha *I_1 + \beta *\gamma \end{aligned}$$
(4)
$$\begin{aligned} \alpha =\sigma (W_\alpha *[I_1,I_2]+b_\alpha ) \end{aligned}$$
(5)
$$\begin{aligned} \beta =\sigma (W_\beta *[I_1,I_2]+b_\beta ) \end{aligned}$$
(6)
$$\begin{aligned} \gamma =tanh(W_\gamma *[I_1,I_2]+b_\gamma ) \end{aligned}$$
(7)

which \(W_*\) also is the parameter weight, and \(b_*\) is the parameter bias value.

Emotion Regulation Mechanism

The interplay between personality and emotion affects the generation of emotional states. Controlling the extent to which these factors influence emotional state generation poses a challenge. To address this, we devised an emotion regulation gate (RM) that governs the influence of medium-term and long-term emotional memory on the updating of the current emotion.

The structure of the emotion regulation gate is depicted in Fig. 4, and its computation is derived from Eqs. 8 to 9.

$$\begin{aligned} RM_{out}=RM([I_1,I_2])=tanh((1-g)*I_1+g*I_2) \end{aligned}$$
(8)

where g is defined as:

$$\begin{aligned} g=\sigma ([tanh(W_1*I_1+b_1),tanh(W_2*I_2+b_2)]) \end{aligned}$$
(9)

As mentioned previously, \(W_*\) represents the weight of the parameter, while \(b_*\) corresponds to the bias value of the parameters. Additionally, \(I_1\) denotes the first input parameter, and \(I_2\) signifies the second input parameter.

Fig. 4
figure 4

The structure of emotion regulation mechanism

To summarize, the emotional state \(e^R_T\) and parameter update of the dialogue agent generated at time T can be calculated using Eqs. 10 to 16, as follows:

$$\begin{aligned} M_T=tanh(W_m*[h_{T-1},P_S,E^U_T]+b_m) \end{aligned}$$
(10)
$$\begin{aligned} N_T=FM_s*s_{T-1}+i_T*M_T \end{aligned}$$
(11)
$$\begin{aligned} c_T=RM_c*c_{T-1}+i_T*M_T \end{aligned}$$
(12)
$$\begin{aligned} R_T=RM(c_T,N_T]) \end{aligned}$$
(13)
$$\begin{aligned} e^R_T=h_T=o_T*R_T \end{aligned}$$
(14)
$$\begin{aligned} r_T=\sigma (W_r*h_T+b_r) \end{aligned}$$
(15)
$$\begin{aligned} s_T=r_T*N_T \end{aligned}$$
(16)

Training

During PEEGM training, the dialogue history, personality traits of the dialogue agent, the sentiment sequence, and the emotion sequence of multi-turns of dialogue are input, and the model outputs the generated emotional state. The objective function of ESIU is shown in formula 17, and the model also minimizes the cross-entropy loss function (see Eqs. 1819) to optimize the model generation accuracy.

$$\begin{aligned} O(\theta )=\sum _{m=1}^{T}(P(e^R_m|U^{U/R}_{1:m};P_S;S^{U/R}_{1:m};e^{U/R}_{1:m})) \end{aligned}$$
(17)
$$\begin{aligned} L(\theta )=-\sum _{c=1}^{6}y_{ic}log(p_{ic}) \end{aligned}$$
(18)
$$\begin{aligned} y_{ic}=\left\{ \begin{aligned}&1, if \quad c=e_i \\&0, else \end{aligned} \right. \end{aligned}$$
(19)

In the above equation, \(y_{ic}\) represents an indicator variable. It takes the value of 1 if the emotional state generated in response sentence i matches the true emotional state, and 0 otherwise. The variable \(p_{ic}\) denotes the decision probability associated with the emotion state c for reply sentence i. Here, c serves as the index of emotional states, while \(e_i\) represents the true emotion state index of the reply sentence i.

Experiments

Dataset

In our research on emotional generation in dialogue systems, we focused on utilizing PELD (Personality Emotion Line Dataset) that has been annotated with Big-Five personality traits. PELD is an English dataset consisting of 6510 dialogue tuples, carefully crafted by PolyU using the script of the popular TV show Friends. Each dialogue tuple comprises 1.5 turns and involves 6 main characters. The personality traits of these characters are labeled based on the Big-Five rules.

To ensure proper evaluation, we divided PELD into three sets: a training set (Trn), a validation set (Val), and a test set (Tes), maintaining an 8:1:1 ratio, respectively. For more specific details, please refer to Table 2 for the comprehensive statistics of these sets.

The results of the literature [7] show that the effect is not obvious under the original PELD dataset; therefore, in this paper, the PELD dataset is re-edited so that the number of conversation rounds is expanded to 2.5, the details of which are shown in R-PELD, and the subsequent experimental programs are carried out on this dataset.

Table 2 PELD dataset segmentation and emotions distribution

Baselines

To enhance the credibility of this study, we conducted a thorough investigation and found limited existing research on this specific task. Therefore, we selected a set of well-established time series models and text emotion prediction methods that share similarities with our task as baselines for comparison. This approach enables us to rigorously evaluate the performance and effectiveness of our proposed approach in relation to these established models.

  • LSTM is a popular model for time series analysis, particularly for handling long-range dependencies and addressing the vanishing gradient problem.

  • BiLSTM (BLSTM) is an improved version of LSTM that incorporates context information, resulting in modeling capabilities.

  • BiLSTM+ATT (BL+ATT) combines LSTM with the attention mechanism, which can effectively extract important information from context. It performs well in several natural language processing tasks.

  • GRU is a simplified and more efficient variant of LSTM that finds extensive applications in natural language processing (NLP).

  • Transformer (TRANS) is a time series modeling model based on the attention mechanism. Due to its strong time series modeling abilities, it is widely used across various domains.

  • IDS-EC is an interactive double-state emotional cell used for text emotion prediction. It is an improvement in the LSTM model.

  • PET is a text emotion prediction model which takes personality traits into account. It models and predicts both personality and emotions in the VAD (valence-arousal-dominance) space.

Evaluation Metrics

To evaluate the performance of PEEGM on emotion generation for dialog systems, we used a combination of automatic and manual evaluation. Although some aspects of emotion generation can be assessed by automated metrics. However, due to the subjective nature of emotions, manual evaluation is also required to ensure a comprehensive assessment of PEEGM’s effectiveness.

Automatic Evaluation

In the context of evaluating emotion generation, precision (P), recall (R), and F1-score (F1) are used as metrics to assess the performance of the classifier.

P measures the proportion of correctly generated positive samples out of all samples that the model generation as positive. It indicates how accurate the model is in generated emotions.

R calculates the ratio of correctly generated positive samples to the total number of actual positive samples. It captures the ability of the model to capture all the emotions present in the data correctly.

F1 is a combined measure that takes into account both precision and recall. It is calculated as the harmonic mean of these two metrics. The F1 score provides an overall assessment of the model’s performance, balancing both precision and recall.

All three metrics have values between 0 and 1, with higher values indicating better accuracy. Therefore, these metrics help us evaluate how well the model performs in correctly generated emotions while minimizing false positives and capturing as many true emotions as possible.

Manual Evaluation

Emotional state generation is subjective, and there are no good automatic evaluation metrics to evaluate it. To this end, this paper conducts manual evaluation from two aspects of matching degree and appropriateness. Specifically, 5 reviewers were selected from the research group to score the samples generated by the model from the above two aspects.

Matching Score(M-S)

is used to evaluate the matching degree of the generated emotional state with the current personality: very matching: 2, matching: 1, not matching: 0;

Appropriateness Score(A-S)

is used to evaluate whether the generated emotional state is appropriate: very appropriate: 2, appropriate: 1, inappropriate: 0. The final result is normalized to the mean.

Table 3 Results of PEEGM in emotion-level experiment
Table 4 Results of PEEGM in sentiment-level experiment
Table 5 Results of manual evaluation

Implementation Details

We implemented all the models using PyTorch [22] and use NVIDIA RTX2060 SUPER GPU to train them. Hyperparameters of PEEGM are set to: embedding dim =64, hidden dim = 256, dropout = 0.2, number of layers =1, batch size = 128, max sequence length = 64. We choose AdamW as an optimizer, and the lr= 0.0001. Each model has trained 50 epochs.

Discussion

Comparative Experiments

We conducted comparative experiments on emotion generation using the PEEGM and multiple baseline models on the PELD dataset. The results of each method were evaluated based on sentiment-level and emotion-level metrics including precision (P), recall (R), and F1 score (F1).

Overall, our findings, as shown in Tables 3 and 4, indicate that the IDS-EC model outperforms LSTM, BLSTM, BL+ATT, GRU, and TRANS models that do not include personality information in terms of P, F1, and R indicators. This demonstrates the modeling ability of the interactive double-state model in capturing the nuances of emotion transmission. Additionally, both the PET and PEEGM models, which incorporate personality information, outperform the IDS-EC model overall, suggesting that personality information can enhance emotion generation. Notably, the PEEGM model outperforms PET at both the emotion level and sentiment level, with average improvements of 1.14%, 3.98%, 0.42%, and 5.0%, 1.73%, and 2.1%, respectively. This improvement can be attributed to PEEGM’s consideration of three influential factors: personality, sentiment, and emotion, throughout multiple rounds of dialogue. In contrast, PET does not take these factors into account, highlighting the effectiveness of PEEGM in the emotion generation task.

To further analyze the differences across emotion categories, we examine the performance of the models on each category, as presented in Table 3. In most categories, PEEGM achieves the best results. However, PET shows slight improvements of 0.4% and 0.2% in R for “surprise” and “anger,” respectively, and a 0.1% increase in F1 for “surprise.” This could be attributed to PET benefiting from RoBERTa’s strong modeling ability, thereby performing better in emotional states with a larger proportion of training samples. Regarding sentiment-level evaluation, Table 4 demonstrates that PEEGM outperforms other baseline models overall. However, the R-value for positive emotion is slightly lower than that of PET. Again, this improvement can be attributed to RoBERTa’s modeling ability. Despite the lower performance in some emotional states, PEEGM still surpasses the random guessing probability of 0.17. Overall, PEEGM outperforms other baseline models, confirming its superiority in emotion generation.

Furthermore, we observed that data imbalance has a significant impact on the results. The model performs notably better in classes with more training examples, such as “none,” “joy,” and “anger.” Moreover, the results for “neutral” are better than those for “positive” and “negative,” aligning with the distribution of emotional states in R-PELD.

Table 6 Results of t-test based significance test for PEEGM vs. baselines
Table 7 Ablation study results of PEEGM in emotion-level
Table 8 Ablation study result of PEEGM in sentiment-level

Due to the subjective nature of emotions, manual subjective evaluation becomes essential. We employed 100 sets of multi-round dialogue samples from the test set to generate emotional states. Subsequently, five reviewers were invited to provide subjective manual scoring, and the evaluation results were normalized and presented in Table 5.

Regarding the A-S metric, both the PET and PEEGM models significantly outperformed baseline models that lacked personality information, highlighting the facilitation of appropriate emotional state generation through personality information. Notably, the PEEGM model exhibited a remarkable 1.2% improvement compared to the PET model, showcasing its exceptional ability in generating suitable emotional states.

In terms of the M-S metric, the PET model attained the highest performance, surpassing the PEEGM model by 0.2%. This demonstrates that the PET model, based on the VAD space, excels in modeling personality information, especially when data availability is limited.

In summary, the PEEGM model ensures both appropriate emotional state generation and a certain degree of personality matching. It outperforms other baseline models, making it the top-performing model in the emotion generation task.

To further analyze the differences between the experimental results, we did the significance analysis based on the T-test in automatic assessment and manual assessment for different baselines, and the experimental results are shown in Table 6. From Table6, the test value of p > 0.05 between the results of PEEGM and the other optimal methods shows that the difference between PEEGM and the other optimal methods is not significant in terms of marginal improvement of the actual results. We will look for more effective improvement methods. However, combined with Tables 3, 4, and 5, PEEGM still has a certain degree of overall advantage and to verify the validity of the method in this paper.

Ablation Study

Table 9 We extracted several sets of multi-party dialogue cases from “friends” for case study test. Due to the limitation of article space, we only tested three characters, such as Chandler, Joey, and Monica. Others will be explored in detail in future studies

To gain further insights into the influence of different modules on the model’s performance, we conducted ablation experiments. The outcomes presented in Tables 7 and 8 shed light on the impact of removing specific modules from the model.

Firstly, when we removed the guidance provided by the FM module, we observed a slight increase in the emotion-level performance of the w/o FM model compared to PEEGM. This intriguing finding revealed an average increase of 0.67%, 0.6%, and 0.05% in emotion-level analysis. Additionally, the sentiment-level performance exhibited mean increases of 0.2%, 0.09%, and \(-\)3.1%. These results suggest that the FM module might overlook certain significant emotional information, potentially limiting the model’s ability to fully capture and express emotions.

Furthermore, we explored the impact of removing personality traits from the w/o P model. The findings demonstrate a significant decrease in emotion-level performance, with an average decline of 1.89%, 2.18%, and 3.73%. Similarly, sentiment-level analysis showed an average decrease of 4.28%, 3.6%, and 2.52%. These results highlight the crucial role played by the agent’s personality traits in guiding the generation of its emotional state. By neglecting these traits, the model’s ability to convey emotions effectively is noticeably compromised.

Lastly, we investigated the effect of removing the RM module from the model. The results obtained from the w/o RM model indicate an average decrease of 3.3%, 1.62%, and 1.54% in emotion-level analysis, with a corresponding average decrease of 1.16%, 1.77%, and 2.34% in sentiment-level analysis. These findings underline the significance of long-term and medium-term emotional memory in updating the agent’s emotional state. The RM module plays a crucial role in the retention of emotional experiences, enabling the model to generate a more anthropomorphic emotional state. Its absence impairs the model’s ability to understand and respond to emotions anthropomorphically.

In conclusion, the ablation experiments provide valuable insights into the contributions of different modules to the overall performance of PEEGM. The results demonstrate that the FM module might overlook important emotional information, the agent’s personality traits significantly influence its emotional expression, and the RM module is essential for accurate emotional understanding. These findings emphasize the necessity of incorporating these modules to enhance the model’s ability to exhibit anthropomorphic emotional responses.

Case Study

To illustrate the model’s ability to generate emotions, this study presents a series of case demonstrations using the R-PELD dataset and the PEEGM. The emotional state responses under different personality settings in various dialogue scenarios are displayed in Table 9.

The case demonstrations align with the prescribed personality traits to some extent, as per the definition of the Big-Five personality factors. This indicates the effectiveness and validity of the model. However, certain cases, such as “Joey” displaying “sadness” and “Chandler” exhibiting “anger,” did not generate the expected emotions. We attribute this to two potential reasons: (1) limited knowledge learning due to uneven availability of data resources and (2) insensitivity of the model structure towards certain sentiments. Future enhancements will focus on addressing these aspects.

Conclusion and Future Work

Existing data-driven methods cannot fundamentally solve the consistency, rationality, and appropriateness of the emotions expressed by the dialogue system, thereby reducing the user’s willingness to interact. To solve this problem, inspired by psychology, this paper proposes an emotion generation task for dialogue systems and designs a personality-enhanced emotion generation model (PEEGM). In addition, to accomplish and verify this work, we carried out experiments on the R-PELD dataset. Finally, the experimental results show that PEEGM can actively generate appropriate emotional states based on specific personality traits in dialogue scenarios. It also reflects that “personality,” “sentiment,” and “emotion” affect the generation of emotional states on a time scale.

In conclusion, this paper contributes to the field of conversation systems by introducing a novel framework that combines personality modeling with emotion generation. The integration of personality enhances the authenticity and effectiveness of emotional responses, leading to more immersive and empathetic conversations. This work opens up new possibilities for developing intelligent dialogue systems that better understand and respond to users’ emotional needs and individuality.

Fine-grained emotional expression helps improve user interaction. In future work, we will combine emotional intensity for a more delicate emotional generation. In addition, combining prior knowledge is also one of the directions for our future exploration.