Emo-AEN: A Lightweight Network for Brand Image Design Based on Aesthetic Evaluation

Cheng, Honglei; Yi, Haorui; Lan, Guipeng; Xiao, Shuai

doi:10.1007/s11036-024-02314-y

Emo-AEN: A Lightweight Network for Brand Image Design Based on Aesthetic Evaluation

Research
Published: 01 April 2024

(2024)
Cite this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Mobile Networks and Applications Aims and scope Submit manuscript

Emo-AEN: A Lightweight Network for Brand Image Design Based on Aesthetic Evaluation

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Honglei Cheng¹,
Haorui Yi³,
Guipeng Lan² &
…
Shuai Xiao²

27 Accesses
Explore all metrics

Abstract

In the aesthetic evaluation of multimedia data, brand image design is closely intertwined with image aesthetics. Although previous researchers have made significant contributions in this field, the intrinsic relationship between the two has not been fully explored. To address this issue, this paper proposes Emo-AEN: a lightweight image aesthetic assessment method that combines brand image design with attention mechanisms. This method takes into account the aesthetic elements involved in the process of brand image design. The network first performs internal fusion operations to obtain fused features of brand image and image aesthetics. Then, through self-attention mechanisms, it thoroughly explores these fused features. This method not only enhances the expressive power of brand image design but also uncovers the intrinsic relationship between brand image design and image aesthetics,and its lightweight model architecture can be deployed in resource-constrained device environments.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

1 Introduction

Multimedia data, with its diverse forms of expression and unique aesthetic appeal, is widely applied in various fields such as business communication [1, 4, 17]. In the era of big data, manual assessment of the aesthetic appeal of multimedia data is not only inefficient but also costly, highlighting the urgent need for automated methods to evaluate the aesthetic qualities of data [3, 5, 18, 25]. However, traditional aesthetic evaluation methods, especially those focused on image aesthetics, often only consider the overall perceptual experience of humans, while neglecting individual preferences and subjective characteristics.The aesthetic quality of images is primarily subjective. Different individuals may assign different scores to the same image due to their personal preferences or emotions. Consequently, general image aesthetic assessment approaches tend to overlook subjective characteristics such as individual feelings and personality, resulting in a lack of personalized image aesthetic assessment. Furthermore, design experts have recognized the close relationship between image aesthetics and brand image design. Therefore, combining image aesthetic evaluation with brand image design has become imperative, as it can further enhance the effectiveness of brand image shaping. Thus, there is a need to adopt a deep learning-based automated approach that comprehensively considers individual differences and subjective aspects to evaluate the aesthetic appeal of multimedia data more comprehensively and accurately. By integrating this evaluation with brand image design, better results can be achieved.

Although deep learning models have achieved success in various fields [8, 10, 12], their emergence has come at the cost of increasingly large-scale models, requiring substantial computational and data resources. However, in practical computer vision applications, especially in edge applications like mobile devices, resource constraints such as limited computing power, high real-time requirements, and insufficient data must be taken into account. Therefore, resource-constrained deep learning theories, methods, and applications should receive adequate attention. There are several different directions that can make deep learning more effective in computer vision, such as reducing the required dataset size, computational memory, or related training and inference time. Most existing research focuses on improving the efficiency of deep learning methods. However, resources associated with real-world mobile devices can vary significantly, and what may be considered an effective approach for one resource budget choice may be entirely ineffective for another.

Therefore, this paper proposes Emo-AEN that combines brand image design with attention mechanisms for image aesthetic evaluation, fully considering the subjective features of brand image design. Obviously, the key focus of Emo-AEN lies in exploring the intrinsic relationship between brand image design and image aesthetics. Attention mechanisms provide a promising solution as they can filter out a small set of important information from a large amount of data, enabling deep learning algorithms to analyze and concentrate on this crucial information. Self-attention mechanism, as a variant of attention mechanisms, better captures the internal correlations of data or features, reducing reliance on external information. Inspired by this, we design a fusion feature extraction algorithm based on brand image and utilize self-attention mechanisms to merge image aesthetics and design information,and a model lightweighting strategy was further implemented to ensure the efficiency and applicability of the algorithm, especially in resource-constrained mobile device environments.

The main contributions of the study can be listed as follows:

By extracting the fused information of image aesthetics and brand image, we propose Emo-AEN, a brand image-based fusion approach for image aesthetic evaluation.
Introducing self-attention mechanisms to capture the intrinsic information of image aesthetics and the fusion of design elements.
Experimental results demonstrate the competitiveness of our method compared to existing approaches. Additionally, a comprehensive discussion on the relationship between brand image design and image aesthetics is provided.

The following sections of this paper provide additional details. Section 2 offers an overview of the related work, and Section 3 elaborates on the framework and intricacies of the proposed method. Section 4 presents the experimental results and compares them with a baseline method. Finally, Section 5 provides a summary of the paper.

2 Related work

2.1 Image aesthetic assessment

Early image aesthetic assessment algorithms often relied on carefully selected features based on expert knowledge. Yan et al. [2] assessed image aesthetics by leveraging their expertise in photography, constructing advanced features such as color, contrast, exposure, and simplicity. Luo et al. [11] created a large-scale dataset which is used for aesthetic visual analysis (AVA), which comprised over 250,000 tagged aesthetic images. While these methods achieved success in image aesthetic assessment, they suffered from certain limitations. On one hand, the design of manual features required expertise in photography. On the other hand, simple features alone may not fully capture the nuances of image aesthetics, thereby limiting the generalizability of these approaches. Subsequently, the progress of image aesthetic assessment has been propelled by the advent of Convolutional Neural Networks (CNNs). Kong et al. [13] proposed a sorting network that incorporated adaptive image attributes and content information for image aesthetic regression sorting. Jin et al. [14] introduced a novel CNN model, ILGNet, which connected Inception modules with shallow and global features to classify images based on different aesthetic qualities. Compared to manual feature design, the utilization of deep learning methods is more straightforward and yields a more comprehensive expression of images, greatly enhancing the accuracy of image aesthetic assessment.

2.2 Image emotion

Research on image emotion classification holds great potential and research value. During the initial stages, the classification of image emotions heavily relied on manually crafted features.

Although manual feature extraction and classification technology have advantages in the application of few samples, they are increasingly limited due to the growing amount of data. With the increasing availability of Internet data and the improvement of algorithm computing power, deep learning has gradually become the mainstream technology for image classification. Compared to the expert design of manual features, deep learning methods can automatically extract effective features.

2.3 Attention mechanism

The concept of the attention mechanism was originally introduced in the domain of visual image processing. Subsequently, similar extensions based on the attention mechanism have been adopted in various natural language processing tasks [16].

The attention mechanism allows the computer vision system to effectively and swiftly concentrate on key areas, mirroring the selective focus of the human visual system. The fundamental principle of the attention mechanism lies in identifying pertinent information while suppressing irrelevant information. The outcomes are commonly visualized as a probability graph or represented as a probability feature vector. In theory, the attention mechanism can be primarily categorized into three different types: the channel attention model, the spatial attention model, and the channel and spatial mixed attention model. The attention mechanism can be used to allocate weight to information and promote the end-to-end fusion of different information, such as multi-modal information fusion and multi-source information fusion.

3 Proposed method

We propose Emo-AEN, a method based on emotion internal fusion. The method framework consists of five parts: the backbone network module, the internal fusion module, the self-attention module, the prediction module, and the pruned model. Firstly, image aesthetics and emotional features are extracted from images using the image aesthetics and emotional backbone networks. Then, the features extracted from the two backbone networks are fused internally. The fused features are then inputted into the self-attention mechanism to mine the inherent relationship between them. Furthermore, the features are fed into the prediction network to complete the aesthetic prediction task. Finally, these features are also utilized in the prediction network to accomplish the aesthetic prediction task, while the model undergoes fine-grained pruning to reduce model redundancy.

3.1 Backbone network

The architecture of the network proposed in this paper is shown in Fig. 1, which relies on ResNet as the backbone network to extract features.

ResNet [20] has been widely used in various tasks, including image classification , object detection , semantic segmentation , and others . Due to the excellent performance of ResNet in feature extraction, the pre-trained ResNet50 is used in the proposed method to separately extract the emotional and aesthetic features from the image. ResNet50 is composed of four basic modules named layer_1, layer_2, layer_3, and layer_4. Each basic module is stacked using residual blocks, known as Bottleneck, which consists of three-layer convolutions ($1\times 1$, $3\times 3$, and $1\times 1$) and a skip connection.

3.2 Internal fusion

To explore the correlation between emotion and image aesthetics, we implement an internal fusion mechanism. The images are first passed through the image aesthetics and emotion backbone network and subsequently standardized to a uniform size within each basic module. The features acquired from each layer are then internally fused by utilizing point multiplication, which can be mathematically represented as:

$$\begin{aligned} a\cdot b=a_{1}b_{1}+a_{2}b_{2}+a_{3}b_{3}, \end{aligned}$$

(1)

Where vectors a and b are:

$$\begin{aligned} a= & {} [a_{1},a_{2},...,a_{n}], \end{aligned}$$

(2)

$$\begin{aligned} b= & {} [b_{1},b_{2},...,b_{n}], \end{aligned}$$

(3)

3.3 Self-attention mechanism

Self-attention mechanisms are significant within the realm of deep learning, serving as variants of attention mechanisms. Their purpose is to decrease reliance on external information and maximize the use of inherent information within features for interaction. Originally used in natural language processing to tackle challenges in training models with inputs consisting of multiple vectors of varying sizes and potential relationships between them, the self-attention mechanism has also found success in image vision domains such as image detection and semantic segmentation.

The Transformer model stands out as one of the most prominent examples of the self-attention mechanism. The architecture of the Transformer model is composed of two main components: an encoder and a decoder. In each network block of the encoder, there is a multi-head attention sublayer and a feedforward neural network sublayer. The entire encoder stack comprises N blocks. During the self-attention process, the original feature map is transformed into three separate vector branches: Q (Query), K (Key), and V (Value). Firstly, we calculate the correlation weight matrix coefficients for Q and K. Secondly, by using the method of soft operations, the weight matrix is normalized. Finally, the weight coefficients are applied to V to effectively incorporate global context information into the modeling process. The self-attention mechanism can be seen as an improvement rather than a variation of the attention mechanism, as it reduces the dependence on external information and is more effective at extracting internal data or feature correlations. The calculation process is represented as follows:

$$\begin{aligned} Attention(Q,K,V) = softmax\left( \frac{QK^{T}}{\sqrt{d_k}}\right) V, \end{aligned}$$

(4)

where $d_k$ represents the key’s dimension. The attention mechanism then focuses on different representation subspaces at different locations to establish a multi-head attention model. This model comprises multiple self-attention blocks and is computed as follows:

$$\begin{aligned} MultiHead(Q,K,V) = Concat(head_1,\cdots ,head_h)W^{o}, \end{aligned}$$

(5)

where

$$\begin{aligned} head_i=Attention(QW_{i}^{Q},KW_{i}^{K},VW_{i}^{V}), \end{aligned}$$

(6)

In Eq. 6, the weight matrices $W_{i}^{Q}\in R^{d_{model}\times d_{k}}$, $W_{i}^{K}\in R^{d_{model}\times d_{k}}$, $W_{i}^{V}\in R^{d_{model}\times d_{v}}$, and $W_{o}\in R^{hd_{v}\times d_{model}}$ are trainable parameters. Here, the dimension of the value is denoted by $d_{v}$, while the number of parallel attention layers is represented by h. Moreover, $d_{model}$ represents the dimension of all Q/K/V vectors within the entire multi-attention model. As this study focuses solely on Transformer encoders, no further discussion will be conducted regarding the decoder.

3.4 Aesthetic prediction

Following the feature fusion process, the intrinsic connection between emotion and image aesthetics is explored through the self-attention mechanism. Subsequently, the features are transferred to the aesthetic prediction module, which comprises a global average pooling layer and a fully connected layer. Ultimately, the aesthetic score is determined.

3.5 Model pruning

The obtained baseline model undergoes fine-grained pruning and retraining. Specifically, after fusing emotional and image aesthetic features, a complex feature representation is acquired. Pruning techniques are applied to simplify the feature extraction network by removing neurons that contribute minimally to the final prediction, thereby reducing the number of parameters and computational load.

The gradient of each neuron to the loss function is first calculated to estimate its importance. The obtained baseline model was used for forward propagation and then the gradient of the loss function to each neuron was calculated. A larger gradient indicates that the neuron has a greater influence on the model’s predictions. Subsequently, based on the importance scores, a threshold is chosen to determine which neurons to retain and which neurons to prune. The selection of the threshold is a crucial step as it directly affects the extent of pruning and the performance of the model. In this study, a method based on percentiles is employed to select an appropriate threshold. A lower threshold will retain more neurons, while a higher threshold will prune more neurons. The pruned neurons will no longer have an impact on the model’s output.

After pruning, the number of parameters in the model is reduced. To maintain the structural integrity of the model, parameter trimming and reconstruction can be performed. Specifically, after removing neurons, it is necessary to adjust the inputs and outputs of other neurons in the model accordingly. This entails reconnecting or adjusting connection weights to ensure that the flow of inputs and outputs in the model remains unaffected. Subsequently, the model’s structure is reorganized based on the layout of pruned neurons, including removing unnecessary layers, adjusting connections between layers, and even adding new layers to reconnect the pruned parts. Through parameter trimming and reconstruction, the structure of the model aligns with the pruned parameters while preserving the overall architecture of the original model. This ensures that the functionality of the model remains intact and can be used for inference or training tasks.

Table 1 Ten methods for overall and individual distortion performance on NAMA3DS1-COSPAD1 database

Full size table

4 Experiments

In this section, we present a brief overview of several indicators used to measure the prediction performance during the experiment. Next, to evaluate the effectiveness of the proposed method, we compare it with existing methods using a relevant dataset. Additionally, we perform ablation experiments to assess the effectiveness of each component of the method, thereby demonstrating their utility.

4.1 Databases and indicators

We perform the experiments using the AVA dataset, which serves as a benchmark dataset for image aesthetic assessment and can be obtained from the DPChasty website. The dataset comprises over 255,000 images, covering 963 topics which is challenging. Each individual image is assigned a label indicating its aesthetic score based on the average score of the candidates, while the number of different candidates and their given score can be used to obtain the aesthetic distribution. For binary classification tasks, images that have an average score of more than five are marked as “1”, representing high aesthetic value, while images that have an average score of less than five are marked as “0”, representing low aesthetic value. Due to the presence of some damaged pictures, a total of 252,423 pictures were used for the experiment. In order to achieve a fair comparison with other models, the AVA dataset is partitioned in a similar manner to prior studies, i.e., 80% of the dataset is allocated for training purposes, while the remaining 20% is reserved for testing.

Furthermore, emotional datasets were also utilized to ensure the fairness and feasibility of the experiment.

The Flickr and Instaraim (FI) dataset contains approximately 23,000 emotion images. It comprises eight emotions: anger, awe, disgust, entertainment, excitement, fear, satisfaction, and sadness. The Emotion6 emotion dataset consists of about 1,980 images, obtained by searching emotion keywords and synonyms in Flickr [21], which contains six basic emotions, utilizing emotion distribution probabilities instead of a single emotion label. The Abstract emotion dataset [22] comprises 279 abstract paintings that do not depict specific objects. Around 230 individuals categorized these pictures into eight emotional categories, and each picture’s average score was evaluated approximately 14 times. The Artphoto emotion dataset combines three datasets: the International Affective Picture System (IAPS), a compilation of art photographs sourced from photo sharing websites, and a collection of peer-rated abstract paintings. It comprises a total of 806 pictures, including eight categories of discrete emotions: Amusement, Anger, Awe, Contentment, Disgust, Excitement, Fear and Sadness [23].

To facilitate a fair comparison with existing work, this section employs several indicators to evaluate the performance of different methods, including accuracy rate (ACC) method, Pearson linear correlation coefficient (PLCC) method, Spearman rank order correlation coefficient (SROCC) method, and Earth Mover’s Distance (EMD) method. ACC represents the ratio of accurately predicted samples among all evaluated samples, where a higher ACC value indicates better classification performance. PLCC and SROCC are employed to assess the correlation between the subjective score and the predicted score. EMD assesses the model’s ability to predict aesthetic distribution.

Regarding network settings, this paper utilizes ResNet50, a neural network pretrained on ImageNet, as the backbone network. ResNet50 consists of 50 layers and includes structures such as shortcuts, batch normalization, and pooling. By starting the training process with a pre-trained model capable of extracting features from natural images, better initial performance for the task can be obtained. The AVA dataset and FI dataset are employed for the aesthetic and emotional datasets, respectively. Each image is first adjusted to a size of $224 \times 224 \times 3$ and then sent to the network for processing. During the training process of the network, the initial learning rate for the backbone network is defined as 1e-4, while for other networks it is defined as 1e-5. The learning rate is reduced to 0.1 times of the original value every five epochs. The entire process employs the Adam optimizer with a weight decay of 5e-4. The utilization of the mean squared error (MSE) loss function is employed. The code for the experiment is written in Python, and the deep learning computations are accelerated using the NVIDIA TITAN XP graphics card.

4.2 Overall performance

There are three main tasks involved in image aesthetic assessment: binary classification, score regression, and distribution prediction. In binary classification, we assign “0” and “1” as image labels and use ACC indicators to evaluate the model’s classification performance. For score regression, we use the average score as the image label and assess the consistency and monotony of subjective score and objective score using SROCC and PLCC. To assess the model’s capability in predicting distributions, we use the EMD value.

Table 1 presents a comparison of the prediction performance of the method outlined in this section with other advanced image aesthetic assessment methods on the AVA dataset. The table showcases the best results attained, which are indicated in bold.

Overall, the method proposed in the paper demonstrates superior competitive performance, with top scores in SROCC, PLCC, and EMD. While the MSE index of our method is marginally lower than that of the HLA-GCN method, the PLCC, SROCC and EMD indexes of our method are significantly higher. This indicates that the proposed method possesses better fractional regression and distribution prediction capabilities compared to HLA-GCN. Furthermore, when compared to the handmade feature method AVA, all depth learning methods show a significant improvement in performance. This further confirms the strong learning ability and image representation capability of neural networks. In terms of score regression task, the methods in this section demonstrate the highest SROCC and PLCC values. Additionally, in the distribution prediction task, the proposed method achieves the minimum EMD value, highlighting its strong performance in this area.

To assess the performance of our method in the score prediction task, we present several score prediction results in Fig. 2. In the captions of each image in Fig. 2, the upper number represents the ground truth, while the lower number represents the prediction score. The results of the network prediction are clearly close to the ground truth, highlighting the effectiveness of our method.

Table 2 The performance of cross database testing

Full size table

The performance of cross-dataset testing is presented in Table 2. The methods used for comparison include “W/0 emotion”, “Emotion6”, “Abstract”, “Artphoto”, and our proposed method. The performance metrics evaluated include Accuracy (ACC) metric, Pearson Linear Correlation Coefficient (PLCC) metric, Spearman’s Rank Order Correlation Coefficient (SROCC) metric, Mean Squared Error (MSE) metric, and Earth Mover’s Distance (EMD) metric. The table clearly illustrates that the proposed method surpasses other methods in terms of all metrics, achieving the highest ACC, SROCC, PLCC, and lowest MSE and EMD.

4.3 Cross database experiment

In addition, this section also includes cross dataset experiments conducted on other emotional datasets. Table 2 displays the obtained results. The test dataset is partitioned, with 80% allocated for training and 20% reserved for testing. The ACC column in the table depicts the outcomes of cross-dataset experiments conducted by employing the weights of the original training parameters, conversely, the retraining ACC signifies the results obtained subsequent to retraining the network with alternative emotional datasets. The utilization of original network weights for cross-dataset experiments leads to a notable decline in accuracy, particularly for the Abstract emotion dataset and Artphoto emotion dataset. This difference is due to the variation in the types and number of emotion datasets. However, after retraining with relevant emotion datasets, the accuracy rate is significantly improved, especially for the Emotion6 dataset which shows a 2% increase in accuracy. The decline in accuracy for the Abstract dataset is attributed to the small number of datasets and the large difference in image types compared to other emotional datasets. This highlights the sensitivity of deep networks to the number of datasets.

4.4 Statistical significance analysis

In addition, we conducted a statistical significance analysis on the SROCC using both the two-sample t-test and the two-sample Wilcoxon rank sum test. The results are shown in Figs. 3 and 4, where “1”, “-1”, and “0” denote the comparison of model performance. The comparison is conducted horizontally, with “1” signifying that the row model surpasses the column model, “0” indicating equal performance between the two models, and “-1” denoting inferior performance of the row model compared to the column model. The methods considered for comparison are those related to image aesthetic distribution prediction. All algorithms were tested on the AVA aesthetic dataset at a 95% confidence level. The comparison results indicate that the methods proposed in this section exhibit strong competitive advantages.

In conclusion, the proposed method demonstrates superior performance in score prediction tasks compared to existing methods. The cross dataset experiments further confirm the effectiveness of the proposed method, especially after retraining with relevant emotion datasets. The statistical significance analysis shows that the proposed method has a strong competitive advantage in image aesthetic distribution prediction.

5 Conclusion

This paper proposes Emo-AEN, an image aesthetic evaluation framework based on the internal fusion of brand image and aesthetic design. Inspired by the close connection between image design and aesthetic evaluation in design studies, Emo-AEN combines brand image design with image aesthetics fusion, allowing the network to capture both the aesthetic and detailed design features of the images. Moreover, in regard to the relationship between image aesthetics and brand image, this study utilizes self-attention mechanisms to uncover the intrinsic connection between image aesthetics and brand design. Redundant and insignificant weights within the network are eliminated using model pruning techniques. Through these lightweight strategies, our integrated feature extraction algorithm is not only capable of efficiently processing information pertaining to image aesthetics and design, but it also boasts a wider application scope. This includes operation on devices with limited computational power and storage resources. Consequently, our algorithm is well-suited not just for high-performance computing environments, but can also be deployed on a variety of mobile and edge computing devices.This versatility enables our algorithm to provide intelligent brand image optimization services to a broader range of users. Experimental results demonstrate that the proposed method performs well compared to existing approaches. The ablation experiment proves the vital role played by the self-attention mechanism and the super network module in the network. In addition, this study also demonstrates that different images can convey different visual content, and the aesthetics of images and brand image design are mutually complementary and integrated. Furthermore, this study shows that different images can convey distinct visual content, highlighting the symbiotic relationship and seamless integration between image aesthetics and brand identity design.

Data Availability

No datasets were generated or analysed during the current study.

References

Nie J, Wang Y, Li Y et al (2022) Sustainable computing in smart agriculture: survey and challenges. Turk J Agric For 46(4):550–566
Article Google Scholar
Paudyal Pradip, Battisti Federica, Carli Marco (2019) Reduced reference quality assessment of light field images. IEEE Trans Broadcast 65(1):152–165
Article Google Scholar
Zhao K, Yuan K, Sun M et al (2023) Quality-aware pre-trained models for blind image quality assessment. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 22302–22313
Nie J, Wang Y, Li Y et al (2022) Artificial intelligence and digital twins in sustainable agriculture and forestry: a survey. Turk J Agric For 46(5):642–661
Article Google Scholar
Saha A, Mishra S, Bovik AC (2023) Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5846–5855
Kong, Shu, et al (2016) Photo aesthetics ranking network with attributes and content adaptation. Proceedings of the 14th European conference on computer vision
Wei S et al (2020) Enhancement of underwater images with statistical model of background light and optimization of transmission map. IEEE Trans Broadcast 66.1:153–169
Xi M, Dai H, He J, et al (2024) A Lightweight Reinforcement Learning-Based Real-Time Path Planning Method for Unmanned Aerial Vehicles. IEEE Internet Things J
Ritendra D, Li J, Wang JZ (2008) Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. Proceedings of the 15th IEEE international conference on image processing. IEEE
Wen J, Dai H, He J et al (2023) Federated Offline Reinforcement Learning With Multimodal Data. IEEE transactions on consumer electronics
Leida L et al (2020) Personality-assisted multi-task learning for generic and personalized image aesthetics assessment. IEEE Trans Image Process 29:3898–3910
Yang J, Cheng C, Xiao S et al (2023) High Fidelity Face-Swapping With Style ConvTransformer and Latent Space Selection. IEEE Trans Multimed
Shu K et al (2016) Photo aesthetics ranking network with attributes and content adaptation. Proceedings of the 14th European conference on computer vision
Xin J et al (2019) ILGNet: inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation. IET Comput Vis 13.2:206-212
Yuzhe et al (2022) Personalized Image Aesthetics Assessment with Rich Attributes. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zhuqing J et al (2020) Lightweight super-resolution using deep neural learning. IEEE Trans Broadcast 66.4:814–823
Lan G, Xiao S, Yang J et al (2023) Generative AI-based Data Completeness Augmentation Algorithm for Data-driven Smart Healthcare. IEEE J Biomed Health Inf
Lan G, Xiao S, Yang J et al (2023) Image Aesthetics Assessment Based on Hypernetwork of Emotion Fusion. IEEE Trans Multimed
Ishan M, Girdhar R, Joulin A (2021) An end-to-end transformer model for 3d object detection. Proceedings of the IEEE/CVF international conference on computer vision
Kaiming H et al (2016) Deep residual learning for image recognition. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Kuan-Chuan P et al (2015) A mixed bag of emotions: Model, predict, and transfer emotion distributions. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Quanzeng Y et al (2016) Building a large scale dataset for image emotion recognition: The fine print and the benchmark. Proceedings of the AAAI conference on artificial intelligence, 30(1)
Jana M, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. Proceedings of the 18th ACM international conference on Multimedia
Naila M, Marchesotti L, Perronnin F (2012) AVA: A large-scale database for aesthetic visual analysis. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zhao Y, Xiao S, Yang J et al (2023) No-reference Qquality index of tone-mapped images based on authenticity, preservation, and scene expressiveness[J]. Signal Process 203:108782
Article Google Scholar
Pei Lv et al (2018) Usar: an interactive user-specific aesthetic ranking framework for images. Proceedings of the 26th ACM international conference on Multimedia
Talebi Hossein, Milanfar Peyman (2018) NIMA: Neural image assessment. IEEE Trans Image Process 27(8):3998–4011
Article MathSciNet Google Scholar
Shuang M, Liu J, Chen CW (2017) A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Dongyu S et al (2021) Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Download references

Author information

Authors and Affiliations

School of Communication, Tianjin Foreign Studies University, Machang, 300204, Tianjin, China
Honglei Cheng
School of Electrical and Information Engineering, Tianjin University, Weijin Road, Tianjin, 300072, Tianjin, China
Guipeng Lan & Shuai Xiao
School of future technology, Tianjin University, Weijin Road, Tianjin, 300072, Tianjin, China
Haorui Yi

Authors

Honglei Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Haorui Yi
View author publications
You can also search for this author in PubMed Google Scholar
Guipeng Lan
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The author contributions for this manusc are as follows: Honglei Cheng: Conceived the research idea, designed the experiments, and analyzed the data, drafted the manuscript Haorui Yi: Implemented the algorithms, and conducted simulations, drafted the manuscript Guipeng Lan: Conducted the literature review, collected the data, and performed statistical analysis. Shuai Xiao: Drafted the manuscript, prepared figures and tables, and revised the manuscript.

Corresponding author

Correspondence to Guipeng Lan.

Ethics declarations

Ethics approval

This experiment excludes any human and animal data and is ethical

Consent to participate

All authors agree to the submission of this paper

Consent for publication

If accepted, all authors agree to the publication of this paper

Conflict of interest

No conflict of interest for all authors

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cheng, H., Yi, H., Lan, G. et al. Emo-AEN: A Lightweight Network for Brand Image Design Based on Aesthetic Evaluation. Mobile Netw Appl (2024). https://doi.org/10.1007/s11036-024-02314-y

Download citation

Accepted: 25 March 2024
Published: 01 April 2024
DOI: https://doi.org/10.1007/s11036-024-02314-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Emo-AEN: A Lightweight Network for Brand Image Design Based on Aesthetic Evaluation

Abstract

1 Introduction

2 Related work

2.1 Image aesthetic assessment

2.2 Image emotion

2.3 Attention mechanism

3 Proposed method

3.1 Backbone network

3.2 Internal fusion

3.3 Self-attention mechanism

3.4 Aesthetic prediction

3.5 Model pruning

4 Experiments

4.1 Databases and indicators

4.2 Overall performance

4.3 Cross database experiment

4.4 Statistical significance analysis

5 Conclusion

Data Availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflict of interest

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation