From data to decision: distilling decision intelligence from user-generated content

Tjaša Redek (Department of Economics, School of Economics and Business, University of Ljubljana, Ljubljana, Slovenia)

Uroš Godnov (Department of Information Sciences and Technologies, Faculty of Mathematics Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia)

Kybernetes

ISSN: 0368-492X

Article publication date: 13 March 2024

Downloads

325

pdf (643 KB)

Abstract

Purpose

The Internet has changed consumer decision-making and influenced business behaviour. User-generated product information is abundant and readily available. This paper argues that user-generated content can be efficiently utilised for business intelligence using data science and develops an approach to demonstrate the methods and benefits of the different techniques.

Design/methodology/approach

Using Python Selenium, Beautiful Soup and various text mining approaches in R to access, retrieve and analyse user-generated content, we argue that (1) companies can extract information about the product attributes that matter most to consumers and (2) user-generated reviews enable the use of text mining results in combination with other demographic and statistical information (e.g. ratings) as an efficient input for competitive analysis.

Findings

The paper shows that combining different types of data (textual and numerical data) and applying and combining different methods can provide organisations with important business information and improve business performance.

Research limitations/implications

The paper shows that combining different types of data (textual and numerical data) and applying and combining different methods can provide organisations with important business information and improve business performance.

Originality/value

The study makes several contributions to the marketing and management literature, mainly by illustrating the methodological advantages of text mining and accompanying statistical analysis, the different types of distilled information and their use in decision-making.

Keywords

Citation

Redek, T. and Godnov, U. (2024), "From data to decision: distilling decision intelligence from user-generated content", Kybernetes, Vol. 53 No. 13, pp. 1-23. https://doi.org/10.1108/K-08-2023-1447

Publisher

:

Emerald Publishing Limited

License

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

“Today, consumers are digital explorers …” (Torben, 2013), who make extensive use of online content when making purchasing decisions. The role of electronic word of mouth (eWOM) is increasing (Ismagilova et al., 2020). Consumers co-shape corporate image and impact peers’ decisions (Zhang et al., 2020). Consequently, companies and their competitiveness in the B2C market are becoming increasingly dependent on consumers' (subjective) opinions and product reviews. However, if they are used as strategic input in companies' decision-making (Chapman, 2023; Kozyrkov, 2019), companies can use them to explore consumers' perceptions and needs as well as the comparative advantages of products and companies from the consumers' perspective.

Whilst there are numerous examples in the literature of how user-generated content can be used, a comprehensive approach to transforming user-generated data into decision knowledge through data science is scarce. This paper uses user-generated reviews of two product types (hoovers and coffee machines) from two established household appliance manufacturers to explore what decision and competitive intelligence can be derived from user-generated content and associated data. The paper examines how the combination of text-mining methods with conventional statistical methods can be used to support decision-making processes in companies. Specific techniques, such as natural language processing (NLP), latent Dirichlet allocation (LDA) and term frequency-inverse document frequency (TF-IDF) analysis, are applied to dissect and interpret the textual data. In addition to text mining, statistical methods are incorporated to validate the findings and quantify relationships. In particular, it shows which methods can be used to explore (1) what consumers are specifically looking for in a product (e.g. functionality or design), (2) what the specific strengths and weaknesses of a particular product or different products are. Combining these text-mining results with the statistical analysis of other data available in user-generated content, such as numerical or star ratings and product prices, can also be used efficiently to identify (3) competitive information at the company level (within a product group) or, more importantly, (4) at the industry level to assess the competitive strength of a company.

The paper makes a methodological contribution to the literature and extends knowledge on the use of data science to generate data intelligence and support strategic management and competitive analysis, mainly by understanding consumer decision-making and the relevant implications for product development and marketing. It extends the existing literature (Harika et al., 2022; Ishikiriyama et al., 2015; Ko and Gillani, 2020; Köseoglu et al., 2020; Lee et al., 2022; Subrahmanya et al., 2022) mainly by providing a comprehensive methodological approach.

In the remainder of this paper, the theoretical background is first explained, followed by a discussion of the methodology used, a presentation of the research questions and the data. Then the results are presented, followed by a discussion and a conclusion.

2. Theoretical background

2.1 Rise of the “digital consumer” and influential reviews

The digital world has empowered consumers, who are now both users and influencers in the digital world. They rely on the Internet to make informed decisions, evaluate options and share personal experiences through text reviews and other complementary data such as testimonials, numerical ratings, images, (Mariani et al., 2022; Saura et al., 2023) etc. This digital world is used to select, evaluate, compare and make a decision (Torben, 2013). Subsequent purchases lead to first-hand experiences that are then shared online, building a product’s reputation and influencing potential buyers' decisions. These reviews play an important role in shaping a product’s reputation and influencing other potential buyers. Most importantly, reviews influence the majority of buyers in their purchasing decisions (Chen et al., 2022). Consumers rely on text reviews due to the inadequacy of star ratings. Text reviews provide deeper insights that star ratings may not capture, or they provide additional detail from user-generated content (De Langhe et al., 2016). However, not all ratings carry the same weight. Negative reviews are particularly sought after as they provide valuable insight into a product’s drawbacks and increase brand reliability by outlining worst-case scenarios, contributing to brand reliability and acknowledging that shortcomings exist (Mudambi and Schuff, 2010; Trustpilot, 2021). By studying reviews and comparing them with prices, consumers can make informed decisions, optimising their utility and obtaining value for money (Kang et al., 2022), adhering to the principle of “rational homo-economicus” (Anderson and Narus, 1998).

2.2 User-generated content, decision science and corporate performance

The growing influence of reviews goes beyond consumer behaviour and significantly affects the reputation and performance of businesses (Ansary and Nik Hashim, 2018; Rodell et al., 2020). Consumer reviews have a direct impact on purchasing decisions, thus influencing company performance and, in turn, brand reputation (Chen et al., 2022; Fernandes et al., 2022). As a result, companies are recognising the importance of analysing online data, including social media and reviews, as part of their business intelligence efforts. The collection and analysis of Big Data, especially from Internet sources, has become a critical tool for competitive differentiation, strategic intelligence, market positioning and strategic intelligence (Chen et al., 2022; Gémar and Jiménez-Quintero, 2015; Ranjan and Foropon, 2021).

Companies can use these tools to examine their competitive landscape, assess their market position and gain insights into the correlation between sales and consumer sentiment. The dynamic potential of social media can be harnessed to refine consumer-generated images and strengthen corporate presence (He et al., 2013). Furthermore, reviews play an integral role in profiling consumers and dissecting consumer interactions, providing critical input for refining strategies (Chau and Xu, 2012; Darko et al., 2023; Fernandes et al., 2022; Park et al., 2012).

This paper focusses on the contribution of decision-making intelligence accessible to firms through consumer responses. In particular, we will focus on three aspects: Consumer insights, pricing strategy information and firm (comparative) performance.

Consumer-driven product insights: User-generated content enables an assessment of consumers' perceptions of product qualities and drawbacks. These insights help identify desired attributes and contribute to product and business performance (Bahtar and Muda, 2016). Quality is a key determinant of market success, brand strength and competitiveness (OECD, 2013). Despite possible bias, reviews exert significant influence and often surpass personal experience when it comes to making consumer decisions (Chen et al., 2022; Zhao et al., 2012). Whilst these reviews may lack reporting and purchasing bias and may even be fake, they still influence purchasing decisions (Chakraborty, 2019; Wu and Qiu, 2023), making them valuable decision-making tools.
Strategic pricing and price sensitivity: User-generated content can also help companies develop effective pricing strategies, especially important for products where consumers are increasingly price-sensitive or have a lot of choice (Deloitte, 2015). Reviews influence consumers' perceptions of value and often overshadow brand value, influencing pricing decisions (Chakraborty, 2019; Chen et al., 2022; Mudambi and Schuff, 2010). In this context user-generated content may inform the companies about the perceived value, about the indications of pricing sensitivity amongst consumers and changes of price (e.g. mentions of affordability, cost-effectiveness), prepare comparative analysis, segment consumers and other.
Driving business performance: user-generated content has a tangible impact on business performance and sales. Research shows that electronic word of mouth (eWOM) has a strong impact on sales, especially for certain product categories and review platforms. Using user-generated content, such as positive reviews, for promotional purposes can increase sales and improve business performance.

3. Methods

3.1 Research objectives

User-generated content is a valuable source of decision-making for businesses in the digital age. It provides insights into consumer perceptions, influences business performance and helps companies formulate successful pricing strategies. Data-driven decision-making can give companies a competitive advantage and enable them to succeed in the ever-evolving digital marketplace. This paper answers the following research questions to achieve the research objectives summarised in Table 1:

What kind of consumer-driven product insights can be gained from user-generated content and how? We use star ratings, sentiment analysis and content analysis to investigate how consumers perceive the product, how reviews can be used to gain insights into the perceived qualities and drawbacks of their own products and services, and to identify desired attributes through consumer reviews. The paper also explores how the information can be used to comparatively assess the competitiveness of products.
Secondly, the paper explores how user-generated content can help companies with their pricing strategies by providing information about consumers' price-quality relationship and price sensitivity and by providing decision support for companies to make pricing decisions. This question is answered by combining data from star ratings, text ratings and prices.

Finally, data should provide decision intelligence. Finally, the paper explores how data can be used for comparative analysis (of different companies), thereby influencing business performance and sales through the use of user-generated content.

3.2 Empirical approach

The paper relies on an extensive user-generated database, which combines textual and numerical data. To prepare the data, data was accessed, retrieved and analysed using Python and R. In continuing, the data collection and data analysis tools are described.

The study employs Python Selenium and Beautiful Soup to extract business intelligence from online data sources. Selenium automates web browsers to access and retrieve user-generated content, including dynamic elements loaded through JavaScript, ensuring a comprehensive data collection process. Post data retrieval, Beautiful Soup is used for parsing capabilities to extract specific product attributes and metadata from the text. This twofold approach enables us to gather and analyse both structured and unstructured data effectively, supporting our competitive analysis by providing a rich dataset that combines user comments with objective product information.

To analyse the data, various approaches from the field of data science, mainly text mining and NLP, with a focus on sentiment analysis and content analysis methods (Table 1) were combined with standard statistical methods.

Sentiment analysis examines the text “between the lines” In general, sentiment analysis relies on lexicons that assign a specific “sentimental” value to each word. Early sentiment approaches were rather rudimentary, assigning each word one of three values: positive (1), neutral (0) or negative (−1) (Hu and Liu, 2004). More recent methods, also used in this analysis, allow for a more complex approach with a scale of, for example, (−5) to (5) or a conditional analysis where a word is analysed in the context of the text and the same word can have different sentiment values depending on the surrounding words (see Nielsen (2011) for AFINN sentiment, Mohammad (2015) for NRC sentiment and Liu (2012) for LIU sentiment). The study relies on a lexicon-based document-level approach. In addition, based on the NRC lexicon (Mohammad, 2015), we also identify the predominant emotion (e.g. confidence, joy, etc.) in the review. The polarity of the text is most often calculated using the SentiWordNet lexicon (Amarouche et al., 2015).

In our study, sentiment analysis is carried out using a refined lexicon-based approach. This approach involves scoring words for sentiment and adjusting these scores with the help of valence shifters and amplifiers. Valence shifters, such as negations and modality of language, play a significant role in altering the emotional value of phrases. Amplifiers and down-toners are also taken into account to accurately measure the degree of sentiment. By considering these nuanced linguistic features, the calculation of emotional polarity results in a composite score, which provides a robust analysis of consumer sentiment. To complement sentiment analysis, content analysis is employed to quantify the presence and co-relationships of words in a structured manner. This analysis involves examining word frequency and association patterns to uncover dominant themes and indicators of consumer satisfaction.

In the second part of the analysis, we rely on content analysis, which aims to identify the main themes of the text. First, a simple but very effective keyword search method is used, using the corpus approach in the tm (e.g. Verma and Gupta, 2004; Feinerer et al., 2023). Second, topic modelling is used to “find and track word clusters (topics)” in large (unstructured) text corpora (Posner et al., 2012), using a probabilistic iterative procedure to identify major topics (Blei, 2012). The LDA approach with Gibbs sampling has been performed in R (package tm) (Chen, 2011; Grün and Hornik, 2011). Although less popular due to a mix of factors (more complex, lack of user-friendly software), it is gaining recognition as a standard tool for marketing, management, finance and tourism analysis (Vitouladiti, 2014; Stepchenkova et al., 2008). The analysis was conducted in R.

3.3 Data

The sample in total comprised 4,311 reviews obtained from online stores [1] that provide text reviews, as well as a numerical rating (or star-review system). Two major global producers of electrical appliances were chosen and reviews of a total of 37 products of the same product categories (washing machines, refrigerators, ovens, etc.) were collected (see Table A1 for a sample description by product). A comprehensive compilation of reviews was conducted for a total of 37 products manufactured by these companies. The selection of these specific products was based on their market significance, as evidenced by their consistent consumer demand and substantial number of reviews. These reviews provide valuable insights into customer satisfaction and product performance. The selection process was also aligned with the objective of our study, which aimed to extract actionable business intelligence by analysing both textual and numerical data derived from consumer reviews.

A typical review comprised 86 words. The mean price of the evaluated product was 337 euros. The overall “star rating” (1–5 stars) was 3.87 [2]. An average review received 3.8 “helpful votes,” meaning that 3.8 readers found the information in the review useful/helpful. The sentiment was on average positive, both the average sentiment of the entire review as well as the sentiment per word in the review [3] (Table 2).

4. Results

4.1 Consumer-driven product insights

4.1.1 Consumer satisfaction

Consumer satisfaction is assessed using numerical (star) evaluations and the text evaluations, where the satisfaction (with specific product traits) can be summarised using sentiment analysis. Figure 1 summarises the star ratings and sentiment scores, assessed using different estimation methods, for 4 key product categories, coffee-makers, dishwashers, mixers and vacuum cleaners.

The results show that the average numerical satisfaction was lowest for dishwashers and relatively similar for other products. The sentiment results lead to two important conclusions. First, the sentiment evaluation (as “felt” from between the lines in the text) and star/numerical evaluation can significantly differ, and second, the sentiment calculation is sensitive to the method used, where we will assume that the more modern, advanced methods are dominant to the earlier ones. [4]

Nonetheless, despite the importance of star ratings and variability, which is an obvious shortfall of the sentiment calculation, companies should pay attention to sentiment in the text as well. Namely, reviews with the same numerical rating can portray completely different sentiment of the product, thereby affecting the consumer. Table 3 depicts two reviews, both with 5-star numerical rating; however, the second review provides the reader with much more delight, enthusiasm and expresses sheer conviction about the product really being great. The first review, on the other hand, despite its 5-star rating, is even somewhat critical. The weak relationship between star rating and sentiment is clear also from the correlation analysis (Table A1).

The lack of very strong relationship between the star-rating and sentiment is seen also from the correlation coefficients (Figure 2). Whilst these are positive and significant, the sentiment is not very strongly related to star-rating, confirming the aforementioned proposition, which confirms the importance of studying reviews by the companies.

It is also important to note that there is a clear negative relationship between the star rating and the number of words. If we look at the rating per word and its relationship with length (to eliminate the effect of the aggregate total rating), it is also negative and statistically significant. It is possible that dissatisfied customers take their time and are more inclined to make long arguments. On the other hand, it is common to see a textual review with a ranking of “1” that is very short but negative, summarised by short and impactful titles such as “Stay away”, “Junk” Longer reviews, on the other hand, are rated as more helpful. Reviews with at least 10 helpful votes had an average of 238 words, while those with nine or fewer had 150 words. Indeed, the most positive responses to the question “Was this review helpful?” were assigned overall to the “5-star” rating, which received a total of over 8,000 helpful votes for 2,342 reviews. This was followed by just over three thousand helpful votes for 665 “1-star” ratings. The high total number of helpful votes for 1-star and 5-star reviews might indicate that these are also the reviews that are read more often (and thus receive “helpful” votes). In short, longer reviews that are likely to be more negative (lower sentiment) are likely to be read as they contain more information in one piece and negative reviews are often seen as helpful. Moreover, consumers like it when negative aspects are highlighted, which is also confirmed by the very weak, negative and highly significant relationship between sentiment per word and usefulness (Table A1 for details). Table 2 shows that for more expensive products, which are considered higher involvement products (in our case those with above average price), higher price and numerical rating are weakly (−0.249) but significantly negatively correlated. The negative relationship indicates the potentially high expectations of consumers who are able or choose to pay a higher price but also expect (more) quality in return.

4.1.2 Product qualities, pricing and product comparison

Vacuum cleaners example are used to study how user-generated data can help distil most important characteristics, qualities and comparative assessment of products. In total, 12 vacuum cleaners were in the sample, two were produced by firm 1 and the rest by firm 2. The price of the vacuum cleaners ranged from 82 to 499 euros (Table 3 summarises main data). The highest star rating was assigned on average to the vacuum cleaner of producer P2: the average star of 4.77 was assigned to a vacuum cleaner costing 500 euros (Table 4). The second best was a cheap vacuum cleaner made by producer P1. Consumers generally do not assess the products based only on star evaluation [5] but are primarily interested in the content (Anderson, 2014).

When consumers buy certain products or prepare product reviews, various aspects can be evaluated. Since reviews are written by individuals, it can be assumed that elements relevant to these individuals appear in the text. Interestingly, the content analysis shows that users primarily evaluate functionality, because at the manufacturer level for hoovers, it shows that the most common words were either nouns describing the product (hoover, hose, canister, power, suction, etc.), the use or handling (easy, use) or generally positive adjectives (like, great, good, easy, etc.) (Table 5). This result could be interpreted to mean that these consumers (review writers) put more emphasis on functionality than, for example, design (e.g. lack of words like design, shape, beautiful, etc.). It could also mean that consumers write about these aspects in their reviews because they expect this information to be valued more by the readers of the reviews (than, for example, the evaluation of the design).

The fact that functionality might be more important than the design aspects is also revealed if LDA topic modelling is used (Table 6). The distinction between the two topics is mild, but in both cases, one topic is more dealing with technical aspects (hose, attach, light; suction, good, work) and one more to cleaning itself (clean, use, work; clean, floor, great). This does not imply that other features are irrelevant. But when the fact that when reviewing a purchased item, the functional elements are discussed more often than for example design, could imply that also when choosing, consumers do focus on functionality and performance. Design, packaging, commercials, are less relevant [6].

4.2 Pricing strategies

User-generated content can be efficiently used as decision-making intelligence if combined also with other data for company and other companies’ products. Utility-maximising consumer will seek to maximise the satisfaction or pay as little as possible per unit of satisfaction. In case of consumer appliances, if choosing between two comparable products (e.g. those costing between 275 and 325 in Table 7), one would be prone to choose the one with lower price and better rating. In the case of vacuum cleaners, the best product to choose in case of a utility-maximising consumer would be P2_16, where on average, one pays 118.8 euros per unit of satisfaction (ranging from 69.8 to 185.2). Whilst consumers do not in fact go to such lengths, a comparative assessment of one’s product in relation to other products, which are competitive, should serve as an important input for decision intelligence and competitive analysis.

User-generated content can also providing information about consumers' price-quality relationship and price sensitivity, and by providing decision support for companies to make pricing decisions. This question is answered by combining data from star ratings, text ratings and prices. In addition, companies can also search for words, highlighting affordability, cost-effectiveness, perceived value for money. For example “(…) Every product I chose had mixed reviews so I went with the prettiest, biggest and most affordable. (…)” or “He primarily took in “product name deleted” because they lasted forever and only required a small investment to refurbish them.”

Companies have to pay special attention not only to the price sensitivity of more budget oriented consumers, but also to the more expensive, higher involvement products. The results in Table 4 showed that a higher price and numerical rating are negatively correlated, weakly (−0.249), but significantly. The negative relationship hints the potentially high expectations of the consumers, which are able or decide to pay a higher price, but in return expect quality. Companies should pay special attention to findings from these reviews. Table 8 further investigates this relationship and confirms that for more expensive products, which count as higher involvement products, (those with above average price in our case), the relationship between the price and the sentiment is stronger and also systematically significant (Table 8).

4.3 Corporate performance, competitive analysis and decision intelligence

To assess their relative performance, companies can study numerical ratings of their and competitive products, calculate sentiments using different methods, study the content of reviews as well as calculate their price-performance ratios (if performance is measured using satisfaction).

Companies can also investigate the prevailing emotions and their competitors’ product reviews using the NRC method (Figure 3), providing further insight into the relative performance of firms. For example, the emotional structure is somewhat more positive for producer P1, the difference being most evident in the prevalence of the “trust.” On average, producer 1 is more “trusted,” which could be interpreted as “more” competitive. Also, for producer P1, the negative emotions (disgust, anger, fear) are slightly less common. In combination with the sentiment, the text evaluations suggest that producer P1 is better.

Whilst a comparison of two products per price was already depicted, also the relative performance of firms can be calculated (Figure 4). Specifically, the ratio between the star rating, the sentiment per product, and the price was averaged by the producer, since the purpose is primarily to present the methodological options. The average rating was divided by the price, obtaining the value of satisfaction “bought” by one euro. Here, the results show that systematically, producer 1 is better (more competitive, providing more satisfaction for money) than company 2. Any real analysis would consider also the product type, nonetheless, even company level data could carry relevant decision intelligence for companies, in particular where companies focus on the same market. Since the purchasing decision is increasingly dependent on the opinions of peers, companies should carefully address the comments by buyers and try to minimise the segment of the problems on which they have an impact [7].

5. Discussion

5.1 Discussion of results

The results of text-mining on consumer reviews of electrical appliances addressed several research aspects: consumer-driven product insights, price sensitivity and strategic pricing and using information for boosting corporate performance (Table 9).

5.1.1 Consumer-driven product insights

First, whilst the companies can obtain the first impression about the perceived quality of their goods from the widely used star-rating system, the results show that the user-generated content might not be well summarised by the stars but it is important to study the text as well, in this regard primarily by studying the text sentiment. The “star” itself does not summarise product quality sufficiently (De Langhe et al., 2016). The results from this study confirm the discrepancy between the star-rating and the text. Based on the detailed reading of the reviews, often the correlation is more obvious in reviews with more emotional involvement (e.g. star rating 5 and highly positive, emotional text). However, often, serious reviewers provide objective reviews with relatively neutral text, primarily due to a very extensive and detailed description of the product, combined with a choice of words without an evident emotional involvement. These reviews provide a list of observations, but rarely use more subjective, very emotional expressions. Such detailed, often longer, unemotional reviews might nonetheless have even a stronger impact on buyers, in particular due to its length. As stressed, readers find longer reviews more useful (reference deleted to maintain the integrity of review process). The premise of the more “objective” and factual approach is also supported by the fact that the overall sentiment in product evaluation (such as this electrical appliances example) is on average much lower than in tourism, even 2–3 times lower (reference deleted to maintain the integrity of review process). Textual reviews also reveal, which product characteristics consumers value or search most in a product (e.g. attributive theory Lancaster, 1966). This is especially important for appliances, especially those more durable (e.g. vacuum cleaners, etc.) (You et al., 2015). In their evaluations consumers often write about product performance, primarily those aspects that are not satisfied as they expected. Besides the functionality and useable life (Koenigsberg et al., 2011) consumers might also care about the design, but the role of design and other (non-functional) features might not be as important as functional (see, e.g. Blijlevens et al., 2009).

The findings in this study of consumer appliances are similar also to those in other fields, e.g. tourism (references removed). De Langhe et al. (2016) claim that when deciding about the product, the consumers do not rely on the price as the signal or the number of ratings, but rather on the average rating. This study extends these results by stressing the lack of a strong relationship between star rating and sentiment in the text. Companies must acknowledge that when a customer is deciding between comparable products, text reviews are important and are gaining importance due to the increasing use of e-sources and e-shopping. Two reviews with the same star rating can give a totally different “feel” about the product quality.

The negative relationship between the price and the sentiment implies that more expensive products have “more critical” evaluations. The rather critical approach could reflect high expectations, which were not fully met by the product. This can be a result of a more demanding consumer, which can afford to pay and are willing to pay the premium for the more expensive product, but have on average higher quality demands (Silverstein and Fiske, 2003). On the other hand, this result can also indicate that the more expensive products are usually examples of high(er)-involvement products as well, where the literature confirms the importance of emotions in high-involvement products (also including housing, cars, higher-priced products, including appliances) (Kokemuller, 2016; Koklic and Vida, 2009).

User-generated content also brings broader implications for pricing decisions, since it can provide the companies more detailed information about the perceived value of their products, about the price sensitivity amongst consumers (e.g. key-words affordability, cost-effectiveness, etc.), segment consumers, study reactions to price changes in relation to demand and customer perceptions at the same time, understand the impact of promotional pricing on consumer behaviour and adjust strategies accordingly prepare in-depth comparative study between relevant products of other companies.

Companies should also pay attention to specific reviews. Overall, the longer reviews are more appreciated, which is clear from the positive and significant correlation (which is in line with theory, claiming that information is sought (Deloitte, 2015). The literature also suggests that negative reviews are more appreciated due to information about what could go wrong (Lackermair et al., 2013). From our results it seems that the average and extremes are most interesting. Namely, the overall number of “review was helpful” votes was highest in case of reviews with highest rating, followed by those with worst and average. Further investigation to understand this, would be needed, best an independent consumer survey.

The results of the text-mining (key-words and LDA) show that, consumers predominantly evaluate the technical aspects of the product and the performance or functionality. Although producers often try to attract consumers with commercials, design, or other minor changes, WRAP (2014) claims that up to a third of washing machines and fridges, and a quarter of all the vacuum cleaners failed to fulfil consumer expectations about the quality and life-span. Given the importance of the basic elements (reliability, quality and durability), firms should focus on investing into enhancing these elements in their products. Reviews should be studied from different available web pages, separately by product. The producers should carefully consider the longer and the good, average and poorest reviews, since these are the reviews that consumers most often read and mark as being useful.

The star evaluations and sentiment results can also be efficiently used in comparing consumers’ satisfaction with different products at the firm level or comparing products of two competitive brands. In addition, we showed that analysis of prevailing emotions could be efficiently exploited to assess trust, satisfaction, joy, etc. of the consumers when using one product or when comparing competitive brands or companies.

From the competitiveness issue, price in comparison to satisfaction is extremely important. Whilst the literature states that consumers trust the more expensive products more and that they would more easily buy a branded (usually more expensive product) (WRAP, 2014), because the price is expected to be a signal of quality and so is the brand (Hwang et al., 2006), but the results of the text-mining show that the price is negatively correlated with the sentiment and star-rating. The negative relationship is more evident, stronger and more significant for the products with above average price in the sample, whilst the relationship for cheaper products is very weak. The results are in line with the literature that investigates the consumer involvement – the more involved a consumer is (in this case due to a high price) the higher are his expectations and the more sentimentally he accepts the products and his flaws. Also, with increasing income and increasing purchasing power (leading to buying pricier products) expectations and demand about product quality increase (Economist Intelligence Unit, 2010), for services in Asia, Oracle (2012), for computers).

On the other hand, consumer rationality implies that “value for money” is desired or that one must maximise utility given the budget constraint. Consequently, if a consumer reads two reviews of otherwise (in his opinion) comparable products, the “feel” about the sentiment of the text in relation to the price could be a deal-breaker. The relationship is also an indication of producers’ competitiveness.

Companies can benefit significantly by utilising data from social media and other platforms in several other aspects, which were not studied here. This includes obtaining: (1) detailed customer insights which also allows personalisation, by using big data analytics to analyse customer behaviour. This helps in understanding preferences, sentiments and trends as well as generate personalised recommendations and targeted advertising, improving customer engagement and satisfaction. (2) It allows companies improved brand monitoring and reputation management by assessing public sentiment, identifying potential issues and proactively manage their brand reputation by addressing concerns in real-time. (3) Social media analytics aids in measuring the effectiveness of marketing campaigns, since companies can track conversion rates, analyse customer interactions and adjust marketing strategies in real-time for better results. (4) Companies can collect social media feedback on existing products or services to identify areas for improvement, obtain insights into consumer needs and preferences, informing the development of new products and services, which allows them to stimulate product development and innovation. (5) Business process improvement stemming from supply chain optimisation, cost reduction by anticipating for example changes in demand, are possible. In addition (6) predictive analytics for business forecasting is improved using big data, including social media data, which enable predictive modelling, improving forecasting business trends, demand patterns and potential challenges, allowing for proactive decision-making. Last, also fraud detection and security may be addressed. For example, big data analytics helps in identifying patterns of fraudulent activities on social media, companies use advanced analytics to detect anomalies, protect customer data and enhance cybersecurity.

5.2 Contributions to the literature

The paper extends the existing literature in several ways. First, it extends the existing literature of consumer decision-making process by providing additional insights about the two stages of both the standard purchasing process, as well as the extended one (Torben, 2013). These are the stages of gathering information and the extension of the process, where the consumer becomes a more active influences than before. By studying the content of the reviews, the paper also extends the study of consumer behaviour, the role and content of eWOM. The text-mining identifies the key features the consumers evaluate and thereby also identifies the aspects that are most important to them as end-users. Next, the paper illustrates the benefits of the use of text-mining approaches in strategic management, primarily competitive intelligence and marketing literature. This paper also illustrates the informational outcome of different text-mining methods in decision-making. It is to the best of our knowledge the first such case in the management and marketing literature about the consumer products. Next, the paper extends the exiting studies of reviews by showing that the link between star evaluations and satisfaction (sentiment) as read between the lines is poor and that it is important to read the reviews. The paper also adds to the discussions about the high vs low involvement products in the marketing literature (e.g. Kokemuller, 2016)

5.3 Limitations and challenges to future research

The limitations of this research offer several challenges for future research. The limitations and consequently challenges for future research can be divided into two types: (1) research/content based and (2) methodological.

First, the research was limited to household appliances for only two producers. By extending the number of producers or/and focussing on specific product category the methodology could provide valuable insights to competition analysis. Second, obtaining more detailed data about the consumers would allow a unique study of consumer behaviour and purchasing motivation. Third, by extending the data to different countries, cultural aspects related to purchasing behaviour could be studied. Fourth, by a detailed product level analysis, but extending to various product categories, consumer behaviour over different categories would be examined. Fifth, by obtaining also corporate data, the link between corporate performance and product evaluation would be studied. Sixth in relation to the aforementioned competitiveness study future research would allow a deeper understanding of the role of the “new” consumer behaviour with increasing online research for corporate performance. Finally, with the development of the methodology the quality of the text-mining output will further improve. For example, sentiment is generally larger for longer reviews. Whilst this is partially controlled for using more novel methods (e.g using valence shifters and machine learning algorithms), an NLP should provide significant steps forward in the future. The presented research results are based on the available methodologies at the moment. However, due to the vast improvements in methodology, primarily machine learning and artificial intelligence (AI), some of the existing methods may become obsolete, which could impact also these findings.

In the future, research could investigate the merging of text mining with other data sources like transactional data or customer service records to provide a more complete understanding of consumer behaviour. Finally, we suggest that future studies should explore the cross-cultural relevance of our techniques, taking into account how cultural backgrounds impact online consumer conversations and product evaluations. In addition, due to possible fraudulent activities in generating reviews, it will be increasingly important to identify fake reviews, which is also a significant methodological challenge. With increased use of more advanced AI, this should be possible.

From the methodological perspective, the potential for future research stems also from the fast advances in analytical techniques or data science. Here, the potential can be expected in the development of more complex algorithms that can accurately analyse the nuances of user-generated content. More advanced algorithms for detailed analysis of vast, unstructured datasets are expected to be a feature of future AI breakthroughs, with a special emphasis on user-generated material. These developments in NLP and comprehension will improve AI’s capacity to comprehend sentiment, context and minute linguistic nuances with greater accuracy. Already recent studies underscore the progressive role of sentiment analysis, text mining and clustering techniques in refining recommender systems and decision-making strategies within e-commerce platforms (Kauffmann et al., 2019), which also could be further improved with methodology development. With better contextual awareness, large language models like GPT-4 should develop, particularly when handling lengthy talks or complicated textual data, could further improve the methodology. Domain-specific adaptations will receive a lot of attention, which will increase the models' applicability in specialist disciplines like engineering, law and medicine. Reducing bias in AI will be a key area of focus, with the goal of creating algorithms that are more equitable and morally sound. It’s also conceivable that AI will be more closely integrated with other cutting-edge technologies in the future, such as blockchain.

6. Conclusion

The economy is becoming increasingly dependent on the Internet, the digital economy is reshaping consumers, as well as companies. Whilst the changes are offering abundant opportunities to companies and consumers, any major change also provokes the established set of relations. The new economy increasingly depends on knowledge and the use of information. Consequently, a new frontier for competition is emerging. Whilst price and quality will remain important, it is also increasingly so that the “perceived” or portrayed image of the product as seen by (objective or not) users, is the one that will determine the future of the company. Companies must thus always keep in touch with the data about their products and exploit the information distilled to their own advantage.

In conclusion, the paper illustrates, how user-generated content can be used to obtain (1) consumer-driven insights into product characteristics, including (comparative) quality, identify emotional involvement, consumer preferences, (2) use data to study price sensitivity to support pricing decisions, study perceived value of products and to gauge price sensitivity amongst consumers and (3) prepare comparative analysis. The user-generated content can be efficiently used to (1) boost corporate performance by improving understanding of customer preferences, needs and make informed strategic investment decisions as well as innovations, marketing strategies. In addition, insights can be efficiently used in comparative analysis as well as brand monitoring, reputation management, targeted advertising and informed business forecasting and other. In this context, it is important to utilise different data sources, consider reliability and utilise several different statistical/analytical tools, that go beyond just text mining.

Figures

Figure 1

Basic descriptive statistics by product category (star-rating average, average NRC sentiment, average AFINN sentiment and average Liu sentiment)

Figure 2

Correlation between star-ratings, sentiments, sentiments per word, price and no of words*

Figure 3

Prevailing emotion in the reviews of producers P1 and P2, presented as % of all reviews with a specific emotion for P1 and P2

Figure 4

Satisfaction per euro paid for producers P1 and P2*

Table 1

Research objectives and methods

Broader field	Method		Output
Consumer-driven product insights	Content analysis (key-words, topic mining) Sentiment and emotions analysis Combination of text-mining and standard statistical analysis	Numerical evaluation of reviews, using sentiment analysis Determination of the prevailing emotion in the text Linking sentiment the star ranking, price, and usefulness of review Determination of most common terms for each of the analysed topics Identification of most discussed aspects Determination of key topics in the reviews related to a producer	Product characteristics (most important characteristics, evaluation of product and its characteristics, comparative/competitive characteristics with other products) Consumer demography
Price sensitivity and strategic pricing	Sentiment analysis, standard statistical methods		Price-quality ratio (Perceived quality of product per unit of price) Comparative analysis of “most important” traits by product price categories
Boosting corporate performance	Content analysis (key-words, topic mining) Sentiment and emotions analysis Combination of text-mining and standard statistical analysis		Product (characteristics, quality, price) analysis and improvement Comparative product analysis (characteristics, quality, price) Competitive analysis

Source(s): Authors own

Table 2

Sample description and overall descriptive statistics

	No. of reviews in the analysis	Minimum	Maximum	Mean	Std deviation
General information about the reviews and products
Rating	4,311	1	5	3.87	1.503
Number of words	4,311	1	1,235	86.61	113.723
Average product price ($)	4,311	22.99	3281.57	337.19	563.34
Helpful votes (number)	4,311	0	787	3.79	21.431
Sentiment of the entire review
BING	4,311	−20	28	2.28	3.341
NRC	4,311	−8	23	1.70	2.837
AFINN	4,311	−24	50	4.67	6.056
Liu	4,311	−31	46	2.75	4.322
Average sentiment per word in the review
BING_N	4,311	−0.50	1.00	0.0801	0.15279
AFINN_N	4,311	−1.00	4.00	0.1742	0.38063
NRC_N	4,311	−0.50	1.00	0.0417	0.09976
LIU_N	4,311	−0.50	1.00	0.0843	0.15342

Note(s): BING, NRC, AFINN, LIU refer to sentiment values relying on a specific methodology (e.g. AFINN methodology). BING_N is sentiment value per word and similarly holds for other sentiment calculation methods, with _N denoting calculation per word

Source(s): Authors’ own

Table 3

Text reviews with same numerical rating score but different sentiment*

	Numerical rating	AFINN sentiment
This vacuum works surprisingly well on carpets as well as bare floors. I’ve been using a * for my pergo floors without any problems, but the * is a lot more portable. The only problem is that the battery life isn’t the longest, but it is long enough for most homes with bare floors. The power button on the stick does get in the way a lot, I found myself accidentally turning the vacuum on or off just from handling the vacuum unit. The filter (…)	5	−6
I cannot recommend this little machine enough. It is one of THE BEST things I have EVER purchased!!! If you are addicted to *, run, not walk, to buy this machine. ! It’s a dream machine! We toiled over buying another appliance. (…) I was surprised that the one I was looking at (this one) in addition to the , had the most positive reviews of all the machines out there. People were raving about this thing (…) Now I know why people are raving positively about this one. It perfect * every single time. (…) But overall, I LOVE IT! (…) I’m buying one for my in-laws!	5	38

Note(s): * (…) The reviews were shortened, but great attention was placed on keeping the feel of the text. Stars (***) replacing product names. Reviews were not edited, language was kept as written by reviewer

Table 4

The example of vacuum cleaners: average values (sorted by star rating)

Product code	Star rating	No. of reviews	BING	NRC	AFINN	Liu	Star_P	BING_P	AFINN_P	NRC_P	LIU_P	No. of words in review	Price
P2_15	4.77	62	3.35	2.40	6.98	3.84	0.00955	0.00671	0.01397	0.00481	0.00768	123.05	500
P1_5	4.40	40	2.25	1.20	3.88	2.93	0.05366	0.02744	0.04726	0.01463	0.03567	66.38	82
P2_7	4.23	411	3.46	1.86	5.86	4.23	0.03522	0.02883	0.04885	0.01551	0.03524	102.25	120
P2_16	4.12	221	2.14	1.62	4.30	2.38	0.01374	0.00712	0.01434	0.00541	0.00792	57.18	300
P2_8	3.98	192	2.58	1.56	4.65	3.14	0.01705	0.01105	0.01990	0.00668	0.01344	70.65	234
P2_6	3.98	199	1.57	0.82	3.27	1.80	0.01458	0.00576	0.01197	0.00302	0.00661	40.31	273
P2_12	3.90	41	1.93	1.73	4.49	2.27	0.00781	0.00385	0.00898	0.00346	0.00454	72.68	500
P2_9	3.61	466	2.39	1.29	4.33	2.75	0.02125	0.01409	0.02546	0.00761	0.01616	68.02	170
P2_17	3.55	275	1.75	1.46	4.33	2.28	0.01182	0.00584	0.01444	0.00486	0.00760	93.05	300
P2_14	3.39	189	1.15	1.43	2.62	1.61	0.00678	0.00231	0.00524	0.00287	0.00322	79.63	500
P2_18	2.73	52	0.54	0.62	2.77	0.67	0.02117	0.00417	0.02147	0.00477	0.00522	121.38	129
P1_4	2.53	19	1.74	2.68	5.68	2.05	0.00722	0.00496	0.01624	0.00767	0.00586	169.37	350
P1	4.06	1,395	2.58	2.14	5.49	3.24	0.03661	0.01991	0.04020	0.01445	0.02405	109.5	363
P2	3.78	2,916	2.13	1.48	4.28	2.51	0.02372	0.01455	0.02862	0.00963	0.01679	75.7	325

Note(s): P (in column Product code) denotes a code, which replaces product name, Star_P denotes the “number of stars per euro paid,” whilst BING_P or other sentiments P denote unit of sentiment per euro paid

Source(s): Authors’ own

Table 5

Most common terms in all reviews of producers P1 and P2* and their vacuum cleaners

P1 vacuum cleaner			P2 vacuum cleaner
Word	Freq. of word	% of all words	Word	Freq. of word	% of all words
vacuum	66	1.12	vacuum	2,348	1.39
***	44	0.75	great	824	0.49
good	26	0.44	use	816	0.48
great	26	0.44	***	660	0.39
use	23	0.39	easy	638	0.38
hose	22	0.37	well	598	0.35
get	21	0.36	like	547	0.32
like	21	0.36	suction	540	0.32
suction	20	0.34	good	528	0.31
canister	18	0.31	floors	499	0.30
power	18	0.31	get	460	0.27
light	17	0.29	battery	452	0.27
batteries	16	0.27	love	449	0.27
cord	16	0.27	clean	445	0.26
vac	16	0.27	hair	406	0.24

Note(s): *brand name hidden

Source(s): Authors’ own

Table 6

LDA results: identification of main topics in the text*

Main topics discussed in vacuum cleaner reviews
P1 vacuums		P2 vacuums
T1	T2	T1	T2
vacuum	***	vacuum	vacuum
get	use	use	one
hose	work	clean	***
attach	like	floor	just
can	battery	great	suction
good	tool	easy	good
light	clean	well	work

Note(s): *** denotes a product name, P denotes producers 1 and 2, T1 and T2 denote topics 1 and 2

Source(s): Authors’ own

Table 7

Cost of euro of satisfaction (as measured by star ratings and different sentiment methods)

	Price paid per euro of satisfaction, measured by different methods
Product code	Star rating	BING	NRC	AFINN	Liu	Average price per unit of satisfaction	Price
P2_15	104.8	149.3	208.3	71.6	130.2	132.9	500
P1_5	18.6	36.4	68.3	21.1	28.0	34.5	82
P2_7	28.4	34.7	64.5	20.5	28.4	35.3	120
P2_16	72.8	140.2	185.2	69.8	126.1	118.8	300
P2_8	58.8	90.7	150.0	50.3	74.5	84.9	234
P2_6	68.6	173.9	332.9	83.5	151.7	162.1	273
P2_12	128.2	259.1	289.0	111.4	220.3	201.6	500
P2_9	47.1	71.1	131.8	39.3	61.8	70.2	170
P2_17	84.5	171.4	205.5	69.3	131.6	132.5	300
P2_14	147.5	434.8	349.7	190.8	310.6	286.7	500
P2_18	47.3	238.9	208.1	46.6	192.5	146.7	129
P1_4	138.3	201.1	130.6	61.6	170.7	140.5	350
P1	89.4	140.7	169.6	66.1	112.0	115.6	363
P2	86.0	152.6	219.6	75.9	129.5	132.7	325

Note(s): P (in column Product code) denotes a code, which replaces product name

Source(s): Authors’ own

Table 8

Correlation coefficient: for high-involvement products (price measure) the link between price and sentiment is stronger and more negative

		Low price (below average)	High price (above average)
Rating	Pearson Correlation	−0.077^**	−0.393^**
Rating	Sig. (2-tailed)	0.000	0.000
BING	Pearson Correlation	−0.044^**	−0.208^**
BING	Sig. (2-tailed)	0.010	0.000
NRC	Pearson Correlation	−0.023	−0.141^**
NRC	Sig. (2-tailed)	0.180	0.000
AFINN	Pearson Correlation	−0.030	−0.195^**
AFINN	Sig. (2-tailed)	0.079	0.000
Liu	Pearson Correlation	−0.030	−0.200^**
Liu	Sig. (2-tailed)	0.081	0.000
	N	3,408	884

Note(s): **. Correlation is significant at the 0.01 level (2-tailed)

*. Correlation is significant at the 0.05 level (2-tailed)

Source(s): Authors’ own

Table 9

Main results and implications

Research aspect	Method	Result	Implication
Consumer-driven product insights	Content analysis (key-words, topic mining) Sentiment and emotions analysis Combination of text-mining and standard statistical analysis	Weak positive correlation between star rating and sentiment in the text Distinction between high and lower involvement products (more expensive) Negative relationship between the price and numerical rating Helpful reviews are those longer and those of more extreme ranking	Need to study text reviews also not only rely on star rating This result is especially important when a consumer is making a choice between different (similarly star rated) products, because in this case the likelihood of reading the reviews increases If price is taken as indication of involvement level, higher involvement evaluations are more sentimental and more negative The relationship is much weaker than in the case of tourism. This indicates that on average consumers are much less involved in appliances purchases To decide, people consider the extremes, average and longer reviews. The consumer on average reads up to 10 reviews. It is more likely that they will read those longer and those either with a high or low star rating
Price sensitivity and strategic pricing	Sentiment analysis, standard statistical methods	Price does not relate directly to quality (as evaluated using different methods) Reviews for more expensive products more critical	Emotional analysis complements the sentiment analysis well and is especially relevant in comparative analysis at both producer and product level. It can help identify brand trust or provide additional information (add explanation to otherwise numerical sentiment) about the quality of the product Due to the increasing reliance of consumer decision-making on online reviews, the relationship between the sentiment in the text and the price is an indication of producer and product competitiveness
Boosting corporate performance	Content analysis (key-words, topic mining) Sentiment and emotions analysis Combination of text-mining and standard statistical analysis	Consumers primarily evaluate functionality of the product The most common words are either nouns, describing the main features of the product, use or handling or (primarily) positive adjectives Two topics dominate: technical aspect and functionality	Evaluations are beneficial to producers, since they point to the products’ characteristics consumers care most about Firms should focus on enhancing the qualities of the functionality and should also stress these qualities in commercials, since consumers primarily care about those

Source(s): Authors’ own

Table A1

Correlation analysis

		Rating	BING_N	AFINN_N	NRC_N	LIU_N	BING	NRC	AFINN	Liu	No of words	Price	Count of helpfulness votes
Rating	Pearson Cor	1	0.337^**	0.288^**	0.235^**	0.346^**	0.412^**	0.204^**	0.395^**	0.363^**	−0.074^**	−0.249^**	−0.027
Rating	Sig. (2-tailed)		0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.071
BING_N	Pearson Cor	0.337^**	1	0.843^**	0.581^**	0.979^**	0.149^**	−0.023	0.085^**	0.083^**	−0.268^**	−0.120^**	−0.071^**
BING_N	Sig. (2-tailed)	0.000		0.000	0.000	0.000	0.000	0.138	0.000	0.000	0.000	0.000	0.000
AFINN_N	Pearson Cor	0.288^**	0.843^**	1	0.603^**	0.832^**	0.045^**	−0.039^*	0.107^**	0.005	−0.241^**	−0.095^**	−0.062^**
AFINN_N	Sig. (2-tailed)	0.000	0.000		0.000	0.000	0.003	0.011	0.000	0.763	0.000	0.000	0.000
NRC_N	Pearson Cor	0.235^**	0.581^**	0.603^**	1	0.563^**	0.094^**	0.196^**	0.097^**	0.053^**	−0.169^**	−0.068^**	−0.043^**
NRC_N	Sig. (2-tailed)	0.000	0.000	0.000		0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.005
LIU_N	Pearson Cor	0.346^**	0.979^**	0.832^**	0.563^**	1	0.163^**	−0.013	0.099^**	0.119^**	−0.261^**	−0.120^**	−0.071^**
LIU_N	Sig. (2-tailed)	0.000	0.000	0.000	0.000		0.000	0.404	0.000	0.000	0.000	0.000	0.000
BING	Pearson Cor	0.412^**	0.149^**	0.045^**	0.094^**	0.163^**	1	0.646^**	0.812^**	0.934^**	0.353^**	−0.172^**	0.078^**
BING	Sig. (2-tailed)	0.000	0.000	0.003	0.000	0.000		0.000	0.000	0.000	0.000	0.000	0.000
NRC	Pearson Cor	0.204^**	−0.023	−0.039^*	0.196^**	−0.013	0.646^**	1	0.639^**	0.650^**	0.502^**	−0.064^**	0.117^**
NRC	Sig. (2-tailed)	0.000	0.138	0.011	0.000	0.404	0.000		0.000	0.000	0.000	0.000	0.000
AFINN	Pearson Cor	0.395^**	0.085^**	0.107^**	0.097^**	0.099^**	0.812^**	0.639^**	1	0.787^**	0.431^**	−0.138^**	0.094^**
AFINN	Sig. (2-tailed)	0.000	0.000	0.000	0.000	0.000	0.000	0.000		0.000	0.000	0.000	0.000
Liu	Pearson Cor	0.363^**	0.083^**	0.005	0.053^**	0.119^**	0.934^**	0.650^**	0.787^**	1	0.475^**	−0.153^**	0.106^**
Liu	Sig. (2-tailed)	0.000	0.000	0.763	0.000	0.000	0.000	0.000	0.000		0.000	0.000	0.000
N_words	Pearson Cor	−0.074^**	−0.268^**	−0.241^**	−0.169^**	−0.261^**	0.353^**	0.502^**	0.431^**	0.475^**	1	0.108^**	0.259^**
N_words	Sig. (2-tailed)	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000		0.000	0.000
Price	Pearson Cor	−0.249^**	−0.120^**	−0.095^**	−0.068^**	−0.120^**	−0.172^**	−0.064^**	−0.138^**	−0.153^**	0.108^**	1	0.047^**
	Sig. (2-tailed)	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000		0.002
	N	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311
Count of helpfulness votes	Pearson Cor	−0.027	−0.071^**	−0.062^**	−0.043^**	−0.071^**	0.078^**	0.117^**	0.094^**	0.106^**	0.259^**	0.047^**	1
	Sig. (2-tailed)	0.071	0.000	0.000	0.005	0.000	0.000	0.000	0.000	0.000	0.000	0.002
	N	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311	4,311

Note(s): **. Correlation is significant at the 0.01 level (2-tailed)

*. Correlation is significant at the 0.05 level (2-tailed)

BING, NRC, AFINN, LIU refer to sentiment values relying on a specific methodology (e.g. AFINN methodology). BING_N is sentiment value per word, similarly holds for other sentiment calculation methods, with _N denoting calculation of sentiment per word

Source(s): Authors’ own

Notes

1.

URL links available upon request.

2.

We will refer to this 1 to 5 rating also as numerical rating due to its very “tangible” nature.

3.

Since sentiment analysis usually sums up the sentiment of all words in the text, the absolute value of the review’s sentiment increases with the review’s word count. To control for this, also sentiment per word was calculated.

4.

But even between these differences exist due to reliance on different lexicons.

5.

But it can be used as a method to exclude certain products by setting criteria to “product must have at least 3 stars, which is also available in “product search engines”.

6.

Of course, this cannot be generalized for all products, especially for less-durable items or items of two brands of similar quality (also in terms of reviews) design, as it would become more relevant in actual decision-making process of the consumer. But this does not say that each individual values a number of other aspects in a product, such as size, colour, etc. (see, e.g. (Essoussi and Merunka, 2007; Westland and Meong, 2015)).

7.

In some cases, the reviews also address the delivery, transport packaging, online seller, and related elements, which are not directly related to the product itself.

Ethics statement: The authors have no conflict of interest to report

Appendix

References

Amarouche, K., Benbrahim, H. and Kassou, I. (2015), “Product opinion mining for competitive intelligence”, Procedia Computer Science, Vol. 73, pp. 358-365, doi: 10.1016/j.procs.2015.12.004.

Anderson, M. (2014), “88% of consumers trust online reviews as much as personal recommendations”, Search Engine Land, available at: http://searchengineland.com/88-consumers-trust-online-reviews-much-personal-recommendations-195803 (accessed 7 April 2016).

Anderson, J. and Narus, J. (1998), “Business marketing: understand what customers value”, Harvard Business Review, Vol. 76 No. 6, pp. 53-55, 58-65.

Ansary, A. and Nik Hashim, N.M.H. (2018), “Brand image and equity: the mediating role of brand equity drivers and moderating effects of product type and word of mouth”, Review of Managerial Science, Vol. 12 No. 4, pp. 969-1002, doi: 10.1007/s11846-017-0235-2.

Bahtar, A.Z. and Muda, M. (2016), “The impact of user – generated content (UGC) on product reviews towards online purchasing – a conceptual framework”, Procedia Economics and Finance, Vol. 37, pp. 337-342, doi: 10.1016/S2212-5671(16)30134-4.

Blei, D.M. (2012), “Probabilistic topic models”, Communications of the. ACM, Vol. 55 No. 4, pp. 77-84, doi: 10.1145/2133806.2133826.

Blijlevens, J., Creusen, M. and Schoormans, J. (2009), “How consumers perceive product appearance: the identification of three product appearance attributes”, IJDesign, Vol. 3 No. 3, pp. 27-35.

Chakraborty, U. (2019), “The impact of source credible online reviews on purchase intention: the mediating roles of brand equity dimensions”, Journal of Research in Interactive Marketing, Vol. 13 No. 2, pp. 142-161, doi: 10.1108/JRIM-06-2018-0080.

Chapman, M. (2023), “Is decision science quietly becoming the new data science?”, Medium, 3 August, available at: https://towardsdatascience.com/is-decision-science-quietly-becoming-the-new-data-science-5616a12fa9e8 (accessed 7 August 2023).

Chau, M. and Xu, J. (2012), “Business intelligence in blogs: understanding consumer interactions and communities”, MIS Q, Vol. 36 No. 4, pp. 1189-1216, doi: 10.2307/41703504.

Chen, E. (2011), “Introduction to latent dirichlet allocation”, Edwin Chen, available at: http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/ (accessed 20 October 2014).

Chen, T., Samaranayake, P., Cen, X., Qi, M. and Lan, Y.-C. (2022), “The impact of online reviews on consumers’ purchasing decisions: evidence from an eye-tracking study”, Frontiers in Psychology, Vol. 13, 865702, doi: 10.3389/fpsyg.2022.865702.

Darko, A.P., Liang, D., Xu, Z., Agbodah, K. and Obiora, S. (2023), “A novel multi-attribute decision-making for ranking mobile payment services using online consumer reviews”, Expert Systems with Applications, Vol. 213, 119262, doi: 10.1016/j.eswa.2022.119262.

De Langhe, B., Fernbach, P.M. and Lihtenstein, D.R. (2016), “Navigating by the stars: investigating the actual and perceived validity of online user ratings”, Journal of Consumer Research, Vol. 42 No. 6, pp. 817-833, doi: 10.1093/jcr/ucv047.

Deloitte (2015), The Deloitte Consumer Review Digital Predictions 2015, Deloitte, available at: https://www2.deloitte.com/content/dam/Deloitte/tr/Documents/consumer-business/consumer-review-digital-predictions-2015.pdf

Economist Intelligence Unit (2010), “Greater expectations: keeping pace with customer service demands in Asia pacific”, CFO Innovation ASIA, available at: http://www.cfoinnovation.com/white-paper/1816/greater-expectations-keeping-pace-customer-service-demands-asia-pacific (accessed 12 November 2015).

Essoussi, L.H. and Merunka, D. (2007), “Consumers’ product evaluations in emerging markets: does country of design, country of manufacture, or brand image matter?”, International Marketing Review, Vol. 24 No. 4, pp. 409-426, doi: 10.1108/02651330710760991.

Feinerer, I., Hornik, K., Software, A. and Ghostscript, I.G. (2023), tm: Text Mining Package (0.7-11) [Computer software], available at: https://cran.r-project.org/web/packages/tm/index.html

Fernandes, S., Panda, R., Venkatesh, V.G., Swar, B.N. and Shi, Y. (2022), “Measuring the impact of online reviews on consumer purchase decisions – a scale development study”, Journal of Retailing and Consumer Services, Vol. 68, 103066, doi: 10.1016/j.jretconser.2022.103066.

Gémar, G. and Jiménez-Quintero, J.A. (2015), “Text mining social media for competitive analysis”, Tourism and Management Studies, Vol. 11 No. 1, pp. 84-90.

Grün, B. and Hornik, K. (2011), “Topicmodels : an R package for fitting topic models”, Journal of Statistical Software, Vol. 40, p. 13, doi: 10.18637/jss.v040.i13.

Harika, S., Priyanka, M. and Vidyullatha, P. (2022), “Exploring text mining techniques for business intelligence”, 2022 7th International Conference on Communication and Electronics Systems (ICCES), presented at the 2022 7th International Conference on Communication and Electronics Systems (ICCES), pp. 920-925, doi: 10.1109/ICCES54183.2022.9835940.

He, W., Zha, S. and Li, L. (2013), “Social media competitive analysis and text mining: a case study in the pizza industry”, International Journal of Information Management, Vol. 33 No. 3, pp. 464-472, doi: 10.1016/j.ijinfomgt.2013.01.001.

Hu, M. and Liu, B. (2004), “Mining and summarizing customer reviews”, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, ACM, pp. 168-177, doi: 10.1145/1014052.1014073.

Hwang, Y., Roe, B. and Teisl, M.F. (2006), “Does price signal quality? Strategic implications of price as a signal of quality for the case of genetically modified food”, International Food and Agribusiness Management Review, Vol. 09 No. 01, pp. 93-116.

Ishikiriyama, C.S., Miro, D. and Gomes, C.F.S. (2015), “Text Mining Business Intelligence: a small sample of what words can say”, Procedia Computer Science, Vol. 55, pp. 261-267, doi: 10.1016/j.procs.2015.07.044.

Ismagilova, E., Slade, E.L., Rana, N.P. and Dwivedi, Y.K. (2020), “The effect of electronic word of mouth communications on intention to buy: a meta-analysis”, Information Systems Frontiers, Vol. 22 No. 5, pp. 1203-1226, doi: 10.1007/s10796-019-09924-y.

Kang, M., Sun, B., Liang, T. and Mao, H.-Y. (2022), “A study on the influence of online reviews of new products on consumers’ purchase decisions: an empirical study on JD.com”, Frontiers in Psychology, Vol. 13, 1032304, doi: 10.3389/fpsyg.2022.983060.

Kauffmann, E., Peral, J., Gil, D., Ferrández, A., Sellers, R. and Mora, H. (2019), “Managing marketing decision-making with sentiment analysis: an evaluation of the main product features using text data mining”, sustainability, Vol. 11 No. 15, p. 4235, doi: 10.3390/su11154235.

Ko, A. and Gillani, S. (2020), “A research review and taxonomy development for decision support and business analytics using semantic text mining”, International Journal of Information Technology and Decision Making, Vol. 19 No. 01, pp. 97-126, doi: 10.1142/S0219622019300076.

Koenigsberg, O., Kohli, R. and Montoya, R. (2011), “The design of durable goods”, Marketing Science, Vol. 30 No. 1, pp. 111-122, doi: 10.1287/mksc.1100.0592.

Kokemuller, N. (2016), “What are high involvement purchases?”, available at: http://classroom.synonym.com/high-involvement-purchases-10584.html (accessed 3 May 2016).

Koklic, M.K. and Vida, I. (2009), “A strategic household purchase: consumer house buying behavior”, Managing Global Transitions, Vol. 7 No. 1, pp. 75-96.

Köseoglu, M.A., Mehraliyev, F., Altin, M. and Okumus, F. (2020), “Competitor intelligence and analysis (CIA) model and online reviews: integrating big data text mining with network analysis for strategic analysis”, Tourism Review, Vol. 76 No. 3, pp. 529-552, doi: 10.1108/TR-10-2019-0406.

Kozyrkov, C. (2019), “What is decision intelligence?”, Medium, available at: https://towardsdatascience.com/introduction-to-decision-intelligence-5d147ddab767 (accessed 7 August 2023).

Lackermair, G., Kailer, D. and Kanmaz, K. (2013), “Importance of online product reviews from a consumer’s perspective”, Advances in Economics and Business, Vol. 1 No. 1, pp. 1-5, doi: 10.13189/aeb.2013.010101.

Lancaster, K. (1966), “A new approach to consumer theory”, Journal of Political Economy, Vol. 74 No. 2, pp. 132-157, doi: 10.1086/259131.

Lee, C.S., Cheang, P.Y.S. and Moslehpour, M. (2022), “Predictive analytics in business analytics: decision tree”, Advances in Decision Sciences, Vol. 26 No. 1, pp. 1-29, doi: 10.47654/v26y2022i1p1-29.

Liu, B. (2012), Sentiment Analysis and Opinion Mining, Morgan & Claypool, San Rafael, CA.

Mariani, M.M., Perez-Vega, R. and Wirtz, J. (2022), “AI in marketing, consumer research and psychology: a systematic literature review and research agenda”, Psychology and Marketing, Vol. 39 No. 4, pp. 755-776, doi: 10.1002/mar.21619.

Mohammad, S. (2015), “NRC word-emotion association lexicon”, Saif M. Mohammad, available at: http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm (accessed 16 October 2015).

Mudambi, S.M. and Schuff, D. (2010), “Research note: what makes a helpful online review? A study of customer reviews on Amazon.com”, MIS Quarterly, Vol. 34 No. 1, pp. 185-200, doi: 10.2307/20721420.

Nielsen, F.Å. (2011), “A new ANEW: evaluation of a word list for sentiment analysis in microblogs”, Proceedings of the ESWC2011 Workshop on “Making Sense of Microposts”: Big Things Come in Small Packages, pp. 93-98.

OECD (2013), The Role and Measurement of Quality in Competition Analysis, Organization for Economic Development and Cooperation, Paris.

Oracle (2012), “Seven power lessons on customer experience leaders”, Oracle, available at: http://www.oracle.com/us/corporate/acquisitions/rightnow/seven-power-lessons-wp-1502937.pdf (accessed 7 April 2016).

Park, S.-H., Huh, S.-Y., Oh, W. and Han, S.P. (2012), “A social network-based inference model for validating customer profile data”, MIS Quarterly, Vol. 36 No. 4, pp. 1217-1237, doi: 10.2307/41703505.

Posner, M., Wallace, A. and Borovsky, Z. (2012), “Very basic strategies for interpreting results from the Topic Modeling Tool”, Miriam Posner’s Blog, available at: http://miriamposner.com/blog/very-basic-strategies-for-interpreting-results-from-the-topic-modeling-tool/ (accessed 13 July 2014).

Ranjan, J. and Foropon, C. (2021), “Big data analytics in building the competitive intelligence of organizations”, International Journal of Information Management, Vol. 56, 102231, doi: 10.1016/j.ijinfomgt.2020.102231.

Rodell, J.B., Sabey, T.B. and Rogers, K.M. (2020), “‘Tapping’ into goodwill: enhancing corporate reputation through customer volunteering”, Academy of Management Journal, Vol. 63 No. 6, pp. 1714-1738, doi: 10.5465/amj.2018.0354.

Saura, J.R., Palacios-Marqués, D. and Ribeiro-Soriano, D. (2023), “Digital marketing in SMEs via data-driven strategies: reviewing the current state of research”, Journal of Small Business Management, Vol. 61 No. 3, pp. 1278-1313, doi: 10.1080/00472778.2021.1955127.

Silverstein, M.J. and Fiske, N. (2003), “Luxury for the masses”, Harvard Business Review, Vol. 1 No. 4, pp. 48-121, April, available at: https://hbr.org/2003/04/luxury-for-the-masses (accessed 19 June 2017).

Stepchenkova, S., Kirilenko, A.P. and Morrison, A.M. (2008), “Facilitating content analysis in tourism research”, Journal of Travel Research, Vol. 47 No. 4, pp. 454-469, doi: 10.1177/0047287508326509.

Subrahmanya, S.V.G., Shetty, D.K., Patil, V., Hameed, B.M.Z., Paul, R., Smriti, K., Naik, N. and Somani, B.K. (2022), “The role of data science in healthcare advancements: applications, benefits, and future prospects”, Irish Journal of Medical Science (1971 -), Vol. 191 No. 4, pp. 1473-1483, doi: 10.1007/s11845-021-02730-z.

Torben, R. (2013), “The consumer decision making process has shifted”, Torben Rick, 3 November, available at: http://www.torbenrick.eu/blog/business-improvement/infographic-the-consumer-decision-making-process-has-shifted/ (accessed 7 April 2016).

Trustpilot (2021), “Why do people read reviews? What our research revealed – Trustpilot Business Blog”, Trustpilot, available at: https://business.trustpilot.com/reviews/learn-from-customers/why-do-people-read-reviews-what-our-research-revealed (accessed 7 August 2023).

Verma, D.P.S. and Gupta, S.S. (2004), “Does higher price signal better quality?”, Vikalpa, Vol. 29 No. 2, pp. 67-78, doi: 10.1177/0256090920040206.

Vitouladiti, O. (2014), “Content analysis as a research tool for marketing, management and development strategies in tourism”, The Economies of Balkan and Eastern Europe Countries in the Changed World (EBEEC 2013), Vol. 9, pp. 278-287, doi: 10.1016/S2212-5671(14)00029-X.

Westland, S. and Meong, J.S. (2015), “The relationship between consumer colour preferences and product-colour choices”, Journal of the International Colour Association, No. 14, pp. 47-56.

WRAP (2014), “Switched on to value”, WRAP, available at: http://www.wrap.org.uk/sites/files/wrap/Switched%20on%20to%20Value%2012%202014.pdf (accessed 7 April 2016).

Wu, R. and Qiu, C. (2023), “When Karma strikes back: a model of seller manipulation of consumer reviews in an online marketplace”, Journal of Business Research, Vol. 155, 113316, doi: 10.1016/j.jbusres.2022.113316.

You, Y., Vadakkepatt, G.G. and Joshi, A.M. (2015), “A meta-analysis of electronic word-of-mouth elasticity”, Journal of Marketing, Vol. 79 No. 2, pp. 19-39, doi: 10.1509/jm.14.0169.

Zhang, J., Zheng, W. and Wang, S. (2020), “The study of the effect of online review on purchase behavior: comparing the two research methods”, International Journal of Crowd Science, Vol. 4 No. 1, pp. 73-86, doi: 10.1108/IJCS-10-2019-0027.

Zhao, Y., Yang, S., Narayan, V. and Zhao, Y. (2012), “Modeling consumer learning from online product reviews”, Marketing Science, Vol. 32 No. 1, pp. 153-169, doi: 10.1287/mksc.1120.0755.

Acknowledgements

The results were prepared within the following projects: J5-4575, P5-0128, V5-2264, co-funded by Slovenian Research and Innovation Agency.

Corresponding author

Uroš Godnov can be contacted at: uros.godnov@famnit.upr.si

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Keywords

Citation

Publisher

License

1. Introduction

2. Theoretical background

2.1 Rise of the “digital consumer” and influential reviews

2.2 User-generated content, decision science and corporate performance

3. Methods

3.1 Research objectives

3.2 Empirical approach

3.3 Data

4. Results

4.1 Consumer-driven product insights

4.1.1 Consumer satisfaction

4.1.2 Product qualities, pricing and product comparison

4.2 Pricing strategies

4.3 Corporate performance, competitive analysis and decision intelligence

5. Discussion

5.1 Discussion of results

5.1.1 Consumer-driven product insights

5.2 Contributions to the literature

5.3 Limitations and challenges to future research

6. Conclusion

Figures

Figure 1

Figure 2

Figure 3

Figure 4

Notes

References

Acknowledgements

Corresponding author

Related articles

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information