Introduction

Analyzing consumers’ online purchasing habits can provide many advantages for commercial entities, marketing professionals, and consumers. Classifiers based on decision trees can be applied to predict market trends, specifically for determining when to buy or sell. Decision trees are the transparent and efficient option for machine learning because they sort data attributes at each node to arrive at a decision (Samarth 2023; Vaca et al. 2020). Several researchers proposed a system to enhance the shopping experience at fashion retail outlets. This system smartly groups together various items and customer profiles, which it encounters online and in physical stores. It leverages the power of mining association rules to foresee the shopping patterns of new customers (Bellini et al. 2023; Fan et al. 2023).

Formal concept analysis (FCA) (Carpineto and Romano 2004; Ganter and Wille 1999; Poelmans et al. 2013) is a method of data analysis based on a lattice theory that has great potential to study how people behave online. This method offers visualization capabilities to explore the relationships in the object-attribute tables that are present, for example, by customer transactions and their characteristics. Combining well-known methods like decision trees and pattern-finding has made these studies more attractive to a broader audience.

The main objective of this paper is to show what is currently known about this topic and how data analysis tools are used to understand customer behavior and buying patterns online. In this paper, we explore online customer behavior by combining decision trees, GUHA (General Unary Hypotheses Automaton) association rules, and Formal concept analysis. We present the findings from these different methods and the practice recommendations. The paper is organized as follows. In “Literature review” section, we review recent studies on analyzing customer behavior with the mentioned data analysis methods. In “Methodology and dataset description” section, we describe data, its source quality, and used methods, including decision trees, GUHA association rules, and concept lattices from FCA. Our findings are presented in the “Results” section. The paper ends with a discussion, recommendations, and notes on its limitations.

Literature review

This section reviews studies on analyzing consumer behavior using decision trees, association rules, and FCA.

Related studies on decision trees

Decision trees are versatile predictive models widely used in artificial intelligence and machine learning to extract hidden patterns from data. They use a logical, rule-based approach to represent and classify events, aiding prediction. Decision trees are particularly effective for categorizing large data sets (Charbuty and Abdulazeez 2021). These studies detail the fundamentals of decision trees and their applications and evaluate specific algorithms, data sets, and results. Yawata et al. (2022) improved decision tree accuracy by adding multi-dimensional boundaries. However, its performance on larger, more complex data sets remains untested, and it has yet to be compared with other advanced methods (Yawata et al. 2022).

Decision trees can also be helpful in remote sensing for distinguishing crop types and patterns, demonstrating their versatility with varied data sets (Tariq et al. 2022). Wang et al. (2022) empirically tested a model linking online trust with consumer e-commerce engagement, suggesting that trust and perceived value significantly influence usage intentions. The study also found that women are more likely than men to use e-commerce for online shopping, providing insights for businesses to enhance their e-commerce strategies (Wang et al. 2022). Charandabi and Ghanadiof (2022) highlight trust’s critical role in e-commerce, addressing security and privacy concerns. Despite its importance, defining trust and its effects remains challenging. Their study introduces a Fuzzy Data Envelopment Analysis model to evaluate trust and resilience in online and offline shopping, offering a new perspective on customer behavior in digital markets and suggesting ways to increase customer satisfaction and loyalty (Charandabi and Ghanadiof 2022).

Lastly, decision trees can enhance customer satisfaction by identifying key purchasing factors, allowing businesses to tailor the customer experience and potentially increase sales and loyalty (Samarth 2023). Ganar and Hosein (2022) crafted a composite model to forecast whether a bank’s customer is likely to switch to using online banking services. The authors compare two supervised learning algorithms: Decision Tree and Extreme Gradient Boosted (XGBoost). The purpose is to predict clients’ behavior when transitioning to an online banking platform. The results indicated that combining the K-Modes clustering algorithm and the XGBoost classification model yielded the best test accuracy, a remarkable 96.1% (Ganar and Hosein 2022).

Wen (2023) analyzed customer churn in banks using logistic regression and decision tree models, with the latter proving more precise and unbiased. Key predictors of churn included age, salary, and product usage, with certain customer profiles showing a higher tendency to leave. They recommend that banks refine their systems to retain customers at high risk of churning (Wen 2023). Similarly, Luo (2023) study employed a Decision Tree Model to forecast churn in telecom firms, finding that customers with long-term contracts are less prone to churn. Factors such as age, marital status, and financial reliance were linked to increased churn. Luo’s model also offered specific strategies to reduce customer turnover (Luo 2023).

Tundo and Mahardika (2023) study introduces a method for forecasting roof tile production in Kebumen by combining a Tsukamoto Fuzzy Inference System with a Decision Tree C 4.5. This research tackles the issue of production management for tile manufacturers, aiming to align output with customer demand and optimize profits. Traditional systems often result in either excess or insufficient production. The authors propose a computerized forecasting tool that employs Tsukamoto fuzzy logic, which generates rules from the C 4.5 decision tree without expert input. The C 4.5 algorithm forms these rules by learning from data-capturing conditions that frequently arise. The study’s findings indicate that the model’s forecasts were close to the actual production numbers, with a 29.34% error rate and 70.66% accuracy, as per the Average Forecasting Error Rate. The model proved to be very effective, fulfilling all customer orders. Its predictions were used to determine production volumes or were integrated with inventory levels to meet customer needs fully (Tundo and Mahardika 2023).

Related studies on association rules

Analyzing customer behavior using association rules involves data mining techniques to uncover complex patterns in customer purchases. These patterns provide insights into how different products or services are related from the customer’s point of view. Such information is crucial for companies to fine-tune marketing strategies, decide on product placement, and improve inventory management. Fan et al. (2023) presented a new method to identify association rules for unsafe behavior in various industrial settings using comprehensive data. They applied the Apriori algorithm to explore patterns of hazardous behavior in subway construction, finding significant rules that link the type of work, construction stage, work hours, and unsafe practices (Fan et al. 2023). The authors could have expanded their research to understand why certain stages or times are prone to unsafe actions. Comparing their method with others in representing and analyzing unsafe behavior data could have validated their approach further.

The following study by Husein et al. (2022) discussed the challenges of predicting customer segments in e-commerce using traditional models. They proposed a clustering method to categorize customer data, using the Apriori algorithm to identify common purchasing patterns. Their work demonstrates the effectiveness of the K-Means Clustering technique combined with the RFM model in recognizing patterns of product purchases (Husein et al. 2022). The authors clearly outline the problem, their solution, and the evaluation of their method, contributing to the current understanding of customer segmentation.

The study by Zhou et al. (2020) delves into the enhancement of warehouse order-picking efficiency through the application of big data. The research involved the collection and analysis of data to improve storage organization through the implementation of clustering and association algorithms for item categorization. The refined approach to the conventional ABC storage method resulted in improved accuracy in goods classification. The study concludes that the application of intelligent data analysis can optimize warehouse operations, leading to cost savings without the need for additional storage space. Thus, the study provides valuable insights into the potential of big data to enhance warehouse management and make it more efficient (Zhou et al. 2020).

Xiao and Piao (2022) explored the significant data era’s onset and how mobile edge computing’s transformative potential has led to a surge in consumer data production. They highlighted the critical need to analyze this vast data for valuable insights. The study focused on understanding customer groups’ consumption patterns to develop effective marketing strategies, particularly in the telecom industry. It pointed out the growing use of association rules for analyzing customer behavior due to the complex data and algorithms involved. The paper noted that while association rules and mobile computing are potent for identifying patterns in large data sets, they have limits, such as the inability to infer causality (Xiao and Piao 2022).

Stuti et al. (2022) presented a detailed method to extract insights from customer purchasing data to improve product recommendations. Their two-step approach identified product correlations from user transactions and then formulated utility-based association rules to map purchasing trends. These rules were integrated into a recommender system to suggest new products. The study also evaluated the accuracy of these rules against those derived from Frequent Item Set Mining and Improved Utility-Based Mining on an e-commerce platform (Stuti et al. 2022).

Roy (2016) explored the retail industry in India, mainly focusing on the food and grocery segment. The study aimed to assess the association between shopping trip patterns and consumer basket size and identify factors influencing a customer’s choice of store format for purchasing grocery items. The study discusses how internal factors like emotions and external factors like store location influence consumer behavior. Demographic factors, convenience, and merchandise assortment influence grocery store choice behavior. The study found that respondents who purchase items less frequently (fortnightly or monthly) tend to shop at hypermarkets or departmental stores to take advantage of offers and discounts. In contrast, those who shop daily prefer convenience or Kirana stores for the credit facilities and personal relationships they offer. The article concludes that both the purchase pattern and the volume significantly influence the choice of store format for grocery shopping in Kolkata (Roy 2016).

Related studies on FCA

Formal concept analysis is a mathematical approach used for data analysis and knowledge representation, widely utilized in fields like customer segmentation. It is a computational method grounded in the theory of partial orders, which excels at uncovering the structural properties of binary data. FCA is particularly adept at identifying how different subsets within a dataset are interrelated. The method is highly regarded for its applications in various areas, especially bioinformatics, which is crucial for analyzing and making sense of complex data patterns.

The research by Roscoe et al. (2022) highlights FCA’s role in revealing the complex interplay and dependencies found in gene expression data, thereby contributing significantly to bioinformatics. In gene expression, which sheds light on how genes are activated under different conditions, FCA is an essential tool for a methodical analysis of such data, aiding in disease prediction and understanding vital biological processes (Roscoe et al. 2022).

Customer segmentation involves categorizing a company’s customers into groups based on common characteristics like age, gender, interests, and purchasing patterns. Using the FCA method to investigate customer behavior and their classification represents a new approach to this issue that offers fresh perspectives and methodologies. Rungruang et al. (2024) introduce a novel customer segmentation technique that blends the RFM model with FCA. Their method bridges the gap between complex data science techniques and their practical business applications. This approach segments customers and reveals the underlying patterns in customer data, offering insights that traditional clustering methods do not. They tested their model against standard clustering techniques using a well-known dataset and found it provides marketers with clear, actionable insights for crafting marketing strategies. The paper delves into the RFM model, which categorizes customers based on purchase recency, frequency, and spending, and describes how to score customers using this model. In summary, the research offers a new clustering algorithm that leverages RFM and FCA for practical, real-world business segmentation (Rungruang et al. 2024).

Kwon et al. (2023) delve into mobile app usage patterns, examining them through the lens of media combination. They unveil a new technique called Weighted Formal Concept Analysis (WFCA) to discern standard sets of apps, known as repertoires and standout apps, or ’killer apps,’ focusing on demographic differences. Their findings highlight that communication and lifestyle apps are universally popular among all demographics. The study employs FCA to map out these app repertoires and to explore the concept of mobile app repertoires more deeply. The authors point out that FCA offers a systematic way to decode the intricate web of app usage patterns across various demographic groups. By integrating weights into FCA, the WFCA method enhances this exploration, providing a detailed picture of user preferences and behaviors (Kwon et al. 2023).

Yang et al. (2019) developed new, eco-friendly financial products for China’s rapidly growing economy. They used a particular method called a fuzzy concept lattice, which organizes data without duplication and emphasizes essential features. This makes the method less complex and improves the evaluation of data. They adapted this method to work better with different data types, created a new algorithm, and showed how it works with green finance data. Their research led to a practical blueprint for creating green financial products (Yang et al. 2019).

Methodology and dataset description

Before beginning any analysis, describing the input data thoroughly is crucial. This initial step provides essential context and insight into the data under scrutiny, enabling the analyst to spot any potential issues or biases and to confirm that the data are appropriate for the chosen analytical methods (Schröer et al. 2021). Moreover, detailing the input data helps peers who may examine the analysis to grasp the origins and characteristics of the data utilized. A clear description of the input data is vital for maintaining the analysis’s validity and precision. The CRISP-DM framework, widely recognized in the industry for data mining, is often used to analyze and interpret consumer trends (Plotnikova et al. 2021).

Preparing data is a crucial and often the lengthiest phase in data mining, essential for obtaining pertinent and insightful outcomes. This preparation involves cleansing, labeling, and structuring unprocessed data, rendering it apt for machine learning and analysis. It also encompasses exploring and charting the data to unearth patterns. Remarkably, data preparation might consume up to 80% of project time, underscoring the need for specialized tools to streamline this phase (Saltz and Krasteva 2022). In our scenario, data preparation commenced with acquiring the data from a company, which would serve as a primary research site for our design applications. Given the extensive dataset with numerous parameters, grasping the data’s essence was a pivotal initial step in our dissertation. The initial data encompassed databases detailing about

  • orders from the pertinent reporting period,

  • customer profiles,

  • product details,

  • goods procurement information.

At the outset, we focused on comprehending the data’s variables and structure. We can categorize the data processing journey into distinct stages:

  • grasping and initial processing of data,

  • verification and validation of data,

  • visualization of data.

In data mining, the CRISP-DM (Cross-Industry Standard Process for Data Mining) is a leading framework used to uncover customer behavior patterns. It includes the following phases (Schröer et al. 2021):

  • Business understanding During this phase, we aimed to grasp the business objectives and requirements of the project. We engaged in frequent virtual meetings with relevant departments, using accurate organizational data to guide the project’s direction and outcomes. However, this critical step must be addressed in the CRISP-DM process, leading to vague goals or repetitive results. Acknowledging the significance of this phase is essential for successful research.

  • Data understanding A crucial step in the CRISP-DM process is understanding the data (Plotnikova et al. 2021). This phase is closely linked to the first stage, as the data’s nature often sets the research’s scope and possible findings. Our work involved getting to know the datasets, which included detailed customer, order, and product information from 10 different databases. Understanding the relationships and dynamics between these databases was vital for progressing our research. Like the first phase, our in-depth discussions with company department representatives focused on these databases. A solid understanding of the data is necessary to explore customer behavior thoroughly, and skipping this step could undermine the integrity and accuracy of our results. Central to understanding the data was identifying the accuracy of the linkage identifier across each database. Before meeting with company officials, we tried to link the databases using key identifiers like order ID, customer ID, email, entity ID, and product ID. We aimed to create a unified database with all relevant variables for our study. Initially, we used the entity ID as a familiar identifier, but discussions with company stakeholders revealed that the entity ID varied across databases, making it unreliable for our purposes. Recognizing the importance of accurate data understanding, we combined the databases using multiple identifiers to ensure precise variable alignment in the central database for the following CRISP-DM stages.

  • Data preparation The initial stage focused on primary data operations such as cleaning, reducing, transforming, integrating, and creating new variables. After thoroughly understanding the data, we reviewed it analytically using descriptive statistics to explore variable distributions, attribute descriptions, and the overall data structure. This detailed examination helped us move smoothly to the next CRISP-DM phase: modeling. With these insights, we improved our data preparation methods, which enhanced the quality of our modeling results. Additionally, we created new variables like time of day, day of the week, fiscal quarter, and delivery method during this data processing phase.

  • Modeling During this stage, we utilized various models to uncover patterns and connections in consumer behavior. This involved choosing the most appropriate model and then assessing its performance.

Data description and source quality

GymBeam, the fitness e-commerce platform with nutritional supplements, provides and maintains our dataset, empowering academic researchers to explore crucial aspects of e-shop consumer behavior in Eastern Europe. The anonymized dataset contains 9123 orders (transactions) of sports nutrition, healthy foods, fitness clothing, and accessories. Data from 2014 to 2021, covering 8 years, are employed.

The set of 19 categorical attributes is available in our dataset with no missing values. In particular, order_ID attribute is a primary key in our data and uniquely identifies each order. The dataset contains 70 product categories (attribute category_ID), ranging from 1 to 1,911 orders for each category value and an average of 130.3. The gross profit category includes three possible classes: A (56.2%), B (23.7%), and C (20.1%). We consider delivery and pickup for the categories of shipping service attribute. The time of order, e.g., day name, hour, week, and other time attributes, is included in the dataset.

Attribute

No. of categories

Attribute

No. of categories

order_ID

9123

day_of_month

31

gross_profit_category

3

month

12

gender

2

year

8

country

8

hour

24

shipping_service

2

day_name

7

bestseller

2

day_part

6

long_term_unavailable

2

quarter_of_year

4

product_ID

70

week

53

Q4_period

2

day_of_year

365

  

morning

2

Looking at the relevance of the input data, the company takes great care in how its data are collected. As mentioned several times in the text, the input database contains parameters related to the company’s orders and customers. These raw data were processed during the CRISP-DM process, specifically in the data preparation part. For example,

  • Unification of order status Orders progress through various stages: pending, prepared, shipped, in transit, delivered, and returned, among others. The order is deemed complete when its final status is “delivered.” Focusing only on orders marked as “delivered” and excluding every other type of order status allows us to identify transactions that reached customers accurately. This approach helps to remove any bias from our analysis that could arise from including orders that were never received or were returned.

  • Consolidation of the gender attribute The company uses two approaches to determine the customer’s gender. The first represents a situation where a registered customer places the order. In this case, the customer fills in the gender parameter himself. When a non-registered user makes an order, the company uses an algorithm to predict the customer’s gender with a prediction accuracy between 80% and 95%. However, this value was not included for a specific group of orders, or the % accuracy needed to be higher. Such orders were excluded from the final database due to concerns of data bias.

Based on the information on the origin of the data received from the company itself and the procedures we have carried out related to CRISP-DM, we can consider our input data as relevant and ready to enter the modeling phase.

In the following part of this section, we present the selected basic notions of three methods (decision trees, GUHA method, and formal concept analysis), which we applied to our dataset.

Decision trees

In this subsection, we will present the classification method of decision trees, which we apply for the classification of orders by quartiles or day parts. In classification tasks, the values of input categorical attributes are called categories (or classes). The values of the target attribute are usually called labels.

The classification method of decision trees is based on the recursive division of the training set into two or more parts (Quinlan 1992; Breiman 1996). A mathematical structure of a tree is created by selecting one input attribute for splitting in each iteration. The splitting attribute is selected using entropy, Gini index, correlation coefficients, statistical tests, and other criteria. The aim is to increase the homogeneity of the subsets of objects concerning the values of the target attribute. Each leaf node of the tree (corresponding to a concatenation of logical conditions on the input attributes) is labeled by the most frequent category of the target attribute in the subset of objects. The tree’s construction is completed if the objects of the same target attribute category (class) are present in the tree’s leaves or if the stopping criterion is fulfilled (e.g., the minimal number of objects in the leaf node or the depth of the tree).

From a mathematical point of view, we can consider a set A of input categorical attributes and \(b\notin A\) of a target categorical attribute. As an illustrative example from our data, we can take the set of input attributes

$$\begin{aligned} A=\{\text{gender, shipping\_service, bestseller}\},\end{aligned}$$

and a target attribute

$$\begin{aligned} b=\text{Q4\_period}.\end{aligned}$$

A set \(L_a\) for each \(a\in A\) expresses the categories of the input attribute, and a set \(L_b\) for b represents the categories of the target attribute. In our example, we have that

$$\begin{aligned} L_{\text{gender}} & = \{\text{woman,man}\}, \\ L_{\text{shipping\_service}} & =\{\text{delivery,pickup}\}, \\ L_{\text{bestseller}} & = \{\text{yes,no}\}, \\{} & {} L_{\text{Q4\_period}} & =\{\text{Q4\_yes,Q4\_no}\}. \end{aligned}$$

In general, the input for decision tree construction can be expressed as a training set D such that it holds

$$\begin{aligned} D \subseteq { L_{a_1}\times \ldots \times L_{a_n}} \times L_b \end{aligned}$$

for a set \(A=\{a_1,\ldots ,a_n\}\) of input attributes and a target attribute b.

Given a training set, we describe a general method for constructing a decision tree in Algorithm 1.

Algorithm 1
figure a

\(\text{Decision\_tree\_construction}(D, \text{minProportion})\)

We will explain Algorithm 1 in more detail to help readers understand the application in our research. In the following example, consider the training set D and value 0.8 of minProportion input parameter in Algorithm 1.

D

Gender

Shipping_service

Bestseller

Q4_period

Order #36

Man

Pickup

No

Q4_no

Order #47

Woman

Delivery

No

Q4_no

Order #128

Man

Delivery

Yes

Q4_no

Order #151

Woman

Delivery

Yes

Q4_yes

Order #160

Man

Delivery

Yes

Q4_yes

Order #199

Woman

Pickup

Yes

Q4_yes

Order #401

Man

Pickup

Yes

Q4_yes

We have seven orders (transactions) in our example of the training set D. In the first line of Algorithm 1, we need to check the proportion of orders in Q4 period. Let \(y\in L_b\), D be the training set, and \(D_y\) be the set of all objects from D with category y. The mapping

$$\begin{aligned} t_{y} & = \frac{|\mathsf{D}_y|}{|\mathsf{D}|} \\ & = \frac{|\{(x_1, \ldots , x_n,y)\in {D:(x_1, \ldots , x_n) \in L_{a_1} \times \ldots \times L_{a_n} }\}|}{|{\mathsf{D}}|} \end{aligned}$$

corresponds to the proportion of objects from D with target attribute category y. In our example, we have that

$$\begin{aligned} & |{\mathsf{D}}| =7,\\ & |{\mathsf{D}_{\text {Q4\_yes}}}|=|\{\mathsf{order} \# \mathsf{151}, \mathsf{order} \# \mathsf{160}, \mathsf{order} \# \mathsf{199}, \mathsf{order} \# \mathsf{401}\}|=4,\\ & |{\mathsf{D}_{\mathsf{Q4\_no}}}|=|\{\mathsf{order} \# \mathsf{36}, \mathsf{order} \# \mathsf{47}, \mathsf{order} \# \mathsf{128}\}|=3. \end{aligned}$$

Thus, we have

$$\begin{aligned} t_{\text {Q4\_yes}} = \dfrac{4}{7}=0.57, \qquad \qquad t_{\text {Q4\_no}} = \dfrac{3}{7}=0.43. \end{aligned}$$

Since 57% of orders are in Q4 period and 43% are not, the termination condition is not fulfilled (\(\text{minProportion}=0.8>0.57\)), and we continue with line 4 of Algorithm 1.

In the fourth line of Algorithm 1, we compute the quality of each input attribute by splitting criterion (e.g., entropy or Gini index). These measures describe the homogeneity in the subset of objects regarding the target attribute. A smaller entropy or Gini index value corresponds to higher homogeneity in the given subset of objects.

First, we will formally describe the evaluation of the quality of input attributes by entropy or Gini index for a training set D. For a selected input attribute \(a\in A\) and its category \(z\in L_a\), we can define a subset \(D^{a,z}\) of a training set D such that it includes all objects of category z of attribute a. For example,

$$\begin{aligned}{} & {} D^{\text {bestseller,yes}}\\{} & {} \quad =\{\text {order} \# {128}, \text {order} \# {151}, \text {order} \# {160}, \text {order} \# {199}, \text {order} \# {401}\} \end{aligned}$$

and

$$\begin{aligned}{} & {} D^{\text {bestseller,no}}\\{} & {} \quad =\{\text {order} \# {36}, \text {order} \# {47}\}. \end{aligned}$$

Analogously, we can define a mapping \(t^{a,z}\) for the proportion of objects from D with the category z of the input attribute a.

Let \(y_i\in L_b\) for all \(i\in \{1,\dots ,|L_b| \}\). A tuple

$$\begin{aligned} {\mathcal {T}}^{D^{a,z}} =\Bigl (t_{y_1}^{a,z},\dots , t_{y_{|L_b|}}^{a,z}\Bigr ) \end{aligned}$$

corresponds to a discrete probability distribution for a subset \(D_a^{a,z}\) of a training set D, whereby

$$\begin{aligned} \sum _{y\in L_b} t_y^{a,z}=1. \end{aligned}$$

In our example, we have that \(L_b=\{\text {Q4\_yes}, \text {Q4\_no}\}\) and

$$\begin{aligned} t_{\text {Q4\_yes}}^{\text {bestseller,yes}}=\dfrac{4}{5} \qquad \qquad t_{\text {Q4\_no}}^{\text {bestseller,yes}}=\dfrac{1}{5} \end{aligned}$$

Thus,

$$\begin{aligned} {\mathcal {T}}^{D^{\text {bestseller,yes}}} =\Biggl (\dfrac{4}{5}, \dfrac{1}{5}\Biggr ) \end{aligned}$$

.

Now, for a given \({\mathcal {T}}^{D^{a,z}}\), an entropy is defined by a function

$$\begin{aligned} \text{entropy}({\mathcal {T}}^{D^{a,z}}) = - \sum _{y\in L_b: t_y^{a,z}\ne 0} t_y^{a,z} \text{log}_2 t_y^{a,z}. \end{aligned}$$

and Gini index is defined by a function

$$\begin{aligned} \text{gini}({\mathcal {T}}^{D^{a,z}}) = 1 - \sum _{y\in L_b} (t^{a,z})^2. \end{aligned}$$

We will explain the computation of entropy and the overall quality of attribute bestseller in our running example. We have that

$$\begin{aligned}\text{entropy}\Bigl ({\mathcal {T}}^{D^{\text {bestseller,yes}}}\Bigr ) & = -\Biggl (\dfrac{4}{5}\text{log}_2\dfrac{4}{5}+\dfrac{1}{5}\text{log}_2\dfrac{1}{5} \Biggr )=0.722, \\ \text{entropy}\Bigl ({\mathcal {T}}^{D^{\text {bestseller,no}}}\Bigr ) &= -\Biggl (\dfrac{2}{2}\text{log}_2\dfrac{2}{2} \Biggr )=0. \end{aligned}$$

To evaluate the overall quality of the bestseller attribute, the weighted average of the previous two values of entropy will be computed by

$$\begin{aligned} \text {entropy\_average}(\text {bestseller})=\dfrac{5}{ 7}\cdot 0.722+\dfrac{2}{7}\cdot 0=0.516 \end{aligned}$$

since we have four bestsellers and two not bestsellers out of six orders.

Analogously, we can evaluate other input attributes in our running example. Thus, we have that

$$\begin{aligned}{} & {} \text {entropy\_average}(\text {gender})=\dfrac{4}{7}\cdot 1+\dfrac{3}{7}\cdot 0.918=0.965, \\{} & {} \text {entropy\_average}(\text {shipping\_service})=\dfrac{4}{7}\cdot 1+\dfrac{3}{7}\cdot 0.918=0.965. \end{aligned}$$

The bestseller attribute obtained the smallest value of the weighted average of entropies (i.e., the highest homogeneity in the subset of objects). Thus, we have \(a^{\ast}=\text {bestseller}\) in lines 7 and 8 of Algorithm 1. Moreover, in Lines 9–11, we will split the root of our decision tree into two nodes by bestseller attribute. We recursively call the method for decision tree construction on these two subsets of orders (i.e., bestsellers and not bestsellers.)

The decision tree in our running example (Fig. 1) contains six nodes, whereby the most important input attributes (in three iterations of the algorithm) are the bestseller, the gender, and the shipping service, respectively.

Fig. 1
figure 1

Decision tree in our running example [Node 1 represents the root of the decision tree with all orders (transactions) from the training set D. Node 2 and Node 3 represent the internal nodes of the decision tree after the first and the second recursive call of Algorithm 1. Terminal nodes 1–4 represent the leaves of the decision tree after satisfying the stopping criteria. Each node highlights the mode of the target attribute and the absolute (or relative) frequencies of the categories (classes). The edges include the values of the splitting attribute]

Decision trees may have issues with overfitting, bias toward attributes with many categories, or sensitivity to small changes in the data. We address these limitations and discuss potential solutions in the discussion and limitations part of our paper.

In our paper, we apply the construction of a decision tree (Algorithm 1) for the classification of 9123 orders having the set of input attributes \(A=\{\text{gross\_profit\_category,gender,country},\text{shipping\_service},\text{bestseller}, \text{long\_term\_unavailable}\},\) and the target attribute \(b=\text{Q4\_period}\) such that

$$\begin{aligned}{} & {} L_{\text {gross\_profit\_category}}=\{\text{A,B,C}\}, \\{} & {} L_{\text{gender}}=\{\text{woman,man}\}, \\{} & {} L_{\text{country}}=\{\text{BG,CZ},\text{DE},\text{HR},\text{HU},\text{RO},\text{SK,UA}\}, \\{} & {} L_{\text{shipping\_service}}=\{\text{delivery,pickup}\}, \\{} & {} L_{\text{bestseller}}=\{\text{yes,no}\}, \\{} & {} L_{\text{long\_term\_unavailable}}=\{\text{yes,no}\}, \\{} & {} L_{\text{Q4\_period}}=\{\text{Q4\_yes,Q4\_no}\}. \end{aligned}$$

GUHA association rules

GUHA (acronym for General Unary Hypotheses Automaton) is a method for generating the association rules between attributes in the data. GUHA methods were successfully applied in the analysis of patient examination (e.g., for attributes of cholesterol, weight, height, triglycerides, and other), for analysis of forest damage (e.g., the effects of sulfur or chlorine on needle loss of spruce), prediction of highway travel time, mutagenes discovery, determination of suitable markers for the prediction of bleeding at patients with chronic lymphoblastic leukemia, or analysis of data about epileptic patients (Hájek et al. 2010).

In comparison with the Apriori algorithm (Agrawal et al. 1993), GUHA method provides a much more general set of association rules. In particular, one can generate association rules based on different types of associations between the attributes and different logical formulas. GUHA method is based on the so-called interest measures or quantifiers (for example p-implication) which we describe in the following.

Consider a set of attributes A and a value \(0<p \le 1\). Then, a p-implication over A is an expression

$$\begin{aligned} Z \Rightarrow _p W, \end{aligned}$$

whereby Z is an antecedent, W is a consequent, and \(Z,W\subseteq A.\) The p-implication is true if at least \(100 \cdot p\%\) of objects satisfying Z fulfills W, as well.

 

Gender

Shipping_service

Bestseller

Period

Order #36

Man

Pickup

No

Q4_no

Order #47

Woman

Delivery

No

Q4_no

Order #128

Man

Delivery

Yes

Q4_no

Order #151

Woman

Delivery

Yes

Q4_yes

Order #160

Man

Delivery

Yes

Q4_yes

Order #199

Woman

Pickup

Yes

Q4_yes

In our example, the 1-implication can be illustrated by the expression \(\{\text {woman, bestseller\_yes}\}\Rightarrow _{1}\{\text {Q4\_yes}\}\) which means that women with bestseller ordered their products in the fourth quarter of the year. Moreover, \(p=1\) implies it holds for at least 100% of orders in our data.

Now, we describe how to determine whether p-implication is true in the dataset. Let D be a training set. Thus, for two binary attributes \(a_1, a_2\in A\), we can construct the contingency table, whereby

  • r is the number of objects of a training set D satisfying both \(a_1\) and \(a_2\),

  • s is the number of objects of a training set D satisfying both \(a_1\) and \(\lnot a_2\),

  • t is the number of objects of a training set D satisfying both \(\lnot a_1\) and \(a_2\),

  • u is the number of objects of a training set D satisfying both \(\lnot a_1\) and \(\lnot a_2.\)

D

\(a_2\)

\(\lnot a_2\)

\(a_1\)

r

s

\(\lnot a_1\)

t

u

In our running example, we can take \(a_1=\text {bestseller}\) and \(a_2=\text {Q4\_period}\). Thus, we obtain

 

Q4_yes

Q4_no

bestseller_yes

3

1

bestseller_no

0

2

In general, p-implication

$$\begin{aligned} \{a_1\} \Rightarrow _p \{a_2\} \end{aligned}$$

is true in D if

$$\begin{aligned} \dfrac{r}{r+s} \ge p. \end{aligned}$$

In our running example, we have that 0.75-implication

$$\begin{aligned} \{\text {bestseller\_yes}\} \Rightarrow _{0.75} \{\text {Q4\_yes}\} \end{aligned}$$

is true in D since

$$\begin{aligned} \dfrac{3}{3+1} \ge 0.75. \end{aligned}$$

However, 1-implication

$$\begin{aligned} \{\text {bestseller\_yes}\} \Rightarrow _{1} \{\text {Q4\_yes}\} \end{aligned}$$

is not true in D since

$$\begin{aligned} \dfrac{3}{3+1} \not \ge 1. \end{aligned}$$

Generally, p-implications can be seen as the association rules from Agrawal et al. (1993) with confidence p. To explain how GUHA association rules differ from other association rule mining techniques and to understand its applicability and potential advantages, we present the additional types of rules that can be generated by GUHA method. In particular, we can use GUHA to generate the so-called double p-implication over a set of attribute A. The double p-implication over A is a formula

$$\begin{aligned} Z \Leftrightarrow _p W, \end{aligned}$$

that is true if at least \(100\cdot p\%\) of objects (e.g., customers) satisfying W or Z fulfill W and Z simultaneously. Double p-implication

$$\begin{aligned} \{a_1\} \Leftrightarrow _p \{a_2\} \end{aligned}$$

is true in D if

$$\begin{aligned} \dfrac{r}{r+s+t} \ge p. \end{aligned}$$

In our running example, we have that double 0.75-implication

$$\begin{aligned} \{\text {bestseller\_yes}\} \Leftrightarrow _{0.75} \{\text {Q4\_yes}\} \end{aligned}$$

is true in D since

$$\begin{aligned} \dfrac{3}{3+1+0} \ge 0.75. \end{aligned}$$

However, 1-implication

$$\begin{aligned} \{\text {bestseller\_yes}\} \Leftrightarrow _{1} \{\text {Q4\_yes}\} \end{aligned}$$

is not true in D since

$$\begin{aligned} \dfrac{3}{3+1+0} \not \ge 1. \end{aligned}$$

The third possibility is constructing p-equivalence over a set of attributes A. In particular, p-equivalence over A is a formula

$$\begin{aligned} Z\equiv _p W, \end{aligned}$$

that is true if at least \(100 \cdot p\%\) of all objects (e.g., customers) obtain the same truth degree for Z and W. Thus, p-equivalence

$$\begin{aligned} \{a_1\} \equiv _p \{a_2\} \end{aligned}$$

is true in D if

$$\begin{aligned} \dfrac{r+u}{r+s+t+u} \ge p. \end{aligned}$$

In our running example, we have that 0.75-equivalence

$$\begin{aligned} \{\text {bestseller\_yes}\} \equiv _{0.75} \{\text {Q4\_yes}\} \end{aligned}$$

is true in D since

$$\begin{aligned} \dfrac{3+2}{3+1+0+2} \ge 0.75. \end{aligned}$$

However, 1-equivalence

$$\begin{aligned} \{\text {bestseller\_yes}\} \equiv _{1} \{\text {Q4\_yes}\} \end{aligned}$$

is not true in D since

$$\begin{aligned} \dfrac{3+2}{3+1+0+2} \not \ge 1. \end{aligned}$$

The other GUHA measures include, for example, average dependence, simple deviation, Fisher quantifier, or above-average dependence (Hájek et al. 2010).

We explained the process of evaluation of p-implications for two binary attributes. Nevertheless, it can be generalized for more than two attributes. In this case, we will consider that r is the number of objects of a training set D satisfying all attributes from an antecedent and all attributes from a consequent. The same holds for computing s, t, and u in case of more than two attributes. Moreover, we can generate the contingency tables for categorical attributes by applying their transformations to the binary attributes.

In our running example, we can take the bestseller, shipping service, and Q4 period attributes. Thus, we obtain

 

Q4_yes

Q4_no

bestseller_yes \(\wedge\) delivery

2

1

In our running example, we have that 0.75-implication

$$\begin{aligned} \{\text {bestseller\_yes, delivery}\} \Rightarrow _{0.6} \{\text {Q4\_yes}\} \end{aligned}$$

is true in D since

$$\begin{aligned} \dfrac{2}{2+1} \ge 0.6. \end{aligned}$$

However, 0.75-implication

$$\begin{aligned} \{\text {bestseller\_yes}\} \Rightarrow _{0.75} \{\text {Q4\_yes}\} \end{aligned}$$

is not true in D since

$$\begin{aligned} \dfrac{2}{2+1} \not \ge 0.75. \end{aligned}$$

In addition, the ratio of the number of objects in D with attributes from antecedent and consequent to the number of all objects in D is called the support of the rule. In our example, the support of 0.75-implication

$$\begin{aligned} \{\text {bestseller\_yes, delivery}\} \Rightarrow _{0.6} \{\text {Q4\_yes}\} \end{aligned}$$

is 0.33 since we have two orders satisfying this rule out of all six orders.

In this paper, we will generate GUHA association rules for four categorical attributes of shipping_service, gender, gross_profit_category, and category_ID at antecedent and three categorical attributes of day_part, quarter_of_year, and day_name at consequent.

Formal concept analysis

Regarding data analysis methods, FCA is a bi-clustering method based on clustering the rows and columns of a table matrix. It provides powerful computational and visualization techniques for solving the various issues of supervised and unsupervised learning tasks (Ganter and Wille 1999). The connections of FCA with machine learning methods and their potential applications in text mining, biology, medicine, or bioinformatics are reviewed by Carpineto and Romano (2004); Poelmans et al. (2013).

We will describe the application of FCA with an example given by a small subset of our data. As the input of this method, we consider six fitness store orders described by five binary attributes (woman, man, delivery, pickup, and bestseller). Each order (or its attribute) can be represented by one row (or column) of the object-attribute table.

 

Woman

Man

Delivery

Pickup

Bestseller

Order #11

 

×

×

 

×

Order #23

 

×

×

  

Order #39

 

×

 

×

 

Order #55

 

×

 

×

 

Order #135

×

  

×

 

Order #156

×

  

×

 

Note that this cross table is generally rectangular in which the rows represent objects and the columns correspond to the attributes.

We can generally describe the cross object-attribute table by defining a notion of the so-called formal context.

Definition 1

Consider the non-empty sets B, A, and a crisp binary relation \(I\subseteq B\times A\). A triple \(\langle B,A,I \rangle\) is called a formal context. The elements of the sets B, A are called objects and attributes, respectively. A relation I is called an incidence relation.

In our example, we have that

$$\begin{aligned} B=\{\text{order} \#11,\text{order} \#23,\text{order} \#39,\text{order} \#55,\text{order} \#135,\text{order} \#156\} \end{aligned}$$

and

$$\begin{aligned} A=\{\text{woman,man,delivery,pickup,bestseller} \}. \end{aligned}$$

Moreover, it holds that

$$\begin{aligned}{} & {} \langle \text{order} \#11,\text{woman}\rangle \notin I,\\{} & {} \langle \text{order} \#11,\text{pickup}\rangle \notin I, \end{aligned}$$

but

$$\begin{aligned}{} & {} \langle \text{order} \#11,\text{man}\rangle \in I,\\{} & {} \langle \text{order} \#11,\text{delivery}\rangle \in I,\\{} & {} \langle \text{order} \#11,\text{bestseller}\rangle \in I. \end{aligned}$$

Regarding the formal context of fitness shop orders, we aim to discover and visualize the meaningful biclusters in our data. For example, we are interested in only the maximal subsets of orders with a delivery method and the bestsellers. For such reasoning, we will use two natural and symmetric mappings to assign the common attributes of a given subset of objects and vice versa.

Definition 2

Let \(\langle B,A,I \rangle\) be a formal context. Consider the subsets \(X \in \text{P}(B)\), \(Y \in \text{P}(A)\) of power sets of objects and attributes, respectively. Two mappings \(\nearrow :\text{P}(B)\rightarrow \text{P}(A)\) and \(\swarrow :\text{P}(A)\rightarrow \text{P}(B)\) such that

$$\begin{aligned}{} & {} \nearrow (X)=X^\nearrow =\{y\in A:(\forall x \in X) \langle x,y \rangle \in I\} \\{} & {} \swarrow (Y)=Y^\swarrow =\{x\in B:(\forall y \in Y) \langle x,y \rangle \in I\} \end{aligned}$$

are called concept-forming operators of a formal context \(\langle B,A,I \rangle\).

In our example, it holds

$$\begin{aligned} \nearrow (\{\text{order} \#23\} )=\{ \text{man,delivery}\}, \end{aligned}$$

but

$$\begin{aligned} \swarrow (\{\text{man,delivery}\} )=\{ \text{order} \#11,\text{order} \#23 \}. \end{aligned}$$

The mapping \(\nearrow\) assigns the common attributes of selected fitness shop orders (e.g., man and delivery method to the order #23). Symmetrically, the mapping \(\swarrow\) assigns the subset of all fitness shop orders to the selected attributes (e.g., man and delivery method).

The previous definition of concept-forming operators provides a way to construct the so-called formal concepts (which can be seen as the maximal biclusters from the clustering point of view).

Definition 3

Consider a formal context \(\langle B,A,I \rangle\) and the concept-forming operators \(\nearrow\), \(\swarrow\). Consider the subsets of objects and attributes \(X \in \text{P}(B)\), \(Y \in \text{P}(A)\). A pair \(\langle X,Y\rangle\) such that it holds \(\nearrow (X)=Y\), \(\swarrow (Y)=X\) is called a formal concept of \(\langle B,A,I \rangle\).

In our running example, we can take

$$\begin{aligned} X=\{ \text{order} \#11,\text{order} \#23 \} \end{aligned}$$

and

$$\begin{aligned} Y=\{ \text{man,delivery} \}. \end{aligned}$$

We have that

$$\begin{aligned} \nearrow (X)=\nearrow (\{\text{order} \#11,\text{order} \#23\} )=\{ \text{man,delivery} \}=Y \end{aligned}$$

and

$$\begin{aligned} \swarrow (Y)=\swarrow (\{\text{man,delivery}\} )=\{ \text{order} \#11,\text{order} \#23 \}=X. \end{aligned}$$

Thus, a pair

$$\begin{aligned} \langle \{ \text{order} \#11,\text{order} \#23\}, \{\text{man,delivery}\}\rangle \end{aligned}$$

forms a formal concept.

We can imagine a formal concept as a bi-cluster that gathers a closed subset of objects and a closed subset of attributes. Another example of a formal concept can include the maximal subset of women who ordered with the pickup method.

In general, the set of all formal concepts (biclusters) can be expressed by a set

$$\begin{aligned} \{\langle X,Y\rangle \in \text{P}(B)\times \text{P}(A): \nearrow (X)=Y , \swarrow (Y)=X \}. \end{aligned}$$

In our running example, we obtain the following eight formal concepts:

$$\begin{aligned}{} & {} \langle \emptyset , \{\text{woman,man,delivery,pickup,bestseller}\}\rangle , \\{} & {} \langle \{ \text{order} \#11\}, \{\text{man,delivery,bestseller} \}\rangle , \\{} & {} \langle \{ \text{order} \#11,\text{order} \#23\}, \{\text{man,delivery}\}\rangle , \\{} & {} \langle \{ \text{order} \#39,\text{order} \#55\}, \{\text{man,pickup}\}\rangle , \\{} & {} \langle \{ \text{order} \#135,\text{order} \#156\}, \{\text{woman,pickup}\}\rangle , \\{} & {} \langle \{ \text{order} \#39,\text{order} \#55,\text{order} \#135,\text{order} \#156\}, \{ \text{pickup}\}\rangle , \\{} & {} \langle \{ \text{order} \#11,\text{order} \#23,\text{order} \#39,\text{order} \#55\}, \{ \text{man}\}\rangle , \\{} & {} \langle \{ \text{order} \#11,\text{order} \#23,\text{order} \#39,\text{order} \#55,\text{order} \#135,\text{order} \#156\}, \emptyset \rangle , \end{aligned}$$

The last formal concept describes that all six orders have no common attribute. Symmetrically, there is no order with all attributes simultaneously (see first formal concept). However, we can see other meaningful closed groups of orders with common attributes.

Fig. 2
figure 2

Concept lattice in our running example [Each node corresponds to the formal concept that includes the subset of objects (orders) and the subset of attributes. The concept lattice includes eight formal concepts, which are partially ordered. The edges represent the relationships between two formal concepts based on subsets of objects/attributes]

Moreover, the set of all formal concepts of a formal context \(\langle B,A,R \rangle\) can be ordered by a partial ordering relation. The partially ordered set of all formal concepts is called the concept lattice. This provides a way to construct the hierarchy of formal concepts and visualize the relationships between the formal concepts. The concept lattice in our running example is shown in Fig. 2.

However, we include the isomorphic visualization of a concept lattice with the so-called reduced labeling of eight formal concepts from our running example (Fig. 3). For better readability, only the first appearance of each object from the bottom (by white label) and the first appearance of each attribute from the top (by shaded label) are highlighted in the concept lattice with reduced labeling.

Fig. 3
figure 3

Concept lattice with reduced labeling in our running example [The concept lattice includes eight formal concepts. Only the first appearance of each object from the bottom (by white label) and the first appearance of each attribute from the top (by shaded label) are highlighted. The attributes (objects) representing the selected formal concept can be obtained by collecting the shaded (white) labels leading to the top (to the bottom) of the concept lattice]

A concept lattice with reduced labeling illustrates the relationships between the subsets of orders regarding their common attributes. Consider white label with order #11 in the previous diagram. It represents the formal concept

$$\begin{aligned} \langle \{ \text {order} \# {11}\}, \{\text{man,delivery,bestseller} \}\rangle . \end{aligned}$$

If we select the white label with order #39, it represents the formal concept

$$\begin{aligned}\langle \{\text{order} \#{39}, \text{order} \#{55}\}, \{\text{man,pickup}\}\rangle.\end{aligned}$$

Thus, the attributes representing the selected order can be obtained by collecting the shaded labels leading to the top of the concept lattice.

If we consider the shaded label with the attribute man, it represents the formal concept

$$\begin{aligned} \langle \{ \text{order} \#{11},\text{order} \#{23},\text{order} \#{39},\text{order} \#{55}\}, \{\text{man}\}\rangle. \end{aligned}$$

Thus, if we select the shaded label of the attribute, the objects representing the selected attribute can be obtained by collecting the white labels leading to the bottom of the concept lattice.

Moreover, the relationships between subsets of orders can be linear or ordered by a partial ordering relation. For example, consider the order #11 with a delivery and the order number #39 with a pickup. They have a common attribute of man, but we cannot compare them linearly regarding the delivery method. Thus, they have a distance of one from the bottom element of the concept lattice with reduced labeling.

We will present the other concept lattices from our dataset in the following section devoted to our results. Note that our full dataset can be used as the input of this method since the gender, country, delivery method, and other attributes can be transformed into binary attributes. Moreover, FCA can provide another computational and visualization technique compared to decision trees and GUHA association rules.

In our paper, we applied FCA for two particular products of the fitness shop. The concept lattice of the first product is constructed concerning women’s orders. The second concept lattice will correspond to the gender, dayparts, and delivery method.

Results

This research has focused on analyzing customer behavior related to the classification of orders by time aspects (decision trees), to the dependencies between attributes (GUHA method), and the biclusters of customers given by ID of product (FCA). We present the results of this research in this section.

Results of decision trees

In the first part, we present the results obtained by a method of decision tree construction (Algorithm 1) implemented in Minitab tool. We used the entropy criterion for splitting or selecting attributes at each tree level. In particular, we obtained a decision tree illustrated by Fig. 4 for \(A=\{ \text{gross\_profit\_category,gender},\) \(\text{country},\text{delivery\_method,global\_bestseller,long\_term\_unavailable}\}\), and a binary target attribute \(b=\text{Q4\_period}\).

Fig. 4
figure 4

Decision tree for classification of 9123 orders by Q4 period [Country is the input attribute with the best splitting quality (based on entropy). The decision tree terminates in five terminal nodes, which determine the paths from the root of the decision tree regarding at least 62% of orders in Q4 (or Q1–Q3) classes]

The cross-validation with 5-fold was performed to assess the generalizability and reliability of the decision tree models. The accuracy of the decision tree was 77.6%. We found that 29.3% of customers ordered their products in the fourth quarter of the year (the root of the decision tree in Fig. 4). However, the proportion is only 16.1% if we select the customers from Bulgaria, Czech Republic, Germany, Croatia, and Ukraine, (left part of the second level of nodes of the decision tree in Fig. 4). Moreover, if the gross profit category of the product equals A, the proportion of orders in the fourth quarter is 8.6% (left part of the third level of nodes of the decision tree in Fig. 4). On the contrary, customers from Slovakia, Hungary, and Romania ordered 30% of their products in the fourth quarter of the year (right part of the second level of nodes of the decision tree in Fig. 4). If they selected the pickup method of delivery, their proportion is 34.3% (right part of the third level of nodes of the decision tree in Fig. 4).

We present four levels of nodes in our decision tree from Fig. 4 (including its root). However, additional levels were generated, as well. In the fifth level, we found that for customers from Bulgaria, Czech Republic, Germany, Croatia, and Ukraine, the gross profit category of B and long-term unavailable products, the orders were not booked in the fourth quarter (which is not shown in Fig. 4) due to lack of space and readability of decision tree in a graphical form.

The structure of the decision tree of the fourth quarter of the year implies that the most important attributes are country, gross profit category, and delivery method. Thus, the potential marketing strategies can include heterogeneous conditions for each country based on its representatives. The observed patterns can be applied to a larger population or other regions to find additional hidden patterns in data.

We present the second decision tree of the same set of input attributes and a binary target attribute \(b=\text{morning}\) in Figs. 5 and  6. The cross-validation with 5-fold was performed to assess the generalizability and reliability of the decision tree models. The accuracy of the decision tree was 71.2%. We found that 27% of customers ordered their products in the morning (first level of the decision tree in Figs. 5 and  6). The proportion of morning orders is 30.4% for women (second level of the decision tree in Figs. 5 and 6). However, if we consider women from Croatia and Hungary with the pickup method of delivery, we have 12.4% of orders in the morning.

Fig. 5
figure 5

Left part of decision tree for classification of 9123 orders by morning time [Gender is the input attribute with the best splitting quality (based on entropy). The left part of the decision tree determines the paths with women’s orders regarding at least 67% of orders in the morning or not in the morning class]

Fig. 6
figure 6

Right part of decision tree for classification of 9123 orders by morning time [Gender is the input attribute with the best splitting quality (based on entropy). The right part of the decision tree determines the paths with men’s orders regarding at least 67% of orders in the morning or not in the morning class]

The structure of the decision tree of the morning time implies that the most important attributes are gender and country (the first two levels of the decision tree). Thus, the potential marketing strategies can include heterogeneous conditions for men and women based on these findings. The observed patterns can be applied to a larger population or other regions to find additional hidden patterns in data.

Results of GUHA method

In Table 1, we present selected GUHA association rules for 4 categorical attributes of delivery_method, gender, gross_profit_category, and category_ID at antecedent and for 3 categorical attributes of day_part, quarter_of_year, and day_name at consequent. In particular, we obtained 250 association rules for 0.7-implications. Moreover, we selected the non-redundant GUHA association rules in Table 1. For example, the second GUHA association rule can be interpreted as follows. Our generated rules provide meaningful insights into customer preferences, patterns, or tendencies. For example, for woman with gross profit category A and product ID1923, the order was booked in the Saturday morning in the third quartal with 100% confidence.

Table 1 GUHA association rules for online customer behavior

Association rules typically include measures of confidence and support. Confidence indicates the conditional probability of the consequent given the antecedent, while support represents the frequency or prevalence of the rule in the dataset. The confidence and support of GUHA association rules are specified by the values of the last two columns in Table 1. Higher confidence values indicate stronger associations, while higher support values suggest more frequent occurrences.

The rules in Table 1 were selected to avoid repetition and improve the association analysis’s clarity and interpretability. The obtained association rules can be generalized to broader populations or contexts. In particular, the results show the typical period of day or week when men or women order particular products. The identified associations can be translated into actionable insights or strategies to improve marketing, sales, or customer service. The business can leverage the rules to enhance customer experiences, optimize product offerings, or tailor promotional campaigns based on the periods obtained by association rules.

Results of FCA

Based on FCA, we generated a formal context of 9123 objects (correspond to the orders) and 28 binary attributes. We transformed the original categorical attributes from our dataset into two binary attributes of gender, eight binary attributes of the country, two binary attributes of delivery method, three binary attributes of gross profit category, seven binary attributes of day name, and six binary attributes of day part.

In Fig. 7, we present the concept lattice with highlighted attributes for the product ID1923 ordered by women from Slovakia and Czech Republic with gross profit categories A, C. The concept lattice is illustrated by ConExpFootnote 1 tool. This concept lattice includes nine nodes which represent nine formal concepts. The shaded labels of attributes \(\text{ID1923, women}\) linked to a node at the top of the concept lattice represent the objects we selected for our analysis. The shaded labels of attributes closer to the top element in the concept lattice are related to the attributes more often present in the objects. We can read the dependencies between attributes by the paths in the concept lattice. For example, the Friday and Monday lunch orders are from Slovakia and category A (left part of concept lattice in Fig. 7). The Sunday orders from Czech and category C were booked in the morning (right part of concept lattice in Fig. 7). Paths connecting attributes and objects indicate the presence of specific combinations or patterns. For example, in Fig. 7, the path from women to Friday lunch orders indicates that women are more likely to place orders on Fridays during lunchtime.

Fig. 7
figure 7

Concept lattice with reduced labeling of product with ID1923 ordered by women [The concept lattice includes eight formal concepts. The shaded label highlights the first appearance of each attribute from the top of the concept lattice. The attributes representing the selected formal concept can be obtained by collecting the shaded labels leading to the top of the concept lattice. The size of black and blue nodes with one distance from the bottom of the concept lattice expresses the proportion of the orders in the particular node.]. (Color figure online)

We present the concept lattice with highlighted objects and attributes of the product ID1912 in Fig. 8. Each object (i.e., white label in the concept lattice) is represented by the attributes (the shaded labels in the concept lattice) collected on all paths leading to the top of the concept lattice from the selected order node. The white labels (i.e., consumer orders) correspond to the groups of similar orders regarding the selected attributes. Each order and each attribute is shown in the concept lattice only once (so-called reduced labeling) for better readability. Three sets of orders which have distance one from the bottom of concept lattice (i.e., \(\{9078, 8932, 8931, 8897, 8896, 8895, 8217, 8063, 7896\}\), \(\{8237\}\), and \(\{8804,7881,7880\}\)) and represent three main groups of order representatives such that there is no order with the superset of their attributes. For example, the group of orders \(\{8804,7881,7880\}\) is from men, classical delivery method, and evening since they are the attributes which are collected on all paths leading to the top of the concept lattice from the selected group. The path from the man to the morning indicates that men will most likely order this product in the morning. Analyzing these identified groups can be valuable for targeted marketing, personalized recommendations, or understanding customer preferences.

Fig. 8
figure 8

Concept lattice with reduced labeling of product ID1912 [The concept lattice includes 12 formal concepts. The shaded label highlights the first appearance of each attribute from the top of the concept lattice. The attributes representing the selected formal concept can be obtained by collecting the shaded labels leading to the top of the concept lattice. The size of black nodes expresses the proportion of the orders in the particular node. Black and white nodes indicate no shaded attribute labels at these nodes]

In addition, the orders covered by the particular attribute can be obtained by collecting the white labels leading to the bottom of the concept lattice from the selected attribute node. For example, we can see that afternoon is fulfilled by orders leading down from this node, i.e., \(\{7384,8187,8186,8237\}\). The thickness of edges represents the proportion of objects in the specific part of a concept lattice. The use of reduced labeling, highlighting, and shading can enhance the readability of the concept lattice and make it easier to interpret.

Discussion and limitations

Our current research has been accompanied by a few limitations that need to be taken into account in the application of the results.

Decision trees

Decision trees are recognized as a prominent technique in machine learning, primarily attributed to their inherent simplicity, transparency, and immediate efficacy. However, certain inherent constraints are their widespread adoption:

The propensity for overfitting

Particularly profound in deeper trees, there is a conspicuous tendency for overfitting, a scenario where the model inadvertently assimilates the noise from training data, undermining its potential to generalize effectively to unobserved data. Such concerns have been corroborated in numerous scholarly investigations. For example, an exploration by Garcia Leiva et al. (2019) underlined this persistent challenge, proffering an innovative method as a potential countermeasure (Garcia Leiva et al. 2019).

Extraction of attribute importance

Directly asking about the importance of attributes in decision trees can involve several drawbacks, especially in fields like customer satisfaction surveys (de Oña et al. 2016).

Predefined structure

Decision trees may struggle with learning the relations between labels without a predefined structure, which can lead to challenges in multi-label classification (Lotf and Rastegari 2020).

Evolutionary design

The evolutionary design of decision trees requires specialized approaches and recognized specializations, which can complicate the design process (Podgorelec et al. 2013).

GUHA method

GUHA method can be appropriate for finding dependencies between attributes, whereby the attributes are divided into two groups (antecedent and consequent). We can numerically evaluate the power of each rule based on several interest measures.

Scalability

An important issue of data analysis with GUHA method can be the high number of generated p-implications. Most of the generated implications are only consequences of already found patterns of domain knowledge. The simplification of rules or reduction of redundant association rules can be explored in this area. The selection of correct methods and parameters for dimensionality reduction can be challenging.

Interpretability

Interpretability can be seen as the degree to which people can understand the inducement of a decision. The high interpretability of GUHA methods can be demonstrated in various applications from a healthcare domain, medical data analysis, environmental issues, or travel time predictions.

Potential challenges

Real-world data often contain outliers or noise. Thus, the GUHA association rules can discover irrelevant implications. GUHA methods should be robust to noisy data. Moreover, missing data can cause the extraction of incomplete implications and results with bias.

FCA method

FCA is based on the theory of lattices, and it provides the construction of concept lattices with hierarchical dependencies between objects and attributes. It can be used for exploring specific products regarding the characteristics of customers or time aspects of orders. The advantage is the possibility to graphically illustrate a concept lattice.

Scalability

FCA is an appropriate method for datasets with one big dimension (objects or attributes) and a small or medium second dimension. However, several fruitful methods and approximations (e.g., the process of attribute exploration or pruning algorithms) can be used to reduce the computational complexity in the case of big datasets.

Interpretability

FCA provides a great visual tool for data interpretation. The precise formal definitions of input, operators, and output and the hierarchical structure of concept lattice contribute to its interpretability, making it a powerful technique for understanding heterogeneous data relationships and patterns in an intuitive way.

Potential challenges

One of the potential challenges in FCA is the construction of a formal context by the selection of the proper attributes and objects. If data are extracted from the database and the attributes are not systematically transformed, it can impact the results. However, several novel approaches and methods are proposed to reduce this potential challenge. Moreover, the area of fuzzy approaches in FCA has been thoroughly explored by several research groups in the world. Each of these approaches has advantages, allows to work with different mathematical structures, and can be applied for different tasks and applications.

Association rules

In academic discourse on data mining, association rules emerge as a widely recognized approach for discerning interrelations among database variables. Nevertheless, every methodology is tethered to its inherent drawbacks. The following encapsulates the prevalent limitations tied to association rules:

Volume of rules

Association rule algorithms can generate many rules, making it challenging to identify which ones are significant or beneficial (Agrawal et al. 1993).

Lack of causality

In academic discourse, it is imperative to clarify that association rules merely elucidate correlations and do not infer causation. The co-occurrence of two items with regularity does not necessarily imply a causal relationship between them (Tan et al. 2006).

Threshold dependency

The results heavily depend on the chosen threshold values for support and confidence. Selecting these parameters involves some subjectivity and can significantly influence the identified rules (Ghafari and Tjortjis 2019).

Static nature

Association rules are predominantly derived from static datasets. Consequently, their capacity to adapt effectively to evolving data or shifts in trends over temporal sequences may need to be improved (Wang et al. 1997).

Lack of context

These rules sometimes overlook the context of the transactions, which can lead to misleading associations that do not align with the true meaning of the data (Meruva and Bondu 2021).

Scalability issues

As the size of the dataset expands, the computational difficulty in generating association rules can become overwhelming. This issue is particularly acute without optimized algorithms or sufficient computational resources.

Limitation of research

Our study faces a challenge in accurately identifying the gender of each customer. The company employed two methods: a strict definition of gender for specific data and a predictive model for ambiguous cases. Although the model is generally reliable, with an 85–95% accuracy in 99% of instances, we cannot fully confirm its precision. However, we have accepted the model’s predictions as conclusive for our research. Another issue is the inconsistency in name formats across purchases, as our company operates in various European markets, not all of which use the euro. For example, Hungary, the Czech Republic, and Romania use different currencies, necessitating the conversion of their orders to euros using the current exchange rate. This process, while necessary, introduces minor inaccuracies due to exchange rate fluctuations over time, particularly noticeable from 2014 to 2019. To mitigate this, the company should adopt a real-time conversion system that adjusts non-euro transactions to their euro equivalent at the prevailing rate, which would more effectively correct this limitation.

Recommendations

Analyzing consumers’ online purchasing habits can provide many advantages for commercial entities, marketing professionals, and, indeed, the consumers themselves. The following elucidates the prospective benefits derived from such comprehension. As an example of potential implications for practice, we will talk about a fictional mid-sized e-commerce company wants to increase its market share and customer loyalty. The company has a diverse product range and a broad customer base but faces stiff competition from larger online retailers.

Personalized shopping experiences

A comprehensive analysis of a consumer’s digital purchasing patterns enables enterprises to tailor shopping experiences to individual clients, suggesting products that align with their preferences.

Example of application

Objective To implement personalized marketing strategies that enhance customer engagement, increase conversion rates, and boost sales.

Data collection Collect data on customer demographics, past purchase history, browsing patterns, and engagement with previous marketing campaigns.

Data analysis Use the mentioned machine learning algorithms to segment customers into distinct groups based on their behavior and preferences.

Implementation Develop personalized email marketing campaigns that recommend products based on past purchases and browsing behavior.

Measurement Track key performance indicators (KPIs) such as open rates, click-through rates (CTRs), conversion rates, and average order value.

Improved marketing efficiency

Studying consumer behavior offers crucial insights to marketers, guiding them in allocating resources wisely, targeting specific customer groups accurately, and focusing on promoting particular products.

Example of application

Objective Optimize and streamline budget spending for marketing campaigns and improve performance metrics.

Data collection Based on the time stamp of the order, create new time variables using BI tools to help specify the time aspect of customer behavior more accurately.

Data analysis Using a combination of decision trees, GUHA association rules and especially FCA analysis, create a time profile of individual products within customer segments.

Implementation The company can enhance its marketing strategies by targeting ads more effectively at potential customers. The company can schedule ads strategically by understanding which products specific customer groups are inclined to purchase at certain times through customer behavior analysis. This precise timing for ad displays can significantly improve the cost-efficiency of their marketing spend.

Measurement To measure the cost-efficiency of a campaign, the company should consider several key metrics that can help you evaluate its performance: Cost Per Click (CPC), CTR, Conversion Rate or Return on Ad Spend (ROAS).

Predictive analytics is an area of sophisticated analytics that leverages past data, statistical algorithms, and machine learning methods to forecast future events.  It aims not only to understand past events but also to deliver a well-informed estimate of future occurrences.

Example of application

  • Objective To utilize predictive analytics to forecast demand, optimize inventory levels, and personalize the shopping experience.

  • Data collection

    • gather historical sales data, including seasonal variations and sales trends,

    • track customer purchase patterns, including frequency, quantity, and types of products purchased,

    • monitor external factors such as local events, holidays, and weather patterns that may influence shopping behavior

  • Data analysis

    • use time-series analysis to understand sales trends and predict future demand for different product categories,

    • apply machine learning models to identify patterns in customer purchase behavior and predict future buying trends,

  • ImplementationCreate personalized shopping experiences by predicting what products customers likely need based on their purchase history and predictive models.

  • Measurement

    • monitor KPIs such as stock turnover rates, sell-through rates, markdown percentages, and customer satisfaction scores,

    • continuously refine predictive models with new data and feedback to improve accuracy,

    • use A/B testing to compare outcomes with and without predictive analytics to measure the impact.

Dynamic pricing strategies

Dynamic pricing strategies are a strategy where businesses set flexible prices for products or services based on current market demands. (Neubert, 2022) It is widely used in various industries, from hospitality to retail, and is particularly prevalent in online marketplaces.

Example of application
  • Objective Implement a real-time dynamic pricing strategy that adapts to market demand, competitor prices, and consumer buying trends.

  • Data collection Implement user behavior analytics tools to track how customers interact.

  • Data analysis

    • gather historical sales data to understand how price changes affect demand,

    • monitor competitor pricing for similar products in real time,

    • track inventory levels to understand supply constraints,

    • analyze customer data for price sensitivity.

    Implementation Use predictive analytics to forecast demand for different products at various price points.

  • Measurement

    • monitor sales volume, profit margins, and price competitiveness KPIs;

    • continuously refine the pricing algorithm based on sales performance and market conditions.

Conclusion

In this paper, we presented the analysis of real e-shop consumer behavior data in eastern Europe, which contains data from 9123 orders of sports nutrition, healthy foods, fitness clothing, and accessories from 2014 to 2021. For analysis, we applied the methods of decision trees, GUHA association rules, and Formal concept analysis. We found that each of these methods provides various possibilities to have a greater look at consumer behavior of online e-shops.

In particular,

  • Decision trees can help to classify the online customer orders with selected target attribute in data, e.g., time aspect or selected categories of customers.

  • GUHA method can be appropriate for finding dependences between attributes of online customer behaviors, whereby the attributes are divided into two groups (antecedent and consequent).

  • FCA method can help to explore and visualize the specific properties of products regarding the characteristics of customers or time aspects of orders.

Analyzing consumers’ online purchasing habits can provide many advantages for commercial entities, marketing professionals, and consumers. Regarding the potential implications for practice, we discussed the examples of applications of personalized shopping experiences, improved marketing efficiency, predictive analytics, or dynamic pricing strategies.

In our future work, we aim to explore other machine learning methods of supervised learning (classification and regression) and unsupervised learning (clustering, attribute implications, or association rules) to analyze online consumer behavior.