1 Introduction

The term cloud-native describes a concept of developing and operating applications in a way that aims to maximally exploit the benefits of modern cloud computing environments [1]. The type of applications we refer to in this context are web applications implemented by organizations specifically to serve their customers or to satisfy internal needs via either internal networks or the internet. This covers mainly, but not exclusively, so called enterprise applications as discussed by Cerny et al. [2] who rely on a description by Fowler [3]. With the term application we refer to such web applications in this work. We differentiate them from other software described as being cloud-native, like the software listed in the Cloud Native Computing Foundation (CNCF) Landscape.Footnote 1 That software may enable the development of or is used in cloud-native applications, but these are not cloud-native applications themselves in the sense of this work.

The implementation of such cloud-native applications covers a broad range of aspects: the choice of cloud service offerings and technologies, the functional decomposition of business logic over components, setting up the communication between components, or managing the deployment and upgrade process of applications [4, 5]. When applied correctly, many benefits are associated with cloud-native applications, such as increased performance and scalability, cost effectiveness, reliability or improved maintainability [6,7,8]. Therefore, when aiming to implement an application in a cloud-native way or evaluating how an existing application could benefit from cloud-native concepts, the question arises how this can be done in an informed and structured way. Because of the thematic breadth of cloud-native and the multiple, potentially conflicting, desirable benefits, this presents a challenge.

To take up this challenge from a conceptual perspective, the essential question is how certain architectural characteristics (those associated with cloud-native concepts) impact the observable behavior of an application. As architectural characteristics we consider design time properties of an application that are the result of the decisions made by its developers. This covers all implementation aspects, such as technological options or software design. As observable behavior we consider runtime properties of an application that are measurable when the application is running and being operated. Runtime properties therefore indicate how well an application fulfills its intended functionalities. This is in general also summarized as the quality of an application. The relationships between architectural characteristics and observable behavior, however, can be complex and different architectural characteristics can also have interfering effects. But if these relationships can be described in a structured and comprehensive way, it would represent the necessary knowledge to evaluate application architectures according to quality aspects. This knowledge would thus satisfy the conceptual perspective.

But to actually support developers and architects in designing and implementing applications in a cloud-native way, also the practical perspective needs to be considered. From a practical perspective, the additional question would be how this knowledge can be integrated and made useful within suitable tooling.

Based on these considerations, we see a hierarchical quality model for software architectures of cloud-native applications as a suitable approach. It can both describe the relationships between architectural characteristics and quality aspects and be integrated into practical tooling for quality evaluations.

This hierarchical structure is one of the factors that differentiates our approach from other existing work. Apel at al. [9] for example present an approach with a similar goal, but directly linked metrics to quality aspects. In addition, they focus on microservices, instead of the broader cloud-native scope. Also focusing on microservices, Soldani et al. [10] present an approach to evaluate and improve microservices architectures, but orient their approach towards best practices instead of quality aspects. Another differentiating factor is the consideration of multiple quality aspects in combination. Also focusing on microservices, for example Camilli et al. [11] present an approach to specifically evaluate the quality aspects performance and reliability. Thus they link runtime metrics directly to quality aspects, but focus less on architectural characteristics that lead to the gained results. Although we can differentiate our work from the mentioned approaches, showing its novel contribution, they are related. Therefore, we aim to also integrate their results and recommendations into our approach.

For a solid foundation to build a hierarchical quality model, we rely on the structure as described by the Quamoco meta-model [12]. In this meta-model, quality aspects differentiate distinct dimensions of quality and are structured in a hierarchy together with product factors. Product factors represent characteristics of a software. The hierarchy is built through impact relations which describe how the presence of product factors has an effect on quality aspects. Such a quality model is therefore suitable for describing how architectural characteristics of an application impact its quality aspects conceptually in a structured way. Product factors can then be linked to entities, that means distinguishable components of a software architecture, to enable their measurement. And thus, a practical evaluation of a software architecture according to the relationships of a quality model becomes possible.

Our overall aim therefore is to formulate a quality model for software architectures of cloud-native applications. This essentially means defining all the different elements required for a complete quality model: quality aspects, product factors, impact relations, entities, measures, and evaluations.

In this work we focus on the formulation and validation of all mentioned elements, except evaluations. Thus, we present our approach from an integrated, holistic perspective. This includes content from previous work. On the one hand, our initial formulation of quality aspects and product factors [8]. And on the other hand, our work on validating this initially formulated quality model [1] to which this work specifically represents an enhanced version. Validation in this context means the assurance that the relations described in the quality model reflect reality as closely as possible. Furthermore, we expand on both previous works by further defining the elements of the quality model and outline how software architectures can be suitably represented.

The guiding research questions which summarize these aims and considerations are:

RQ1: How are characteristics of cloud-native applications related to quality aspects of distinct dimensions?

RQ2: How can a quality model for software architectures of cloud-native applications be formulated and validated?

To answer these research questions, we start with foundations of quality models in Sect. 2, based on which we present our methodology for this work in Sect. 3. The results of applying this methodology, especially the current state of the quality model, is presented in Sect. 4 and further discussed in Sect. 5. The relation to other work is described in Sect. 6, before a final conclusion in Sect. 7.

2 Foundations

In this section, relevant foundations are presented to more clearly specify the type of quality model we consider (hierarchical). And we explain how quality models can be formulated and validated and how these activities are related to each other.

2.1 Hierarchical quality models

The history of quality models is tightly connected to the history of software engineering itself [13], Their core idea is to provide a conceptual foundation for linking measurable characteristics of a software with quality aspects. If the quality of software can be measured and assessed, then it is also possible to show how software quality can be improved by changing the characteristics of software. In earlier days individual metrics have been used to measure for example the complexity of source code [13]. But it has become apparent that relations between characteristics of software and its quality are more complex. Therefore, quality models such as those of McCall [14] or Boehm [15] were structured as hierarchical quality models [13]. In these higher-level quality aspects can be differentiated into quality sub-aspects for which in turn it is easier to formulate relations with characteristics of software. Hierarchical quality models are therefore essentially representable through graphs. The higher level quality aspects are independent nodes in such a graph and the other elements have effects on these quality aspects. These effects are the edges of the graph. Through this a hierarchical quality model is able to capture the complexity of how software characteristics influence quality and it is also possible to consider several aspects in relation to each other.

Work on software quality models has been consolidated in standards with the ISO 25000 series [16] being the latest. Because it considers software quality on a rather abstract level, it has since been used as a basis for formulating new quality models [13]. And it can be used to compare and differentiate quality models. A main differentiator for quality models is the scope that they cover. The scope of a quality model is defined by the number and heterogeneity of its elements. Heterogeneity can be considered regarding the number of different quality aspects included, often referring to the number of different high-level quality aspects from the ISO 25000 standard [16]. And heterogeneity can also be observed for the aspects covered by the other elements of the quality model, also referred to as the dimensions [17, 18]. Different dimensions can for example be static characteristics of an application at different layers like the infrastructure, application logic, or supported business functionality, runtime behavior, development process characteristics, or also organizational aspects.

Since the overview paper on quality models by Ferenc et al. [13], further quality models have been proposed and in turn been reviewed by researchers. It is thus worthwhile to analyze such reviews for insights on important aspects of quality models. Reviews exist for quality models in general [17, 19, 20], but also for quality models in specific areas, for example web service [21]. One finding is that from the high-level quality aspects, mostly maintainability, reliability, and performance efficiency have been in focus of quality models [19]. Another finding by Galli et al. [20] is that although many quality models have been proposed in literature, their relevance, also regarding the practical use in the industry, differs significantly. The quality models with the highest relevance, as reported by Galli et al. [20], are the ISO 25000 standard [16] with its predecessor ISO 9126, the SQALE model [22] and the Quamoco model [12]. The respective reasons, however, are different. The ISO 25000 standards [16] popularity is because of its simplicity and the possibility to adapt it to more specific contexts [20]. The popularity of SQALE [22] is based on its widespread tooling support, e.g. within Sonar,Footnote 2 which facilitates its integration into the development process [20]. And the importance of Quamoco [12] lies within the extensive research it is based on and the formulation of a comprehensive meta model for hierarchical quality models that new quality models can build on.

The importance of effective tooling support for the adoption of quality models as in the case of SQALE is also stated by Yan et al. [19]. Related to the aspect of tooling, an additional success factor for the adoption of quality models is the completeness of definitions for its elements. To analyze this, Nistala et al. [17] differentiate between different phases of when quality models can be used (planning, realization, documentation, and assessment) and then map the type of elements of quality models to the phases in which they are used. Through this, they identify significant differences for the definition completeness of quality model elements. Furthermore, they observe a gap regarding the mapping of certain software properties and processes to the quality aspects in focus. These are, however, important for the practical application of a quality model. The aspect that definitions of quality model elements are not always complete is also stated for web service quality models by Oriol et al. [21].

Major aspects are therefore the elements of a quality model themselves and a suitable tooling support. Considering the elements of a quality model, the mentioned Quamoco meta model [12] from our perspective provides the most comprehensive foundation to build on. The main elements of the Quamoco meta model are shown in Fig. 1 and explained in the following.

Fig. 1
figure 1

Elements of the Quamoco meta model [12]

The core elements of the Quamoco meta model are factors which are differentiated into quality aspects and product factors. Quality aspects represent the rather abstract quality characteristics for example from the ISO 25000 standard. Product factors capture characteristics of a software which should be measurable for a specific software based on measures. Through the impact relationship, it is possible to define a hierarchy of factors where quality aspects define the top-level elements and product factors are structured below. The impact relationship can be defined coarsely, for example, as being positive or negative, describing how a factor impacts another factor, if that factor is present in a software. More detailed statements about how certain product factors impact quality aspects, also using mathematical approaches, can be made using evaluations. Evaluations can be considered as a kind of rules that configure the exact way in which the quality of a software system can be evaluated using measures, product factors and quality aspects. Evaluations can also be set in relation to each other, for example to enable aggregations of evaluations. Finally, to connect the quality model with the actual software under consideration, entities are used that represent different components of a software system. The extent to which product factors are present in a certain software system is characterized through the specific set of entities which describes this software system. Based on how these entities are structured and which properties they have, product factors can be measured, in the end enabling a quality evaluation of the software system.

In the rest of this article we use the terms introduced by the Quamoco model to describe our approach for formulating and validating a quality model for cloud-native applications.

2.2 Quality model formulation

With Formulation we refer to the process of creating a new quality model. In essence, this means that elements corresponding to the types of the Quamoco meta model [12] need to be formulated and set into relation. There are several ways to do this and some constraints that need to be considered.

First, the scope of the quality model should be decided on. That means for example what kind of software is considered, at which layer it is considered and also which quality aspects are of interest. Only if the scope is defined, it is possible to decide which elements to include in the quality model and which not. The quality aspects and product factors should be defined first, because they represent the core of the quality model. Entities, measures and evaluations should be defined in a second step, suitable to the factors. In the end, completeness of a quality model needs to be ensured so that all factors are defined in a clear and understandable way, measures are available to quantify factors, and evaluations enable the quality assessment of a system. Considering the structure of the quality model, AL-Badareen et al. [23] have investigated rules and constraints to consider when formulating quality models. One exemplary rule is that there should be only one path from an element to a certain top-level quality aspect [23] in order to keep evaluations regarding this quality aspect unambiguous. The most important aspect, however, for a quality model to be useful and of good quality itself, is that the conditions and relations it describes reflect reality as closely as possible. Although an exact reflection of reality is impossible, because a model per its definition has to abstract from the real world, it should be ensured that software quality relations are modeled realistically. This is also stated by Wagner et al. who emphasize the importance of a Rationale [12] for each stated impact relation between factors of the quality model. The respective rationale included in a quality model is a direct result of the approach used for formulating a quality model. Different approaches have been listed by Moody [24], including for example: theory-based (deductive), experience-based (codification), observation-based (inductive), or consensus-based (social). Based on the available possibilities, a suitable choice needs to be made on the approach used for formulating a quality model. In any case, the chosen approach should be documented and explained so that it can be reproduced and evaluated by others.

2.3 Quality model validation

As stated in the previous section, a quality model necessarily abstracts from reality, but should nevertheless reflect reality as closely as possible for meaningful assessments of software. Ensuring this reflection of actually observable quality impacts is therefore an important property of a useful quality model. We refer to this assurance as validation [19, 25] in the sense that the theory underlying the formulation of quality model elements is validated using suitable approaches. There are several aspects of a quality model that can be validated and validations can in turn be also done in several ways. In previous work [25] we have investigated validation approaches and scopes for quality models in more detail. An important aspect that emerged is that not all types of validation are possible during all phases of the creation process of a quality model. Especially in an early phase, when only quality aspects, product factors, and their respective impact relations are formulated, a complete validation of the quality model based on an assessment of software is difficult because of the missing measures and evaluations. Nevertheless it is possible to validate these formulated elements, but a suitable approach needs to be chosen. To decide which validation approach is a suitable one, it is necessary to check whether an approach can accomplish the goal of a successful validation. And that goal is the confirmation of relations and statements defined during the initial formulation of a quality model based on a different type of rationale than initially used. For example, if a quality model has been formulated based on experience by experts, it could be validated using an experimental approach relying on measures of an actual software system. Or, if a quality model has been formulated based on theory, e.g., relying on literature, it could be validated through an empirical survey (consensus-based) which is also the scenario that is considered in this work. Approaches that do not require the complete formulation of all elements of a quality model are for example consensus-based empirical approaches, such as interviews or surveys. These can be used to validate factors by asking for the clearness of their descriptions or their applicability to software systems. Or they can be used to validate the formulated impact relationships between factors by asking for the type and strength of impact certain factors have on other factors. Although using a somewhat different categorization of validation approaches, Yan et al. [19] found this to be the most used validation approach, called expert-opinion by them.

An iterative approach of formulation, validation, and refinement through new formulations can therefore be taken and validation can be applied repeatedly and from different perspectives. As a general rule it can be stated that the more validations from different perspectives are performed, the more confidence exists that the quality model is appropriate and usable. Validations during the creation process should thus also be complemented by validations once all elements of a quality model are defined. This explicitly includes experimental approaches during which quantitative measures should be taken to evaluate the factors for a specific software system and compare them with additional measures, for example taken at runtime. Validation in this regard has also been named as the “representation condition” by Galli et al. [20] which they define as the assertion that “properties of real-world entities measured are mapped in numeric representations in such a way that the numeric representations are equivalent to the reality” [20].

3 Methodology

Building on the presented foundations regarding quality models, we explain in this section our methodology for formulating and validating a quality model for software architectures of cloud-native applications. This methodology integrates the previous works [8] and especially [1] to which this work represents an extended version. In addition, we have developed the quality model further by repeating and therefore updating the literature search for suitable measures from [8]. Furthermore, we provide a more detailed view on the refinement of the quality model done after the validation survey presented in [1]. To provide a better overview, we have summarized our methodology in Fig. 2 and explain the steps which can be seen there in the following.

Fig. 2
figure 2

The overall methodology of our approach

Step A: First, we have formulated the quality aspects and product factors relevant to cloud-native applications together with their connecting impact relationships. To do so, we relied on literature, namely the ISO 25000 standard [16], existing definitions for the term cloud-native, and practitioner books on the topic. This has been done and presented in previous work [8].

Step B: In addition to the factors of the quality model, we have also performed a literature search to identify suitable measures that can be used to evaluate the formulated factors. This has also been done in the mentioned previous work [8]. We found measures, especially for the factors grouped under the maintainability quality aspect, but for many factors suitable measures still need to be defined.

Step C: Because we wanted to validate the elements formulated up until this point before proceeding further with the work on the quality model, we performed a survey to validate the existing elements. The focus was specifically on the impact relationships between product factors and quality aspects. The survey and its result have been presented in [1] and it led to a refinement of the quality model elements. This refinement is described in Sect. 3.1.

Repeat Step B: To further develop the quality model in this work, we have repeated and revised the literature search for measures to identify additional measures from newly published work. How we repeated and revised this search is described in Sect. 3.2.

Step D: In parallel to the validation of the existing entities, we prepared the actual evaluation of software architectures by formulating suitable entities to which factors can be linked. In addition, we reviewed modeling options for describing software architectures of cloud-native applications. This review and initial tooling support has been presented in [26].

Step E: (Planned in future work) Based on the found measures and the development of tooling support, we plan to validate the quality model from additional perspectives. In specific, we plan for an experimental approach with software architectures of applications that serve as use cases.

3.1 Validation survey (Step C)

Because the initial quality model was formulated based on literature [8] and only to the extent of quality aspects and product factors, we chose to use an empirical survey-based approach for the validation of the formulated elements. A detailed description of how the survey was prepared and conducted can be found in [1]. In this work, we only repeat the main aspects and focus on how the survey results were used to refine the quality model.

As survey participants we targeted IT professionals who have practical experience with implementing and deploying web applications on cloud infrastructures. We contacted potential participants based on relevant publications or talks they had given, social media groups and communities with a fitting topic, as well as personal contacts. The basic approach of the survey was to build pairs of product factors (i.e., architectural characteristics of a cloud-native application) and quality aspects (i.e., the sub-characteristics from the ISO 25000 standard [16] such as Availability, Fault Tolerance, or Modifiability). In total, 45 product factors and 24 quality aspects were considered. For each product factor the survey participants were then tasked to freely rate potential impacts on any of the quality aspects. The impacts stated in the initial quality model were intentionally not shown to the participants to avoid bias in their answers. Impacts could be rated on a 5-point scale of:

  • factor has positive impact (++) on quality aspect

  • factor has slightly positive impact (+) on quality aspect

  • factor has no impact (0) on quality aspect

  • factor has slightly negative impact (−) on quality aspect

  • factor has negative impact (−−) on quality aspect

Thus, the survey resulted in a set of impact ratings stated for each pair of a product factor and a quality aspect. For each set we calculated a mean value using numeric values of + 2, + 1, 0, − 1, and − 2 for the respective impact ratings and a p value as an indicator for the significance of a result. The p value was calculated using the Exact multinomial test of goodness-of-fit [27]. We chose this test, because it is suitable when there are multiple values of one nominal variable (the different types of impact) and the sample size is small. For the test we assumed equal probabilities for each impact rating, except for the rating of 0 (no impact). Because “no impact” was selected as a default if a participant did not explicitly state any impact, we assumed it to be twice as likely as the other rating options. Based on the mean values and the p-Values, we refined the impact relationships of the quality model from two points of view: First, we iterated over the impact relations already stated in the initially formulated quality model and checked whether they could be confirmed by the survey results or had to be rejected. And second, guided by the significance indicator, we iterated over the impact relations not yet considered, but found through the survey, and decided whether to include them as new impact relations in the model or not. Orthogonal to these two points of view, we also considered each product factor individually with its rated impacts on quality aspects to identify factors which might need to be refined as such. In fact, a few factors showed to be ambiguous, because either significant impacts on several different quality aspects were found or because conflicting types of impact (that means positive and negative) were found for the same quality aspect. In these cases, we revised the factors as such and included the found impacts as new impact relations in the quality model. Revision could also mean splitting a factor into new separate factors which cover distinguishable aspects of the factor that showed ambiguous results. All refinements were based on the interpretations by us as authors, but in each case we also reconsidered the initially used literature to substantiate our decisions. To summarize, the following refinement actions were applied to the quality model based on the validation survey results:

  • Removal of an initially stated impact relation (17 times)

  • Addition of an newly found impact relation (16 times)

  • Reformulation of a product factor and selection of suitable impact relations (2 times)

  • Splitting a product factor in two or more new product factors which cover distinguishable sub-aspects of the original product factor (1 time)

Which refinements regarding specific impacts and factors were applied is also listed online.Footnote 3

3.2 Repeated literature search (Repeat Step B)

As mentioned, the overall goal of our work is to have a formulated and validated quality model for software architectures of cloud-native applications. Measures, in this regard, are one type of model elements needed to make factors measurable. Once suitable measures are formulated, additional types of validation can be applied including quantitative experiments. Because we identified a lack of suitable measures in previous work [8] and some time has passed since the initial literature search, we decided to repeat and revise this literature search. Through this, we aim to include literature published in the meantime and identify potentially missed literature. In a second search done on 2023-07-13 with exactly the same search terms as in the initial search, we included literature that had been published since the first search. However, based on a closer review of the search terms and results, we realized that papers may have been missed due to the usage of either singular or plural forms in literature. For example, in the initial search terms, we included the term “microservices”, but not “microservice”. Through this, the paper by Zdun et al. [28] was not in our results, although relevant metrics are presented in it. Thus, we revised the search terms and performed a third search on 2024-01-16. The revised search strings include one focusing microservices architectures as an architectural style commonly associated with cloud-native applications:

figure c

And a second search string focuses on cloud applications, especially their quality:

figure d

These search strings were adapted to search the ACM Digital Library,Footnote 4 IEEE Xplore,Footnote 5 and Springer Link.Footnote 6

The steps and numbers of papers considered from these searches are combined in Fig. 3. The number left of the + operator describes the results from the second search and the number right of it describes the results from the third search. The overall process is also described in more detail online.Footnote 7

As relevant results, we considered only literature in which:

  • a perspective is taken corresponding to that of application developers (in contrast to the perspective of cloud providers),

  • software architectures are considered on a level suitable to the quality model (in contrast to lower levels such as internal component design or higher levels such as the end-user perspective),

  • measures are presented that can be evaluated at design time (in contrast to runtime specific measures),

  • calculations for measures are specified clearly so that they can be adopted

Fig. 3
figure 3

Repeated and revised literature search results

As it can bee seen in Fig. 3, we could thereby extend the literature set describing suitable measures. 23 additional works that present measures suitable for the factors of the quality model were found so that in total 84 works can be considered for the quality model. It was encouraging that also measures could be found for factors for which previously no measures were found in the literature. We included these additional measures in the quality model which therefore represents an improved basis for quantitative validations relying on such measures. The successful identification of additional measures also shows the possibility of using an iterative approach for the formulation of a quality model in the sense that such a literature search can be done repeatedly to continuously update and refine a quality model.

The results of applying this methodology are presented in the following Sect. 4. It has to be noted that this approach is one possible option for the development of a quality model which we found suitable in the context of our goal. Other approaches for the formulation and validation of a quality model are also possible. Our description of foundations for the formulation and validation of quality models in Sect. 2 as well as the documentation of our methodology aim to make different approaches comparable.

4 The cloud-native quality model

In this section we present the current state of the quality model. Therefore the different types of elements are explained and then exemplary factors from the model are described in more detail. Using the examples, we also show how the formulation and validation steps lead to their current state.

4.1 Elements of the quality model

The quality model for software architectures of cloud-native applications is based on the Quamoco Metamodel [12] and in its current status, quality aspects, product factors, impacts, entities and partly measures are defined. This means the completion of providing measures to make product factors measurable as well as the definition of evaluations that enable meaningful analyses are still needed for quantitative evaluations of software architectures. In contrast, a qualitative evaluation based on a manual assessment of a software architecture regarding the product factors of the quality model would already be possible. We therefore present the quality model in its current form in this section and focus specifically on the different product factors of the quality model We describe how they should be understood, which rationale was used for their formulation and how a potential evaluation may use them. The quality model in visual form is shown in Fig. 4. It includes product factors in white boxes and quality aspects in gray boxes. The connecting arrows represent the impact relations with indicating a positive impact and indicating a negative impact. It is important to state that the focus of the quality model is on the design time of an application. Thus, the included product factors are intended to be evaluable at design time based on an architectural representation of an application. Additional aspects, such as characteristics of the deployment process, are also important within the topic of cloud-native, but out of the scope of this quality model.

Fig. 4
figure 4

The quality model for software architectures of cloud-native applications

However, a decision needed to be made, about what a suitable architectural representation of an application would be. While in general, applying the quality model directly to source code and deployment descriptions would be imaginable, it makes the implementation increasingly complex, because then a multitude of technologies would have to be supported. We therefore chose to rely on a model of a software architecture and formulated potential entities of such a model in [8]. Additionally, to rely on existing approaches, we reviewed currently available modeling options [26] for software architectures of cloud-native applications, as described in Sect. 3: Step D. The result are the entities shown in Fig. 5.

Fig. 5
figure 5

Entities proposed for the architectural description of cloud-native applications

These entities together with further properties describing them in more detail enable the modeling of software architectures of cloud-native applications so that these architectures can be evaluated. The formulation of the specific properties added to each type of entity strongly depends on the respective measures associated with the product factors. These properties together with the component structure provide the data basis for measuring. The definition of properties is therefore intended to be more flexible so that they can be adapted as required by the used measures.

4.2 Exemplary product factors

Based on the general description of the elements in Sect. 4.1, we now present three exemplary factors in more detail. These factors are also highlighted in blue in Fig. 4. We specifically describe how they were formulated, validated, and how they can be evaluated based on a cloud-native architecture considering required entities and measures:

Automated infrastructure provisioning This product factor is described as “Infrastructure provisioning should be automated based on component requirements which are either stated explicitly or inferred from the component which should be deployed. The infrastructure and tools used should require only minimal manual effort. Ideally it should be combined with continuous delivery processes so that no further interaction is needed for a component deployment.” and is considered to have positive impacts on both, Modifiability and Installability. The initial formulation is for example based on literature by Reznik et al. [29] who write that “Any manual work that is required in between the changes committed by the developer and the delivery to production will significantly reduce the speed of delivery” [29, Pattern: Automated Infrastructure (Chapter 10)] and therefore argue that through automated provisioning of required infrastructure, modifications are simplified and can be done more quickly. In the originally formulated quality model this aspect was subsumed in a product factor called Automated infrastructure, but the results from the validation survey [1] showed an ambiguity for this factor, because several impacts on different quality aspects were reported. This can be seen in Table 1 where selected results from the validation survey are listed. On the basis of the survey results we therefore refined the factor and split it up into two separate factors Automated infrastructure provisioning and Automated infrastructure maintenance which impact the respective quality aspects. What has to be noted is that we kept the positive impact from Automated infrastructure provisioning on Modifiability, although the survey results did not show a clear confirmation. The reason is that we used the survey results as a basis for refining the quality model and not as definitive results, because the sample was not representative and participants were also not able to provide a reasoning for their ratings. To evaluate the product factor Automated infrastructure provisioning, the infrastructure entities need to be considered and it has to be evaluated whether they can be provisioned automatically, that means with little or no manual actions. This could, for example, be the case when the infrastructure is managed by a cloud provider and can be provisioned once needed by a component. Another case would be the usage of Infrastructure as Code (IaC) approaches for which an additional product factor Use infrastructure as code exists. A suitable measure could then, for example, be the ratio of infrastructure entities which can be provisioned automatically as a basic indicator.

Table 1 Exemplary results from the validation survey [1]

Autonomous Fault Handling This factor is described as “In cloud-native applications services should expect faults at different levels and either handle them or minimize their impact by relying on the capabilities of cloud environments.”. It is a mediating factor for the product factors Invocation timeouts, Retries for safe invocations, and Circuit breaked communication. Circuit breaked communication has been mentioned regularly in the literature [30, Chapter 10.1], [31, Use Circuit Breakers for Nontransient Failures (Chapter 6)], [32, 3.2.3] and it can be used “to prevent components from doing operations that will likely fail and are not transient” [31] so that an alternative action can be used until the affected component is healthy again. Invocation timeouts [33, Chapter 3], [32, 3.2.3] and Retries for safe invocations [30, 9.1], [31, Chapter 6], [33, Chapter 3] are also based on literature. In the validation survey, however, we only considered Retries for safe invocations and Circuit breaked communication, because we assumed Invocation timeouts to be less specific to cloud-native applications. For both it can be seen in Table 1 that their positive impact on Fault Tolerance could be confirmed, although we had to mark the results as potentially valid due to a low number of responses. To evaluate these product factors, the link entities need to be analyzed to check whether these mechanisms are used and in addition for the product factor Retries for safe invocations, the endpoint to which a link is connected, needs to be analyzed for checking whether it is safe to be invoked multiple times. Suitable measures have already been proposed in the literature, such as Number of Links with retry logic or Number of Links with Complex Failover by Apel et al. [9].

Consistent centralized logging This factor is described as “Logging functionality should be concentrated in a centralized component which combines and stores logs from the components of a system. The logs should also be consistent regarding their format and level of granularity so that a correlation and analyzability of logs is facilitated.”. It has been formulated based on literature and covers aspects such as that logs are stored in a central component [31, Use a Unified Logging System (Chapter 6)], Logs are collected in a consistent format [31, Common and Structured Logging Format (Chapter 6)], or that functionalities for log aggregation and analysis are provided [30, Chapter 11.1], [32, Applying the Log aggregation pattern (11.3.2)]. The validation survey confirmed the positive impact on Analyzability, although the result had to be marked as potentially valid, because the p-Value was above the chosen significance level of 0.1. Furthermore, a positive impact on Accountability was added to the quality model based on the survey results. To evaluate this factor, the backing service is relevant, since a potential logging service would be modeled as such and then it could be checked which components provide logs to this backing service, for example by sending logs to a certain endpoint. As a suitable measure, the Ratio of Components or Infrastructure nodes that export logs to a central service by Ntentos et al. [34] could be used.

4.3 Support through measures

The support of the product factors through measures is a key aspect for enabling quantitative evaluations of software architectures. As described in Sect. 3.2, we have repeated the initial literature search for suitable measures and could identify additional measures for product factors of the quality model. This included also product factors for which we previously had not found any metrics, such as Guarded Ingress for which Ntentos et al. [34] proposed a metric called Ingress Traffic Control utilization metric (ING) which we slightly adapted to our model as Ratio of endpoints whose ingress is guarded. This adaptation was necessary because of the differing approaches for representing software architectures. Ntentos et al. [34] have developed their own modeling approach and for our model we have proposed the entities shown in Fig. 5.

As a general observation regarding the architectural measures proposed in the literature, we can state that a main focus has been on maintainability aspects, especially regarding coupling and cohesion [35, 36]. But more recently also measures in other areas such as security have been proposed [34].

Regarding measure support for the product factors of the quality model, there are two aspects which need to be addressed. On the one hand there are product factors for which no or only a few suitable measures have been proposed in the literature yet. For these, it is necessary to propose and validate new measures. And on the other hand, there are product factors for which numerous measures have been proposed, partly being very similar in terms of what they measure. For these, it is necessary to make a selection of meaningful measures that reflect the product factors in a suitable and understandable way. It has to be noted that so far only measures that have been already presented in the literature have been included in the quality model. As a next step further measures need to be formulated and validated in a structured way.

5 Discussion

In this section we want to reflect on the current state of the quality model, provide answers to the initially posed research questions, and outline future work regarding the formulation and validation of the quality model.

An interesting aspect that came up during our work on the quality model is how product factors and design patterns are related. Many of the practitioner books [30, 32, 33, 37] present patterns to achieve cloud-nativeness and also in the scientific literature the evaluation of the quality of an architecture is often based on patterns [36, 38,39,40,41]. From our point of view, product factors and patterns are highly interrelated and in some cases a product factor directly maps to a pattern (for example Circuit breaked communication [30]). Ideally, however, patterns are reflected by certain formations of entities and their respective properties within an architecture. When an architecture is evaluated, the fact that a certain pattern was used, should be measurable through corresponding architectural measures which measure a related product factor. The quality aspect that this product factor impacts should then be the same as the one also targeted by the original pattern, leading essentially to the same outcome that if a certain pattern was used, it should have an impact on a certain quality aspect.

Another interesting aspect to discuss is that a majority of impact relations in the quality model describes positive impacts on quality aspects. This is to a large extent due to our chosen methodology which considered practitioner books on how to implement cloud-native applications for the initial formulation of the quality model. This literature highlighted the potential benefits of using cloud-native concepts. Through the validation survey we also aimed at potential negative impacts which could indeed be identified, but only to some extent, for example the negative impact of Consistently mediated communication on Time behaviour. Because there are, however, typically trade-offs between quality aspects [42], this should in the end also be visible in the quality model. Further work on the quality model, especially considering use cases, should therefore also put a focus on potential negative impacts of application characteristics.

While keeping these aspects in mind, we are nevertheless confident that our quality model shown in Fig. 4 adequately describes how characteristics of cloud-native applications impact multiple quality aspects, therefore providing an answer to RQ1. Because of the reliance on the Quamoco meta model [12], the quality model is formulated on a sound theoretical foundation. It provides the basis for being further developed to enable quality evaluations of software architectures. The methodology that we used for formulating the quality model aims to show on the one hand how the model was developed in this specific case. But on the other hand, it also shows what needs to be considered more generally when developing a quality model and which approaches can be used. The presentation and demonstrated application of our methodology therefore represents the answer to RQ2 of how a quality model can be formulated and validated. We want to make clear that it is one of several ways of formulating and validating a quality model and that different approaches can be taken, also in an iterative way. In fact, the more a quality model can be validated from different perspectives and based on different types of rationale, the more confidence can be put in its applicability.

Thus, a major aspect for future work is to use additional validation approaches for the quality model, namely an experimental approach as already mentioned in Sect. 3 Step: E. To facilitate this, corresponding tooling support is crucial. Specifically, the tooling support should enable the intuitive modeling of software architectures using the entities shown in Fig. 5, the specification of measures based on modeled architectures, and their automated calculation so that they can be used for evaluations of the quality model. A prototype focusing on the modeling functionality has been presented in [26] and is the basis for further implementation of tooling support.Footnote 8 One aspect to discuss in this regard is how architectural models are created. Generally, it is possible to either manually model software architectures based on the understanding of the software or to automatically generate models based on artifacts such as source code or deployment descriptors. While it would significantly facilitate the usage of tooling if models could be generated automatically, the problem is the heterogeneity of technologies available in the context of cloud-native applications to which tooling in turn would need to be adapted. Ntentos et al. [34] have taken this approach of generating models automatically from IaC artifacts, but had to start with one specific technology to then continuously include further technologies. We decided to preliminary provide only manual modeling of software architectures, because we wanted to focus on the relations between architectural characteristics and quality aspects in the quality model. Therefore our aim for the tooling support is to make the manual modeling nevertheless as intuitive as possible. In addition, we based the modeling approach on TOSCA [43] as a standard. This provides the foundation for transforming deployment descriptors of various technologies to a TOSCA template, if adapters are available, and therefore improves the compatibility of our tooling.

6 Related work

Table 2 Comparison of our approach to other work based on the taxonomy by Aleti et al. [44]

In a broader context, our work can be seen as a part of research on Software Architecture Optimization methods which has been reviewed by Aleti et al. [44]. By using the taxonomy presented by Aleti et al. [44], our work can be characterized so that it can be better related to and compared with other work. We have done this in Table 2 which shows the different categories of the taxonomy by Aleti et al. [44] and a selection of related work to which our approach is compared to. The common aim of these approaches is to evaluate the quality of an architecture and, based on this evaluation, suggest potential options for quality improvements. Differences, however, can be seen when considering the categorizations in more detail.

The scope is set by the problem category which characterizes the problem tackled by an approach. It can be differentiated based on the domain, the phase of the software development lifecycle it is used in, and the covered quality aspects. Regarding quality aspects, Aleti et al. differentiate between quality attributes which are to be optimized, and constraints which are specified upfront and limit the potential solutions. Furthermore, with the dimensionality it can be expressed whether multiple quality dimensions are considered simultaneously, potentially conflicting with each other, or if there is a single objective. In contrast to other approaches, when using a hierarchical quality model, typically multiple quality attributes are considered and a quality evaluation and potential optimization aims to weigh multiple objectives against each other. This is a commonality between our approach and the approaches in Table 2. On which basis the quality attributes are defined, however, differs. While our approach is based on the quality aspects of the ISO 25000 standard [16] combined with cloud-native characteristics, Mayr et al. [45] focus on requirements derived from expert knowledge. Achilleos et al. [46] formulate Service Level Objectives (SLO), for example regarding the response time of a system. And Soldani et al. [10] rely on best practices for microservices-based systems obtained from literature. With quality models, constraints are formulated implicitly within the model, while Achilleos et al. [46] allow for a custom specification of constraints. For the approach of Soldani et al. [10] constraints may come from external business-specific considerations that might justify violations of certain best practices. Regarding the solution characteristics, it can be seen that the other approaches are similarly based on models for the quality evaluation, but the types of models used for the Architecture representation differ significantly. They range from directly using source code [45] or using deployment-focused descriptions such as CAMEL [46] or \(\mu \)TOSCA [10]. The degrees of freedom describe which aspects of an architecture are considered to be changeable in order to improve the quality of a system. These degrees are highly related to the used representation of a system and while, for example, Achilleos et al. [46] have a more narrow focus on deployment configurations to also enable runtime adaptations, our scope is broader considering both, how components are structured and interact with each other and the deployment perspective. For the strategy used to enable optimizations, multi-objective problems are often approximate and problem-specific options need to be evaluated. The approach of Mayr et al. [45] can be categorized as being more exact, because of the focus on requirements which allows for an optimization in terms of either fulfilling requirements or not. And regarding constraint handling, different approaches are used related to how constraints are expressed. For example, in the case of Achilleos et al. [46] deployment options that violate specified constraints are not considered at all, while in our approach constraints occur implicitly within the model and therefore potential violations might be acceptable if other quality aspects are considered more important. Finally, a comparison is possible regarding how an approach as such is validated and whether an optimization done through the approach has been validated. In our case, a validation has been performed through the survey and an actual optimization based on the quality model is planned for future work. For the other compared approaches, several different types of validations have been done, although applying an approach to an actual use case is a common procedure.

With this comparison, we aim to set our work in a broader context and compare it to other related work. Although only a selection of different work was used, other work can also be compared to ours based on the taxonomy by Aleti et al. [44]. But from this comparison it becomes clear already that despite the common goal of evaluating and improving architectural aspects of applications, there is a large variety of problems and corresponding solutions to consider. Especially for approaches that cover a similar domain as ours, there is a significant potential for synergies in terms of integrating knowledge from these approaches in our quality model. A closely related domain is that of microservices for which approaches have been presented that evaluate the architecture of a microservices based system. The work included in our comparison by Soldani et al. [10] belongs to this category, but also works focusing on security aspects of microservices [28, 34, 47]. Furthermore, in the context of microservices, the analysis of anti-patterns [48, 49] or patterns [41, 42] together with an application to microservices architectures has been investigated. Another closely related domain is that of cloud deployments for which the work included in the comparison by Achilleos et al. [46] represents an example. Additional related work again applies a focus on patterns used in the context of cloud deployments [40, 50, 51] and how these can be used to detect problems or compare different deployment options. As mentioned, many of these related works provide aspects that are integrable in our approach. But they differ regarding their specific domain, formulation of quality attributes, architectural representations, and degrees of freedom. To summarize, our work thus represents a novel approach which can be differentiated from other existing work specifically based on:

  • the considered domain: Because we cover both, how components of cloud applications interact and how they are deployed

  • the consideration of multiple quality aspects in a hierarchy: Because we aim at the capability to also express trade-offs between quality aspects

  • the abstraction from specific technology: Because we take into account the technological heterogeneity of cloud applications and therefore abstract from it

7 Conclusion

With cloud-native being a recent and noteworthy topic, being able to assess how applications can benefit from cloud-native concepts and technologies is an advantageous capability for software developers and architects. Due to its breadth of scope and the technological heterogeneity, however, this represents a challenge. With the quality model for software architectures of cloud-native applications presented in this work, we provide an approach for tackling this challenge. From a theoretical perspective, the quality model aims to provide a structured understanding about how cloud-native characteristics impact different quality aspects. This is in line with the possibility of using the quality model for a qualitative evaluation of software architectures. For a quantitative evaluation and therefore a more practical perspective, however, a further enhancement of the quality model is needed for which we have built the foundation and plan to continue in future work. Furthermore, our work represents a methodological contribution on how quality models can be formulated and validated. Through the structured and comprehensive description of the approach for formulating and validating the quality model, we aim to make approaches for developing quality models more comparable and therefore comprehensible.