From assistant systems to assistant platforms

Assistant platforms like Alexa and Siri have received strong attention in the last decade. The same applies to the more recent generative artificial intelligence (GAI) platforms Bard, ChatGPT or Claude. While they are assistant systems, they feature traits that render them a specific form of digital platforms. This Fundamental elaborates on this notion and on the characteristics of assistant platforms. At the beginning, it is valuable to understand the role of assistants in the economy and their support with information technology (IT).

Assistants and assistant systems

Assistants have a long tradition in supporting people to fulfill their tasks. By delegating tasks to assistants, individuals can reduce their workload, streamline processes, and access necessary domain knowledge and resources. Assistants also improve the efficiency of decision-making, from preparing activities to operational execution, control, and readjustments. Assisting clients has also been the business model of mediators or intermediaries, such as travel agents, financial advisors, and brokers. Since intermediating functions are mostly information based, IT (especially digital IT) has a strong impact on these business models. Early examples of IT-based assistants date back to the 1980s, with program error locating assistant systems (Korel, 1988) and later on with tourist assistant systems (Yang et al., 1999), teaching assistant systems (Lesta & Yacef, 2002), meeting assistant systems (Tur et al., 2010), and hotel assistant systems (López et al., 2013). More recent examples are diverse forms of chatbots, personal voice assistants such as Amazon’s Alexa, Microsoft’s Cortana (to be succeeded by Copilot), or Apple’s Siri (Yoffie et al., 2018), and text-oriented assistants such as ChatGPT from OpenAI (van Dis et al., 2023), Claude from Anthropic (2022) or Bard from Google (Nguyen & Kim, 2022).

Assistants and intermediaries have become known as important concepts for enhancing customer-orientation in value chains and business models. In particular, they link the problem space on the customer side with the solution space on the provider side. Contrary to providers, customers are typically not (or less) familiar with the specifics of a certain domain (e.g., the components of financial assets for a financial investment decision) and require a "translation" from their general requirements to the product-specific terminologies and procedures (Alt, 2016). While intermediaries like asset managers, travel agents, marketplaces and search engines have shown to be successful business models for this facilitating function, it has been challenging to develop sustainable business models for assistant systems. Despite the technological enthusiasm on digital assistants, these business models are still missing as illustrated by the large losses and layoffs in Amazon’s Alexa unit in 2022 (Nguyen & Kim, 2022; Peters, 2022).

Assistant systems and assistant platforms

An important step in structuring assistant systems is to acknowledge their three main dimensions. The first dimension is based on the application domain and leads to the distinction of domain-specific and general-purpose assistant systems. As sociotechnical information systems, assistant systems simulate the role of human assistants, combining technology, users, and tasks (Maedche et al., 2019). They capture the requirements of their principal, procure services, knowledge, and resources to make them available for their clients. Many assistant systems were predominantly domain-specific, focusing on a particular area or industry (Knote et al., 2019). They provided specialized support and services within their application domain (e.g., programming, tourism, teaching) by offering in-depth knowledge in this field. By contrast, general-purpose assistant systems are designed to perform a wide range of tasks across various domains, to handle diverse requests, and to assist users in multiple everyday activities (Knote et al., 2019). Despite these diverse roles, current research fails to adequately address the comprehensive scope and complexity when assistant systems are combined with a variety of services from other domains and platforms.

The second dimension distinguishes assistant systems along their interaction mode with users (Rzepka & Berger, 2018) between voice- and text-oriented assistant systems. In their early days, assistant systems relied on text-oriented communication owing to resource limitations and technology restrictions (e.g., Korel, 1988). As technology advanced, assistant systems began incorporating voice-oriented interfaces, enabling more natural and intuitive interaction between users and the system (e.g., Oviatt et al., 2000). Such advanced systems can use both modes interchangeably through technologies like speech-to-text and text-to-speech conversion. For example, OpenAI’s text-oriented ChatGPT assistant can also work as a voice assistant using these methods (Nedelcu, 2023; Open AI, 2023b). Similarly, Alexa has recently been enhanced with GAI capabilities (Gurman & Day, 2023) and Microsoft Copilot with ChatGPT-4 (Warren, 2023). Therefore, the interaction mode of these assistants is not clearly determined but depends on the specific implementation and on accessible modules, highlighting their versatility in meeting diverse user needs. Although rich knowledge has emerged on user interaction, current research has not adequately explored the implications of this dual-mode communication capability in assistant systems, and the ways in which the flexibility of interaction modes shape user experience and task efficiency.

The third dimension that differentiates assistant systems is topology and leads to the distinction of single-actor and multi-actor assistant systems (Tatnall, 2005). On the one hand, an assistant system may be seen as a service that mediates between a user demand and a provider’s offering. The assistant system is then a single-actor system. On the other hand, assistant systems may pursue a platform approach and serve to access services and devices from various providers. Examples of such multi-actor assistant systems are voice-oriented assistant systems like Amazon Alexa, where developers may create their own skills (Amazon Web Services, 2023; Amazon, 2023), as well as text-oriented assistant systems like ChatGPT, where plugins allow the use of third-party services (Open AI, 2023a). In addition to accessing and combining services from various actors, they also combine diverse platforms. For example, Alexa can address music- and video-streaming as well as marketplace platforms. Despite the intricate topology of assistant systems and their potential to connect users with various service providers, current research has not sufficiently delved into the structural implications of these platform-oriented systems, leaving a significant gap in our understanding of their full capabilities and functionalities.

Structure and approach

This Fundamentals article addresses three gaps in the current understanding of assistant systems: the multifunctionality and comprehensive scope of general-purpose assistant systems, the dual-mode communication capability that has remained largely underexplored, and the complex topology of platform-oriented assistant systems which shall be referred to as assistant platforms. By emphasizing, the platform nature, the notion recognizes that implementations of these systems in practice show the interconnectedness of various platforms and services. Therefore, the Fundamental structures the novel term of assistant platforms and highlights its unique characteristics such as a multi-platform architecture, a declarative interface, and a platform ecosystem. These characteristics also set the stage for future research topics, thereby making a contribution to the evolving discourse on assistant platforms. The Fundamental concludes with a summary and an outlook, which points at the key role of GAI technologies. Methodologically, the Fundamental is based on prior research (Schmidt et al., 2021, 2022, 2023, 2024) and feedback from various reviews and presentations at academic conferences, in particular the AI-based assistants minitrack at the Hawaii International Conference on System Sciences (HICSS).

Assistant platform characteristics

Building upon the above-mentioned shortcomings, this chapter aims to deepen the understanding of assistant platforms by embedding them within the existing discourse on platforms. It analyzes innovation, transaction, digital, and AI-based platforms and suggests that assistant platforms are a novel combination of these existing views.

Innovation platforms

Following present research, the first main class of platforms refers to innovation platforms. On the one hand, they facilitate the development of new, complementary products and services built primarily by third-party companies (Cusumano et al., 2019). On the other hand, they feature a core functionality supplemented with complementary modules. The concept of modularity is associated with many advantages: it allows components or modules of a system to be developed, tested, and evolved independently. Each module may be updated or modified independently of others (Baldwin & Clark, 2000), and end-users may add or remove modules according to their specific needs (Campagnolo & Camuffo, 2010). A system’s efficiency and flexibility typically benefit from it, since improvements or changes can be made to one module without disrupting the entire system (Gawer, 2014). By decomposing a system into interchangeable modules, developers can focus on innovating within their specific area of expertise. Such modular approaches have shown to foster more creative and unique solutions that might not have been possible within a monolithic system (Baldwin & Woodard, 2009). These ideas have also been applied to digital platforms to enable user interactions and product variety (Dai, 2023).

Based on these properties, assistant systems, such as Alexa and ChatGPT, qualify as innovation platforms. The skills of Alexa or the plugins powered by ChatGPT may be conceived as modules that provide specific functionalities. Different developers may develop them independently, but they may be integrated (primarily in the sense of loose coupling) into the platform via standardized interfaces (Baldwin & Woodard, 2009; Dai 2023). These interfaces define interactions between the assistant platforms with the skills or actions acting as boundary resources (Ghazawneh & Henfridsson, 2013), which ensure the interoperability of these modules despite different functionalities and development environments (Baldwin & Clark, 2000; Campagnolo & Camuffo, 2010). Assistant platforms often integrate services from third parties, such as music streaming, smart home devices, or weather services. For example, the German Alexa Skill Store alone includes 10,329 different skills belonging to 43 different categories. These categories cover a wide array of third-party modules, from business and finance to weather reports. Several categories of skills integrate other platforms, such as hotel and car sharing platforms (see Schmidt et al., 2022).

Transaction platforms

The second main class of platforms are transaction platforms. By empowering participants to exchange goods, services, or information (Cennamo, 2021), they make available resources previously unavailable for transactions (McAfee & Brynjolfsson, 2017) and reduce transaction costs (Parker & Van Alstyne, 2014). Originating from economic research on multi-sided markets (Rochet & Tirole, 2003; Jacobides et al. 2024), transaction platforms are more sociotechnical in nature than the rather technology-oriented innovation platforms. These include marketplaces (e.g., Amazon Marketplace, eBay, Opodo) as well as social networking platforms (e.g., WeChat, Instagram), with the concept dating back to the early electronic marketplaces of the 1980s (Abdelkafi et al., 2019).

Assistants like Alexa as well as ChatGPT are already enabling transactions, and similar features are announced for Bard. For instance, ChatGPT plugins such as the KAYAK plugin support the booking of travel-related services and Alexa allows users to make purchases from Amazon’s marketplace and numerous other transaction platforms such as music and video streaming platforms (Schmidt et al., 2021, 2022). Users may ask Alexa to order a product, and the assistant will search for it in the Amazon marketplace, offering options based on the user’s previous purchases and preferences. In addition, Alexa skills can offer in-skill purchases for premium content, special features, and subscriptions (Schmidt et al., 2021) and Alexa can interact with various third-party services to facilitate transactions, such as ordering a ride from Uber or Lyft, ordering food delivery from Grubhub, and playing music from Spotify or Amazon Music (Schmidt et al., 2021). Other skills allow users to check bank balances, pay bills, or make investments, thus enabling financial transactions through voice commands.

Digital platforms

Digitalization is a separate and independent perspective that intersects with innovation and transaction platforms (Gawer, 2022). If the distinction is made between analog and digital IT, it would be precise to denote them as digital innovation and digital transaction platforms (Bonina et al., 2021; Hein et al., 2020). Due to the advantages of digitalization, the remainder of this Fundamental focuses on these digital representations. On digital innovation platforms, the core technology and modules are digital, which allows for a more efficient creation and integration of modules as well as a remote and real-time access via programmatic interfaces known as application programming interfaces (API) (Piccoli et al., 2022). These digital interfaces enable seamless interactions and flexibility, fueling global collaborations and innovations. On digital transaction platforms, digitalized supply and demand information permits infinite replication and efficient digital searchability, which significantly lowers transaction costs, expands reach, and increases market efficiency (Bakos, 1991). In sum, digitalization enhances innovation as well as transactional efficiencies and paves the way for hybrid platforms that combine traits of both types of platforms.

Digitalization profoundly impacts assistant platforms, such as Amazon Alexa and ChatGPT. Possible effects of digitalization are fourfold: First, the exponential growth of digital data and advancements in data processing technologies have enabled large datasets to learn and improve their understanding of human language, empowering assistant platforms to provide more accurate and personalized assistance to users (Gregory et al., 2022). Second, cloud computing provides a scalable and flexible infrastructure, allowing assistant platforms to manage computational resources and to handle millions of users simultaneously (Cusumano, 2019). This led to the rapid deployment and the global access of assistant platforms, while developers could build and scale their applications more easily. Third, the increased connectivity and proliferation of Internet of things (IoT) devices have integrated assistant platforms into a wide range of products and services. For instance, Alexa can control smart home devices via natural language, whereas ChatGPT can enhance websites with textual content in various languages (Schmidt et al., 2021). Fourth, the widespread availability of smartphones, tablets, and other connected devices grants users access to assistant platforms anytime and anywhere. This has driven the demand for assistant platforms and the development of new features to better serve mobile users (Basole & Karla, 2011).

AI platforms

Artificial intelligence (AI) has amplified the impact of digitalization on platforms and created a new dimension of functionality and user interaction. Research indicates that AI and digital platforms share multiple relationships with platforms serving as data sources for AI, AI supporting platform processes, and AI being part of a platform stack such as found in many cloud stacks (Alt, 2021). In particular, the advent of AI, coupled with the growing availability of data, has given rise to data network effects. These effects reveal a direct positive correlation between a platform’s AI capability and the value perceived by its users (Gregory et al., 2021, 2022). Moreover, the combination of AI and data enhances the proportion of value that platform owners can extract from users (Clough & Wu, 2022).

The most well-known general-purpose assistant systems leverage AI to render interactions more fluid, intuitive, and personalized (Rzepka, 2019). A key element is their declarative interface, which serves as the primary mode of user interaction on these platforms. AI in these systems is designed to accurately interpret user inputs, even when these inputs are expressed in natural conversational language (Rzepka & Berger, 2018). Natural language understanding (NLU) enables the assistant to decipher literal words spoken or typed by a user and to infer the intent behind them. Likewise, natural language generation (NLG) empowers assistants to generate human-like text, facilitating responses to users that feel natural and are easy to understand.

AI in assistant platforms also aids in the contextual understanding of a user’s request, ensuring the delivery of relevant and accurate responses (Ren et al., 2021). For instance, if a user instructs an assistant to “play some music,” the AI algorithm can include the user’s past listening history, the time of day, and other factors to select an appropriate song or playlist. Moreover, AI enables assistant platforms to learn from each interaction and to tailor their responses based on user preferences, habits, and behavior patterns (Rong et al., 2021). Advanced AI algorithms can anticipate a user’s needs based on past behavior, providing proactive assistance (Lim et al., 2022). For example, an AI assistant could suggest leaving early for an appointment if a traffic congestion is expected based on current data. In essence, applying AI in these platforms renders the results more intuitive and personalized, while enhancing the utility of the declarative interface for a personalized as well as intuitive user experience.

Defining assistant platforms

In summary, assistant platforms may be conceived as multifaceted systems that complement and combine different existing research streams. They constitute a distinct category of platforms incorporating elements of innovation, transaction, digital, and AI-based platforms (see Table 1). On the one hand, they facilitate innovation via the development of modular functionalities (skills or plugins) by third-party developers. On the other hand, they also enable transactions, such as purchases on retail or travel marketplaces. Furthermore, assistant platforms are inherently digital, leveraging the benefits of digital technologies, such as cloud computing, IoT connectivity, and advanced data processing. Finally, they utilize AI to enhance user interaction, to understand user intent, and to offer personalized and proactive assistance. From their evolution, voice-oriented platforms focus on providing simple control of modular resources such as IoT devices and streaming services, while text-oriented platforms allow users to execute more complex queries and to perform sophisticated text transformations. Recent developments indicate a convergence of voice- and text-oriented platforms, which allows even higher levels in interaction complexity. In view of these characteristics, assistant platforms also qualify as multi-platforms that integrate and combine services from various other platforms (Schmidt et al., 2024).

Table 1 Elements to define assistant platforms

Conceptualization of assistant platforms

Following the definition of assistant platforms, an architecture model distinguishes three main elements of assistant platforms. It recognizes assistant platforms as multi-platforms (i.e., a collection of services from various platforms) that offer access via a declarative interface, and are embedded in a platform ecosystem (see Fig. 1). This approach aligns with the established practices in the information systems discipline, which regards architectures as conceptual models. These models delineate the various elements of a specific system and elucidate how these elements interact and operate cohesively (Zachman, 1999).

Fig. 1
figure 1

Assistant platform architecture

Multi-platform architecture

As mentioned above, the modules in assistant platforms refer to peripheral components called “skills” (Amazon, 2023) on Alexa and “plugins” (Open AI, 2023a) on the ChatGPT platform. For example, gateway modules like the KAYAK plugin on ChatGPT grant access to integrated platforms and in this case link seamlessly to the KAYAK transaction platform. The modules define interfaces as boundary resources, fostering loose module coupling and complementarities without requiring knowledge on the details of the internal realization of individual modules (Jacobides et al., 2018). Platforms typically manage modules in a metadata repository that ensures compatibility with search and matchmaking, thereby enabling seamless processes among multiple actors. Important metadata in these repositories pertain to the categorization of modules, user reviews and ratings, commissioning of modules, and information about the module providers.

The concept of a multi-platform architecture (Schmidt et al., 2024) is linked to the innovation platform aspect of assistant platforms. As innovation platforms, assistant platforms utilize a modular approach, in which independent components or modules are developed, tested, and evolved separately. This allows to extend their functionality without needing to build every feature in-house and allows end users to add or remove modules based on their specific needs. The adopted modular approach follows the design of established architectures that conceive components as context-independent software units (Broy et al., 1998; Ciupke & Schmidt, 1996; Lau & Wang, 2007). They have proven to be valuable for integrating heterogeneous resources and for facilitating combinatorial innovation. Compared to innovation platforms, multi-platforms emphasize the existence of gateway modules for integrating other platforms via standardized interfaces (e.g., an API). Thus, platforms integrated via gateway modules can be accessed as a genuine service on the platform.

Declarative interface

Current user interfaces are based on unambiguous user inputs. For example, a command line follows a precise syntax but rejects imprecise inputs. In the context of assistant platforms, a declarative interface allows users to interact with a platform using spoken or written natural language. AI technologies, such as NLU and NLG, offer the functionality to interpret user input (“utterances”), to infer user intent, and to generate human-like responses. Until the rise of GAI technologies (Banh & Strobel, 2023), interactions via the declarative interface have been limited to rather simple tasks, which caused many users (in the Alexa case an estimated 15-25% of users) to discontinue using the assistant platform (Anand, 2021). Infusing GAI into these classical assistant platforms has become an important path towards mastering more complex interactions (Gurman & Day, 2023; Open AI, 2023b). AI also fosters the assistant platform’s understanding of the context of a user’s request and learns from each interaction as well as anticipates user needs based on past behavior. Automation on assistant platforms may be reactive and proactive, reflecting user-made and automated rules derived from AI learning. On the one hand, reactive automation occurs when users explicitly instruct an assistant platform to perform certain tasks or behaviors. For instance, a user may set a rule for the assistant to turn on the lights at a specific time each day. On the other hand, proactive automation is driven by AI functionalities that learn from user behavior and adapt accordingly. Her, the assistant platform might detect patterns in a user’s actions, such as watching a movie at a particular time each day, and proactively adjust room lighting to create a “movie scene” ambiance without the user’s explicit instruction. Recommendations of new services and combinations of services may be created in this way. The interconnectedness of reactive and proactive automation on assistant platforms underlines their versatility and adaptability.

Platform ecosystem

A platform ecosystem allows multiple participants (producers and consumers) to connect, interact, create, and exchange values (Tiwana, 2014; Hein et al., 2020). It is characterized by externalities such as network effects, where an increase in the number of participants on one side of the platform enhances the platform’s value to participants on the other side. In the context of assistant platforms, the ecosystem includes several key actors, each playing a vital role in the overall operation and value generation of the platform:

  • Platform owners manage assistant platforms, e.g., Google (Google Assistant/Bard), Amazon (Alexa), Apple (Siri), and OpenAI (ChatGPT). They are responsible for maintaining the infrastructure, ensuring security, setting rules for participation, and fostering a healthy ecosystem.

  • Third-party developers create and provide modules to extend the functionality of the assistant platform. These functional modules can range from simple tasks, such as setting reminders, to complex tasks, such as controlling smart home devices and generating coherent text.

  • End users are individuals or businesses that use the assistant platform for various tasks. They benefit from the enhanced functionalities that third-party developers provide and contribute to the platform by generating data, providing feedback, and by enhancing network effects.

  • Partners include device manufacturers, service providers, and other businesses that integrate their products or services into an assistant platform. They enhance the platform’s utility by allowing users to access and control their devices or services via the assistant platform.

Data plays a pivotal role in the platform ecosystem of assistant platforms as these ecosystems are not the result of a central plan. They are emergent systems since they rely on a multitude of independent user decisions. Therefore, it is important to collect and analyze data reflecting user decisions in order to capture the structure of the platform ecosystem of assistant platforms in as much detail as possible and to understand how value is created and distributed among various actors. This data generates knowledge on the competitive dynamics, potential bottlenecks, and opportunities for innovation within the ecosystem. It may be seen as key resource for managing activities in assistant platforms.

Capabilities of assistant platforms

Based on the generic architecture model of assistant platforms presented above, their functionality shall be described via four main capabilities. Capabilities refer to the technical features or functions that a system, technology, or platform can perform (Teece et al., 1997). These are inherent in the design and function of the technology itself. Integration pertains to incorporating external resources, composition to assembling diverse modules for a cohesive user experience, prediction to enhancing user experience through anticipatory mechanisms based on machine learning (ML) and AI algorithms, and generation to providing coherent responses for meaningful user interactions.

Integration

Integration is an assistant platform’s ability to connect with and incorporate resources, such as systems, devices, and services. Internal resources are delivered by the platform provider (e.g., Google calendar by Google Assistant) and are often more closely aligned with the assistant platform. By contrast, external resources are offered by third-party service vendors (e.g., weather reports), sensor device vendors (e.g., temperature, humidity sensors, cameras, and motion detectors), or actuator device vendors (e.g., LED lights). From a technological perspective, device integration relies on standard protocols like Zigbee for smart home automation and the Matter standard (CSA, 2022). These protocols, development kits, and API definitions enhance devices with voice functionalities, such as wearables, vehicles, and smart home devices. For instance, Alexa offers several ways to integrate services and devices, including a smart home skill API for linking third-party cloud services with the Alexa cloud and an Alexa Connect Kit to add Alexa to third-party hardware (Amazon, 2023).

From a conceptual perspective, the integration of resources on assistant platforms follows a black box approach, which makes module functionality accessible via a standardized interface, thereby hiding internal complexity (Baldwin & Woodard, 2009; Garud & Kumaraswamy, 1995). An example from healthcare illustrates how integration extends an assistant platform’s functionality and value: Consider a hospital where different systems are used for patient records, appointment scheduling, and medical imaging. Traditionally, these systems operate independently, leading to potential inefficiencies or errors owing to fragmented information (e.g., Fürstenau et al., 2019). These disparate systems can be unified with the integration capability of assistant platforms. A physician could use a digital assistant to quickly retrieve a patient’s complete medical history from the records system, schedule a follow-up appointment using the scheduling system, and even pull up relevant medical images from the imaging system. All these tasks could be completed seamlessly through a single voice or text command, without the physician manually logging into and navigating multiple systems.

Composition

The capability to compose and manage multiple services, devices, and information sources encapsulated as modules through a declarative interface is a key feature of assistant platforms, providing users with a highly convenient and personalized experience. Composition assembles various modules into a cohesive user experience that is rich, adaptable, and responsive to a wide range of scenarios. Each module encapsulates specific services or devices and has a clearly defined interface that allows modules from different vendors with diverse functionalities to interact seamlessly. This modular structure enables an independent evolution: a new version can replace a module, provided it supports the interface, eliminating the need for vendors to synchronize their developments. In this context, multiple modules may be integrated into module configurations, which consist of sequences of actions triggered and manipulated by various events such as user behaviors or environmental changes.

An example of the composition capability is the coupling of various smart home devices with an assistant platform like Alexa. Imagine a user with several smart devices: a thermostat, lights, and a security system. Each device has an interface and controls, which makes it difficult to manage all devices collectively. With the composition capability, Alexa allows users to create routines. For example, a “morning routine” could adjust the thermostat to a comfortable temperature, switch on the lights in certain rooms, and disarm the security system. This routine could be triggered by a voice command, a specific time, or a user’s smartphone. The composition capability also extends to non-hardware services. For example, a user could set up a routine asking Alexa about their “day’s schedule” that prompts the assistant to read calendar events, share weather forecasts, or play their favorite morning news podcast.

Prediction

Prediction involves the ability of assistant platforms to anticipate user needs and preferences thereby enhancing user experience (Agrawal et al., 2022). This capability leverages historical data and contextual information and often applies ML and AI algorithms to provide accurate and relevant predictions. For instance, an assistant platform might anticipate a user’s need for weather updates based on their morning routine or suggest restaurants based on past dining preferences. Text- and voice-oriented assistant platforms allow users to interact using colloquial language. This feature reduces interaction thresholds and allows for more location-independent access, thereby improving the platform’s ability to predict user needs based on a wide range of contextual cues.

An example of the prediction capability can be seen with the use of AI-powered assistant platforms in personal productivity. Imagine a user with a daily routine of catching a train to work at 8:00 AM. Over time, the assistant learns this pattern and predicts that the user will need information about train schedules around this time. Without being prompted, the assistant could provide predictive notifications. For instance, if there is a delay in the user’s daily train connection, the system may proactively alert the user earlier, suggesting to leave home earlier or take an alternate route. The assistant could also be integrated into the user’s email and calendar. As soon as it detects an upcoming flight in the user’s email, it could proactively provide weather updates for the destination city or reminders to check in for a booked flight.

Generativity

Generativity refers to the ability of assistant platforms to create contextually relevant and coherent responses or actions based on user input and predictions (Henfridsson & Bygstad, 2013). The capability to generate diverse outcomes is particularly important in the context of NLG, which is critical for enabling meaningful user interactions. The two primary types of assistant platforms feature distinct generation capabilities: Voice-oriented platforms have a limited capacity for generativity and rely heavily on a range of actors, network effects, and complementarities between modules. Text-oriented assistant platforms combine a high generativity with limited recombinability since they can only recombine text-oriented resources. Despite this limitation, the ecosystem of text-oriented platforms is rapidly evolving, with emerging network effects and supermodular (i.e., reinforcing) complementarities (Jacobides et al., 2018) between developers using the platform’s API and their users.

Generativity allows assistant platforms to create new knowledge or content. An example is OpenAI’s GPT-4, a language model that powers various assistant platforms such as ChatGPT or Microsoft’s Copilot. Given a prompt, ChatGPT can generate various outputs, ranging from writing an email or an essay to creating poetry, writing software code, or even generating a short story. Imagine users who are writing a science fiction novel and are stuck with the plot. They might ask an assistant platform powered by GPT-4 for help, providing some context and asking for suggestions on how the story could proceed. Using GPT-4, the assistant platform generates several plot ideas that users can consider. Note that the generated content is not pre-stored or pre-defined but is created on-the-fly based on the context provided by the user and the model’s training.

Research agenda

Research on digital platforms has a long tradition, especially when referring to earlier research on electronic markets and networks (e.g., Clemons et al., 1993; Malone et al., 1987). Recently, several research agendas have been presented for digital platforms (e.g., de Reuver et al., 2018) and ecosystems (Heinz et al., 2022). These research agendas were developed based on broad literature reviews and partially also apply to assistant platforms. While considering these more general topics, the following research agenda is more focused on assistant platforms and first discusses directions along the four capabilities. In addition, the identification of affordances as well as the analysis of economic and social impacts are considered as themes for future research.

Capabilities

Each of the four capabilities elaborated above shed light on several research questions related to improving existing methods, user experience, and the functionality of assistant platforms. In most existing assistant platforms, composition and configuration are still manual procedures and several directions may be identified to create methodological support as well as to increase automation:

  • Module configuration mining: This concept extends beyond preconfigured scenes or home automation. It pertains to identifying and suggesting potential module configurations based on prior user actions. The automatic analysis of module configurations, representing a bottom-up approach, could lay the groundwork for a module and skill composition recommendation system. The research question here is how to effectively mine module usage to generate valuable recommendations for adding, replacing, or removing modules and for identifying favorable configurations.

  • Lightweight module configuration: Making configuration mechanisms as threshold as low as possible, is an important factor for the usefulness of a platform. The research question here is how to leverage evolving no-code or low-code programming techniques, such as graphical design editors, to automate procedures and render module configurations more accessible to users with limited technological skills.

  • Design of complementary modules: Currently, module development is primarily based on templates and interactive forms for ad hoc module development. However, if modules are complementary, a method to guide the development of these complementary modules would be beneficial. The research question in this context is how to develop effective methods or guidelines to assist in creating complementary modules.

  • Typification of modules: The proposed definition of assistant platforms is suitable for comparing assistant platforms in terms of their orientation, such as innovation and transaction characteristics. Refining this typification to provide a more nuanced understanding of different assistant platforms could be another interesting area for further research.

The second set of research questions pertains to the integration capability, which strongly determines the efficiency of extending the assistant platform’s ecosystem. Two areas of future research can be identified here:

  • Automated integration: The integration capability recognizes standardization and interoperability as key factors for assistant platforms. As mentioned above, templates are a widespread means of integration where device providers map their data and functional interfaces. Even more manual effort is needed for less standardized external resources. This leads to the research question how methods may be designed for an (partially) automated integration of diverse resources on assistant platforms.

  • Model-driven mapping of modules: The creation of modules for device and service configurations currently requires manual coding, which is not scalable and lacks checks for completeness and consistency. Therefore, developing a method for the automated mapping of device and service models into module dialogs seems an important research question within the integration capability.

The declarative interface of assistant platforms has rendered the formulation of user intentions more flexible and gives rise to three research themes within the prediction capability:

  • Explainability: In the context of assistant platforms, explainability extends beyond the typical explainability associated with AI. While AI explainability focuses on understanding how an AI model attains a decision (Meske et al., 2022), the challenge on assistant platforms is rather complex: AI assistants interact with humans in a more direct and conversational way compared to other types of AI applications. They often need to explain their actions or decisions in human-readable terms and within the flow of a conversation. Many AI assistants handle a variety of input types (voice, text, visual, etc.) and generate multimodal outputs. Especially, GAI technologies are known for incorrect outputs (e.g., due to misinformation and hallucinations). Providing explanations that span these different modalities and secure the output quality could be a research question in this regard.

  • Domain knowledge: Understanding the application context of interactions is important for correct interpretations and valuable user support. Similar to human experts, who should be aware of their knowledge limits, assistant platforms should recognize the extent and limitations of their domain knowledge. This enables them to apply their knowledge judiciously, to adapt to different scenarios effectively, and to reduce hallucination effects. Therefore, future research should consider enhancing the self-awareness of AI assistant platforms to better understand and navigate their domain knowledge boundaries.

  • Translation of non-functional requests: Assistant platforms often encounter non-functional requests from users, like “What is a good investment option?”. These requests need to be translated into practical terms within the platform’s solution space, such as risk classes or volatility. The ability to effectively interpret and respond to such requests is a key aspect of assistant platforms and warrants further research.

The generativity capability is particularly relevant in the realm of NLG, which enables meaningful user interactions. While some progress has been made, potential avenues for further research and development are the following:

  • Enhanced generativity: Voice-oriented platforms offer seamless integration of resources, but their ability to generate diverse outcomes is limited. The research question here is how the generativity of voice-oriented platforms may be enhanced without compromising their usability and their ability for seamless integration. This requires a detailed exploration of how these platforms’ generativity can be boosted while maintaining their inherent ease of use and integration capabilities.

  • Generativity in text-oriented platforms: At present, text-oriented assistant platforms relying on GAI show high generativity but limited recombinability. This leads to the research question how the recombinability of text-oriented platforms may be improved to further enhance their generativity.

  • Tailoring generativity to user needs: The generative responses of an assistant platform should be contextually relevant and tailored to user needs. The associated research question calls for leveraging ML and AI to better understand user behavior and preferences, thus to improve the tailoring of generative responses.

  • Generativity and user trust: As assistant platforms become more generative, they also feature increased levels of complexity, which may potentially affect user trust. The research question relates to how user trust can be maintained and enhanced as the generativity of assistant platforms increases.

  • Generativity and platform evolution: As platforms evolve, so do their generative capabilities. The corresponding research question focuses on how the evolution of generative capabilities in assistant platforms may be understood and improved in line with users’ changing needs and expectations.

Affordances

A promising approach to obtain a deeper understanding of capabilities is affordance theory (Michell, 2013). The concept of affordances pertains to the actions a user can execute with a given technology, contingent on its capabilities and the user’s individual skills and objectives (Hartson, 2003). Affordances encapsulate more than just the technology’s inherent abilities and also consider what users perceive they can accomplish with the technology and how they utilize it in real-world scenarios. Thus, affordances emphasize the user’s perspective and the interaction between the user and the technology. Compared to capabilities, affordances may be conceived as more abstract means, while capabilities are more directly concerned with outcomes. As mentioned by Thapa and Zheng (2019, p. 54) “affordances theory is very useful in guiding the design of technologies, whereas the CA [capability approach] is often used as an evaluative framework.” Research on affordances for assistant platforms follows prior research which revealed the top five affordances for voice assistants: controlling household devices (89%), playing music or radio (84%), initiating calls (77%), receiving traffic updates (47%), and conducting internet research (41%) (Bitkom, 2022). Other authors (e.g., Moussawi, 2018) have highlighted sensory affordances of personal intelligent agents, such as hand-free use and emotional engagement; cognitive affordances like personalization; functional affordances, including quick assistance; and physical affordances as key to user experience. Finally, affordances of chatbots have shown to alleviate the chasm between social and traditional enterprise systems (Stoeckli et al., 2020). Similarly, the casual language in assistant platforms could bridge the gap between everyday speech- and solution-oriented services, revolutionizing human-computer interactions. Although significant progress has been made in understanding and designing affordances in general, several research opportunities appear in the context of assistant platforms:

  • User perception affordances: Since the architecture of assistant platforms and their technological capabilities are advancing rapidly, understanding how users perceive these capabilities and their potential affordances is important. It leads to the question how architectural designs can bridge the gap between the technology’s capabilities and user perceptions of the assistant platform’s abilities.

  • Sensory affordances: The architecture of assistant platforms can significantly influence sensory affordances and user experience. The research question could center on the design and improvement of the architecture to enhance sensory affordances, thereby increasing user engagement and satisfaction.

  • Personalization affordances: Affordances may be personalized based on the architecture of assistant platforms, considering user preferences and usage patterns. How the architecture of assistant platforms might be tailored to personalize affordances and to improve user experiences effectively could prove a promising research topic here.

  • Physical affordances: Physical affordances, closely linked with the architecture of a system, provide indications on how the system might be used. In this regard, it would be interesting to investigate how the accessibility and usability of assistant platforms, particularly for users with physical disabilities, could be architecturally improved (e.g., with computer vision solutions like Microsoft's Seeing AI).

  • Architectural affordances: Multi-platforms shed light on a multitude of affordances, which emerge from conceiving multiple platforms as a new, loosely defined whole. Research could target how the interaction of different affordances could be architecturally understood and leveraged for the best user experience. In addition, the network effects among platforms could be scrutinized.

  • Declarative interfaces, and affordances: Declarative interfaces, a key architectural feature of assistant platforms, allow users to express their needs freely. Here, research could target how these interfaces could be architecturally designed to provide the most valuable and usable affordances.

  • Multimodal interaction affordances: The architecture of modern assistant platforms often supports multimodal interactions, which enhances user experience. Thus, it would be interesting to learn how the architecture of assistant platforms could be enhanced to further support multimodal interactions.

  • Affordances on, non-verbal cues and emotion recognition: Future developments in AI and NLP may enable assistant platforms to recognize non-verbal cues and emotions in user communication. How architectural designs could incorporate recognition of non-verbal cues and emotions to facilitate more empathetic interactions would point to a future research question on this topic.

  • Technological affordances: Finally, as AR and VR technologies continue to mature, their integration into the architecture of assistant platforms could create more interactive and engaging experiences. Further research deems necessary on how architectural designs can effectively integrate AR and VR technologies into assistant platforms to enhance user experience.

Economic and social impact

This third part of the research agenda examines the broader implications of assistant platforms on society and the economy. The research questions here revolve around the platforms’ role as gatekeepers, their impact on business models and market structures, the ethical considerations surrounding AI and data protection, as well as the need for interdisciplinary research to address these challenges.

  • Assistant platforms as gatekeepers: Assistant platforms can serve as gatekeepers, influencing value chains as outlined in the European Commission’s Digital Market Act (Digital Markets Act, 2022; Clemons et al., 2022). They can favor specific products and services, as seen with Amazon’s promotion of Amazon Basics products. It leads to the question how the gatekeeper role of assistant platforms impacts the dynamics and value creation within the ecosystem.

  • Economic value: Despite the impressive technological advances in the field of assistant platforms, their direct contribution to economic value has been limited. On the one hand, there is ample research on the positive impact on productivity through the use of assistant platforms (Brynjolfsson et al., 2023). On the other hand, platform owners like Amazon have experienced losses as well as layoffs within their assistant division (see above) and assistant platforms have only rarely been used as sales channels (Anand, 2021). Therefore, the question remains how the significant investments in assistant platforms (e.g., large language models, AI algorithms, interfaces) could be redeemed and monetized. For example, business models could be based on directly charging service providers and/or users as well as indirectly via revenues through increased sales on the connected platforms.

  • Impact on market structures: The characteristics of assistant platforms can significantly affect market and industry structures. A possible research question in this regard is how assistant platforms reshape market structures, which new business models they facilitate and what the potential socio-economic implications they might have. This also includes the dark sides of platforms and ecosystems, such as market failures (Jacobides et al., 2024).

  • Trust and risks: The duality of platforms also becomes visible when large collections of data are present. With the collection of context-enriched data, assistant platforms offer personalized interaction but pose potential risks. This leads to the research question how assistant platforms ensure the ethical use of AI and how they protect user data effectively.

  • Interdisciplinary research: Considering the potential for assistant platforms to be used for illicit purposes, further research should consider areas beyond the information systems discipline, including law and sociology (Clemons et al., 2022). This relates to the research question how interdisciplinary research could address the diverse challenges posed by assistant platforms.

  • Specific perspectives: In addition to interdisciplinary research, specific perspectives like the service-dominant logic (Lusch & Nambisan, 2015) and ecosystem intelligence (Schmidt et al., 2022) could offer new lenses to analyze phenomena on assistant platforms. The research questions here include how these perspectives can guide the design and management of assistant platforms to improve value creation for all actors in the assistant platform ecosystem.

Conclusion

This Fundamental has recognized assistant platforms as a new platform phenomenon, which features a unique amalgamation of innovation, transaction, digital and AI platform characteristics. The complexity is encapsulated in the terms “multi-actor” and “multi-platform,” signifying their structure as a collection of multiple actors and interconnected platforms that work together to deliver a seamless and potentially valuable user experience. Due to their composite nature, these platforms are highly versatile and adaptable, capable of integrating a diverse array of services, thereby transforming the way users and developers engage with digital resources.

One of the most differentiating features of these platforms is their declarative interface, which allows users to express their needs and intentions freely and naturally, in contrast to more traditional interfaces that require explicit commands or selections. Especially the GAI technologies have strongly influenced this shift towards richer and more complex interactions. This has not only enhanced the accessibility and user-friendliness of assistant platforms, but it has also fueled their adoption and usage. By effectively bridging the gap between users’ everyday language and the solution-oriented world of service providers, the declarative interface has fundamentally altered the nature of human-computer interaction. While this is also a feature of assistant systems in general, the declarative interface’s combination with the multi-platform architecture constitutes the concept of assistant platforms. Understanding the dynamics within ecosystems and the interactions between different actors is the key for managing activities in assistant platforms and to leverage the capabilities of assistant platforms for all stakeholders involved (and not only the platform owners). The recent GAI developments have not only strengthened the input side of the declarative interfaces (e.g., by enabling file uploads), but also created substantially enriched outputs in the form of entire texts. The future trajectory of assistant platforms is anticipated to witness a convergence of text-oriented and voice-oriented development paths, further enhancing their functionality and user experience. In view of the many open questions related to the business model of assistant platforms, it will be interesting to see how these may be monetized in the future.

Given the profound economic and societal implications of assistant platforms, there is a strong need for research to scrutinize these impacts and for containing potential risks at an early stage. This could occur via alternative theoretical perspectives, which are interdisciplinary in nature (e.g., computer science, economics, sociology, law, psychology) and yield insights that may be helpful for platform participants, owners as well as for regulators. As advancements in the platform and AI technologies continue to unfold, the role of assistant platforms in supporting organizations and individuals is expected to grow, underscoring their transformative potential for the future. In light of these developments, assistant platforms enrich the field of digital platforms, and the research agenda formulated in this Fundamental sheds light on a variety of directions that could be valuable in guiding future investigations in this rapidly evolving field.