Brought to you by:
Review The following article is Open access

Physics language and language use in physics—What do we know and how AI might enhance language-related research and instruction

Published 29 January 2024 © 2024 The Author(s). Published on behalf of the European Physical Society by IOP Publishing Ltd
, , Citation Peter Wulff 2024 Eur. J. Phys. 45 023001 DOI 10.1088/1361-6404/ad0f9c

0143-0807/45/2/023001

Abstract

Language is an important resource for physicists and learners of physics to construe physical phenomena and processes, and communicate ideas. Moreover, any physics-related instructional setting is inherently language-bound, and physics literacy is fundamentally related to comprehending and producing both physics-specific and general language. Consequently, characterizing physics language and understanding language use in physics are important goals for research on physics learning and instructional design. Qualitative physics education research offers a variety of insights into the characteristics of language and language use in physics such as the differences between everyday language and scientific language, or metaphors used to convey concepts. However, qualitative language analysis fails to capture distributional (i.e. quantitative) aspects of language use and is resource-intensive to apply in practice. Integrating quantitative and qualitative language analysis in physics education research might be enhanced by recently advanced artificial intelligence-based technologies such as large language models, as these models were found to be capable to systematically process and analyse language data. Large language models offer new potentials in some language-related tasks in physics education research and instruction, yet they are constrained in various ways. In this scoping review, we seek to demonstrate the multifaceted nature of language and language use in physics and answer the question what potentials and limitations artificial intelligence-based methods such as large language models can have in physics education research and instruction on language and language use.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Motivation

Natural language is an important representational means in scientific and educational contexts to communicate ideas and construct meaning [14]. Language is intricately linked to observing nature, quantifying observations, and synthesizing results into theories [5]. Hence, comprehending and producing language was linked to scientific literacy in a fundamental sense [69]. The command of physics language then becomes a resource for reasoning, conceptual understanding, and learning [10, 11], as spoken and written language are modes within disciplinary discourses that represent the ways of knowing in a discipline [12]. The tangled history of physics and the levels of abstraction attributes to the fact that the language used in physics is highly specialized in reference to everyday language, and hence the process to become a proficient user of scientific and physics language is complex [1316].

Qualitative research on language and language use in physics illuminated specific features of physics language and patterns of language use in formal and informal physics learning environments [3, 17]. Language-related artifacts such as classroom discourse and linguistic corpora can provide evidence on complex, language-bound processes. For example, analysis of physics language and language use in physics contexts reveals plenty about concepts, the conceptual structure, and ways of knowing in physics [2, 18], and can provide evidence for cognitive and motivational processes in physics learning environments [13, 19]. However, qualitative, content analytical research on language and language use is limited in its capability to identify relationships in large language corpora, which become increasingly available with advances in digital technologies (sensors, data storages,...) [20, 21]. Moreover, products of this research such as coding manuals are difficult to apply on scale to improve instructional design and learning processes.

Computational techniques and methods in the field of artificial-intelligence (AI) offer new potentials to either analyze complex language data, and facilitate implementation in practice to promote adaptive, individualized learning [22]. In particular, natural language processing (NLP), alongside algorithmic modelling such as machine learning (ML) [2326] in the context of AI-based research in recent years provide promising means to advance systematic analysis of physics language and language use in physics [27]. The last few years have seen an unprecedented rise of so-called large language models (LLMs) that combine NLP and ML [26], also in physics education research [2833]. Among others, LLMs were found to successfully solve open-ended and closed-form physics problems [33, 34]. At the same time, LLMs are riddled with challenges such as bias, false knowledge claims, or non-transparent decision-making [3537]. While ML, NLP, and LLMs might offer new possibilities for language-related research and instruction in PER, risks and limitations have to be critically discussed. Utilizing ML, NLP, and LLMs in PER (and physics more generally) is only in its infancy, and it is important to reflect upon potentials and challenges in reference to what is already known from research on language in physics and learning in physics more generally.

In PER, a rich body of research on the specific hurdles in physics language and language use from qualitative approaches is already established. We can expect that AI methods will expand the opportunities for language analysis and, accordingly, language-sensitive instructional design. It is yet unclear in what ways established knowledge on physics language (use) and novel potentials with ML, NLP, and LLMs are compatible with each other and how results from one thread can influence the other and vice versa. This scoping review will therefore outline established knowledge from qualitative research on language in physics and findings from applying ML, NLP, and LLMs in PER, and derive perspectives on how research on language (use) and implementation might be enhanced in the future. The article is organized as follows:

  • (i)  
    We first set the grounds by establishing an understanding of what language can be understood of and review important influencing factors for language use in physics. The purpose of this section is to emphasize the complexity, dynamics, and contextual nature of language and language use.
  • (ii)  
    We then review studies in PER that examined the characteristics of physics language and language use in physics. This section highlights that physics education researchers developed a refined understanding of intricacies of physics language and language use in physics, however, research and instruction would benefit from systematic, scalable, and principled approaches to language analysis in order to further texture our understanding.
  • (iii)  
    In the following section, we evaluate in what ways NLP, ML, and LLMs have been applied in PER related to language and facilitate means for language-sensitive instructional design in physics. We show that many physics education researchers are engaged in AI-based analyses, and that these methods oftentimes rely on the established body of knowledge from qualitative research on language.
  • (iv)  
    Finally, we outline future paths and offer concluding remarks for language analysis and language-sensitive instructional design in PER.

2. Language and language use

The social semiotic framework [38], and research on multimodality [4] recognize that meaning making is a function of semiotic resources (modes of representing knowledge) such as gesture, gaze, and language in social contexts. This has been a productive lens in PER to investigate physics-related learning processes that identify, among others, the importance of embodiment and gesture [3941]. Spoken and written language have also been recognized as an important semiotic resource to convey knowledge and construct meaning [42]. We begin by elaborating on key affordances that language provides as a semiotic resource: communication (section 2.1), and meaning making (section 2.2).

2.1. Communication

Language is considered an evolutionary adaptation, carrying out important functions like communicating 'who did what to whom' [43, p. 81]. Language is recognized as an outstanding feature of the evolution of humans [44]. Accordingly, it has been characterized to be a 'main evolutionary contribution of humans' [45, p. 611] and 'a means by which humans collectively establish and maintain coherence with one another and the grander world' [46, p. 32]. Humans developed languages to transmit 'unlimited [..] information among individuals' [45, p. 611], and to represent and make sense of their experiences, coordinate action, and think together [3, 4749]. Language developed (and probably can only develop) in social interaction [50, 51].

Communication purposes afford that languages develop specific means to convey information. To transmit information on high-dimensional, real-world phenomena efficiently and effectively, languages are patterned and exhibit universal laws that characterize language use. Communication in language is a process where mutual understanding is rewarded, and communication has to be optimized with regards to time, information content, and other commodities [52]. The principle of least effort in communication states that 'once a word has been used, it takes less effort to use it again for similar meanings than to come up with a different word. On the other hand, people want language to be unambiguous, which they can accomplish by using different words for similar but nonidentical meanings' [53, p. 271]. Utterances such as words or phrases should 'maximize the amount of information while minimizing the cost of sending that information.' [53, p. 271]. The minimization of communicative effort and the simultaneous optimization afford regularities in languages [5459]. Similarly, rules and universal features of languages can be determined [45]. Any known language can be described by a few simple super-rules such as subject-verb-object ordering for English (and many other languages) [43, 51]. With the mentioned super-rule and the overall subject-verb-object rule, a child can generate novel sentences according to environmental affordances, and thus communicate with the world.

2.2. Meaning construction

Communication through language requires recipient and receiver to (re-)construct the meaning of what was communicated, either in a situation where spoken language is utilized or in some form of written communication [40]. Meaning construction is inherently constrained by the affordances inherent in both spoken and written language, and situated in a specific context defined by communicative norms [60] and rhetorical goals [47]. In a more collective dimension (subsection 2.2.1), language is linked to disciplines and ways of knowing which define respective communities. As such, a command of respective languages as a mode of representation of ways of knowing [12] is required to make sense of the discipline, and a shared experiential basis facilitates meaning construction. In a more individualist dimension (subsection 2.2.2), language, constrained by specific human affordances, enables humans to make sense of the disciplinary ways of knowing and functions as a resource for understanding and reasoning.

2.2.1. Collective dimension

Language has been recognized as a disciplinary mode of representation to enable acquisition of the ways of knowing in a discipline, i.e. 'the coherent system of concepts, ideas, theories, etc. that have been created to account for observed and theoretical phenomena' [12, p. 27]. Postman and Wiengartner [61, p. 103] pointed out that '[a] discipline is a way of knowing, and whatever is known is inseparable from its symbols (mostly words) in which the knowing is codified'. Any discipline develops a system of semiotic resources consisting of (arbitrary) words, images, symbols, and actions [1]. Social semiotics as the 'study of the development and reproduction of specialized systems of meaning making in particular sections of society' [12, p. 95] engages in the question of how the specialized ways of knowing in disciplines can be accessed. Spoken and written language are major representational modes of disciplines to enable access to the respective ways of knowing. Languages evolved specific means to enable meaning making, supported by the situation (pragmatics), and the structure (syntax). Universal features of how meaning is created syntactically (in a natural language) is through composition, hierarchy, and recursion [62].

Language functions as an interface between collective and individual knowledge, and enables knowledge construction more generally. It serves as an externalized semiotic resource for distributed cognitive processes, where 'external semiotic systems such as language [...] interact with internal cognitive resources to support understanding and reasoning' [11, p. 72]. For example, it was posited that complex scientific models cannot entirely be represented internally as mental models, but rather need some externalized representations such as language [63]. Language functions as a resource for enabling reasoning and even 'designing [new] representations' [64].

However, meaning construction in communication situations is also tentative and dynamical: 'the meaning of a sign is not fixed, but rather can be thought of as a flexible resource for meaning making' [40, p. 18]. The philosopher Wittgenstein coined the concept of 'language-games' [65] 1 for communication situation. The concept of language-games highlights, among others, that (i) words have no static meaning, but a context-dependent meaning; (ii) there are countless different languages games with own rules and purposes (e.g. putting questions as assertions); (iii) there is not one truth or meaning, and (iv) language-games can change over time, e.g. vanish or evolve. Language-games underscore the inherent indeterminacy within communication situations and the specific contextual conditions that determine the choice of games played. Getting to understand and engage in different language-games is an important aspect of learning.

2.2.2. Individual dimension

In communication situations, the individual (the interpreter) needs to make sense (construct meaning) of the linguistic utterances in order to be able to respond to them in conventionalized ways. It is important to remember that language and linguistic utterances are only indirect approximations of cognitive processes: language and cognitive processes are assumed to have a bi-directional influence on each other [13, 43, 66, 67]. The study of cognitive semantics then posits that meaning construction for the individual is dependent on embodied cognition, encyclopedic knowledge, and contextualization:

Embodied cognition recognizes that meaning is grounded in physical experience. Language and language use are shaped based on perceptual impressions and physical experiences that humans make in environments, which determine how things are referred to and what metaphors are established in a language [68]. Psycholinguists have discovered that perceptual correlates in a child's early cognitive development are reflected in the semantics of their individual language and language in general [68]. For example, the particulars of the human body afford three dimensions of symmetry. For one thing, geometrical constructions (objects are always in reference to other objects) arise with respect to reference points, lines, and planes. English spatial adjectives require the notion of 'point of reference' and direction. The biological endowment also causes humans to describe time with respect to one-dimensional spatial terms, given that time is modelled as a one-dimensional continuous variable. Time is also asymmetrical and directed and is therefore described with spatially-derived words. Everyday language (and language in general) is thus rich in metaphors to relate concrete (source) domain to abstract (target) domain [66, 6971].

Encyclopedic knowledge is the idea that ancillary knowledge creates meaning [72]. Individuals access the collective ways of knowing in a discipline and meaning construction is dynamically dependent on the knowledge of learners, rather than mere encyclopedic definitions [72]. It is important to note that meaning often 'emerges more from what is absent, tacit, literalized, and forgotten than from what is present, explicit, figurative, and conscious' [46, p. 38]. Individuals embrace common sense, world knowledge, and disciplinary ways of knowing that are seldom explicitly laid out in any utterance that a person makes, but rather have to be inferred from the communicative context.

Contextualization refers to the fact that meaning is constructed dynamically in reference to other linguistic entities and in social, communicative situations [73]. Denotational semantics posits that meaning, as a theory of reference, arises from recognition of the set of objects and relations in the real world that a linguistic utterance refers to. Distributional semantics, on the other hand, suggests that meaning arises from the linguistic context in which a word or sentence is uttered [26], as meaning of words and sentences is dependent on the context they appear in [74, 75]. Moreover, language affords meaning construction by situatedness (pragmatics), syntax (grammar), and semantics [76]. To reconstruct the meaning of an utterance (language comprehension), among others, requires the reader to parse the syntax correctly, and disambiguate the word sense in the context [77]. Meaning construction through language is thus an inherently statistical process (among competing interpretations, the most likely one is chosen), and empirical evidence supports the dependence of understanding on the distributional properties of the words that are used and the particular sentence construction [77].

Besides these important determinants for individual meaning construction, it is important to acknowledge inherent constraints for natural languages to represent information and knowledge. While natural language is acknowledged as a high-bandwidth medium for information exchange within the communication spectrum (particularly when compared to low-bandwidth, implicit forms of communication like pheromone trails [78]), natural language lacks the capability to represent functional relationships, spatial constellations, or mathematical models efficiently, which limits the usability of language for physics in important ways. The importance of formal languages 2 such as mathematics or programming languages as a disciplinary mode of representing and sharing information for physics has been widely recognized [8082]. Natural languages are more powerful in terms of expressibility than formal languages such as programming languages [45], however, they are more ambiguous as well.

3. Physics language and language use in physics

Command of natural language is considered essential for literacy in physics, and constitutive for scientific practices and knowledge [83]. Moreover, it is constitutive for physics as a discipline in general [84]: 'Nothing resembling what we know as western science would be possible without text' [8, p. 224]. 'Text' is generally taken to be any instance of language in any medium that makes sense to someone who knows the language [47]. Any functional role for language in scientific literacy falls short the more fundamental role that language plays in constituting scientific literacy [8]. Among others, natural language in physics serves the purpose to receive and share information, and generate and validate one's knowledge [5, 83]. Physics developed over time a highly specialized (rigorous, coherent, and dynamical [15, 85]) language that fulfills theoretical and practical purposes [15].

Learners in a domain such as physics have to become able to comprehend and even produce this specific language [83]. In fact, a large part of the perceived difficulty and resulting alienation in early physics education result from language-related issues [3]. The challenges associated with the language of physics pertain to the particular syntax (grammar) used in presenting physical concepts and to the semantics, which is intricately connected to the structure of physics knowledge and the types of reasoning processes involved (e.g. model-based reasoning [86]). Physics was described as a well-structured and semantically rich domain, referring to the amount of knowledge that is required for problem solving in physics [87]. Physics language and language use in physics are also noticeably different compared to everyday language use. The differences primarily pertain to the specific writing style employed by physicists (and scientists in general) when addressing natural phenomena, a style that typically avoids personal involvement [1]. Language and language use vary depending on the communication situation. Different communication situations have been studied, which we review in subsection 3.1.

3.1. Studies on physics language use in different communication situations

3.1.1. Studying physicists' writing

The development of scientific English was attributed to Newton's Treatise on Opticks, and, from this day onwards, science and physics developed an increasingly advanced system of communication through text [88]. Contemporary physics texts (such as journal articles or textbooks) are characterized by their use of passive constructions, an impersonal writing style, subclause constructions, and, more generally, the incorporation of nominalized discourse, all of which serve specific rhetorical purposes such as foregrounding and thematic focus [17]. For example, using nouns for processes (nominalization) achieves to package complex phenomena into a single semiotic entity and pronouncing its rhetorical function in backgrounding known information and foregrounding new information [3, 17]. Verbs, then, oftentimes express relationships between nominalized processes [17]. This results in a dense language that knowledgeable readers will parse and produce more easily compared to novices.

3.1.2. Studying students' writing in physics (and science) classrooms

Analysis of students' writing in physics classrooms supports these findings. A study revealed that experienced writers, in contrast to novices, achieve higher levels of abstraction through the use of complex nominal groups, subordinate/nested clause structures, and grammatical metaphor [3]. Linguistically, learners in scientific domains transition from a knowledge-telling writing style characterized by simple lexicon and active voice, to more complex clause structures and relationships, an increased utilization of modals, passive voice (knowledge transformation style) [3, 8991]. On a level of genre and writing style, science writing is different, because feelings are hardly ever expressed and modal verbs (to attribute for how probable an event is) are used often [3]. Physicists use much technical language using grammatical metaphor that only evolves late in human cognitive development [3]. These studies support the claim that students have to learn the relevant language-games in physics communication situations, and that learning to write, and writing to learn, in physics necessitates providing guidance for navigating these intricate discourses.

3.1.3. Listening to physicists in the lab

Besides formal, written physics language, sociologists emphasized that spoken discourse in physics labs is quite different [92]. Physicists' use of language depends on the social contexts and communication situation: While the construal of personal involvement in the production of scientific knowledge was found to be minimal in today's public and formal discourse 3 , in everyday informal or lab interactions scientists more often refer to themselves as agents when speaking about scientific discovery [94]. There is a divide between a more objective stance, and one that is more psychologically involved and narrative. The interviewed physicists also used a hybrid approach, which included the use of personal pronouns and predicates of motion or change, such as 'When you go up in phase space.' These discourses were termed 'physicist-centered' (physicist as thematic agent) versus 'physics-centered' account [94]. However, also physics entities (e.g. the system under study) were construed to experience by using predicates of sentience and understanding [94], a strategy of repositioning themselves in an imagined realm of physical events [94]. Condensed matter physicists, explaining a phase transition diagram, said: 'When I come down I am in the domain state,' hence fostering empathy with the abstract entity [94]. Personal, agentic involvement was then found to be expressed as an amalgam, where the subject is identified with the personalized system under study. The thematic agent can either be the physicists or the physical entities or processes [11].

Another phenomenon was termed the 'vernacular of the laboratory' [95], referring to the fact that scientists with a shared knowledge base will use increasingly abbreviated and eventually incomprehensible language (to novices). This resonates with observations that much (assumptions, knowledge) remains unstated in language communication [96]. For example, when posing physics problems in mechanics, we often assume that we are on earth (without explicitly stating it), given that our students supposedly share this background knowledge and make this same assumption. Moreover, this findings also emphasizes the importance of prior knowledge for communication situations. If an interlocutor believes that the other party shares her or his knowledge, this will likely result in a more abbreviated conversation where assumptions and common knowledge remain unstated.

3.1.4. Studying verbal interactions in physics classes

Physics (and science) classrooms are also characterized by a specific style of language use. Researchers found science teachers in a difficult position to use language, always negotiating autonomy in students' exploration of scientific facts versus converging on agreed-upon laws and principles [97]. This was referred to as a controlling discourse, marked by the Stimulus-Response-Evaluation/Initiation-Response-Evaluation discourse pattern [97, 98]. Moreover, physics language in classrooms is full of difficult, rarely used terms with very specific and context-dependent meanings [99, 100], eventually approximating the discourse in physics. For example, terms such as 'adiabatic' or 'isochoric' are not used by most students' for the better part of their lives except in physics (and chemistry) classrooms. The meaning of these terms cannot be simply derived by similar words or the context. Rather, a specific set of assumptions and constraints for applicability, and referential physical systems go along with these terms.

Students also learn to use terms and phrases in language without deep understanding. Physics education researchers noted that learning environments in physics such as physics classes provide plenty of opportunity to pick up discourse fragments, such as curl of the electric field or tensors [12, 101]. Students are often able to describe the formula without a deep understanding. Deep understanding requires a 'constellation of modes' [12, p. 33] extending beyond spoken and written language due to their inherent limitations in conveying information. This phenomenon has been referred to as 'learning slogans' [101] or 'discourse imitation' [12], which captures the important observation that learning and deep understanding of disciplinary knowledge and ideas require a continuous orchestration of adequate disciplinary modes (graphs, language, diagrams, formulas,...).

3.1.5. Understanding historical roots of terms in physics

Given its fairly long history and the need to abstract from concrete phenomena, physics language inherits plenty of terms that even change in meaning throughout time. For example, physicists use the concept 'phase space' of a physical system referring to the possible configurations this system can take with respect to generalized variables [102]. Neither is 'phase space' representing space as most learners will assume: It is a very abstract concept of space, bearing no resemblance to the familiar three-dimensional Euclidean space. Furthermore, 'phase space' does not depict the 'phase' of the physical system. Likewise, Newton envisioned the centripetal force as central force (physical center of force). However, today, in inertial systems, centripetal force refers to the force that is pointed toward to center of the (local) curvature. Oftentimes, terms in physics language attribute to the historical roots and processes of the concepts [14]. At face value, this (historical) language is utterly difficult and unfamiliar to students. In fact, many terms in physics have a long history of change in meaning and usage (for the term 'force': [103]). The science fiction writer Aronson sought to write a description of atom without Latin or Greek-derived terms (atom, particle, nucleus, charge), which is hardly understandable: 'At first it was thought that the uncleft was a hard thing that could be split no further hence the name. Now we know it is made up of lesser motes. There is a heavy kernel with a forward bernstonish lading, and around it one or more light motes with backward ladings' [5, p. 673]. Only much prior knowledge and experience will enable learners to attribute meaningful associations with these terms.

Over time, tentative interpretations of metaphors for physics terms evolve into conventionalized and ultimately unquestioned meanings [14]. It is particularly difficult for students to appreciate the specific meaning of the terms, because they are mostly unaware of the history of these terms Depending on the overarching world-view and theoretical foundations, these terms may allude to distinct underlying concepts. Nevertheless, within instructional settings, only a particular, simplified interpretation is employed, which aligns with the meaning eventually established by the scientific community (conventionalized ways-of-knowing). Discussing the variegated history of physics-specific terms in class would be a valuable strategy to raise awareness of students for the somewhat arbitrary nature of semiotic signs such as words. However, seldom is this explicitly discussed in physics classes.

3.2. Ontology in physics language

Language is a resource to make sense of phenomena and processes, and reason in physics. As such, an important function of language in physics and beyond is to enable the description and modelling of physical experiences, phenomena, systems, and processes in nature [13, 104]. Languages encode a specific ontology that influence how people think about physical entities, systems, and processes [105]. This is also called implicit or intuitive ontology. Intuitive ontology refers to one's conception of the basic categories of existence [106]. Ontology in physics comprises matter, processes, and physical states [13]. For example, physicists might state that 'alpha particles (matter) leak through (process) a potential barrier (matter)' [13]. The physical state is denoted by the fact that the alpha particles are contained in the nucleus [13]. Physical concepts are considered to be attributed to either one ontological category by learners, and may change the ontological category in phases of conceptual change [107].

Confusion or misattribution of ontological categories was argued to comprise a major hurdle in developing an appropriate understanding of physical concepts [108]. Such as when 'abstract physics concepts [..] tend to be attributed with properties or behaviors of material substances.' [109, p. 1]. While substance-based reasoning is certainly productive for the energy concept, for other concepts such as force, light, heat, or electricity (e.g. electrical current) it can create conceptual confusion [109]. '[M]any scientific concepts are 'constraint-based interactions' [..], which is a subcategory of processes in which a defined system behaves according to the principled interaction of two or more constraints' [107, p. 9]. Electrical current, in this conceptualization, is a constraint-based interaction. Contrary to events, it does not involve a specific beginning and ending. Moreover, even though it involves charged particles as matter, it does not fall into the ontological category matter [108]. Constraint-based interaction is a subcategory of process, and it is determined by a known set of constraints [108]. Physics concepts that are classified by experts as constraint-based interactions are then often incorrectly classified as material substances by novices [110]. However, it has also been noted that ontological attribution must not be thought of as static, but rather as complex and dynamical [107] and that experts' ontologies exhibit more local coherence.

Even false ontological commitments can lead to productive reasoning. French physicist Sadi Carnot, in his derivation of the ideal heat engine, utilized the caloric metaphor (heat as an elastic fluid) to determine the maximum amount of motive force [111]. Whereas some experts characterize heat as disordered/random energy, others refer to it as energy in transit, however, it cannot be energy per se because it is not a state function [13]. It was then argued that the caloric metaphor is outdated and should not be used anymore in teaching physics and in physics language. Some suggested to entirely abandon heat as a noun from physics vocabulary to stress its function as a process function [112]. Similarly, 'heat transfer' and 'heat capacity' invoke a substance ontology, and should be used cautiously. Others have even advocated for the abandonment of the verb 'to heat' [113].

Of great relevance in physical systems are so-called relational processes, expressed preferably through the verb 'to be.' Identifying and attributional relational processes are then distinguished [105]. Physical states are oftentimes represented through identifying relational processes where grammatical circumstance of location replaces the second identifier, such as 'the electron is in the ground state.' In fact, these grammatical structures closely resemble language on mental states ('I am in love', [105]). Ontologically, however, the verb 'to be' is also problematic, because it identifies individuals with abstractions [43]. This is pervasive in physics language. The electron is a fermion/wave/particle/... [114]. All sorts of ontological confusions may arise. After all, waves are mathematical abstractions that exhibit certain properties (e.g. spreading of excitation of physical variables across time and space, better thought of as events rather than objects [10]). However, some students readily imagine there 'being' waxing and waning levels of the physical quantity quite literally in space and time, or expect sound waves to propel dust particles around the room [10].

It is also important to recognize the constraints that language puts on descriptions for physical systems and processes. It is recognized that '[n]atural language is very limited in its ability to describe continuous variation, shape, and movement in space' [1]. Language can only qualitatively describe relationships among variables, but for more complex functional relationships researchers resort to mathematical equation, pictures, graphs, tables, etc. Arguably much of scientific reasoning happens through mental modelling and carrying out thought experiments on the internal models [115]. Language, then, works in conjunction with the other modalities to make sense of models. However, internal models are not represented as natural language [43].

In sum, ontologies in language provide the necessary resources to represent physical processes and phenomena through languages. Given the everyday use of terms ('energy consumption'), ontological confusion restricts students to learn the physical meaning of concepts, however, utilizing different ontologies for concepts can function also as a productive means for reasoning processes. Being aware of ontologies and ontological confusion enables researchers and students alike to gain a deep understanding of physics concepts as represented in language.

3.3. Conceptual metaphors in physics language

Metaphors and analogies in physics function as productive modes of reasoning about physical systems and are closely linked to ontologies [100]. One of the major productive resources in physics language are conceptual metaphors, which are oftentimes rooted in analogies. In fact, metaphors enable us to communicate about abstract physical entities, systems, and processes. Conceptual metaphors involve mappings from concrete knowledge structures 'incorporating notions of object, space, movement, and force' (concrete, conceptual source domain) to 'abstract concepts such as time, cause, depression, and love for which no (or little) experience-based schematic structure can be described' (abstract, conceptual target domains) [11, p. 79]. Physical systems in mechanics might often resemble concrete sensory experiences (such as a falling apple that can be directly observed). However, to represent systems in electricity, magnetism, optics, acoustics, radioactivity, quantum physics, atomic physics, particle physics, etc system descriptions are largely metaphorical and unaccessible to direct sensory experiences. 4 'Concrete domains are those conceptual schemata that derive directly from the sensorimotor experience of interacting with the physical and social worlds, whereas abstract domains are those schemata for which no such experiences can be identified' [11, p. 79]. Conceptual metaphors are a productive means to make abstract domains understandable through language. The sentence 'a net force causes an object to accelerate' [100], see: uses the conceptual metaphor: force is an agent, however, it gives no account of the idea (even hinders understanding) of forces being interactional. Conceptual metaphors are widely employed in physics and the natural sciences, and they oftentimes reflect 'embodied experience' and are used to draw inferences about abstract domains [100].

In thermodynamics, researchers identify Location Event Conceptual Metaphor. Within this framework, they differentiate concepts such as (a) 'Thermodynamic States Are Locations,' (b) 'Thermodynamic States Are Movements,' and (c) 'Spontaneous Change is Directed Movement.' In (a), the static location of states in phase space can be referred to. In (b), the changes in a state, going from point A to point B in phase space, are referred to, and in (c) natural directionality of thermodynamic processes is encapsulated [11]. Conceptual metaphors in quantum mechanics relate, among others, to the potential well. Talking about the potential-'well' evokes associations and assumptions that make reasoning more easy [100]. However, that also might introduce oversimplification and confusion. 'Swackhamer [..], building on the work of Lakoff and Johnson on conceptual metaphor [..], analyzed the energy as "stuff" conceptual metaphor and identified three statements derived from the metaphor which guide the development of energy concepts and an understanding of energy conservation with students: (1) As an attribute, energy is viewed as a possession that can be "stored" or "contained" in a "container," namely, a physical system. (2) Energy can "flow" or be "transferred" from one container to another and so can cause changes. (3) Energy maintains its identity after being transferred.' [104, p. 2] In fact, the 'stuff' conceptual metaphor for energy is productive in nature [104]. Energy is a concept that refers to something quantitative (coming in quantities) that can be balanced. This is beneficial, because one can use metaphors to indicate that an object has a certain amount of energy (as opposed to force), which is easier to understand [89].

Conceptual metaphors are closely linked to grammatical metaphors and ontologies in systemic functional grammar. Science and physics education researchers utilized systemic functional grammar to analyze language-based meaning construction in physics learning [13]. Systemic functional grammar recognizes the specific functions of language such as interpreting experiences, communicating ideas, and meaning making. As such, language is recognized to always have specific rhetorical functions that are important to consider in order to construct meaning from utterances. Grammatical metaphors seek to make abstract concepts understandable. In systemic functional grammar, grammatical participants (nouns, noun groups), grammatical processes (verbs), and circumstances (adverbial or prepositional phrases) are differentiated. Grammatical participants are then mapped to ontological categories [105]. Moreover, physical models consist of (1) models of objects, (2) models of interactions between objects, (3) models of systems of objects, and (4) models of processes [105]. Then, (5) physical properties, and (6) state functions are physical variables used to describe the system or the objects in it. These physical concepts are then mapped to an ontology tree, which is called a lexical ontology, and every physical model described in language has an ontology, and this ontology is encoded in the grammar of the sentence [105, p. 5]. If the lexical ontology does not match the grammatical ontology, grammatical metaphor is present. The sentence 'heat [medium] flows [process] from the environment to the gas' [105, p. 5]. The meaning of the sentence (lexical ontology) refers here to the energy that is transferred into the system. But grammatically, heat is functioning as the participant, i.e. as the concrete manifestation of matter that is flowing (grammatical ontology). Grammatical and lexical ontology differ in this sentence, which—on this basis—can be identified to be metaphorical [105]. 'Conceptual metaphors often give abstract concepts an existence as concrete objects or things' [100, p. 4] and allow us to understand it, quantify it, identify it as a cause, etc. However, there is the risk that conceptual metaphors are taken at their face value and are not explicitly reflected to be metaphors in everyday language [69, 105], which might hinder genuine understanding of the involved processes, systems, and models. For example, when learners hear or say 'gravity pulls' and encounter textbook definitions where forces are introduced as agencies/influences they are less likely to develop a physically accurate, interactional conception of forces [116].

(Conceptual) metaphors are rich resources in language to capture physical phenomena and processes, however, learners of physics need to get acquainted with the specific conceptual metaphors that are used in this discipline. As with the other features of physics language and language use in physics, this process will likely benefit from explicit guidance rather than expecting students to learn this from merely participating in physics-related communication situations.

3.4. Context-dependency of word meaning

Besides the underlying, metaphorical nature of physics language, the meaning of words itself and the context in which words appear are important considerations when trying to understand physics language (use). Words can have different meanings in physics depending on the context in which they are used: 'What is perhaps most confounding [in physics language use], we even use words which we define precisely yet their meanings change with context, or even with speaker' [5, p. 672]. Context itself has a certain granularity as well, local and global, and external and internal [72]. Meaning in reference to context can depend on topic (e.g. mechanics) or discipline (e.g. chemistry). For example, work in the work-energy theorem context has a different meaning as in the second law of thermodynamics [5, 117]. Arons (referenced in [117]) suggests to use 'pseudowork' for the former, to either show the connection between both and make clear that they are distinct. After all, seven types of work can be differentiated [117]. It is thus important to signify the topic in which work is used (often gleaned from the context). Moreover, meaning might also differ depending on discipline. A physicist and chemist might attribute different meaning to the term molecule, given their different traditions in their disciplines to define the word [5].

Without context it is impossible to specify the meaning and predication of a word. Moreover, language is always relational, and fixed definitions can only be approximations of the particular meaning of a word in context, which resonates with Wittgenstein's concept of language-games. Thomas Kuhn stresses this aspect in his seminal work 'The structure of scientific revolutions' as follows: '[The] 'definition' of an element is no more than paraphrase of a traditional chemical concept. Like 'time,' 'energy,' 'force,' or 'particle,' the concept of an element is the sort of textbook ingredient that is often not invented or discovered at all. [...] The scientific concepts to which they [the definitions] point gain full significance only when related, within a text or other systematic presentation, to other scientific concepts, to manipulative procedures, and to paradigm applications.' [118, p. 142]. After all, dictionaries are poor sources for definitions, given the context-dependence of word usage [116]. Not only the semantics matters, but rather also the locations (syntax) of how words are used.

Among the most deceiving contexts in which meaning differs is between physics language and everyday language: 5 '[M]any of the words we define precisely carry the same meaning but less precisely in their everyday usage. Other words carry different, often contradictory meanings' [5, p. 672]. Common sense terms in physics can be deceptive, as, for example, neither the terms mixture, suspension, or solution in ordinary language have the same meaning in scientific language [88]. Moreover, everyday language even uses physics concepts in contexts that invoke false associations. For example, heat is often used in everyday contexts (at times also in physics textbooks) in ways that invoke the caloric (substance-based) metaphor as in: 'Winters are milder near the coast because the ocean holds a lot of heat' [112, p. 107].

Word meaning is nothing static, but rather dynamically depends on the context (and communication situation). This enables learners to productively use language and create their own meanings, however, it also causes interference of physics language with everyday language. In one form or another, it is thus important to raise such issues in learning settings. Creating the illusion that the meaning of terms in physics is static can thus create problems when students use physics language outside of classrooms.

3.5. Experiential basis of physics language

Physics education researchers also posited that experiences in the physical world interact with the ways humans construe and reason about physics processes and phenomena. Experiences are considered as a resource for developing phenomenological primitives (abstractions of common events), i.e. frames or schemata, in which physical phenomena are interpreted [101]. Characteristic for this knowledge system is: 'causal schematization in terms of agents, patients, and interventions ('causal syntax'); a tendency to focus on static characterizations of dynamic events, including the global form of trajectories; and a relatively rich phenomenology of balancing and equilibrium' [101, p. 105]. For example, 'people develop on the basis of their everyday experience remarkably well-articulated naive theories of motion' [119, p. 301]. However, they would not derive the laws of motion as laid out in physics theory, because these were refined over centuries with special purposes (e.g. to predict planetary motion). Phenomenological primitives are models for core intuitions and considered an elemental cognitive structure that is a resource to guide interpretation of experiences [10, 101]. These findings suggest that language-sensitive instruction includes opportunities to lay out one's experiences and understanding (in own words) for physical process and phenomena.

4. Language-sensitive instructional design in physics

Overall, language provides a powerful resource to describe and model physical phenomena and processes. Instructional design in physics has to leverage language as a resource, thus enabling students to comprehend and produce this important disciplinary mode of representation. The reviewed research indicates that physics language and language use in physics learning settings is riddled with challenges such as metaphorical use of language, unstated assumptions, ambiguity of word sense, or narrow dialogic interaction in classrooms. Language-sensitive instructional design in physics has to account for these hurdles in language and language use. Given that language is intricately interwoven with any learning process, and given the complexities of language use, no one-size-fits-all strategies likely exist for language-sensitive instructional design. Being aware of these challenges and addressing them explicitly in instruction is an important first step [120]. As a first diagnostic, Fulmer et al [83] developed a questionnaire to test science teachers' conceptualization of the functions of language.

Language-sensitive instructional design in physics can be implemented on different dimensions. It can, for example, relate contents to everyday language and experiences of students, it can affect the level of language that is used in learning materials, or it can provide students strategies that enhance students' use of language as a discovery tool.

4.1. Relating contents to everyday language

Suggestions for language-sensitive instructional design come from more general language research, and specific discipline-based educational research in science and physics classrooms. General educational research on language acquisition suggests that the more abstract the concepts and phenomena are (i.e. away from students' everyday experiences), the more explicit scaffolding is necessary [121]. Moreover, everyday language (and other non-disciplinary semiotic resources, see [39, 122]) and experiences should play a role in physics classrooms, as this is less alienating to students and reduces the barrier when discussing physics phenomena with the language they are already familiar with [123]. Addressing students' experiences was also found to be an important contributor to building cumulativity in classroom communication, helping break authoritative patterns of teacher-students interactions [124]. Utilizing what students already are able to (i.e. produce everyday language) as a resource for physics learning can thus enhance classroom communication.

4.2. Adapting language in learning materials

Increasing clarity and elaboration also raises reading comprehension for science texts [125]. Evidence from intervention studies suggests that language-sensitive learning materials can help students in their development of conceptual understanding [126]. For example, the Science Writing Heuristic engages students in immersive argumentative writing (producing language), and evidently supports argumentative skills, metacognition, and other important outcomes [126128]. Science writing is a means to prepare students for future careers in science as writing as a means of communication is an important part of the scientific discovery process [1, 129].

4.3. Navigating different disciplinary modes

Language is but one disciplinary mode of representing ways of knowing in physics. Learning and deep understanding of physics concepts rather hinges on critical coordination of different disciplinary modes [12]. It is therefore important to utilize language in conjunction with other modes such as mathematics, observations, etc in order to enable students to understand how these different modes in conjunction enable understanding, and how certain modes afford only certain insights and inferences.

4.4. Providing writing strategies to students

Metacognitive prompts, e.g. on text composition, can positively influence distant variables such as academic outcomes. Language-sensitive instructional design in physics thus involves the posing of questions and strategies for enabling students to revise their positions and understanding rather than reliance on short responses [83]. Moreover, prompting students to explicitly write how they would qualitatively solve a physics problem was found to be an effective strategy to train their physics problem solving capabilities [130]. This then allows instructors to better diagnose the students' understanding of the problem situation and provide more specific guidance. Making explicit conceptual and strategic knowledge through language is in itself a valuable resource [131].

In physics learning environments, students should get plenty of space to comprehend physics language and produce it by themselves, optimally with feedback and guidance from instructors [1]. Empirical research suggests, however, that too few opportunities are established for students to comprehend and especially produce language [132, 133]. Moreover, assessing language products requires instructors to invest much time to individually guide students [134].

5. How to enhance language-related research and language-sensitive instruction in physics?

Qualitative research on language and language use in physics outlined important facets of physics language, such as conceptual metaphors, or syntactical and grammatical affordances for talking about physical phenomena and processes. This research also established differences between spoken and written physics language. Spoken language in labs and classrooms omits information (such as assumptions), and is rather constrained through authoritative patterns of teacher-student interaction. However, our overall understanding of language and language use in physics is rather fragmentary and would likely benefit from systematic tools that could be employed in research to analyze language on a larger scale, e.g. across place, time, and assessment format. Yet, systematic studies are resource consuming and even impossible at times given the size of available data. The reviewed studies often restricted themselves to small samples (e.g. specific laboratory, or classrooms), to specific terms that are used in physics (e.g. heat or potential well), or language phenomena (e.g. conceptual metaphor). Furthermore, the intervention studies enable the scaffolding of language processes in physics. Nevertheless, they only provide rather general guidance on designing classroom instruction. These guidelines must then be implemented by individual instructors who may interpret the findings in varying ways (highlighting a somewhat common challenge in implementing curriculum materials).

Both, research on language use in physics and language-sensitive instructional design, would benefit from methods that could reliably and systematically analyze large corpora of language data at low cost, either to explore hypotheses or provide explicit models and test hypotheses. Furthermore, instructional design would benefit from automated and adaptive scaffolding for language processes. With advances in AI research, either systematic and scalable data analysis of language (use) in research, or automated assessment and adaptive guidance for language-based responses have become possible in many domains. Most notably, advances in AI-based language technologies enabled conversational AI-technologies like ChatGPT, which reached record highs in subscribers only a month after its inception [135] and potentially transforms the ways humans interact with the Internet and AI-technologies in a fundamental way. ChatGPT and other language-based AI-technologies have also been utilized in PER for research and instruction.

6. AI methods for analyzing physics language and language use in physics

Language data has always played an important role in PER. Much progress resulted from employing hermeneutic and content analytical approaches (e.g. protocol analysis), where written responses might be sorted, categorized, or classified according to normative criteria or in a more data-centered, inductive manner [87, 101]. Much emphasis in these studies is on the construction of appropriate prompts (e.g. problems, questions) that eventually elicit relevant reasoning processes [101] that can be documented via language (think-aloud, cognitive interviews, constructed responses). Inductive analysis of protocols requires substantive expertise and knowledge in PER, which was difficult to put into computers by way of rule-based AI approaches [136]. Knowledge engineering approaches sought to encode the available human knowledge into computers, however, these systems were found to be too brittle to be of use for research and most instructional purposes [136]. Yet, computer analyses became increasingly important to analyze language data, and even rule-based approaches found important applications in PER such as intelligent tutoring systems with well-crafted underlying knowledge modules [27, 137].

Challenges with hermeneutical, content analytical approaches to analysis of language result from different sources such as the complexity of language, the problem of scaleability, and the problem of implementability. For once, language use is dependent on rhetorical goals, prior knowledge and experiences (see sections 2.2, 3.4) in physics. Attributing for all influencing factors that determine language comprehension and production requires systematic tools for analysis that should be inherently statistical in nature, given that language comprehension and production have been found to be stochastic processes [77]. Moreover, scaleability requires either human efforts (that are both costly and difficult to control) or outsourcing to computers (that are difficult to instruct). Finally, sharing resources such as coding manuals across research contexts is difficult. Sociologists argued that 'the classifying demanded by a coding operation is so delicate that its validity is perhaps too tentative for others to build on' [138, p. 177], which relates to the delicate nature of language-games (see subsection 2.2). It would be desirable to have tools for language analysis that are systematic, analytical, and scalable. It will be argued that computer resources in the field of AI are promising means to advance language analysis in physics and PER in terms of capturing complexity, ensuring scalability, and enabling implementability.

In the realm of AI, i.e. which encompasses the study of comprehending and constructing rational agents [139], a versatile technique is known as ML. ML can be considered the workhorse of AI research, and it inverts classical programming where instructions need to be provided to a computer to be able to process input to rather learn instructions from data [25, 140]. ML enables computers to inductively learn from (large amounts of) data, which incited a new paradigm in science called data-intensive scientific discovery [141]. ML has different goals, which are related to different learning types. Prominent learning types are supervised ML where the goal is to develop a model that is able to predict existing patterns in unseen data, and unsupervised ML where novel patterns can be explored in complex data sets [142, 143]. NLP is the systematic processing of language data with the help of computers (oftentimes ML), to achieve goals such as representing language in an efficient, computer-readable way [26, 79], or translating, summarizing, and classifying documents [26]. In AI research, the advantage of ML and NLP over knowledge engineering-based approaches was termed a 'bitter lesson' [144], given that it seems that human expertise can be outcompeted with large data sets and generic computation. In any case, in times of large language repositories, ML in combination with NLP offers valuable resources to analyze and model language data.

6.1. Unsupervised ML applications

Given the prevalence of unstructured, e.g. unlabelled, language data [142], physics education researchers employed tools for data-driven discovery such as NLP with the help of unsupervised ML. Unsupervised ML in PER has been used to extract distinct themes from students' interview transcripts on explaining the seasons [145]. To validate the findings, specific indicators were established to determine the meaningfulness of the automatically extracted themes for human analysts. Validation criteria included if the themes '(a) were interpretable in terms of the theory, (b) captured knowledge at an appropriate grain size, (c) captured combinations of elements, and (d) captured the dynamics of individual interviews' [145, p. 633]. It was found that many of the extracted topics well matched theoretical expectations (from knowledge in pieces theory and common sense physics reasoning). In another study that sought to group unstructured data, an unsupervised ML model named Latent Dirichlet Allocation (LDA) was used to extract themes that were discussed throughout the Physics Education Research Conference [146]. In this research, NLP methods helped to preprocess the articles in a meaningful way. For example, the authors combined NLP methods to reduce word forms, and detect language patterns that account for larger patterns of words. Different means for validating the model were used, such as coherence and perplexity of the topics, face validity, and comparison with other qualitative methods. After validating the topics, emerging themes in PER or overall important topics could be singled out from a large, unstructured language data set.

Unsupervised ML has also been utilized in intelligent tutoring systems in physics. Both symbolic and statistical NLP techniques are used to intelligently parse input, interpret it, and generate responses in tutorial dialogues. A well-known implementation is AutoTutor that is based on a strategy named expectation and misconception-tailored dialogue [147]. Besides the requirements for researchers to specify the prompts and expected answers and misconceptions (non-normative ideas), AutoTutor utilizes statistical NLP, namely latent semantic analysis. Latent semantic analysis is a means of reducing the dimensionality of the language data. As such, it allows researchers to calculate the similarity (i.e. distance in high-dimensional space) of documents. Latent semantic analysis was then used to match expectations and misconceptions in students' responses, hence allowing the tutoring system to track the current level of understanding of students in the solution space [137]. Of course, specification of expected answers and misconceptions was necessary in the form of curriculum scripts. It is important to note that in these unsupervised approaches established theory in PER and educational sciences formed the background against which the validity of the findings was evaluated. If the algorithms picked up on novel patterns, these might not have been recognized by the researchers.

In sum, unsupervised ML approaches can be utilized to explore patterns in complex language data. This can help researchers in PER to browse through data repositories of unprecedented size and extract information from them. For instance, the disciplinary structure, encompassing the topics addressed over time, could be analyzed using existing physics journals as a foundation. Moreover, unsupervised ML approaches could facilitate adaptive guidance in tutoring systems and thus foster physics learning processes.

6.2. Supervised ML applications

Some of the language data is structured, i.e. annotated, coded, or labelled. Oftentimes, researchers spend parts of their research for developing and validating coding rubrics and coding data. For this data, supervised ML methods can be used. A wide variety of supervised NLP-based software tools exist that can help in analyzing constructed responses in PER. For example, the program LightSIDE was used to extract feature lists from text data [148]. The seemingly simplest text feature, namely the words that occur in a response, was selected to predict category membership. Validation of the ML model was performed through comparison with human raters. Human-machine agreement was then calculated as a proxy for performance of the ML algorithm. Mean response length was 1–2 sentences, which is quite typical for constructed responses by students for a simple prompt. The authors analyzed responses on Newton's first law, e.g. in the ball-track-question, where they asked what happens to the movement of a ball, if a certain track was extended. The ML model could then be employed to automatically determine whether students provided answers regarding whether the movement would slow down, speed up, and so forth. In another question concerning crash test dummies, the ML model successfully discerned whether a student's response was purely descriptive, or included an explanation related to forces (as the question prompted students to explain). As such, the ML model could be used to request a rewrite if students did not explain the phenomenon. Furthermore, the ML model performed reasonably well in identifying distinct groupings based on various strategies for removing a stuck coin from a graduated cylinder. The authors ascertain that self-validation proved to be a viable method for approximating results. However, as expected, cross-validation results, which involve testing the ML model on unseen data, consistently yield lower outcomes in comparison to self-validation results. Furthermore, in line with expectations, more examples for a category in the training data result in improved classification accuracy for that group. The authors conclude that 'larger data sets being more desirable suggests that collaborations that could combine smaller response sets may present a significant advantage in further exploring ML in physics education' [148, p. 13].

Supervised ML was also used to predict students' score in the Physics Measurement Questionnaire, based on their constructed responses. NLP was used to preprocess responses (vectorization) in order to feed them into a ML model (logistic regression, random forest, k-nearest neighbors). All ML models performed about as well as human raters. Moreover, they found that most predictive words were well aligned with expected words in pointlike and setlike paradigm [149]. In a similar manner, supervised ML has been used to analyze middle school students' explanations of thermodynamics phenomena [150]. The authors provided then guidance to the students based on an automatically generated (ML-based) score. One motivation for this usage was that students' misconceptions include that metals 'conduct,' 'absorb,' 'trap,' or 'hold' cold better than other objects [150]. The authors used the NLP program c-rater to automatically identify distinct concepts in students' explanations of a thermodynamic problem ('A metal spoon, a wooden spoon, and a plastic spoon are placed in hot water. After 15 s, which spoon will feel the hottest and why?'), such as 'The metal spoon feels like a different temperature than its actual temperature,' or 'Metal conducts or heats up or transfers heat (NO comparisons OR rate)'. Overall, supervised ML is oftentimes utilized to classify constructed responses into pre-defined categories and automate this process, which raises scalability of analyses of language products in PER.

Supervised ML approaches in PER can help researchers to automate their research and instruction. For example, validated coding rubrics could be fed into ML models to train automated classifiers that could then be used either for diagnostic purposes in large scale studies or in physics learning environments to classify students' responses and provide them feedback.

6.3. Utilizing LLMs

Capturing the statistics (e.g. recurring patterns and themes) of natural language and predicting next words in a sequence can be defined as the task of language modelling. Modelling tools became very sophisticated with advances in AI research [26]. In particular, artificial neural networks excelled in language modelling [26, 151153]. An important goal in these approaches is to represent words as continuous vectors where each word has a vector based on the context that it appears in, following the distributional hypothesis that words that appear in similar contexts have similar meaning [75, 79]. Contextualized embeddings were trained with specific language model architectures, established as LLMs. LLMs are typically complex neural network architectures that have the overall objective to predict next words in a sequence [154]. As such, the trained LLMs can be used to generate a response to a prompt (input), so-called generative LLMs, or they are used as intermediate means to represent language data, where these representations are used in other ML models to classify text. A popular example of generative LLMs include GPT (generative pre-trained transformers). These LLMs can be used to generate new text, given some input and prompts, such as in the example of ChatGPT. Other LLMs include BERT (bi-directional encoder representations from transformers) [153], which can be used to retrieve contextualized embedding vectors for input.

Training LLMs has become pivotal to the success of contemporary NLP systems. LLMs are trained on large repositories of language data, such as the Common Crawl of the Internet (410B tokens), WebText2 (19B tokens), Wikipedia (3B tokens), and probably other, undeclared sources (proprietary information of respective private companies). It was found that generative LLMs such as GPT or BERT trained on these large repositories could well produce authentic and oftentimes correct language utterances given some prompt. However, the early versions of LLMs had to be fine-tuned to particular tasks in order to perform well on them. These problems were solved with increases in training steps (compute time), model size, and also prompting techniques [34, 155]. LLMs such as GPT-3 became what was called few-shot or zero-shot learners [155]. They were capable of completing tasks with only a few or no examples demonstrating how to solve such tasks. For example, chain-of-thought prompting was used to enable LLMs to perform simple reasoning tasks [156]. In chain-of-thought prompting, the LLM is commonly provided input-output pairs (questions and answers). Rather than instantly producing the answer, in chain-of-thought prompting the model is required to produce intermediate reasoning steps. Google Research applied chain-of-thought prompting to PaLM (pathways language model) and reached state-of-the-art performance on GSM8k, a data set of 8.5K high quality linguistically diverse grade school mathematics word problems [157], with 58%. The famous conversational AI ChatGPT (based on the predecessors, Generative Pre-trained Transformers, GPT-3.5) from OpenAI was trained both with supervised and other forms of learning (e.g. with human feedback). ChatGPT scores well in MBA exams, medical exams, and verbal-linguistic IQ [158]. However, in advanced mathematics results are more mixed [158].

LLMs became promising tools for language analysis in PER. By merely processing large amounts of language data, LLMs performed (at least) above chance in some extent reasoning tasks about physical common sense [159]. Novel training methods such as prompting allowed LLMs to increase performance in tasks of quantitative as well as common sense reasoning in domains such as physics and mathematics [34]. With further fine-tuning the LLMs on mathematics and physics data, the models surpassed performance of other LLMs to solve quantitative reasoning problems (including derivations of the solution) to unprecedented accuracy, surpassing human average in national mathematics exams [34]. An acknowledged issue in LLMs is memorization, meaning the ability to solve problems by merely recalling samples from the training data (thus characterizing LLMs as 'glorified lookup tables', or 'stochastic parrots' [160]), rather than through genuine synthesis based on understanding the underlying concepts. The authors have, to a certain extent, addressed this concern by demonstrating that the problems presented in the test data were not present in the training data and that adjustments to prompts, whether in wording or numerical content, correlated with variations in solution performance.

LLMs could also be used to enhance unsupervised ML approaches, e.g. to inform topic detection [161]. For this purpose, the feature of LLMs to transform an input sentence into a contextualized, dense embedding within a high-dimensional vector space was employed. These embeddings can then be used as a feature to be forwarded into another ML model which performs clustering. In this case, an LLM was used to encode sentences in pre-service physics teachers written responses (here: reflections). Afterwards, dimensionality reduction and clustering algorithms were used to extract topics in the reflections. Similar validity indicators as in [145] were utilized, and it was found that pre-service physics teachers' reflections entailed rather general and more physics-specific topics, which is consistent with research on pre-service teacher's noticing [162, 163]. Again, substantive theory in PER and educational science was required to validate the extracted topics and findings. As with unsupervised ML, LLMs could also enhance supervised ML through efficient data transformations. Wulff et al [164] used ML algorithms to classify pre-service physics teachers' written reflections into categories provided by a reflection-supporting model. The performance and generalizability could be substantially boosted by using LLMs to transform the input data [161]. Moreover, the LLM-based model could undergo additional fine-tuning for various contexts, allowing for the classification of written reflections from non-physics students [165]. This offers new possibilities in PER to share models and thus scale research.

Tschisgale et al [166] showed how both unsupervised and supervised ML and NLP could work in tandem with human experts to discover patterns in physics students' written problem solutions. The authors applied what is called 'Computational Grounded Theory' and used unsupervised ML on the basis of LLMs in a first step to discover patterns in the problem solutions. The patterns were cross-validated with human experts. Finally, a classifier was trained to automatically detect the identified patterns in unseen data and the final model presents a validated ML model that could be utilized in teaching and learning settings, across research sites.

Recently, many physics education researchers investigated the performance of the generative AI model GPT (especially versions 3.5, and 4.0, as implemented in conversational AI ChatGPT) for solving physics problems. Applications are versatile, given that GPT as a generative AI that continues input text can parse and process all kinds of open-ended and closed physics problems (as of October 2023 image processing capabilities are limited). West [33] found that ChatGPT was capable of physics conceptual problem solving as presented by the widely used Force Concept Inventory (FCI) [167]. In fact, ChatGPT dramatically improved its performance from version 3.5 to 4.0 with a gain g = 86.7%, 'which is more than twice the average effect of a pedagogically designed, interactive physics course' [33, p. 2]. This progress led to an almost perfect expert-like solution probability (note that graphical representations had to be transcribed for GPT to parse them). However, occasionally very basic errors that are not expert-like occurred [28, 33]. A deep, socratic conversation with ChatGPT on the question 'A teddy bear is thrown into the air. What is its acceleration in the highest point?' [29, p. 1] also produced knowledge statements that were unreliable, despite the linguistically advanced manner in which they were presented [29]. ChatGPT was then found to excel in short-form physics essays, i.e. perform on par with students [168]. Instruction with ChatGPT could increase students' perception of ChatGPT, and it is expected that ChatGPT can play an important role in instruction [30, 33]. For example, Küchemann et al [31] showed that ChatGPT could assist physics students with task generation, and Kieser et al [32] demonstrated that ChatGPT could be used as a tool for educational data augmentation: ChatGPT was successfully used to simulate students with certain pre-conception (e.g. impetus) related to forces. This offers novel opportunities to pilot test instruments in physics, augment data, probe instructional materials, and even generate instructional materials.

In sum, LLMs offer potentials to improve accuracy for unsupervised and supervised ML approaches in PER. They help represent language data in a more nuanced way (contextualized embeddings), and capture more complex (non-linear) relationships that can be harnessed for automated classification. In addition, generative LLMs offer avenues for language production when prompted, which has been demonstrated to be beneficial in both research and teaching within PER.

7. Synthesis: enhancing instruction and research on language and language use in physics through AI

Upon reviewing existing studies on language within PER, we identified several key insights: (1) Physics language provides a rich and nuanced toolkit for explaining physics concepts and processes; (2) There's a significant difference between physics language and everyday language, with the latter being crucial for engaging learners in physics educational settings; (3) The comprehension of physics language is influenced by individuals' knowledge and past experiences; (4) The understanding and use of language in physics are influenced by context; and (5) Effective instructional design in this area should address multiple aspects including personalized content, explicit and metacognitive guidance, as well as the use of different disciplinary modes of representation in reference to each other. This established knowledge, we argue, can be expanded and advanced through ML, NLP, and LLMs. ML, NLP, and LLMs already enabled physics education researchers to detect topics in their unstructured, language-related data and helped them to automate coding of constructed response items in order to provide specific and automated guidance for students. Recently, generative LLMs solved open-ended and closed-form conceptual physics tasks and problems that tested for physics-related reasoning. LLMs were also utilized to enhance pattern detection, automate annotation and coding, provide exemplary responses for concept inventories, and create synthetic data sets. All these applications come with specific challenges. We will discuss potentials and challenges related to instruction and research in the following.

7.1. Language-related instruction in physics

Language-related research and emerging AI-technologies have played and will play an important role in physics instruction. Given the analytical power and automation potentials of ML, NLP, and LLMs, these technologies can enhance adaptive instruction, and spare the valuable time of instructors who can be (partly) freed from repetitive tasks such as essay scoring. Given the importance of language as a semiotic resource and the scarcity of opportunities for students to produce language in physics classrooms, many language-involved activities and applications are conceivable such as interviews with simulated historical figures or adaptive problem solving. Moreover, teacher dashboards become possible with the help of ML, NLP, and LLMs that can track language use in classrooms. This might open up classroom communication by identifying and breaking cycles of authoritative communication or similar strategies that narrow students opportunities to actively engage in physics learning environments. It is then the responsibility for researchers and instructors to design these technologies in ways that they enhance the learning processes of students, rather than eliminating essential language-related learning opportunities.

Physics education researchers also explored the caveats for these applications such as false confidence in factually wrong answers by LLMs (also known as hallucinations), which always have to be critically monitored, especially when implemented in educational settings with students. It was discovered that LLMs acquired similar biases to those of humans, reflecting the fact that their training data encompass diverse sources from the Internet [35]. These issues pose serious challenges to educational applications, as students might be confronted with unproductive problem solving strategies, or worse, biases and stereotypes related to their social identities. After all, language data on the Internet (i.e. training data for LLMs) is full of harmful language. However, considering bias will also involve trade-offs, as we know from educational research that teachers expose certain biases as well. For example, studies reported gender biases in physics classrooms—such as teachers' implicit, and gendered theories on giftedness [169]. If students interact with LLMs and do not expose their gender orientation, this might reduce bias in guidance. Also, average human tutors were found to struggle to adequately judge a student's understanding and adapt their guidance [170]. Hence, even if automated tutoring systems achieve accuracies of only 70%, they may outperform inexperienced human tutors not only in terms of accuracy but also in considerations of time and cost [148].

7.2. Language-related research in PER

Researchers laid out in what ways general ML nowadays can assist in different phases of scientific discovery and automate them [171]. When utilizing ML, NLP, and LLMs in PER, we found that researchers used various ways to ensure model validity. Mostly, substantive domain expertise was necessary to interpret and validate output of ML models. As such, the established body of knowledge in PER on language and language use in physics and beyond is crucial to direct the development of these AI-technologies. As of now, ML and NLP rather enable researchers to outsource repetitive tasks and enable adaptive, automated guidance in their learning environments. For example, automated transcription through ML models of verbal interactions in classrooms can yield new insights into interactional patterns to help refine our understanding of classroom communication, or language use in informal (learning) environments as well. These technologies could also be applied to navigate existing data repositories in order to reconstruct the evolution of meanings associated with physics terminology over time. LLMs could even be trained to predict 'optimal' physics terminology to ease communication and understanding, and lower interference with everyday language. LLMs might discover entirely new and different concepts in the first place. ML and NLP have already been used in PER to suggest curriculum sequences [172]. However, standards for ML model validity in PER should be established in order to ensure robust theory development.

A fundamental question relates to the actual capabilities 6 that the LLMs acquired from merely browsing through language data. While some researchers posit that LLMs such as GPT-4 show 'sparks of artificial general intelligence' 7 , and challenge our understanding of learning and cognition [173], others remain more sceptical and question the claim that LLMs are somewhere near artificial general intelligence or even an efficient architecture to process language in the first place [174, 175]. Be that as it may, cognitive semantics, social semiotics theory and multimodality theory suggest that physical embodiment in the world and exploratively engaging in language-games based on various semiotic resources (not only language) is crucial to develop a deep understanding of concept as used in physics. Based on these premises, it is contested that LLMs do acquire a (deep) understanding of physics concepts. Browning and LeCun [176] recognize the limitations of language and conclude: 'these systems [LLMs] are doomed to a shallow understanding that will never approximate the full-bodied thinking we see in humans.' Rather LLMs can be considered as performant interpolation machines, i.e. they excel at generating similar language which has been seen during training. Given that it is likely that LLMs trained on the Common Crawl (dump of the Internet) saw some form of concept inventories during training, this sets the high performance on these concept inventories into perspective. While some researchers demonstrated that LLMs could generalize beyond physics problems that they have seen in the training set [28, 177], the reach of this transfer is often narrow, given that names and objects are changed. Solving these transformed problems would not necessarily account for deep physics understanding, and researchers should be particularly careful to attribute processes such as reasoning or argumentation capabilities to these models, i.e. anthropomorphize them.

A purely language-based learning approach for LLMs is hardly capable of learning representations for perceptual features such as smell or colors [178], reinforcing the concern that—without physical grounding, i.e. embodiment—LLMs cannot learn adequate representations of words and concepts [178, 179]. Quite interestingly to this point, some colors can be meaningfully internally represented to some extent by LLMs [180], and also meaningful internal representations of board position in the board game Othello could be acquired from merely predicting moves in the game [181]. Yet, physics understanding is contingent on other semiotic resources than language such as visualization and mathematics. Given the limitations of language itself, it is reasonable to assume that LLMs in their current form do not possess anything resembling a physics-related understanding of the world, i.e. formed internal representations that transcend mere linguistic associations and encapsulate physics. While it has been documented that LLMs demonstrate improved performance with increased training steps, larger training data sets, and more extensive parameter configurations [182, 183], the inherent constraints of language and the absence of embodiment likely hinder LLMs from developing a deep, expert-level unerstanding of physics. In terms of Wittgenstein: what these LLMs most likely learned is to play different language-games without any reflection on the structure of communication and semiotic processes. Once, language processing and processing of perceptual sensor information is combined with opportunities for AI to engage with the physical world (robotics and multimodal models), new possibilities emerge as to what these systems can possibly learn in terms of a physics-related internal representation of the world. Current LLMs are oftentimes black-boxes, in which model decisions are not understood well [184]. To what extent these enhanced, multimodal foundation models [185] then form similar ontologies and conceptual metaphors as present in physics language is an open question. AI researchers began to integrate robotics and concept formation (e.g. schema networks) to derive knowledge (and, eventually, language) from merely observing and interacting with the physical world (real or simulated), with some successes in generalizing and understanding causality of events [186].

Ethical and ecological concerns, along with the need for transparency, pose specific challenges when it comes to the utilization of LLMs in language-related research in physics [187]. Also, biased model decisions pose concerns especially when LLMs are employed in educational settings or for high-stakes testing [187]. It is important for researchers to assure that their models do not advantage certain groups, especially related to gender or race [166, 188]. LLMs are seldom open sourced 8 and involve private contracts to use private data sets for training. At times, even private information might be revealed through these models with specific prompting. Researchers recommend open sourcing the LLMs in order to better understand model decisions. Moreover, research in the explainable AI tradition should seek to illuminate why LLMs generate certain outputs. Finally, it is important to also consider the CO2 footprint of LLMs [189]. Training and inference cost large amounts of CO2 (since energy systems are not entirely renewable in most countries) and financial investments in cloud computing infrastructure [190]. These need to be balanced with the pedagogical benefits of instructing students more adaptively and in an individualized manner.

8. Conclusion

It is claimed that most physics lessons, especially in early schooling, are in part language lessons [83, 129]. An essential objective is to offer students ample opportunities for comprehending and producing physics language, thus facilitating their proficiency and literacy in this discipline. In order to offer students the most advantageous circumstances for understanding and producing physics language, further research is indispensable for gaining a deeper understanding of language usage in physics classrooms and for designing language-sensitive instructional materials and guidance. ML, NLP, and LLMs can further enhance this research and instructional approach in the aforementioned ways, among others. The purpose of utilizing ML, NLP, and LLMs is then to enhance researchers' and instructors' capabilities and allow them to focus on what is most important: productively guiding the physics learning process of their students.

Acknowledgment

This research was supported by the Klaus-Tschira-Stiftung (Grant number: 00.001.2023).

Data availability statement

As this is a scoping review study, data availability does not apply here.

Footnotes

  • 1  

    Credit for introducing language-games in this context goes to one of our reviewers, and ChatGPT (based on GPT4) for elaborating on the connotation of language-games for Wittgenstein (access conversation here: https://chat.openai.com/share/78d1bcdd-a304-4779-af86-5aaf7380bcb4, Oct 2023).

  • 2  

    We use this term rather loosely here to distinguish it from natural languages, which have less restrictions; for linguistic details see [45, 79].

  • 3  

    Though Faraday, back in time, wrote in a very personal style [93].

  • 4  

    For a discussion of transducing one semiotic system to another, see [39].

  • 5  

    Even worse, this was identified to be a three-language problem, comprising everyday, academic, and scientific language [16].

  • 6  

    When we speak of LLMs' capabilities, we always refer to the performance on some tasks without implying that these models actually learned human-like skills.

  • 7  

    Artificial General Intelligence contrasts with narrow AI to demonstrate broader capabilities of reasoning, planning, etc besides well-defined tasks [173, p. 1].

  • 8  

    Quite ironically, the company OpenAI was created to provide the public open sourced AI products, but now keeps their source code protected. The company Meta propagates openness and consequently open sourced their LLM Llama.

Please wait… references are loading.