1 Introduction

As well as actionality,Footnote 1 which seems to be a common feature of all languages (cf. Breu, 1980: 115; Lehmann, 1992), Slavic languages additionally have verbal aspect as a grammatical category. The two interact in a rather complicated way.Footnote 2 On the morphological level, a given verb has an inherent PFV or IPFV aspect, obligatorily expressed by all finite and non-finite forms of that verb (cf. Janda, 2007a: 608). For example, based on its actional properties the lexical meaning ‘to voluntarily let someone have something free of charge’ is morphologically coded as PFV, i.e. dati. In contrast, the lexical meaning ‘prepare food on the stove or on the fire in a pot with boiled water’ is, according to its actional properties, coded as IPFV on the morphological level, i.e. kuhati. Nevertheless, in principle every lexical meaning can be expressed with both PFV and IPFV verbs. The examples in (1) illustrate how this is done in Croatian.Footnote 3

  1. (1a)

    dati

    davati

    give.pfv.inf

     

    give.ipfv.inf

    ‘to voluntarily let someone have something free of charge’

  1. (1b)

    kuhati

    skuhati

    cook.ipfv.inf

     

    cook.pfv.inf

    ‘prepare food on the stove or on the fire in a pot with boiled water’

Prototypically this possibility of expressing the same lexical meaning with verbs whose aspectual values are opposed is provided by aspectual (grammatical) derivation (Lehmann, 2009: 2, 7). Nevertheless, in addition to verbs that have an inherent PFV or IPFV aspect, in all Slavic languages there are verbs with underspecified aspectual value. In simple terms, these verbs lack the morphological distinctions that most verbs have: different forms for PFV and for IPFV aspectual value (cf. Janda, 2007a: 636–637). This may be seen for the Croatian verb analizirati ‘to analyze’, as presented in (2).

  1. (2)

    analizirati

    analyze.(i)pfv.inf

    ‘to analyze’

In the aspectological tradition these verbs are usually called biaspectual verbs (BVs).Footnote 4 It is assumed that on the sentence level they have the potential to express both aspectual values, PFV and IPFV, without any further aspectual affixation (cf. Stevanović, 1952: 304; Babić, 2002: 554).Footnote 5,Footnote 6 Further to this, scholars (e.g. Isačenko, 1960: 143–144; Avilova, 1968: 66; Galton, 1976: 294; Čertkova, 1996: 100–109; Zaliznjak & Šmelëv, 2000: 10; Silić & Pranjković, 2007: 49) have repeatedly argued that the resolution of (I)PFV aspectual value occurs with the help of context. They have asserted that on the sentence level only one of the two opposing aspectual values is realized. Since the hallmark of BVs is the absence of morphologically distinct PFV and IPFV forms, it is usually other verbs, adverbials, the verbal category of tense, conjunctions, and to some extent the combination of clauses in coordination and subordination that serve as cues for determining the realized aspectual value. To illustrate, in (3) the temporal adverbial satima ‘for hours’ signals that the BV analizirati ‘to analyze’ is being used in the progressive function and has IPFV aspectual value.

  1. (3)

    To

    smo

    satima

    analizirali…

     

    that

    be.1pl

    hours

    analyze.ipfv.ptcp.pl.m

     

    ‘We were analyzing that for hours.’

    (hrWaC)

However, some scholars (e.g. Veselý, 2010: 121) argue that the intended aspectual value of such verbs can rarely be signaled unambiguously. In other words, there are many cases in which both aspectual values, PFV or IPFV, can be attributed to a single instance, like in (4).

  1. (4)

    Kako

    ću

    ovo

    s

    guštom

    analizirati,

    mmmmmm.

    how

    fut.1sg

    this

    with

    pleasure

    analyze.(i)pfv.inf

    mhmmmm

    ‘With what pleasure I will analyze/will be analyzing this.’

    (hrWaC)

It is not very easy to determine the intended aspectual value of the BV analizirati ‘to analyze’ in (4), since in Croatian the future tense allows the use of both aspectual values, as do some other tenses. In such a case, given a lack of context or discourse signals, both aspectual values can be attributed to a single instance. Therefore, the instance in (4) can be interpreted as either concrete-factual or progressive, in other words as either PFV or IPFV.

Cases where the intended aspectual value can stay hazy is exactly where things get interesting. Namely, in such a case, native speakers in principle have three possibilities. First, they can ignore the blurred aspectual value and leave the utterances as they are, as illustrated in (4). Secondly, they can give an additional context signal as in (5), where the nominal phrase cijelu situaciju ‘the whole situation’ syntactically indicates perfectivity.Footnote 7,Footnote 8

  1. (5)

    [...]

    krizni

    stožer

    koji

    će

    analizirati

     
     

    crisis

    headquarters

    which

    fut.3sg

    analyze.pfv.inf

     

    cijelu

    situaciju.

         

    whole

    situation

         

    ‘[…] crisis headquarters, who will analyze the whole situation.’

    (hrWaC)

     

The third possibility is to form a new, aspectually defined verb with the same meaning, as in (6). In (6) the prefix pro- is a morphological signal of perfectivity.

  1. (6)

    Sjest

    ćemo

    i

    proanalizirati

    situaciju.

     

    sit.pfv.inf

    fut.1pl

    and

    analyze.pfv.inf

    situation

     

    ‘We will sit down and analyze the situation.’

    (hrWaC)

This study aims to examine the third option empirically. Section 2 summarizes the state of the art in biaspectuality research with a focus on aspectual affixation of BVs. Section 3 presents the research questions, while Section 4 explains the data collection process. Section 5 describes the results in detail, and is followed by the final Section 6, which draws conclusions from the main results and offers a suggestion for future research.

2 Biaspectual verbs: state of the art

2.1 The most common topics of biaspectual verbs and their affixation

Biaspectuality is a lexically limited phenomenon. However, besides having a small number of BVs of Slavic origin (mostly inherited from Proto-Slavic), Slavic languages are continuously acquiring BVs via language borrowing. Therefore, given the relative share of such verbs as well as their general semantic, morphological and syntactic properties, biaspectuality certainly plays an important part in the riddle that is Slavic aspect ​​(cf. Mučnik, 1966: 65, 73). However, grammar books and other descriptions of Slavic aspect have devoted relatively little space to the phenomenon ​​(cf. Jászay, 1999: 169).

A review of the rather scarce and scattered information in aspectological literatureFootnote 9 shows that the following issues are addressed in the majority of scholarly works on BVs and biaspectuality in Slavic languages:

  • classification of verbs as biaspectual and discrepancies between different dictionaries,

  • diagnosing biaspectuality (detecting and defining BVs),

  • number of BVs in a given Slavic variety,

  • BV prefixation and suffixation, relative frequencies and functions of the processes and affixes involved, as well as pleonastic and other stylistic characteristics of aspectually marked derivatives formed from base BVs,

  • perseverance of biaspectuality following the emergence of aspectually defined derivatives,

  • differences in degrees to which verbs are biaspectual,

  • biaspectual or aspectless status of such verbs (whether they convey aspectual value at all).

As may be seen, the literature addresses various topics regarding BVs. Nevertheless, the main focus is on aspectual affixation, and more precisely on inventorying the prefixes and suffixes used to derive new (aspectually defined) verbs from BVs.

2.2 Predictions on factors contributing to aspectual affixation of biaspectual verbs

As mentioned in Sect. 1, if a BV is not communicationally transparent with respect to the intended aspectual value, native speakers can draw on aspectual affixation. This allows them to derive an overtly marked PFV or IPFV verb from a base BV to resolve aspectual vagueness (for Russian see Avilova, 1968: 66, for Croatian see Silić & Pranjković, 2007: 49, for Czech see Veselý, 2010: 121; for OCS see Kamphuis, 208–212); for an illustration see the example presented in (6). Nonetheless, it seems that only a limited number of BVs can undergo aspectual affixation. According to the literature (e.g. Mučnik, 1966: 65), less than 1/3 of all BVs in Russian form new overtly aspectually marked derivatives. Moreover, it seems that prefixation is the more common derivation method. Although, as shown in the previous subsection, in works regarding BVs in Slavic the main focus is on aspectual affixation, only some authors touch on the factors that contribute to this process. In a more detailed literature review morphological, semantic and other factors such as the sociolinguistic enter the picture.

Morphological factors that block or foster affixation of BVs appear mainly in literature on Russian aspect. Some scholars (e.g. Mučnik, 1966: 65–66) assume that biaspectual borrowings would fit more easily into the Russian aspectual system if they were formed only with the suffix -ova- (e.g. arendovat’ ‘to rent’, atakovat’ ‘to attack’). In that vein, Avilova (1967: 85; 1968: 69–71) claims that from the end of the 18th century to the 1830s, Russian biaspectual borrowings with the suffix -ova- (e.g. arendovat’ ‘to rent’, raportovat’ ‘to report’) were more prone to prefixation than those containing the suffix -irova- (e.g. bal’zamirovat’ ‘to embalm’, meblirovat’ ‘to furnish’). Building on these lines of thought, one could speculate that in Croatian BVs of Slavic origin and biaspectual borrowings might differ when it comes to prefixation. In Croatian these two types of BVs actually differ strongly in morphological structure. While the majority of biaspectual borrowings have the suffix -ira- (e.g. analizirati ‘to analyze’, fotografirati ‘to photograph/to take photos’), BVs of Slavic origin are built with various suffixes (e.g. vidjeti ‘to see’, savjetovati ‘to counsel’, čestitati ‘to congratulate’, noćiti ‘to spend the night’).

Further, the recent literature on Russian aspect indicates the presence (or lack) of synchronically visible prefixes as a morphological factor in the prefixation of BVs. Namely, Piperski (2018: 117–118) claims that Russian BVs that have a synchronically visible prefix (e.g. ispol’zovat’ ‘to use’) are very unlikely to be prefixed. Similar observations were made by the Croatian linguist Babić (1978: 74; 2002: 537), who claimed that prefixation of prefixed verbs is a very rare phenomenon. In Croatian many BVs have either a diachronically or a synchronically distinguishable prefix (e.g. doručkovati ‘to have breakfast’, objedovati ‘to have lunch’, poštovati ‘to respect’, razumjeti ‘to understand’, savjetovati ‘to counsel’, uzrokovati ‘to cause’). Therefore, there is good reason to accept Piperski’s claims (2018) and assume that in Croatian, prefixation of base BVs that have a synchronically and/or diachronically distinguishable prefix will be less probable than prefixation of BVs with no such prefix.

As already mentioned, scholars working on Russian aspect have recognized that biaspectual borrowings with different suffixes are not equally prone to prefixation. Morphological constraints on aspectual affixation of BVs have also been observed for South Slavic languages. Reportedly, in Croatian and Serbian -ira- BVs (e.g. markirati ‘to mark’, konstatirati ‘to state’) do not undergo suffixation (cf. Magner, 1963: 628). It has been argued that suffixation of BVs with this suffix is blocked because -ira- is actually an imperfectivizing suffix of native verbs (cf. Magner, 1963: 628).Footnote 10 The same problem occurring with the suffixation of -ira- BVs has also been noticed in Slovenian (cf. Plotnikova, 1971: 35). Nevertheless, corpora of contemporary Croatian language suggest that some -ira- BVs (e.g. instalirati ‘to instal’, organizirati ‘to organize’) in Croatian do have suffixed derivatives (e.g. instaliravati, organiziravati). One justifiable way of thinking would be to treat these suffixed derivatives of BVs as morphological signs of the instability of biaspectuality. Accordingly, if these BVs are unstable enough to form suffixed derivatives, there is good reason to assume that they build prefixed derivatives too.

A few scholars (e.g. Avilova, 1968: 66–68; Šeljakin, 1983: 149) recognize the semantics of base BVs as a factor contributing to prefixation of BVs in Russian. Avilova (1968: 67–68) believes that biaspectual lexical bases like arendovat’ ‘to rent’ and atakovat’ ‘to attack’ that are less polysemous, i.e. have fewer meanings, go hand in hand with prefixation.Footnote 11 Apparently, there is a greater probability that such perfective derivatives will not differ in their lexical meaning from their base BVs. Therefore, such perfective derivatives are ideal candidates for what is called “true aspectual pairs” in traditional Slavic aspectology.Footnote 12 In view of this it may be suspected that polysemy also plays an important role in the prefixation of BVs in Croatian.

Although, as far as I am aware, there have been no comprehensive sociolinguistic studies of BVs in any Slavic language, some sociolinguistic factors regarding derivatives of BVs do appear in the literature. It appears that a speaker’s age has a crucial role in the acceptance of derivatives of BVs. Young and middle-aged native speakers of Russian, Serbian and probably Croatian are more likely to accept derivatives that, until recently, have been rejected in the norm (cf. Jászay, 1999: 174; Lazić, 1976: 58). Prefixal derivatives of biaspectual verbs are particularly visible in Russian colloquial uncodified language (Potechina, 2007: 115). In relation to this, it is worth noting that in the middle of the last century the Russian linguist Isačenko (1960: 145) observed that some BVs did not form aspectually marked suffixed derivatives because of the conservative nature of the functional registers in which they were used. Moreover, some Serbian and Croatian linguists (e.g. Car, 1934; Stevanović, 1952; Kovačević, 2011; Hudeček et al., 2011: 54) label derivatives of BVs as pleonasms and exclude them from the norm. Having this in mind, it could be assumed that different corpora of contemporary Croatian could be a good start in looking for differences between registers. More precisely, corpora of standard Croatian might be more conservative and reflect the norm, while web corpora such as hrWaC and its user-generated subcorpus Forum might allow for broader derivation and usage of derivatives of BVs.

3 Research questions (hypotheses)

As already stated in Sect. 1, the aim of this paper is to give an empirical account of BV prefixation in Croatian. Section 2, which reviews previous studies on BVs in Slavic languages, showed that not all BVs can form new aspectually defined derivatives to the same extent. Therefore, the goal of this study is to investigate and determine empirically which factors affect the prefixation of BVs. Based on the summarized theoretical facts and predictions outlined in the previous section, the following 5 research questions were formulated:

RQ1::

Do base BVs of Slavic origin and biaspectual borrowings behave differently with respect to prefixation (cf. Mučnik, 1966: 65–66; Avilova, 1967: 85; Avilova, 1968: 69–71)?

RQ2::

Are base BVs with a synchronically and/or diachronically distinguishable prefix less prone to prefixation than BVs that do not begin with such a prefix (cf. Piperski, 2018: 117–118)?

RQ3::

Are base BVs with attested suffixed derivatives more prone to prefixation than BVs without such derivatives?

RQ4::

Does the number of meanings of a base biaspectual lemma influence its prefixation (cf. Avilova, 1968: 67–68)?

RQ5::

Are prefixed derivatives of base BVs equally present in different corpora of the Croatian language, i.e. corpora reflecting standard and colloquial language use (cf. Isačenko, 1960: 145; Jászay, 1999: 174; Lazić, 1976: 58; Hudeček et al., 2011)?

These research questions were operationalized as the following null-hypotheses:

H0.1::

Base BVs of Slavic origin and biaspectual borrowings do not differ significantly with respect to prefixation.

H0.2::

Base BVs with a synchronically and/or diachronically distinguishable prefix and base BVs without such a prefix do not differ significantly with respect to prefixation.

H0.3::

Base BVs with suffixed derivatives attested in corpora of the Croatian language and base BVs without such derivatives do not differ significantly with respect to prefixation.

H0.4::

Base BVs with different numbers of meanings do not differ significantly with respect to prefixation.

H0.5::

The same base BVs do not differ significantly with respect to prefixation when data from different corpora of the Croatian language are compared.

4 Data extraction and methodology

4.1 Study design and operationalization of variables

For the purpose of this study, a fully crossed factorial design allowing the examination of the relationship between the dependent variable (prefixation of BV) and the five independent variables was developed. The information on the variables, i.e. their class, type, levels and coding, is summarized in Table 1 below, and a concise description of their operationalization follows beneath it.

Table 1 List of variables

The study design comprised one dependent and five independent variables, reflecting the research questions presented in the previous section. The dependent variable was the prefixation of the BV, with two levels: 0 (no prefixed derivative attested) and 1 (prefixed derivative attested in corpora of Croatian).

The first independent variable was the origin of the base biaspectual lemma, with two levels: slav (e.g. poštovati ‘to respect’, častiti ‘to invite/to respect’) and irati (e.g. karakterizirati ‘to characterize’, analizirati ‘to analyze’). As may be seen in the examples in brackets, the code slav corresponds to BVs of Slavic origin and the code irati corresponds to biaspectual borrowings with the -ira- suffix.

The second independent variable was the presence of a synchronic and/or diachronic prefix within the base BV, which had two levels: 0 (synchronic and/or diachronic prefix not present in the base BV) and 1 (synchronic and/or diachronic prefix present in the base BV).

The third independent variable was the existence of a suffixed derivative of the base BV, with two levels: 0 (suffixed derivative of the base BV not attested) and 1 (suffixed derivative of the base BV attested in corpora of Croatian). To check if suffixed derivatives of BVs are attested in Croatian, the CNC, RepositoryFootnote 13 and hrWaC were queried and the results obtained from these corpora were merged.

The fourth independent variable was the number of meanings of the base biaspectual lemma. As might be assumed, this variable is the most problematic and definitely the most difficult to operationalize in a sensible way. As will be shown, this measure can never be entirely reliable, no matter how it is operationalized. In this study it was operationalized as a numerical variable in the following way. The number of meanings was extracted from two dictionaries (for more information see Sect. 4.3). Since the number of meanings ascribed to lemmas varied between these dictionaries, first all the meanings in Matasović & Jojić (2002) were copied to an Excel table. Next, they were compared and supplemented with the meanings from Jojić et al. (2015). Finally, each biaspectual lemma was ascribed a number corresponding to the sum of its meanings (1, 2, 3, …). Bear in mind that meaning variants were also counted as separate meanings.Footnote 14 Alternatively, data on the number of meanings could have been extracted from corpora. Although Croatian corpora do not contain semantic annotation, data on the meanings of each lemma could have been extracted indirectly. One option would have been to analyze the meanings of the lemmas used in a vast number of sentences, i.e. different contexts, and to annotate them manually. Moreover, possible automatic alternatives include application of word sense disambiguation, word sense induction or potentially vector semantics.Footnote 15 However, in the case of the latter methods, pulling out information on the number of meanings, i.e. operationalizing the variable in question, would be too time consuming and require a separate study. Therefore, given a lack of guarantee that the application of word sense disambiguation, word sense induction or vector semantics would actually result in significant improvement of the measure, i.e. after cost-benefit analysis, consulting dictionaries seemed the most reasonable option.Footnote 16

The first four independent variables were introduced as between-items factors (as one base BV cannot belong to different types, i.e. it can either have a Slavic origin or be a biaspectual borrowing; likewise it can either have or not have a suffixed derivative, etc.).

Finally, corpus was introduced as the fifth independent variable, with four levels (CNC, Repository, hrWaC and Forum). This variable was introduced as a within-item factor. To establish whether prefixation of BVs varies between different corpora of contemporary Croatian language, it was necessary to allow comparison of prefixation scores of the same item, i.e. BV, in four different corpora of Croatian.

In order to find out which factors affect the prefixation of BVs in Croatian, corpus linguistic methods and data extraction from dictionaries were employed. The following subsections present item selection (the sample of BVs) as well as the data sources in more detail.

4.2 Item selection: sample of biaspectual verbs for the purpose of the study

The list included in the doctoral dissertation Dvovidni glagoli u hrvatskome i slovenskome jeziku (Smailagić, 2011) suggests that there are more than a thousand BVs in Croatian. Nevertheless, Smailagić (2011) does not offer an exact number of BVs in Croatian. Furthermore, the list itself nicely corroborates what aspectological literature has already reported as a general problem: dictionaries provide contradictory information on the (bi)aspectual values of some lemmas (for Russian see Maslov, 1963: 96–97; Čertkova & Čang, 1998: 24–25; Jászay, 1999: 169; Janda, 2007b: 14, for Czech see Kopečný, 1962: 42; Chromý, 2014: 89, for Bulgarian see Ivančev, 1971: 175). 

Although the literature reviewed does not provide information about the exact number of BVs in Croatian, it definitely indicates that there are several distinct groups of BVs in terms of morphological configuration and origin:

  • BVs of Slavic origin (e.g. ručati ‘to have lunch’, vidjeti ‘to see’, poštovati ‘to respect’, častiti ‘to invite/to respect’). These BVs constitute a closed class and belong to various conjugation types.

  • Biaspectual borrowings with the -ira- suffix (e.g. karakterizirati ‘to characterize’, garantirati ‘to guarantee’). This open class is by far the largest group of BVs in Croatian.

  • Non-standard variants of biaspectual borrowings with the -isa- or the -ova- suffixes (e.g. karakterisati, ‘to characterize’, garantovati ‘to guarantee’). This is a closed class of BVs. Corresponding standard variants have -ira- suffixes: compare examples in the second and third group. Some examples of these non-standard variants can still be found in corpora of contemporary standard Croatian due to the presence of older texts, primarily those written in the Yugoslav period.

  • Regionally restricted biaspectual borrowings with the -isa- suffix (e.g. kurtalisati se ‘to get rid of’, uvarisati ‘to score’). Unlike BVs from the previous group, this closed class of regionally restricted BVs does not have -ira- counterparts in standard Croatian.

  • Biaspectual borrowings with various suffixes, many of which are regionally restricted or used exclusively in the colloquial register (e.g. apšisati ‘to fade/to lose color’, barketati ‘to prank’, duplati ‘to double’, hasniti ‘to be useful/to earn’, linčovati ‘to lynch’, lobati ‘to lob’). Although also an open class, this group of BVs is considerably smaller than the second group.

Since RQ1 “Do base BVs of Slavic origin and biaspectual borrowings behave differently with respect to prefixation?” was formulated with reference to Mučnik’s (1966: 65–66) and Avilova’s (1967: 85; 1968: 69–71) assumptions that the morphological structure of BVs influences their prefixation, only verbs from the first two groups were considered to be of interest in this study. BVs from the third and fourth group were not taken into consideration due to their generally low appearance rate in corpora of contemporary Croatian and because it would be difficult to find enough representative items, i.e. base BVs, which would be attested in all the corpora used in this study. Biaspectual borrowings from the last group were not taken into consideration since they are, just like BVs of Slavic origin, formed with various suffixes, and because many of them are regionally restricted and a good portion of them are used mainly in the colloquial register.

This study does not aspire to compile a genuine, novel list of all BVs in Croatian with the help of corpus linguistic methods. This would require a separate study. Instead, it is limited to analyzing samples of verbs that have been recognized as biaspectual by previous scholars.Footnote 17 Before any further data collection, two subsamples of BVs were drawn from the list in Smailagić (2011). The first group contained 37 BVs of Slavic origin. The second group contained 200 biaspectual borrowings. Only lemmas labelled as biaspectual in the two largest dictionaries of contemporary Croatian, Jojić et al. (2015) and Matasović & Jojić (2002), were included in the subsamples.Footnote 18

First, BVs of Slavic origin were taken from the list of BVs in Smailagić (2011) to form the first subsample. Next, as stated above, the biaspectuality of these BVs was additionally checked in the two largest dictionaries of contemporary Croatian. This step was employed to eliminate BVs with debatable biaspectual status from the sample. The check led to the elimination of a total of 20 BVs from the list.Footnote 19 However, during the extraction process it turned out that Smailagić’s (2011) list lacks some BVs of Slavic origin, such as daniti ‘to spend the day/to dawn’, noćiti ‘to spend the night’, poštovati ‘to respect’, zavjetovati se ‘to make a vow’. Therefore, the subsample was supplemented with BVs of Slavic origin found in grammar books and other relevant works.Footnote 20 The problem of representativeness was not a particular consideration in the compilation of this subsample. It comprises 37 BVs of Slavic origin whose biaspectual status was consistently recognized in the dictionaries and grammar books reviewed, see Table 5. Although this might give the impression that the subsample is almost identical to the entire population of BVs of this type,Footnote 21 it is still possible that some Croatian BVs of this type have been overlooked. No list of BVs, including those of Slavic origin, can ever be complete or entirely uncontroversial.

The second subsample (200 biaspectual borrowings, see Table 6)Footnote 22 was formed as a stratified random sample,Footnote 23 as described in the following. Biaspectual -ira- verbs were selected from the list contained in Smailagić (2011) proportionately to the number of BVs given under each letter of the alphabet. Before potential candidates were integrated into the subsample, their biaspectuality was additionally checked in the two dictionaries of contemporary Croatian as described above. Further checks were carried out to ensure that there was enough data for all independent variables of interest.Footnote 24 It is nevertheless possible that the data gathered for this study contain some margin of error. However, the large number of BVs and the stratification of the sample should minimize any distortion error that might have been introduced.

4.3 Corpora of contemporary Croatian and other data sources

In the second step, to empirically answer RQ5 “Are prefixed derivatives of base BVs equally present in different corpora of the Croatian language, i.e. corpora reflecting standard and colloquial language use?” data on prefixation (dependent variable) of the 237 biaspectual items mentioned in the previous subsection were extracted from the three corpora and one subcorpus of contemporary Croatian language that were introduced in the study design as independent variables. Moreover, to meet the requirements of RQ3 “Are base BVs with attested suffixed derivatives more prone to prefixation than BVs without such derivatives?” data on the suffixation of 237 BVs from the sample were also extracted from these corpora. Table 2 below gives basic information about the corpora used in this study.

Table 2 Croatian corpora used in the study as data sources

Two publicly available and electronically stored corpora of standard Croatian, the CNC (Tadić, 1998, 2002) and the Repository (Ćavar & Brozović Rončević, 2012; Brozović Rončević et al., 2018), were used as data sources in this study. Although both corpora represent language strongly influenced by normative prescription, in a way each of them represents at best only a part of the Croatian standard language. Even though both of these corpora are well-known for their rigorous selection of texts, which cover written language from various functional domains and genres, they were compiled at different institutions by different experts with (partly) different visions of what Standard Croatian language is and what it should be like. Thus for example in contrast to the CNC, the Repository also contains translated literature by outstanding Croatian translators. Moreover, unlike the CNC, which features texts from the 1990s onwards, the Repository contains texts dating from the second half of the 19th century to the beginning of the 21st century.Footnote 25 In addition to the two corpora of standard language, the hrWaC and its subcorpus Forum (Ljubešić & Klubička, 2014) were used for data collection. While the Forum subcorpus is composed exclusively of user-generated non-edited content (without external proofreading), the hrWaC contains both standard Croatian (proofread language material) and colloquial Croatian (i.e. non-proofread texts).Footnote 26,Footnote 27 All the corpora used have been automatically lemmatized and morphosyntactically annotated.Footnote 28 Moreover, they are available via the NoSketchEngine interface, which allowed relatively fast data collection.

As already mentioned, it was not possible to extract all data from the corpora. The first obstacle faced was the limitation in annotation. To fill the gap that emerged in the data needed, Matasović & Jojić (2002) and Jojić et al. (2015)Footnote 29 were used as additional data sources. These dictionaries were used to double-check the aspectual value of verbs from both samples, origin of the base BVs (RQ1 “Do base BVs of Slavic origin and biaspectual borrowings behave differently with respect to prefixation?”),Footnote 30 existence of synchronic and/or diachronic prefixes within the base BV (RQ2 “Are base BVs with a synchronically and/or diachronically distinguishable prefix less prone to prefixation than BVs that do not begin with such a prefix?”),Footnote 31 and the number of meanings ascribed to the base biaspectual lemmas (RQ4 “Does the number of meanings of a base biaspectual lemma influence its prefixation?”).

5 Results and discussion

The independent variables outlined in Sect. 4.1. should help to shed light on factors that enable or block prefixation of BVs in Croatian. To test whether the prefixation of the 237 analyzed Croatian BVs is influenced by the origin of the base biaspectual lemma, the presence of a synchronic and/or diachronic prefix within the base BV, the existence of a suffixed derivative of the base BV, the number of meanings of the base biaspectual lemma, and corpus, a generalized linear mixed regression model was designed.Footnote 32 The test was conducted in R (R Core Team, 2020) using the lme4 package (Bates et al., 2015). The code for this model in the R statistical software package is:

$$\mathrm{Pf} \sim \mathrm{Type} + \mathrm{OrigPf} + \mathrm{Suf} + \mathrm{Corpus} + \mathrm{Sem} + (1|\mathrm{Simplex}) $$

The empirical study showed that all of the independent variables contribute in a statistically significant way. Table 3 presents the results of the statistical analysis performed.Footnote 33

Table 3 Factors contributing to prefixation of BVs in Croatian

The results in Table 3 reveal that all five independent variables have a statistically significant influence on prefixation of BVs in Croatian. The most significant factors were the presence of a diachronic and/or synchronic prefix within a base BV (OrigPf) and the corpus (Corpus), followed by the number of meanings of a base biaspectual lemma (Sem). The two factors that were the least statistically significant for prefixation of BVs in Croatian were the origin of the base biaspectual lemma (Type) and the existence of a suffixed derivative of the base BV (Suf).

Table 4 presents the results of a post hoc test, which tested how exactly the individual levels of independent variables influence prefixation.Footnote 34

Table 4 Results of post hoc test (factor levels contributing to prefixation of BVs in Croatian)

These results are also illustrated in Fig. 1, whose purpose is to enable a better understanding of the discussion.

Fig. 1
figure 1

Contribution of individual factor levels to prefixation of BVs in Croatian

Some Russian aspectologists (e.g. Mučnik, 1966: 65–66; Avilova, 1967: 85; Avilova, 1968: 69–71) noticed that biaspectual borrowings with certain morphological properties, such as the suffix -ova- (e.g. rekomendovat’ ‘to recommend’, patentovat’ ‘to protect by patent’), are more prone to prefixation. Following these lines of thought this study compared prefixation rates of biaspectual borrowings (with the suffix -ira-) and of BVs of Slavic origin (with various suffixes). As may be seen in Tables 34 and in Fig. 1, the origin of the base biaspectual lemma has a statistically significant impact as a factor on prefixation. BVs of Slavic origin (e.g. ručati ‘to have lunch’, savjetovati ‘to counsel’, večerati ‘to dine’, vezati ‘to bind’) are more likely to be prefixed than biaspectual borrowings (e.g. akceptirati ‘to accept’, alarmirati ‘to alarm’, deportirati ‘to deport’, ekranizirati ‘to film’, ekshumirati ‘to exhume’, mumificirati ‘to mummify’, pasterizirati ‘to pasteurize’, sistematizirati ‘to systematize’). In other words, the results obtained with the generalized linear mixed model suggest that the null-hypothesis H0.1 “Base BVs of Slavic origin and biaspectual borrowings do not differ significantly with respect to prefixation” should be rejected. Instead, for the time being an alternative hypothesis should be accepted: Croatian BVs of Slavic origin and biaspectual borrowings differ significantly as to prefixation.

The Russian scholar Piperski (2018: 117–118) raised the question of prefixation of BVs with a synchronically visible prefix (e.g. ispol’zovat’ ‘to use’). He stated that such BVs are very unlikely to be prefixed, but offered no empirical data to support this claim. Building on his ideas, in this study the generalized linear mixed model was applied to test whether in Croatian BVs with and without a synchronically and/or diachronically visible prefix (e.g. doručkovati ‘to have breakfast’, objedovati ‘to have lunch’, savjetovati ‘to counsel’, dezinficirati ‘to disinfect’, reproducirati ‘to reproduce’ vs ručati ‘to have lunch’, večerati ‘to dine’, grupirati ‘to group’, kastrirati ‘to castrate’, marinirati ‘to marinate’) differ significantly as to prefixation. The results of the statistical analysis unambiguously indicate that the null-hypothesis H0.2 “Base BVs with a synchronically and/or diachronically distinguishable prefix and base BVs without such a prefix do not differ significantly with respect to prefixation” should be rejected: see Table 3. Therefore, for the time being the following alternative hypothesis will be accepted: prefixation of BVs that have a distinguishable synchronic and/or diachronic prefix and prefixation of BVs that do not have such a prefix do differ significantly. As the results presented in Tables 34 and in Fig. 1 reveal, having a synchronic and/or diachronic prefix has a negative impact on BV prefixation (which is consistent with Babić, 1978: 74; Babić, 2002: 537; Piperski, 2018: 117–118).

Theoretical aspectological literature points out that some BVs are less stable than others. Additionally, there are claims that some BVs have not only prefixed, but also suffixed derivatives. However, the interplay of these processes has never been explicitly empirically linked. Therefore, this study tested whether Croatian BVs for which suffixed derivatives were attested in the corpora of Croatian (e.g. ručati ‘to have lunch’, večerati ‘to dine’, vezati ‘to bind’, instalirati ‘to instal’, organizirati ‘to organize’, parkirati ‘to park’) are more susceptible to prefixation. The results presented in Tables 34 and in Fig. 1 strongly suggest that BVs for which suffixed derivatives were attested are more prone to prefixation. In other words, the generalized linear mixed model indicates that the null-hypothesis H0.3 “Base BVs with suffixed derivatives attested in corpora of the Croatian language and base BVs without such derivatives do not differ significantly with respect to prefixation” should be rejected, see Table 3. Instead, for the time being an alternative hypothesis will be accepted: there are significant differences in the prefixation of BVs for which suffixed derivatives were attested in the corpora of Croatian and of BVs for which no such derivatives were attested.

The Russian aspectologist Avilova (1968: 67–68) put forward the hypothesis that prefixation of BVs in Russian is influenced by the polysemy of a base BV. She argued that BVs with fewer meanings (e.g. arendovat’ ‘to rent’, atakovat’ ‘to attack’) should be more prone to prefixation. The results obtained with the generalized linear mixed model and presented in Table 3 demonstrate that the null-hypothesis H0.4 “Base BVs with different numbers of meanings do not differ significantly with respect to prefixation” should be rejected. Moreover, the empirical results obtained for prefixation of Croatian BVs suggest quite the opposite of what Avilova (1968: 67–68) assumed for Russian BVs. That is, prefixation of BVs with more meanings (e.g. častiti ‘to invite/to respect’, vezati ‘to bind’, cementirati ‘to cement’, generirati ‘to generate’, maskirati ‘to mask’) is more prevalent than prefixation of BVs with fewer meanings (e.g. čestitati ‘to congratulate’, opetovati ‘to do something repeatedly’ asfaltirati ‘to asphalt’, lektorirati ‘to proofread’, sistematizirati ‘to systematize’). Therefore, it is clear that the null-hypothesis H0.4 should be rejected. For the time being the following alternative hypothesis will be accepted: base BVs with different numbers of meanings do differ significantly with respect to prefixation. As already discussed in Sect. 4.1 the independent variable of the number of meanings of the base biaspectual lemma was difficult to operationalize in a completely reliable way. Nevertheless, the results obtained revealed a very interesting fact: the more polysemous BVs seem to be more prone to prefixation. One of the logical explanations could be that this is caused by a disambiguation technique. For instance, the BV vezati ‘to bind’ has 11 meanings (cf. Matasović & Jojić, 2002: 1420; Jojić et al., 2015: 1675) and is attested with 24 different (combinations of) prefixes. Some derivatives (e.g. odvezati ‘to untie’, podvezati ‘to tie up/to lift’, razvezati ‘to untie’) are clearly lexical (also known as specialized perfectives in Janda’s 2007a: 609 terms). Others seem to be aspectual pairs (natural perfectives in Janda’s 2007a: 609 terms) for certain meanings. The following lines present several examples of the latter. The biaspectual lemma vezati in its meaning ‘to impose a legal or contractual obligation on’ has the PFV derivative obvezati as its natural perfective. The meaning ‘to fix together and enclose (the pages of a book) in a cover’ of the same biaspectual lemma has the PFV derivative uvezati as its natural perfective. The PFV derivatives zavezati and svezati serve as natural perfectives for the meanings ‘to wrap something tightly’, ‘to restrain someone or something by tying’ and ‘to fasten with a knot’. The PFV derivative povezati (se) is the natural perfective for the meaning ‘to establish a relationship or link with someone based on shared feelings, interests, or experiences’.

Some scholars connect a fair number of sociolinguistic factors to the usage of BVs and their derivatives (e.g. some derivatives are labelled colloquial; prefixed derivatives are more acceptable to younger speakers, etc.). In this respect, in this study it was assumed that the different corpora of Croatian could reflect the importance of some of the (socio)linguistic factors mentioned in the literature. I conjectured that corpora of Croatian that were compiled from texts written in standard Croatian would have fewer prefixed derivatives of BVs than corpora that contain colloquial texts and texts that have not been proofread and corrected in order to meet the norm. As the results obtained with the generalized linear mixed model presented in Tables 34 and in Fig. 1 clearly demonstrate, the null-hypothesis H0.5 “The same base BVs do not differ significantly with respect to prefixation when data from different corpora of the Croatian language are compared” should be rejected. Instead, an alternative hypothesis will be accepted for the time being: corpora of Croatian (the texts from which they are compiled) do influence the prefixation of BVs. In other words, prefixation of BVs is more frequent in corpora that contain colloquial and unproofread texts than in corpora that were compiled from texts written in the standard Croatian variety. For instance, while BVs such as specijalizirati ‘to specialize’, rezervirati ‘to make a reservation’, šokirati ‘to shock’, reproducirati ‘to reproduce’, negirati ‘to deny’ and operirati ‘to operate’ have perfective derivatives in the hrWaC and Forum, i.e. corpora that contain colloquial and unproofread texts, their PFV derivatives have not been attested in the Croatian Language Repository and in the Croatian National Corpus, i.e. corpora that were compiled from texts written in the standard Croatian variety. This can be clearly observed in Fig. 1 by comparing prefixation rates in the hrWaC corpus and its subcorpus Forum on the one hand with prefixation rates in the Croatian Language Repository and the Croatian National Corpus on the other.

6 Conclusions and further perspectives

This study addressed five research questions concerning prefixation of BVs in Croatian. In terms of the methodology applied, it is the first such survey of BVs not only in Croatian, but also in Slavic aspectology in general. In total, five factors that affect the prefixation of BVs in Croatian were identified.

As this paper demonstrates, prefixation of BVs is not a random process, but quite the opposite. The empirical study of BVs on the morphological level has confirmed the presence of unquestionable regularities in the process of prefixation. That is, the process is influenced by a range of factors. Some of them can be attributed to the lexical level, such as number of meanings of a biaspectual lemma (RQ4 “Does the number of meanings of a base biaspectual lemma influence its prefixation?”), and some are related to the morphological level, such as presence of a synchronic and/or diachronic prefix within a base BV (RQ2 “Are base BVs with a synchronically and/or diachronically distinguishable prefix less prone to prefixation than BVs that do not begin with such a prefix?”) and the existence of a suffixed derivative of a base BV (RQ3 “Are base BVs with attested suffixed derivatives more prone to prefixation than BVs without such derivatives?”). In this study, the impact of the origin of the base BV on its prefixation was also linked to its morphological structure. The studied sample of 237 Croatian BVs contained 37 BVs of Slavic origin formed with various suffixes and 200 biaspectual borrowings with the suffix -ira- (RQ1 “Do base BVs of Slavic origin and biaspectual borrowings behave differently with respect to prefixation?”). Moreover, as different rates of prefixed derivatives in the four examined corpora of the Croatian language indicate, prefixation of BVs in Croatian is affected by sociolinguistic factors as well (RQ5 “Are prefixed derivatives of base BVs equally present in different corpora of the Croatian language, i.e. corpora reflecting standard and colloquial language use?”).

The post hoc test helped to detect how exactly individual levels of each independent variable influence prefixation of BVs in Croatian. We now know that BVs of Slavic origin are more likely to be prefixed than biaspectual borrowings (RQ1). Further, a synchronic and/or diachronic prefix within a base BV has a negative impact on the prefixation of such verbs (RQ2). In contrast, BVs from which suffixed derivatives have been formed are more likely to be prefixed (RQ3). Lastly, the post hoc test revealed that prefixation of BVs is more frequent in corpora with colloquial texts (RQ5).

Finally, it should be noted that the same factors or a part of them could be relevant for the prefixation of imperfective verbs in Croatian, but this has yet to be proven empirically. It would definitely be interesting to compare whether there is a difference in how the prefixation of BVs and imperfective verbs is affected by the aforementioned factors.