1 Introduction

In the 1960s and 1970s, there was significant interest in studying the urbanization of Indigenous populations (Cooke & Bélanger, 2006). The movements or migrations of Indigenous people toward cities were a key part of the urbanization process. These movements toward cities were perceived to be motivated primarily by a desire to escape isolation and the difficult economic conditions of Indian reserves, and thus were seen to be an indication of various needs not been fulfilled in or outside of Indigenous communities (Falconer, 1985; Trovato et al., 1994).

Subsequent research has provided a more nuanced portrait of the phenomenon. Demographic studies of the population living on Indian reserves show that this population is growing, and that there are more people entering Indian reserves than leaving them (Amorevieta-Gentil et al., 2015). Between 2006 and 2016, the population living on reserve increased by 42,500 people.Footnote 1 The main factor for this growth is the high fertility rate of the Indigenous population, which has remained above the generation replacement level (Morency et al., 2018). However, internal migration is also considered to be a contributor to the population growth on Indian reserves, particularly for Registered Indians (Clatworthy & Norris, 2007, 2014; Cooke & Penney, 2019). Estimates based on the retrospective census information have shown steady, small positive net migration for Indian reserves since the 1970s (Amorevieta-Gentil et al., 2015; Clatworthy & Norris, 2007, 2014; Cooke & Penney, 2019; Norris et al., 2004; Siggner, 1977). In the five quinquennial periods beginning with 1986 to 1991 until 2006 to 2011, the population of Registered Indians on Indian reserves increased by between 9000 and 15,000 people through internal migration (Clatworthy & Norris, 2007, 2014; Cooke & Penney, 2019; Norris et al., 2004). Clatworthy & Norris (2014) found that the people who migrated to Indian reserves came primarily from rural areas, but also from urban areas. Some studies suggest that a substantial portion of the migration into and out of Indian reserves is circular in nature,Footnote 2 i.e., that the migration is part of repetitive movements between two or more regions (Cooke, 1999). Furthermore, according to prior research, migration among Indigenous populations is influenced by several factors and triggered by multiple motivations (Cooke, 1999; Trovato et al., 1994).

Estimates of migration flows between Indian reserves and off-reserve areas in the existing literature have been derived using the retrospective information collected in the long-form census on the place of residence 5 years priorFootnote 3. This is not only because the census was the only source of information available, but also since its large sample size makes it possible to measure a relatively rare event among relatively small populations. However, the recent availability of linked data from consecutive censuses provides new possibilities to study migration by comparing the place of residence of individuals enumerated in two consecutive censuses. Moreover, data linkage provides characteristics and place of residence (including Indian reserves) of individuals at two points in time, making it possible to examine both the characteristics associated with migration and the changes induced by migration.

The main goal of this paper is to study the potential uses of census data linkage in analyzing the migration flows into and out of Indian reserves, and to compare these results with the traditional retrospective approach based on a single census (long-form questionnaire).

This paper has five main sections. In the following section, we provide a description of the benefits and limitations of using census data to analyze geographic mobility (particularly at low levels of geography), and the potential of using data linkage to fill some significant data gaps. Next, the data sources and methods used to link two consecutive censuses and calculate the weights to compensate for non-linked records are explained. The strengths and limitations of using this new data source for migration analysis are then discussed, followed by an overview of the various census data linkages that were used in this paper to calculate alternative estimates of migration flows between Indian reserves and off-reserve areas. These estimates are then compared with those obtained from a single census (long-form questionnaire). The new estimates obtained from data linkage indicate that Indian reserves lost more people than they gained through internal migration between 2006 and 2016, contrary to what each individual census shows. Still, despite the striking change of sign in net migration, Indian reserves have continued growing in the period due to birth rates, and they have not experienced anything close to an exodus. The fifth section gives an overview of how data linkage was used to compare the place of residence information in the 2016 Census long-form questionnaire with that from the 2011 Census long-form questionnaire (National Household Survey [NHS]). According to the results of this analysis, various factors related to the way the information in the census long-form questionnaire is reported and processedFootnote 4 may affect the accuracy of the information on migrant status and place of residence five years prior to the census.

2 Using the Census to Analyze Geographic Mobility

Because of the richness of its content, its large sample size, and its fine geographic detail, the census long-form questionnaire is a preferred source for analyzing geographic mobility. However, the census also has certain limitations when it comes to analyzing migration into and out of Indian reserves, and these limitations deserve attention. They are described here, and a distinction is made between those that are intrinsic to the census as a data collection tool, and those that are inherent to the use of retrospective information, such as the possibility of recall and response errors, proxy responses, and issues with geocoding an individual’s reported place of residence 5 years prior to the census. While the first limitation affects the quality of the migration estimates regardless of the method used (the retrospective approach or the comparison of linked censuses), the second limitation can be avoided by using census data linkage.Footnote 5

2.1 Limitations Inherent to the Census

Estimates of migration flows into and out of Indian reserves can be affected by how the census reaches its target population. Some individuals may not be enumerated (undercoverage), while others may be counted more than once (overcoverage). Compared with the rest of the population, highly mobile individuals and young adults are more likely to be missed. This could be an issue for the Indigenous population in particular, as it is generally younger and more mobile than the Canadian population as a whole (Clatworthy & Norris, 2014; Dion & Coulombe, 2008). The combined effect of undercoverage and overcoverage is referred to as census net undercoverage (since undercoverage tends to be greater). Net undercoverage is positive in most regions of Canada and is generally higher on than off Indian reserves (Clatworthy & Norris, 2007, 2014; Norris & Clatworthy, 2011).

Some individuals living on Indian reserves may also not be enumerated because census data were either not collected on their reserve or were incomplete. For example, there were 14 reserves (referred to as incompletely enumerated reserves) in the 2016 Census and 31 in the 2011 CensusFootnote 6 for which data were not available (Statistics Canada, 2017b). Any individual who lived in an incompletely enumerated reserve in the start year, the end year, or both years of a linkage cannot be linked, and is therefore excluded from the analysis. In a linkage of the 2011 and the 2016 census, this represents about 40,000 individuals from 35 reserves (Statistics Canada, 2020), or 10% of the total population living on Indian reserves.

Other types of non-sampling errors that can affect the quality of estimates from census data are misunderstanding of the questions, incorrect data capture, or non-response (Statistics Canada, 2018). Non-response, in particular, is hard to measure for specific populations, but could be relatively more prevalent among Indigenous populations for reasons such as apprehensions about government data collection, use of the data for government control, or barriers related to language or literacy and numeracy issues (Wright et al., 2020). Higher level of non-response increases the risk of non-response bias, which occurs when non-respondents differ from respondents (Statistics Canada, 2018).

The quality of migration flow estimates also depends on whether enumerated individuals are registered in the right place. For example, individuals enumerated at the wrong residence during the census may mistakenly be considered migrants. While the census data are generally thought to be of good quality in this regard, it is more challenging to accurately collect the usual place of residence for some populations, such as individuals living in more than one place (e.g., students living in an apartment during the school year and with their parents the rest of the time, or workers living in another area for work).

Lastly, the census long-form questionnaire is not distributed to individuals living in institutions. This could be non-negligible for Indigenous populations, as they tend to be overrepresented among the institutional population (Clatworthy & Norris, 2007, 2014; Norris & Clatworthy, 2011).

2.2 Limitations Inherent to the Use of Retrospective Information

Collecting information on individuals place of residence 1 or 5 years prior to the census date relies on their recollection of past events, which may be poor, especially after 5 years. Because the reference date is not linked to any specific event that could provide a reference point for respondents, it could be more challenging for them to remember this information. Memory problems can affect the quality of the migration flow size estimates (e.g., a migration could be forgotten) and of the migrants’ characteristics (e.g., a place of residence error or a forgotten postal code could affect the quality of the geocoding). Accurately remembering information about one’s place of residence 5 years prior to the census can be more challenging for highly mobile individuals (e.g., circular migrants).

The vagueness of the concept of municipality used in the census question on the place of residence 1 or 5 years earlier may also affect the quality of responses. Statistically speaking, municipality refers to the boundaries of the census subdivision (CSD), and an Indian reserve has a distinct CSD name and code.Footnote 7 However, there can be discrepancies between statistical rules and individuals’ perception of their reality. For instance, some respondents provide a name of municipality (CSD) of origin that refers to the closest census metropolitan area (CMA) or census agglomeration (CA) instead of the name of their Indian reserve.Footnote 8

The quality of responses can also be affected by the fact that household members’ responses are often given in an indirect interview (proxy responses).Footnote 9 It can be difficult for a person to know or remember the place of residence of each member of their household 5 years earlier, especially if they are not immediate family. This is more likely to be the case for larger households (which are more prevalent on Indian reserves). Furthermore, the effects of proxy responses are potentially greater on reserves, since data are collected through interviews with enumerators who ask the questions to the person or people in the household when they visit, compared with online questionnaires, for which the person most equipped to respond for all family members will generally do so.

Once collected, the information goes through an intensive data processing step. Geocoding—a process in which collected information is matched to a CSD within the Standard Geographic ClassificationFootnote 10—is key for information on the place of residence 1 or 5 years prior to the census. This is a highly sophisticated process that continuously improves from census to census. However, there are circumstances that may affect its precision, for example, when the place of residence 1 or 5 years earlier is not a unique name, which is the case for a considerable number of Indian reserves. A number of reserves have names that are similar to those of neighboring off-reserve CSDs (e.g., Chilliwack 1 [on reserve] and Chilliwack [off reserve]), which can create confusion in the geocoding process, especially if both CSDs share the same postal code. The complex spelling of certain Indian reserves—which are more prone to errors—can also lead to geocoding errors. Postal codes help improve the geographic coding process, but less so in rural areas, as one postal code can often cover more than one CSD. Postal code information is also often unreported, which is most likely because of respondents’ trouble remembering. Geocoding-related aspects will be addressed in greater detail in Section 5 and Appendix 2.

Overall, at large geographic scales, these obstacles are likely minor and do not significantly bias the results. Migratory flows—as measured by census data—are generally in line with the measurements from other sources, such as income tax records. However, for lower levels of geography (e.g., CSD level), these obstacles could be more problematic. Unfortunately, a number of factors can make comparisons with other data sources harder, including sources that lack variables, different methods of operationalizing migration, differences in the target populations, and different reference periods. Census data linkage provides new opportunities for validation and analysis.

3 Description of Data Files and Method

Linking files is a complex task. The two most challenging steps are linking records—which involves linking as many records as possible while avoiding false positives (i.e two records that belong to different individuals)—and accurately weighting the file so that the linked individuals are representative of the whole population of interest (including individuals who could not be linked). Below is a brief description of the data files used in this study and the methods used for linkage and weighting.

3.1 Description of Linked Census Data Files

Every 5 years, Statistics Canada conducts the Census of Population, which is a key source of information on the Canadian population. All Canadians are asked to complete the census short-form questionnaire. A sample of the population is also asked to answer the census long-form questionnaire, which is more elaborate in terms of content. The two questionnaires are mandatory, except in 2011, when the census long-form questionnaire was replaced by the 2011 NHS, which—although it was similar in content to the previous (2006) and subsequent (2016) census long-form questionnaires—was voluntary. The sampling scheme for the long-form questionnaire has also varied over the years. The proportion of sampled households was 33% in the NHS and 25% in the 2016 Census. However, for all northern communities and Indian reserves, no sampling was used. In other words, all households were asked to complete the census long-form questionnaire.

Data linkage makes it possible to combine the data collected through these questionnaires from two consecutive years. Each combination has its advantages. For example, linkage between two consecutive census long-form questionnaires contains a large number of variables at two points in time, allowing for comprehensive multivariate analyses. Conversely, linkage between two consecutive census short-form questionnaires contains a much larger number of records. Hence, a linkage between a census short-form questionnaire and a census long-form questionnaire provides a mix of benefits: a relatively large number of records with a large number of variables (at one point in time). In general, the objectives of a researcher will dictate which type of linkage to use.

Several different linkages built from the 2006, 2011, and 2016 census short-form questionnaires (S2006, S2011, and S2016, respectively), and the 2011 NHS and the 2016 Census long-form questionnaires (L2011 and L2016, respectively) were used for this study.Footnote 11 The goal was to assess the consistency of the results and make sure that they are not artifacts of the linking methodology and the weighting process (discussed in the next sections). Likewise, the linkage of S2006/S2011 was made to assess consistency overtime.Footnote 12

In all the files considered, the geographical limits used were those defined based on the 2016 Census. The five different linkagesFootnote 13 used in this study are:

  • L2011/L2016: linkage of the 2011 NHS and the 2016 Census long-form questionnaires (L2011 and L2016)

  • S2011/S2016: linkage of the 2011 and 2016 census short-form questionnaires (S2011 and S2016)

  • L2011/S2016: linkage of the 2011 Census long-form questionnaire (L2011) and the 2016 Census short-form questionnaire (S2016)

  • S2011/L2016: linkage of the 2011 Census short-form questionnaire (S2011) and the 2016 Census long-form questionnaire (L2016)

  • S2006/S2011: linkage of the 2006 and 2011 census short-form questionnaires (S2006 and S2011).

For clarity, retrospective information in a census file will be referred to with “_R” in the file name. For example, the place of residence in L2016_R is the place of residence 5 years earlier collected retrospectively in L2016.

3.2 Linkage Method

The linkage of individuals from two consecutive censuses was performed using a multistage method. This involved using personal information (given name, family name, sex, and birth date) and household information (geography [province or territory and census division], postal code, telephone number, and household composition). Not all of this information is used at each stage. For instance, some stages use the geographic information, while at other stages, geography is not considered at all in order to maximize the linkage of migrants. Only single links were accepted to avoid linking records that do not belong to the same individual.Footnote 14 Because of weak linkage rates for the population living in institutions and in collective dwellings (for linkages using the census short-form questionnaire), these populations were excluded from the linkage process. Therefore, this study covers only the population living in private households at both points in time.Footnote 15

Linkage rates are lower for the population living on reserve than for the entire Canadian population. For example, using the population of the start-year file (2011) as the denominator, and observing the whole population intended to be linked, the linkage rates for the population living on reserve are 42% in S2011/S2106 and 46% in L2011/S2016, in comparison to 80% and 83%, respectively, for the population living off reserve (please see Appendix 1 for important considerations on the calculation and interpretation of linkage rates).

A number of reasons can explain the differences in linkage rates observed between subgroups of the population, but incorrect or missing information could be an important factor. First, the collection method using enumerators—the predominant method used in Indigenous communities—could introduce errors in the names collected because of misspellings. Moreover, the impact of this situation could be made worse by names with a complex spelling.Footnote 16 The information collected (e.g., date of birth) can also be less accurate when collected through indirect interviews. The use of enumerators on reserves can increase the chances for indirect interviews if the questions are asked only to the people present in the household at the time of their visit. Proxy responses may also be more frequent in large households, which is a situation that is more common on reserves.Footnote 17

The relatively weak linkage rates for the population living on reserve are a source of concern because biases could occur if unlinked individuals have characteristics different from those of linked individuals (selection bias). In the context of this study, there is no particular reason to believe that the quality of the matches is lower among migrants than among non-migrants because the control methods—i.e., methods used to assess the quality of each linked record—are the same in both cases. Moreover, the proportion of reserve in-migrants and out-migrants is similar among the linked and unlinked populations, which likely means that there is no bias for this specific characteristic. Lastly, linkage rates were similarly low for people living on reserve in 2011 and in 2016, and should be therefore similar for both reserve in-migrants and out-migrants. While these facts are reassuring, they do not imply a total absence of bias. The choice of an appropriate weighting strategy (described next) can minimize potential biases induced by the linkage process.

3.3 Weighting Method

Linked files are subsets of the source files they stem from.Footnote 18 For example, the S2011/S2016 file is a subset of the 2011 Census file because not everyone in the 2011 Census population was linked to the 2016 Census file. People who were enumerated in the 2011 Census but were not linked to the 2016 Census file can be thought as non-respondents of a survey. Therefore, to infer results from the S2011/S2016 file for the entire 2011 Census population, a special set of weights must be computed to account for these non-respondents (i.e., non-linked records).

The weights must also take into account the fact that some people in the 2011 Census population could not be linked to the 2016 Census because they had either died or left the country before the 2016 Census.Footnote 19 These people can be considered “out-of-scope” cases in a longitudinal survey sample. As a result, a special adjustment to the weights is needed to take these cases into account. The weights for the S2011/S2016 file make it possible to make inferences for the population that was alive and living in the country in both 2011 and 2016.Footnote 20 This is also true for other linked files used in this study. Various methods were used to identify the key variables and interaction terms to be used to create homogeneous adjustment classes depending on whether start year file is a long-form census or a short-form census. Tests were conducted to ensure that the results were not overly sensitive to the choice of variables.

In the S2011/S2016 file, all individuals have an initial weight of 1, since the S2011 file covers the entire population and every record is self-representative. This weight of 1 must first be adjusted to account for individuals in the 2011 Census population that were not linked to S2016. This process is somewhat similar to the calibration of weights for non-response in a survey. This adjustment is done among groups of people with similar characteristics (called homogeneous groups), made by combining several variables relevant to this study and available in S2011, including geography (province or territory, census division, and an on-reserve or off-reserve indicator), age, sex, marital status, mother tongue (including Aboriginal and Inuit languages), language spoken at home (including Aboriginal and Inuit languages), household size, and type of census family. Within each homogeneous group, the weights of linked records with characteristics XYZ are inflated so that they also represent the non-linked records with the same characteristics XYZ. After this step, the sum of the weights for the linked records in the S2011/S2016 file is equal to the 2011 Census population size.Footnote 21

The next step was to adjust the weights of the linked records, which now represented the entire 2011 Census population, to account for the fact that not all of them could be linked to S2016 because of death or emigration, i.e., some people in the 2011 Census population were no longer alive or living in Canada in 2016. To make this adjustment, estimates of the number of people in the 2016 Census population who were alive and in the country in 2011 were needed (i.e., the population that could potentially be linked), then the weights of the linked records were calibrated to these new totals. Estimates of the population that could potentially be linked between 2011 and 2016 were obtained using information collected in L2016_R. With this information, anyone in the 2016 Census population who was not born or did not live in Canada 5 years prior to the census (i.e., people who could not be linked) were removed, resulting in a weighted estimate of the population for both 2011 and 2016.Footnote 22 These new totals were calibrated by province or territory of residence in 2016, place of residence on- and off-reserve, age, and sex. This made it possible for the final weights to be representative of the population that was alive and living in the country in both 2011 and 2016.Footnote 23 The same strategy was used to calculate the weights for the S2006/S2011 file.Footnote 24 The weights of the different linkage files are described in Table 1.

Table 1 Description of weights in the S2011/S2016, L2011/L2016, L2011/S2016, and S2006/S2011 files

The strategy used to compute the weights for the L2011/L2016 file differs slightly from the strategy used for S2011/S2016 because it must account for the fact that not everyone was selected to respond to the NHS in 2011 and the long-form questionnaire in 2016. Therefore, the initial weights were computed as the product of the 2011 NHS final weight and the 2016 Census long-form questionnaire final weight to reflect the inverse of the probability that an individual would be selected to complete both. The next steps (i.e., making adjustments for non-linked records and for deaths and emigration) are similar to those used for the S2011/S2016 file, except that, in the L2011/L2016 file, a much broader range of characteristics could be considered when adjusting for non-linked individuals. As a result, more complex methods (logistic regression and clustering techniques) were used to generate homogeneous response groups.

The distribution of the adjusted weights for the L2011/L2016 file (Table 1) shows wide variations by place of residence in 2011 and 2016, primarily because of the different collection strategies used in Indigenous communities. This has important implications, as an incorrect link between two records can have a major impact on the quality of the results if its weight is high.

The strategy used to weight the L2011/S2016 is similar to what was described above, but is not presented here to avoid redundancy.Footnote 25

3.4 Limitations of Linked Files

A large number of complex procedures must be carried out before using linked data to analyze migration flows. This is particularly true for Indigenous populations and populations living on Indian reserves because of inherent difficulties, such as relatively low response and linkage rates, as well as the use of different collection strategies. In addition to the limitations inherent to census data (described in Section 2.1), linked census files have their own limitations. However, while linked census files could be used to evaluate the quality of the retrospective information on the place of residence 5 years prior to the collection date (Section 5), no data file could be used to highlight errors in the places where individuals were enumerated, wrong or missed record linkages, or remaining selection biases that could not be compensated for through the reweighting process, despite the wide range of variables used.Footnote 26 Therefore, despite all of the precautions taken, users should keep these considerations in mind when using estimates from linked census files.

Lastly, estimates produced from linked census files are subject to sources of uncertainty that are impossible to measure. Unfortunately, it was not possible to provide variance for estimates computed using linked census files (nor for estimates computed using retrospective census information). Because migration is a fairly rare event, counts are often relatively small and must be interpreted with caution.

4 Main Results

This section provides the estimates of migration flows into and out of Indian reserves computed using census data linkages for the quinquennial periods of 2006 to 2011 and 2011 to 2016. The range of sources makes it possible to verify the robustness of these estimates, as each file has its own strengths and limitations (described in Section 3). Listed first are the totals for Canada and each province, for large geographical areas following a rural–urban gradient, and then by age group. These estimates are compared with those obtained using retrospective information from the census. At the end of this section, an additional comparison with an alternative data source is presented.

4.1 Comparison of Results from Linked Files with Those Obtained from a Single Census

According to the retrospective information collected by the 2016 Census long-form questionnaire (L2016_R), between 2011 and 2016 Indian reserves in Canada had a net gain of about 10,600 people through internal migration (Table 2) which contributed to a 3.4% increase of their population during that period.Footnote 27 In contrast, estimates from the S2011/S2016, L2011/L2016, and L2011/S2016 files show net losses of 18,900, 20,400, and 18,700 people, respectively, which contributed to a decrease of between 5.4 and 5.9% of their population. The differences found were mainly in the number of out-migrants, as estimates of out-migrants from the linked files were more than double those from the 2016 Census. Estimates of in-migrants were fairly consistent across all sources.Footnote 28 Similar results were obtained for the 2006-to-2011 period by comparing estimates from L2011 with those from S2006/S2011 (Table 3). According to the L2011_R file, at the national level, Indian reserves had a net migration gain of 11,700 people, compared with a net loss of about 14,000 people in the S2006/S2011 file.

Table 2 Estimates of Indian reserve in-migrants, out-migrants, and net migration between 2011 and 2016, total population and Registered Indian population, for Canada and the provinces, by data source
Table 3 Estimates of Indian reserves’ in-migrants, out-migrants and net migration between 2006 and 2011, total population, for Canada and the provinces, by data source

The results for the population of Registered Indians were similar (see Table 2). According to the 2016 Census (L2016_R), in the quinquennial period from 2011 to 2016, Indian reserves gained 6500 people through internal migration (which contributed to an increase of 2.4% of the Registered Indian population on reserve during this 5-year period) compared with net losses of 21,400 and 25,100 in L2011/L2016 using respectively the Registered Indian status information in 2016 and 2011, and a loss of 24,000 people in L2011/S2016 using the Registered Indian status information in 2011 (which contributed to a decrease of between 6.9 and 7.8% of the Registered Indian population). While it seems like it should not matter whether the information about Registered Indian status in 2011 or 2016 is used, the population may—in fact—differ slightly for a few reasons. For example, some people may have registered between 2011 and 2016, in part because of legislative changes to the Indian Act,Footnote 29 and some others may have given different responses in 2011 and 2016 regarding their Registered Indian status because of response mobility (Statistics Canada, 2019).

The trends described above at the national level were also observed at the provincial level.Footnote 30 Linked files show negative net migration on Indian reserves in all provinces except British Columbia in 2011/2016, whereas the non-linked L2011 and L2016 files (using retrospective responses) almost always show net positive migration.

Figure 1 shows the estimated net migration for Indian reserves in relation to other types of regions along an urban–rural continuum: urban CMA, urban non-CMA, and rural.Footnote 31 All estimates computed from linked files were negative, suggesting that Indian reserves lost people through internal migration to all three types of regions, for both the population as a whole and Registered Indians. The results show that Indian reserves’ largest net losses were to rural areas or urban CMAs. For the Registered Indian population in particular, the largest net losses were to urban CMAs. In contrast, L2016_R data show positive net migration everywhere.

Fig. 1
figure 1

Five-year net migration flows for the total population and the Registered Indian population, by data source, Canada, 2011 to 2016. Notes: CMA stands for census metropolitan area. Results exclude the population living in the territories or on an incompletely enumerated Indian reserve in 2011 or 2016. Sources: L2016_R: retrospective information from the 2016 Census; L2011_R: retrospective information from the 2011 National Household Survey; L2011/L2016: Data linkage between the 2011 National Household Survey and the 2016 Census long-form questionnaire; S2011/S2016: Data linkage between the 2011 Census short-form questionnaire and the 2016 Census short-form questionnaire

Figure 2 shows Indian reserves’ 5-year out-migration rates by age group for the total population and the Registered Indian population in Canada in 2011. While the different data sources show similar patterns by age group, out-migration rates estimated from linked files were much higher than in the L2016 file (at all ages). Conversely, rates of in-migration on Indian reserves were much more similar in both level and age profile (Figs. 2 and 3) from one source to another.

Fig. 2
figure 2

Indian reserves’ 5-year out-migration rates (per thousand), 2011 to 2016, by age group and data source. a Total population. b Registered Indian population. Sources: L2016_R: Retrospective information from the 2016 Census long-form questionnaire; L2011/L2016: Data linkage between the 2011 National Household Survey and the 2016 Census long-form questionnaire; S2011/S2016: Data linkage between the 2011 Census short-form questionnaire and the 2016 Census short-form questionnaire; L2011/S2016: Data linkage between the 2011 National Household Survey and the 2016 Census short-form questionnaire

Fig. 3
figure 3

Indian reserves’ 5-year in-migration rates (per thousand), 2011 to 2016, by age group and data source. a Total population. b Registered Indian population. Sources: L2016_R: Restrospective information from the 2016 Census long-form questionnaire; L2011/L20126: Data linkage between the 2011 National Household Survey and the 2016 Census long-form questionnaire; S2011/S2016: Data linkage between the 2011 Census short-form questionnaire and the 2016 Census short-form questionnaire; L2011/S2016: Data linkage between the 2011 National Household Survey and the 2016 Census short-form questionnaire

Figure 4 shows Indian reserves’ net migration rates by age group, computed from all sources. Net migration estimated using linked files was negative for all age groups except for the population aged between 50 and 75. In contrast, net migration estimated using L2016_R was positive or close to 0 for all ages.

Fig. 4
figure 4

Indian reserves’ 5-year net migration rates (per thousand), 2011 to 2016, by age group and data source. a Total population. b Registered Indian population. Sources: L2016_R: Retrospective information from the 2016 Census long-form questionnaire; L2011/L2016: Data linkage between the 2011 National Household Survey and the 2016 Census long-form questionnaire; S2011/S2016: Data linkage between the 2011 Census short-form questionnaire and the 2016 Census short-form questionnaire; L2011/S2016: Data linkage between the 2011 National Household Survey and the 2016 Census short-form questionnaire

4.2 Comparison with Net Migration Obtained from a Residual Approach Using Multiple Censuses

As mentioned in the Introduction, the population living on Indian reserves increased by 42,500 people between 2006 and 2016 according to the retrospective information collected in each census. This can be decomposed roughly by estimating the contribution of the various components (fertility, mortality, and migration) that influenced this population growth. First, it can be assumed that the contribution of international migration to population growth on Indian reserves is negligible. Second, the number of births between two censuses can be estimated by counting the population younger than age 5 in the last census. It was estimated that between 2006 and 2016, there were 72,800 births on Indian reserves. By applying death rates to the population at the beginning of an intercensal period, the number of deaths on Indian reserves was estimated at around 17,200 over the same period. As a result, natural growth (i.e., the difference between the number of deaths and number of births) contributed to 131% of the observed growth (+ 55,600) between 2006 and 2016.

If these figures are assumed to be accurate and net internal migration on Indian reserves is estimated as a residual (i.e., the number required to balance total growth at 42,500), then an estimate of − 13,100 would be obtained, which is higher than the sum of the estimates from S2006/S2011 and S2011/S2016 (− 32,900), but much lower than those from L2011_R and L2016_R (+ 22,300). Given that natural growth exceeds the estimated total growth over the study period, net migration on Indian reserves could not be positive.

Once again, it is important to interpret these results with caution, as there is a lot of uncertainty with regard to census estimates of population counts on Indian reserves, particularly because of net undercoverage. This uncertainty not only affects the quality of the migration estimates obtained from linked files, but also attempts to reconcile estimates of migration with those related to observed population growth, like the one done here.

5 Analysis of Discrepancies Related to Migrant Status or Previous Place of Residence Between Retrospective Responses and Linked Census Information

In this section, the S2011/L2016 file was used to compare the retrospective information about an individual’s place of residence in 2011 from the L2016_R file with their place of residence in 2011, as listed in the S2011 file.Footnote 32 In theory, this information should be identical, and discrepancies between the two files could explain why the two sources provide different estimates of migration flows when the place of residence is located on an Indian reserve in one file but not in the other. Table 4 shows a summary of the discrepancies found among linked individuals in the S2011/L2016 file. These discrepancies are of two main types: those related to self-declared migrant status (migrants are those who reported living in a different municipality five years earlier in L2016_R), and those related to migrants for whom the place of residence in 2011 does not match the retrospective information captured in 2016. Table 4 also shows the effect of each type of discrepancy on estimates of global net migration on Indian reserves, assuming that the information from the census linked files is correct.

Table 4 Summary of the discrepancies observed in migrant status or place of residence (on or off reserve) between the retrospective source and the linked source in the S2011/L2016 file

5.1 Discrepancies Related to Migrant Status

There are two cases of migrant status-related discrepancies. The first involves individuals who declared in 2016 that they were living in the same place 5 years earlier, so they were considered non-migrants in L2016, but who were enumerated at different places of residence in S2011 and L2016 (i.e., they migrated into or out of an Indian reserve). Overall, discrepancies of this nature accounted for a difference of 7600 in the net migration on Indian reserves in S2011/L2016 when migration was measured using the linked information on place of residence in 2011 and 2016, instead of the retrospective information contained in L2016_R. The second case is the opposite of the first. It involves self-declared migrants who were enumerated at the same place of residence in S2011 and L2016. Of interest in this study among these cases are the records where retrospective information showed a migration into or out of an Indian reserve. Records exhibiting this type of discrepancy contributed to a difference of 3500 individuals in the net migration on Indian reserves between the linked and retrospective sources.

What causes these differences? There are reasons to believe that the retrospective information is at fault most of the time. One reason is that it may be difficult for respondents to remember whether they were living in the same place 5 years earlier, on a specific date that bears no particular significance to them, and this is especially true for highly mobile individuals. Some very mobile individuals, such as circular migrants, may believe that they have more than one place of residence (although they should choose one usual place of residence as per census guidelines). Another reason is that responses indicating that the respondent was living in the same place 5 years earlier help alleviate response burden, since no retrospective place of residence information needs to be added.

There is also another factor at play in the case of self-declared migrants with two identical addresses in the linked files. An examination of the coding of the retrospective information on the place of residence 5 years earlier showed that there was a bias toward coding these places as “not a reserve.” This could explain why there is a larger number of inconsistent records for migrants onto a reserve (off-reserve to on-reserve) than for migrants out of a reserve (on-reserve to off-reserve). The specific topic of geocoding is addressed in Appendix 2.

5.2 Same Migrant Status, But Discrepancies in On-Reserve and Off-Reserve Residence in 2011

The second type of discrepancy involves individuals who are self-declared migrants and have two distinct places of residences in the linked file, but whose place of residence in S2011 does not match their place of residence 5 years earlier as collected in L2016. This type of discrepancy occurs in four different forms:

  1. 1.

    Migration onto a reserve:

    1. a.

      Origin is on an Indian reserve as per S2011, but off a reserve as per L2016_R (400 cases)

    2. b.

      Origin is off an Indian reserve as per S2011, but on reserve as per L2016_R (2,300 cases)

  2. 2.

    Migration off reserve:

    1. a.

      Origin is on an Indian reserve as per S2011, but off a reserve as per L2016_R (16,200 cases)

    2. b.

      Origin is off an Indian reserve as per S2011, but on reserve as per L2016_R (4,500 cases).

Overall, these four versions of the second type of discrepancy account for a difference of − 13,600 individuals in the net migration of Indian reserves when the place of residence obtained from S2011 is used instead of the one obtained from L2016_R.

The above discrepancies may occur for many reasons, but it is likely that a large proportion of the inconsistencies originates from L2016_R because of factors such as the provision of imprecise or erroneous retrospective information, possibly caused by respondents having trouble remembering the information, or the challenges associated with coding the retrospective information to a precise location. As mentioned above, there seems to be a bias in the geocoding where more locations are wrongly coded as off reserve (if S2011 is assumed to be accurate) than on reserve. Therefore, there are more “erroneous” migrations from outside reserves than from on reserves (see Appendix 2). This is consistent with the results shown here.

6 Conclusion

For several decades, the retrospective questions on the place of residence one year and 5 years prior to the census have been the preferred source for measuring migration flows between Indian reserves and off-reserve areas in Canada. From one census to the next, the same observation has emerged: slightly more people entered Indian reserves than left. Consistency over time and the absence of data allowing for alternative estimates have led these results to be widely used and accepted, in particular by departments, policy-makers and the scientific community.

During that period, Statistics Canada was aware of limitations related to these retrospective census questions, and that these limitations have a greater impact for smaller geographic areas.Footnote 33 For this reason, the agency put forward various initiatives during the last two censuses to improve the coding and processing operations related to these variables. However, without census data linkage, it is very difficult to assess how and to what extent these limitations could affect the quality of the estimates.

The fact that two consecutive censuses are now linked made it possible—for the first time—to obtain alternative estimates of migration flows between Indian reserves and off-reserve areas without some of the limitations inherent to traditional censuses. These new estimates reveal a somewhat different portrait for the periods from 2006 to 2011 and from 2011 to 2016: there were more out-migrants from than in-migrants to Indian reserves, leading to negative overall net migration on Indian reserves. Furthermore, comparisons of data from the two sources shed light on some specific limitations associated with the collection of retrospective information about an individual’s prior place of residence, which led to an underestimation of the number of people leaving Indian reserves.

However, it is important to be cautious when interpreting these results. Although the change of sign in net migration is striking, the size of outward migration flows from Indian reserve does not in any way suggest an exodus, and Indian reserves have been continuously experiencing population growth in last decades due to high birth rates. Besides, these new estimates carry a certain dose of uncertainty. There are, indeed, factors that can affect the precision of the estimates, such as the potential linkages of two records that do not represent the same individual, the exclusion of the population living in incompletely enumerated Indian reserves, the low linkage rates on Indian reserves, and the limitations in the weighting processes. The population living in institutions (which includes many Indigenous people) also created a particular challenge, as it had low linkage rates and was excluded from the analysis as a result. However, as previously mentioned, this population could not be fully excluded from the weighting process.

That said, the results of net migration on reserves obtained from census data linkages may be more consistent with the population growth observed on reserves in the past, as demonstrated by the result of the decomposition of population growth according to the various demographic components, which shows that net migration on reserves could not be positive between 2006 and 2016.

Lastly, census linkages have served in this study as an alternative data source for evaluating the quality of the retrospective information related to the place of residence. This evaluation showed some limitations of this information such as a bias in the geocoding process favoring attribution to off-reserve locations and an underestimation of individuals having migrated. While there is no indication that these issues are specific to some population groups, these results could serve as a starting point for further investigation of the quality of retrospective census information, particularly in regard to other types of geographic areas. They could also provide an opportunity to integrate new data sources, such as data linkages and administrative data, into the coding and processing of census migration data, in particular for smaller areas and Indigenous communities. This would make it possible to further improve the quality of Canadian census data.