Abstract
A major goal of survey sampling is finite population inference. In recent years, large-scale survey programs have encountered many practical challenges which include higher data collection cost, increasing non-response rate, increasing demand for disaggregated level statistics and desire for timely estimates. Data integration is a new field of research that provides a timely solution to these above-mentioned challenges by integrating data from multiple surveys. Now, it is possible to develop a framework that can efficiently combine information from several surveys to obtain more precise estimates of population parameters. In many surveys, parameters of interest are often spatial in nature, which means, the relationship between the study variable and covariates varies across all locations in the study area and this situation is referred as spatial non-stationarity. Hence, there is a need of a sampling methodology that can efficiently tackle this spatial non-stationarity problem and can be able to integrate this spatially referenced data to get more detailed information. In this study, a Geographically Weighted Spatially Integrated (GWSI) estimator of finite population total was developed by integrating data from two independent surveys using spatial information. The statistical properties of the proposed spatially integrated estimator were then evaluated empirically through a spatial simulation study. Three different spatial populations were generated having high spatial autocorrelation. The proposed spatially integrated estimator performed better than usual design-based estimator under all three populations. Furthermore, a Spatial Proportionate Bootstrap (SPB) method was developed for variance estimation of the proposed spatially integrated estimator.
Similar content being viewed by others
Data availability
Data sharing is not applicable.
References
Anselin, L. (1995). Local indicators of spatial association-LISA. Geographical Analysis, 27, 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x
Anselin, L. (1996). The Moran scatter plot as an ESDA tool to assess local instability in spatial association. In M. Fischer, H. Scholten, & D. Unwin (Eds.), Spatial analytical perspectives on GIS in environmental and socio-economic sciences (pp. 111–125). Taylor and Francis.
Biswas, A., Rai, A., & Ahmad, T. (2020). Spatial bootstrap variance estimation method for missing survey data. Journal of the Indian Society of Agricultural Statistics, 74(3), 227–236.
Biswas, A., Rai, A., Ahmad, T., & Sahoo, P. M. (2017). Spatial estimation and rescaled spatial bootstrap approach for finite population. Communication in Statistics-Theory and Methods, 46, 373–388. https://doi.org/10.1080/03610926.2014.995820
Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). Geographically weighted regression: A method for exploring spatial non-stationarity. Geographical Analysis, 28, 281–298. https://doi.org/10.1111/j.1538-4632.1996.tb00936.x
Brunsdon, C., Fotheringham, S., & Charlton, M. (1998). Geographically weighted regression-modelling spatial non-stationary. The Statistician, 47(3), 431–443. https://doi.org/10.1111/1467-9884.00145
Chambers, R. L., & Clark, R. G. (2012). An introduction to model-based survey sampling with applications. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198566625.001.0001
Cochran, W. G. (1977). Sampling techniques (3rd ed.). Wiley.
Efron, B. (1979). Bootstrap methods: Another look at the Jackknife. The Annals of Statistics, 7(1), 1–26. https://doi.org/10.1214/aos/1176344552
Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically weighted regression: The analysis of spatially varying relationships. Wiley.
Fotheringham, A. S., Charlton, M. E., & Brunsdon, C. (1998). Geographically weighted regression: A natural evolution of the expansion method for spatial data analysis. Environment and Planning A, 30, 1905–1927. https://doi.org/10.1068/a301905
Gollini, I., Lu, B., Charlton, M., Brunsdon, C., & Harris, P. (2015). GWmodel: An R package for exploring spatial heterogeneity using geographically weighted models. Journal of Statistical Software. https://doi.org/10.18637/jss.v063.i17
Kim, J. K., Park, S., Chen, Y., Wu, C., & Wu. (2021). Combining non-probability and probability survey samples through mass imputation. Journal of the Royal Statistical Society, Series A. https://doi.org/10.1111/rssa.12696
Kim, J. K., & Rao, J. N. K. (2012). Combing data from two independent surveys: A model-assisted approach. Biometrika, 99, 85–100. https://doi.org/10.1093/biomet/asr063
Kim, J. K., & Tam, S.-M. (2020). Data integration by combining big data and survey sample data for finite population inference. International Statistical Review, 89(2), 382–401. https://doi.org/10.1111/insr.12434
Leung, Y., Mei, C. L., & Zhang, W. X. (2000). Statistical tests for spatial non-stationarity based on the geographically weighted regression model. Environment and Planning A, 32(1), 9–32. https://doi.org/10.1068/a3162
Liu, C., Wei, C., & Su, Y. (2018). Geographically weighted regression model-assisted estimation in survey sampling. Journal of Nonparametric Statistics, 30(4), 906–925. https://doi.org/10.1080/10485252.2018.1499907
Lohr, S. L., & Raghunathan, T. E. (2017). Combining survey data with other data sources. Statistical Science, 32(2), 293–312. https://doi.org/10.1214/16-STS584
Merkouris, T. (2010). Combining information from multiple surveys by using regression for efficient small domain estimation. Journal of the Royal Statistical Society, 72(1), 27–48. https://doi.org/10.1111/j.1467-9868.2009.00724.x
Moran, P. A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society Series B (methodological), 10(2), 243–251. https://doi.org/10.1111/j.2517-6161.1948.tb00012.x
Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37, 17–23. https://doi.org/10.1093/biomet/37.1-2.17
Pebesma, E. J. (2004). Multivariable geostatistics in S: The gstat package. Computers and Geosciences, 30(7), 683–691. https://doi.org/10.1016/j.cageo.2004.03.012
Raghunathan, T. E., Xie, D., Schenker, N., Parsons, V. L., Davis, W. W., Dodd, K. W., & Feuer, E. J. (2007). Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening. Journal of the American Statistical Association, 102, 474–486. https://doi.org/10.1198/016214506000001293
Rao, J. N. K. (2021). On making valid inferences by integrating data from surveys and other sources. Sankhya B, 83, 242–272. https://doi.org/10.1007/s13571-020-00227-w
Royall, R. M. (1970). On finite population sampling theory under certain linear regression models. Biometrika, 57(2), 377–387. https://doi.org/10.1093/biomet/57.2.377
Särndal, C. E., Swensson, B., & Wretman, J. H. (1992). Model assisted survey sampling. Springer.
Thompson, M. E. (2018). Combining data from new and traditional sources in population surveys. International Statistical Review, 87, S79–S89. https://doi.org/10.1111/insr.12292
Valliant, R. (2009). Model-based prediction of finite population totals. Elsevier.
Wu, C., & Thompson, M. E. (2020). Sampling theory and practice. Springer Nature. https://doi.org/10.1007/978-3-030-44246-0
Yang, S., Kim, J. K., & Hwang, Y. (2021). Integration of data from probability surveys and big found data for finite population inference using mass imputation. Survey Methodology, 47(1), 29–58. https://doi.org/10.48550/arXiv.1807.02817
Acknowledgements
The authors would like to thank the anonymous referees and the Editor for constructive comments and suggestions which led to the significant improvement in the manuscript. The first author would like to express his heartfelt gratitude to the ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India and the Graduate School, ICAR-Indian Agricultural Research Institute, New Delhi, India, for providing lab facilities, and overall support to conduct the research work during his Ph.D. programme.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declare no potential conflict of interest relevant to this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Paul, N.C., Rai, A., Ahmad, T. et al. Spatially integrated estimator of finite population total by integrating data from two independent surveys using spatial information. J. Korean Stat. Soc. 53, 222–247 (2024). https://doi.org/10.1007/s42952-023-00244-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-023-00244-1