Skip to main content
Log in

Machine Learning Driven Developments in Behavioral Annotation: A Recent Historical Review

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

Annotation tools serve a critical role in the generation of datasets that fuel machine learning applications. With the advent of Foundation Models, particularly those based on Transformer architectures and expansive language models, the capacity for training on comprehensive, multimodal datasets has been substantially enhanced. This not only facilitates robust generalization across diverse data categories and knowledge domains but also necessitates a novel form of annotation—prompt engineering—for qualitative model fine-tuning. This advancement creates new avenues for machine intelligence to more precisely identify, forecast, and replicate human behavior, addressing historical limitations that contribute to algorithmic inequities. Nevertheless, the voluminous and intricate nature of the data essential for training multimodal models poses significant engineering challenges, particularly with regard to bias. No consensus has yet emerged on optimal procedures for conducting this annotation work in a manner that is ethically responsible, secure, and efficient. This historical literature review traces advancements in these technologies from 2018 onward, underscores significant contributions, and identifies existing knowledge gaps and avenues for future research pertinent to the development of Transformer-based multimodal Foundation Models. An initial survey of over 724 articles yielded 156 studies that met the criteria for historical analysis; these were further narrowed down to 46 key papers spanning the years 2018–2022. The review offers valuable perspectives on the evolution of best practices, pinpoints current knowledge deficiencies, and suggests potential directions for future research. The paper includes six figures and delves into the transformation of research landscapes in the realm of machine-assisted behavioral annotation, focusing on critical issues such as bias.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. (2019) Subreddit Simulator using GPT-2. Reddit. Retrieved 2 November, 2022, from https://www.reddit.com/r/SubSimulatorGPT2

  2. Al Zamil MG, Rawashdeh M, Samarah S, Hossain MS, Alnusair A, Rahman SMM (2018) An annotation technique for in-home smart monitoring environments. IEEE Access 6:1471–1479

    Article  Google Scholar 

  3. Axenie C, Scherr W, Wieder A, Torres AS, Meng Z, Du X, Sottovia P, Foroni D, Grossi M, Bortoli S, Brasche G (2022) Fuzzy modeling and inference for physics-aware road vehicle driver behavior model calibration. SSRN Electron J

  4. Bahnsen CH, Møgelmose A, Moeslund TB (2018) The AAU multimodal annotation toolboxes: annotating objects in images and videos. ArXiv abs/1809.03171

  5. Baker B, Akkaya I, Zhokhov P, Huizinga J, Tang E, Ecoffet A, Houghton B, Sampedro R, Clune J (2022a) Learning to play minecraft with Video PreTraining (VPT). OpenAI. Retrieved 1 November, 2022, from https://openai.com/blog/vpt/

  6. Baker B, Akkaya I, Zhokhov P, Huizinga J, Tang J, Ecoffet A, Houghton B, Sampedro R, Clune J (2022b) Video PreTraining (VPT): learning to act by watching unlabeled online videos. ArXiv abs/2206.11795

  7. Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, Arx SV, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji NS, Chen AS, Creel KA, Davis J, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie LE, Goel K, Goodman ND, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard TF, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass MS, Krishna R, Kuditipudi R, Kumar A, Ladhak F, Lee M, Lee T, Leskovec J, Levent I, Li XL, Li X, Ma T, Malik A, Manning CD, Mirchandani SP, Mitchell E, Munyikwa Z, Nair S, Narayan A, Narayanan D, Newman B, Nie A, Niebles JC, Nilforoshan H, Nyarko JF, Ogut G, Orr LJ, Papadimitriou I, Park JS, Piech C, Portelance E, Potts C, Raghunathan A, Reich R, Ren H, Rong F, Roohani YH, Ruiz C, Ryan J, R'e C, Sadigh D, Sagawa S, Santhanam K, Shih A, Srinivasan KP, Tamkin A, Taori R, Thomas AW, Tramèr F, Wang RE, Wang W, Wu B, Wu J, Wu Y, Xie SM, Yasunaga M, You J, Zaharia MA, Zhang M, Zhang T, Zhang X, Zhang Y, Zheng L, Zhou K, Liang P (2021) On the opportunities and risks of foundation models. ArXiv abs/2108.07258

  8. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv abs/1810.04805

  9. Dhamija S, Boult TE (2018) Automated action units vs. expert raters: face off. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp 259–268

  10. Dong H, Wang W, Huang K, Coenen F (2019) Joint multi-label attention networks for social text annotation. In: Proceedings of the 2019 conference of the north, pp 1348–1354

  11. Dong H, Wang W, Huang K, Coenen F (2021) Automated social text annotation with joint multilabel attention networks. IEEE Trans Neural Netw Learn Syst 32(5):2224–2238

    Article  Google Scholar 

  12. Ganguli D, Hernandez D, Lovitt L, Askell A, Bai Y, Chen A, Conerly T, Dassarma N, Drain D, Elhage N, El Showk S, Fort S, Hatfield-Dodds Z, Henighan T, Johnston S, Jones A, Joseph N, Kernian J, Kravec S, Mann B, Nanda N, Ndousse K, Olsson C, Amodei D, Brown T, Kaplan J, McCandlish S, Olah C, Amodei D, Clark J (2022) Predictability and surprise in large generative models. In: 2022 ACM conference on fairness, accountability, and transparency, association for computing machinery, vol 5 pp 1747–1764

  13. Gaur E, Saxena V, Singh SK (2018) Video annotation tools: a review. In: 2018 International conference on advances in computing, communication control and networking (ICACCCN), pp 911–914

  14. Goldberg SB, Tanana M, Imel ZE, Atkins DC, Hill CE, Anderson T (2020) Can a computer detect interpersonal skills? Using machine learning to scale up the Facilitative Interpersonal Skills task. Psychother Res 31(3):281–288

    Article  Google Scholar 

  15. Hänggi JM, Spinnler S, Christodoulides E, Gramespacher E, Taube W, Doherty A (2020) Sedentary behavior in children by wearable cameras: development of an annotation protocol. Am J Prev Med 59(6):880–886

    Article  Google Scholar 

  16. Hassani A, Shi H (2022) Dilated neighborhood attention transformer. ArXiv abs/2209.15001

  17. Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212

    Article  Google Scholar 

  18. Jäger J, Reus G, Denzler J, Wolff V, Fricke-Neuderth K (2019) LOST: a flexible framework for semi-automatic image annotation. ArXiv abs/1910.07486

  19. Kurzhals K, Rodrigues N, Koch M, Stoll M, Bruhn A, Bulling A, Weiskopf D (2020) Visual analytics and annotation of pervasive eye tracking video. In: ACM symposium on eye tracking research and applications, pp 1–9

  20. Li M, Lv T, Cui L, Lu Y, Florêncio DAF, Zhang C, Li Z, Wei F (2021) TrOCR: transformer-based optical character recognition with pre-trained models. ArXiv abs/2109.10282

  21. Liang PP, Zadeh A, Morency L-P (2022) Foundations and recent trends in multimodal machine learning: principles, challenges, and open questions. ArXiv abs/2209.03430

  22. Lorbach M, Poppe R, Veltkamp RC (2019) Interactive rodent behavior annotation in video using active learning. Multimed Tools Appl 78(14):19787–19806

    Article  Google Scholar 

  23. Rahtz M, Varma V, Kumar R, Kenton Z, Legg S, Leike J (2022) Safe deep RL in 3D environments using human feedback. ArXiv abs/2201.08102

  24. Segalin C, Williams J, Karigo T, Hui M, Zelikowsky M, Sun JJ, Perona P, Anderson DJ, Kennedy A (2021) The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice. Elife 10:e63720

    Article  Google Scholar 

  25. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  26. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359

    Article  Google Scholar 

  27. Srivastava A, Rastogi A, Rao AB, Shoeb AAM, Abid A, Fisch A, Brown AR, Santoro A, Gupta A, Garriga-Alonso A, Kluska A, Lewkowycz A, Agarwal A, Power A, Ray A, Warstadt A, Kocurek AW, Safaya A, Tazarv A, Xiang A, Parrish A, Nie A, Hussain A, Askell A, Dsouza A, Rahane AA, Iyer AS, Andreassen AJ, Santilli A, Stuhlmuller A, Dai AM, La AD, Lampinen AK, Zou A, Jiang A, Chen A, Vuong A, Gupta A, Gottardi A, Norelli A, Venkatesh A, Gholamidavoodi A, Tabassum A, Menezes A, Kirubarajan A, Mullokandov A, Sabharwal A, Herrick A, Efrat A, Erdem A, Karakacs A, Roberts BR, Loe BS, Zoph B, Bojanowski B, Ozyurt B, Hedayatnia B, Neyshabur B, Inden B, Stein B, Ekmekci B, Lin BY, Howald BS, Diao C, Dour C, Stinson C, Argueta C, Ram'irez CEF, Singh C, Rathkopf C, Meng C, Baral C, Wu C, Callison-Burch C, Waites C, Voigt C, Manning CD, Potts C, Ramirez CT, Rivera C, Siro C, Raffel C, Ashcraft C, Garbacea C, Sileo D, Garrette DH, Hendrycks D, Kilman D, Roth D, Freeman D, Khashabi D, Levy D, Gonz'alez D, Hernandez D, Chen D, Ippolito D, Gilboa D, Dohan D, Drakard D, Jurgens D, Datta D, Ganguli D, Emelin D, Kleyko D, Yuret D, Chen D, Tam D, Hupkes D, Misra D, Buzan D, Coelho Mollo D, Yang D, Lee D-H, Shutova E, Cubuk ED, Segal E, Hagerman E, Barnes E, Donoway EP, Pavlick E, Rodolà E, Lam EF, Chu E, Tang E, Erdem E, Chang E, Chi EA, Dyer E, Jerzak E, Kim E, Manyasi EE, Zheltonozhskii E, Xia F, Siar F, Mart'inez-Plumed F, Happ'e F, Chollet F, Rong F, Mishra G, Winata GI, de Melo G, Kruszewski G, Parascandolo G, Mariani G, Wang G, Jaimovitch-L'opez G, Betz G, Gur-Ari G, Galijasevic H, Kim HS, Rashkin H, Hajishirzi H, Mehta H, Bogar H, Shevlin H, Schütze H, Yakura H, Zhang H, Wong H, Ng IA-S, Noble I, Jumelet J, Geissinger J, Kernion J, Hilton J, Lee J, Fisac JF, Simon JB, Koppel J, Zheng J, Zou J, Koco'n J, Thompson J, Kaplan J, Radom J, Sohl-Dickstein JN, Phang J, Wei J, Yosinski J, Novikova J, Bosscher J, Marsh J, Kim J, Taal J, Engel J, Alabi JO, Xu J, Song J, Tang J, Waweru JW, Burden J, Miller J, Balis JU, Berant J, Frohberg J, Rozen J, Hernández-Orallo J, Boudeman J, Jones J, Tenenbaum JB, Rule JS, Chua J, Kanclerz K, Livescu K, Krauth K, Gopalakrishnan K, Ignatyeva K, Markert K, Dhole KD, Gimpel K, Omondi KO, Mathewson KW, Chiafullo K, Shkaruta K, Shridhar K, McDonell K, Richardson K, Reynolds L, Gao L, Zhang L, Dugan L, Qin L, Contreras-Ochando L, Morency L-P, Moschella L, Lam L, Noble L, SchmidtL, He L, Col'on LO, Metz L, cSenel LK, Bosma M, Sap M, Hoeve MT, Andrea M, Farooqi MS, Faruqui M, Mazeika M, Baturan M, Marelli M, Maru M, Quintana M, Tolkiehn M, Giulianelli M, Lewis M, Potthast M, Leavitt M, Hagen M, Schubert MAA, Baitemirova M, Arnaud M, McElrath MA, Yee MA, Cohen M, Gu M, Ivanitskiy MI, Starritt M, Strube M, Swkedrowski M, Bevilacqua M, Yasunaga M, Kale M, Cain M, Xu M, Suzgun M, Tiwari M, Bansal M, Aminnaseri M, Geva M, Gheini M, MukundVarma T, Peng N, Chi N, Lee N, Krakover NG-A, Cameron N, Roberts NS, Doiron N, Nangia N, Deckers N, Muennighoff N, Keskar NS, Iyer N, Constant N, Fiedel N, Wen N, Zhang O, Agha O, Elbaghdadi O, Levy O, Evans O, Casares PAM, Doshi P, Fung P, Liang PP, Vicol P, Alipoormolabashi P, Liao P, Liang P, Chang PW, Eckersley P, Htut PM, Hwang P-B, Milkowski P, Patil PS, Pezeshkpour P, Oli P, Mei Q, Lyu Q, Chen Q, Banjade R, Rudolph RE, Gabriel R, Habacker R, Delgado ROR, Millière R, Garg R, Barnes R, Saurous RA, Arakawa R, Raymaekers R, Frank R, Sikand R, Novak R, Sitelew R, Le Bras R, Liu R, Jacobs R, Zhang R, Salakhutdinov R, Chi R, Lee R, Stovall R, Teehan R, Yang R, Singh SJ, Mohammad SM, Anand S, Dillavou S, Shleifer S, Wiseman S, Gruetter S, Bowman S, Schoenholz SS, Han S, Kwatra S, Rous SA, Ghazarian S, Ghosh S, Casey S, Bischoff S, Gehrmann S, Schuster S, Sadeghi S, Hamdan SS, Zhou S, Srivastava S, Shi S, Singh S, Asaadi S, Gu SS, Pachchigar S, Toshniwal S, Upadhyay S, Debnath S, Shakeri S, Thormeyer S, Melzi S, Reddy S, Makini SP, Lee S-H, Torene SB, Hatwar S, Dehaene S, Divic S, Ermon S, Biderman SR, Lin SC, Prasad S, Piantadosi ST, Shieber SM, Misherghi S, Kiritchenko S, Mishra S, Linzen T, Schuster T, Li T, Yu T, Ali TA, Hashimoto T, Wu T-L, Desbordes T, Rothschild T, Phan T, Wang T, Nkinyili T, Schick T, Kornev TN, Telleen-Lawton T, Tunduny T, Gerstenberg T, Chang T, Neeraj T, Khot T, Shultz TO, Shaham U, Misra V, Demberg V, Nyamai V, Raunak V, Ramasesh VV, Prabhu VU, Padmakumar V, Srikumar V, Fedus W, Saunders W, Zhang W, Vossen W, Ren X, Tong XF, Wu X, Shen X, Yaghoobzadeh Y, Lakretz Y, Song Y, Bahri Y, Choi YJ, Yang Y, Hao Y, Chen Y, Belinkov Y, Hou Y, Hou Y, Bai Y, Seid Z, Xinran Z, Zhao Z, Wang ZF, Wang ZJ, Wang Z, Wu Z, Singh S, Shaham U (2022) Beyond the imitation game: quantifying and extrapolating the capabilities of language models. ArXiv abs/2206.04615

  28. Stiennon N, Ouyang L, Wu J, Ziegler DM, Lowe RJ, Voss C, Radford A, Amodei D, Christiano P (2020) Learning to summarize from human feedback. ArXiv abs/2009.01325

  29. Su H, Kasai J, Wu CH, Shi W, Wang T, Xin J, Zhang R, Ostendorf M, Zettlemoyer L, Smith NA, Yu T (2022) Selective annotation makes language models better few-shot learners. ArXiv abs/2209.01975

  30. Szegedy C, Reed SE, Erhan D, Anguelov D (2014) Scalable, high-quality object detection. ArXiv abs/1412.1441

  31. Takano W (2020) Annotation generation from IMU-based human whole-body motions in daily life behavior. IEEE Trans Hum–Mach Syst 50(1):13–21

    Article  Google Scholar 

  32. Tjandrasuwita M, Sun JJ, Kennedy A, Chaudhuri S, Yue Y (2021) Interpreting expert annotation differences in animal behavior. ArXiv abs/2106.06114

  33. Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv abs/1706.03762

  34. Wang S, Liu Y, Xu Y, Zhu C, Zeng M (Year) Want to reduce labeling cost? GPT-3 Can Help. In EMNLP

  35. Wang Z, Yu AW, Firat O, Cao Y (2021) Towards zero-label language learning. arXiv abs/2109.09193

  36. Watson E, Viana T, Zhang S (2023) Augmented behavioral annotation tools, with application to multimodal datasets and models: a systematic review. AI 4:128–171

    Article  Google Scholar 

  37. Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, Yogatama D, Bosma M, Zhou D, Metzler D, Chi E, Hashimoto T, Vinyals O, Liang P, Dean J, Fedus W (2022) Emergent abilities of large language models. ArXiv abs/2206.07682

  38. Xue T, El Ali A, Zhang T, Ding G, Cesar P (2021) RCEA-360VR: real-time, continuous emotion annotation in 360° VR videos for collecting precise viewport-dependent ground truth labels. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–15

  39. Yu J, Xu Y, Koh JY, Luong T, Baid G, Wang Z, Vasudevan V, Ku A, YangY, Ayan BK, Hutchinson BC, Han W, Parekh Z, Li X, Zhang H, Baldridge J, Wu Y (2022) Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. ArXiv abs/2206.10789

Download references

Acknowledgements

The authors wish to extend their gratitude to Alexander Kruel and Karoly Zsolnai-Fehér for producing various timely machine learning news bulletins on the evolving state-of-the-art in machine learning. The authors also wish to thank A. Safronov for editing assistance.

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eleanor Watson.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Informed Consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Watson, E., Viana, T. & Zhang, S. Machine Learning Driven Developments in Behavioral Annotation: A Recent Historical Review. Int J of Soc Robotics (2024). https://doi.org/10.1007/s12369-024-01117-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12369-024-01117-1

Keywords

Navigation