skip to main content
research-article

Zeph & Iris map the internet: A resilient reinforcement learning approach to distributed IP route tracing

Published:01 March 2022Publication History
Skip Abstract Section

Abstract

We describe a new system for distributed tracing at the IP level of the routes that packets take through the IPv4 internet. Our Zeph algorithm coordinates route tracing efforts across agents at multiple vantage points, assigning to each agent a number of /24 destination prefixes in proportion to its probing budget and chosen according to a reinforcement learning heuristic that aims to maximize the number of multipath links discovered. Zeph runs on top of Iris, our fault tolerant system for orchestrating internet measurements across distributed agents of heterogeneous probing capacities. Iris is built around third party free open source software and modern containerization technology, thereby presenting a new model for assembling a resilient and maintainable internet measurement architecture. We show that carefully choosing the destinations to probe from which vantage point matters to optimize topology discovery and that a system can learn which assignment will maximize the overall discovery based on previous measurements. After 10 cycles of probing, Zeph is capable of discovering 2.4M nodes and 10M links in a cycle of 6 hours, when deployed on 5 Iris agents. This is at least 2 times more nodes and 5 times more links than other production systems for the same number of prefixes probed.

References

  1. [n. d.]. PlanetLab Europe. https://planet-lab.eu/. ([n. d.]). Accessed February 2, 2022.Google ScholarGoogle Scholar
  2. 2004. Oregon Route Views. http://routeviews.org/. (2004). June 8, 2004; accessed February 2, 2022.Google ScholarGoogle Scholar
  3. 2008. The CAIDA UCSD IPv4 Routed /24 Topology Dataset. https://www.caida.org/catalog/datasets/ipv4_routed_24_topology_dataset/. (2008). February 1, 2008; version of July 8, 2020.Google ScholarGoogle Scholar
  4. 2014. The Impact of the Archipelago Measurement Platform. https://www.caida.org/projects/ark/impact/. (2014). July 3, 2014; version of November 15, 2019.Google ScholarGoogle Scholar
  5. Bernhard Ager, Nikolaos Chatzis, Anja Feldmann, Nadi Sarrar, Steve Uhlig, and Walter Willinger. 2012. Anatomy of a Large European IXP. In Proc. ACM SIGCOMM Conf. (SIGCOMM '12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brice Augustin, Xavier Cuvellier, Benjamin Orgogozo, Fabien Viger, Timur Friedman, Matthieu Latapy, Clémence Magnien, and Renata Teixeira. 2006. Avoiding Traceroute Anomalies with Paris Traceroute. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Paul Barford, Azer Bestavros, John Byers, and Mark Crovella. 2001. On the Marginal Utility of Network Topology Measurements. In Proc. ACM SIGCOMM Internet Measurement Workshop (IMW '01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Robert Beverly. 2016. Yarrp'ing the Internet: Randomized High-Speed Active Topology Discovery. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Robert Beverly, Arthur Berger, and Geoffrey G. Xie. 2010. Primitives for Active Internet Topology Mapping: Toward High-Frequency Characterization. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kimberly Claffy, Young Hyun, Ken Keys, Marina Fomenkov, and Dmitri Krioukov. 2009. Internet Mapping: From Art to Science. In Proc. 2009 Cyber-security Applications Technology Conf. for Homeland Security (CATCH). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Graham Cormode, Howard Karloff, and Anthony Wirth. 2010. Set Cover Algorithms for Very Large Datasets. In Proc. ACM Intl. Conf. on Information and Knowledge Management (CIKM '10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Berat Can Şenel, Maxime Mouchet, Justin Cappos, Olivier Fourmaux, Timur Friedman, and Rick McGeer. 2021. EdgeNet: A Multi-Tenant and Multi-Provider Edge Cloud. In In Proc. ACM Intl. Workshop on Edge Systems, Analytics and Networking (EdgeSys '21). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Italo Cunha, Pietro Marchetta, Matt Calder, Yi-Ching Chiu, Bruno V. A. Machado, Antonio Pescapè, Vasileios Giotsas, Harsha V. Madhyastha, and Ethan Katz-Bassett. 2016. Sibyl: A Practical Internet Route Oracle. In Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI '16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ítalo Cunha, Renata Teixeira, Darryl Veitch, and Christophe Diot. 2014. DTRACK: A System to Predict and Track Internet Path Changes. IEEE/ACM Trans. on Networking 22, 4 (2014), 1025–1038. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Benoit Donnet, Matthew Luckie, Pascal Mérindol, and Jean-Jacques Pansiot. 2012. Revealing MPLS Tunnels Obscured from Traceroute. ACM SIGCOMM Computer Communications Rev. 42, 2 (Mar. 2012), 87–93. 0146-4833 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Benoit Donnet, Philippe Raoult, Timur Friedman, and Mark Crovella. 2005. Efficient Algorithms for Large-Scale Topology Discovery. In Proc. ACM SIGMETRICS Conf. (SIGMETRICS '05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Constantine Dovrolis, Krishna Gummadi, Aleksandar Kuzmanovic, and Sascha D. Meinrath. 2010. Measurement Lab: Overview and an Invitation to the Research Community. ACM SIGCOMM Computer Communications Rev. 40, 3 (Jun. 2010), 53–56. 0146-4833 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Vasileios Giotsas, Thomas Koch, Elverton Fazzion, Ítalo Cunha, Matt Calder, Harsha V. Madhyastha, and Ethan Katz-Bassett. 2020. Reduce, Reuse, Recycle: Repurposing Existing Measurements to Identify Stale Traceroutes. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vasileios Giotsas, Matthew Luckie, Bradley Huffaker, and kc claffy. 2014. Inferring Complex AS Relationships. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Govindan and H. Tangmunarunkit. 2000. Heuristics for Internet map discovery. In Proc. IEEE INFOCOM '00. Google ScholarGoogle ScholarCross RefCross Ref
  21. Yuchen Huang, Michael Rabinovich, and Rami Al-Dalky. 2020. FlashRoute: Efficient Traceroute on a Massive Scale. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Van Jacobson. 1988. 4BSD routing diagnostic tool available for ftp. Email [email protected] to the IETF and end2end-interest e-mail lists. (1988).Google ScholarGoogle Scholar
  23. Yuchen Jin, Colin Scott, Amogh Dhamdhere, Vasileios Giotsas, Arvind Krishnamurthy, and Scott Shenker. 2019. Stable and Practical AS Relationship Inference with ProbLink. In Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI '19). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ethan Katz-Bassett, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. 2006. Towards IP Geolocation Using Delay and Topology Measurements. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Volodymyr Kuleshov and Doina Precup. 2014. Algorithms for multi-armed bandit problems. (2014). [arxiv]1402.6028Google ScholarGoogle Scholar
  26. Matthew Luckie and Robert Beverly. 2017. The Impact of Router Outages on the AS-Level Internet. In Proc. ACM SIGCOMM Conf. (SIGCOMM '17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Matthew Luckie, Robert Beverly, William Brinkmeyer, and kc claffy. 2013. Speedtrap: Internet-Scale IPv6 Alias Resolution. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Harsha V. Madhyastha, Tomas Isdal, Michael Piatek, Colin Dixon, Thomas Anderson, Arvind Krishnamurthy, and Arun Venkataramani. 2006. IPlane: An Information Plane for Distributed Services. In Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI '06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pascal Mérindol, Benoit Donnet, Jean-Jacques Pansiot, Matthew Luckie, and Young Hyun. 2011. MERLIN: MEasure the router level of the INternet. In Proc. Conference on Next Generation Internet Networks (EURO-NGI '11). Google ScholarGoogle ScholarCross RefCross Ref
  30. Yuval Shavitt and Eran Shir. 2005. DIMES: Let the Internet Measure Itself. ACM SIGCOMM Computer Communications Rev. 35, 5 (Oct. 2005), 71–74. 0146-4833 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Neil Spring, Ratul Mahajan, and David Wetherall. 2002. Measuring ISP Topologies with Rocketfuel. In Proc. ACM Sigcomm Conf. (SIGCOMM '02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. RIPE NCC Staff. 2015. RIPE Atlas: A Global Internet Measurement Network. The Internet Protocol Journal 18, 3 (Sept. 2015), 2–26. http://ipj.dreamhosters.com/wp-content/uploads/2015/10/ipj18.3.pdfGoogle ScholarGoogle Scholar
  33. James P.G. Sterbenz, Egemen K. Çetinkaya, Mahmood A. Hameed, Abdul Jabbar, Shi Qian, and Justin P. Rohrer. 2011. Evaluation of network resilience, survivability, and disruption tolerance: Analysis, topology generation, simulation, and experimentation. Telecommunication Systems 52, 2 (Dec. 2011), 705–736. 1018-4864, 1572-9451 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yves Vanaubel, Jean-Romain Luttringer, Pascal Mérindol, Jean-Jacques Pansiot, and Benoit Donnet. 2019. TNT, Watch me Explode: A Light in the Dark for Revealing MPLS Tunnels. In Proc. Network Traffic Measurement and Analysis Conference (TMA '19). Google ScholarGoogle ScholarCross RefCross Ref
  35. Darryl Veitch, Brice Augustin, Renata Teixeira, and Timur Friedman. 2009. Failure control in multipath route tracing. In Proc. IEEE INFOCOM '09. Google ScholarGoogle ScholarCross RefCross Ref
  36. Kevin Vermeulen, Justin P. Rohrer, Robert Beverly, Olivier Fourmaux, and Timur Friedman. 2020. Diamond-Miner: Comprehensive Discovery of the Internet's Topology Diamonds. In Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI '20). https://www.usenix.org/conference/nsdi20/presentation/vermeulen Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kevin Vermeulen, Stephen D. Strowes, Olivier Fourmaux, and Timur Friedman. 2018. Multilevel MDA-Lite Paris Traceroute. In Proc. ACM SIGCOMM Internet Measurement Conf. (IMC '18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Walter Willinger, David Alderson, and John C Doyle. 2009. Mathematics and the internet: A source of enormous confusion and great potential. Notices of the American Mathematical Society 56, 5 (2009), 586–599. https://www.ams.org/notices/200905/rtx090500586p.pdfGoogle ScholarGoogle Scholar
  39. Zheng Zhang, Ying Zhang, Y. Charlie Hu, Z. Morley Mao, and Randy Bush. 2008. Ispy: Detecting Ip Prefix Hijacking on My Own. ACM SIGCOMM Computer Communications Rev. 38, 4 (Aug. 2008), 327–338. 0146-4833 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Zeph & Iris map the internet: A resilient reinforcement learning approach to distributed IP route tracing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGCOMM Computer Communication Review
      ACM SIGCOMM Computer Communication Review  Volume 52, Issue 1
      January 2022
      44 pages
      ISSN:0146-4833
      DOI:10.1145/3523230
      Issue’s Table of Contents

      Copyright © 2022 Copyright is held by the owner/author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 March 2022

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader