当前位置: X-MOL 学术arXiv.cs.OS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Nahida: In-Band Distributed Tracing with eBPF
arXiv - CS - Operating Systems Pub Date : 2023-11-15 , DOI: arxiv-2311.09032
Wanqi Yang, Pengfei Chen, Kai Liu, Huxing Zhang

Microservices are commonly used in modern cloud-native applications to achieve agility. However, the complexity of service dependencies in large-scale microservices systems can lead to anomaly propagation, making fault troubleshooting a challenge. To address this issue, distributed tracing systems have been proposed to trace complete request execution paths, enabling developers to troubleshoot anomalous services. However, existing distributed tracing systems have limitations such as invasive instrumentation, trace loss, or inaccurate trace correlation. To overcome these limitations, we propose a new tracing system based on eBPF (extended Berkeley Packet Filter), named Nahida, that can track complete requests in the kernel without intrusion, regardless of programming language or implementation. Our evaluation results show that Nahida can track over 92% of requests with stable accuracy, even under the high concurrency of user requests, while the state-of-the-art non-invasive approaches can not track any of the requests. Importantly, Nahida can track requests served by a multi-threaded application that none of the existing invasive tracing systems can handle by instrumenting tracing codes into libraries. Moreover, the overhead introduced by Nahida is negligible, increasing service latency by only 1.55%-2.1%. Overall, Nahida provides an effective and non-invasive solution for distributed tracing.

中文翻译:

Nahida:使用 eBPF 进行带内分布式跟踪

微服务通常用于现代云原生应用程序以实现敏捷性。然而,大规模微服务系统中服务依赖的复杂性可能会导致异常传播,给故障排查带来挑战。为了解决这个问题,人们提出了分布式跟踪系统来跟踪完整的请求执行路径,使开发人员能够排除异常服务。然而,现有的分布式跟踪系统存在局限性,例如侵入性仪器、跟踪丢失或跟踪关联不准确。为了克服这些限制,我们提出了一种基于 eBPF(扩展伯克利数据包过滤器)的新跟踪系统,名为 Nahida,无论编程语言或实现如何,它都可以在不入侵的情况下跟踪内核中的完整请求。我们的评估结果表明,即使在用户请求高并发的情况下,Nahida 也可以稳定准确地跟踪超过 92% 的请求,而最先进的非侵入性方法无法跟踪任何请求。重要的是,Nahida 可以通过将跟踪代码插入库来跟踪由多线程应用程序提供的请求,而现有的侵入式跟踪系统都无法处理这些请求。而且,Nahida 引入的开销可以忽略不计,仅增加服务延迟 1.55%-2.1%。总体而言,Nahida 为分布式跟踪提供了一种有效且非侵入性的解决方案。
更新日期:2023-11-16
down
wechat
bug