当前位置: X-MOL 学术ACM SIGCOMM Comput. Commun. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AppClassNet: a commercial-grade dataset for application identification research: ACM SIGCOMM Computer Communication Review: Vol 52, No 3
ACM SIGCOMM Computer Communication Review ( IF 2.8 ) Pub Date : 2022-09-06 , DOI: https://dl.acm.org/doi/10.1145/3561954.3561958
Chao Wang, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, Dario Rossi

The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential.

Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.



中文翻译:

AppClassNet:用于应用识别研究的商业级数据集:ACM SIGCOMM 计算机通信评论:第 52 卷,第 3 期

人工智能 (AI) 最近的成功源于几个伴随因素,即理论进步以及丰富的数据和计算能力。大公司可以利用大量数据,通常由于隐私或业务敏感性问题而从研究界隐瞒,这对于网络数据尤其如此。因此,缺乏高质量数据通常被认为是目前限制网络研究充分利用 AI 方法潜力的主要因素之一。

在收到来自科学界的众多请求后,我们发布了 AppClassNet,这是一个商业级数据集,用于对流量分类和管理方法进行基准测试。AppClassNet 在样本数量和类别数量方面都明显大于学术界普遍可用的数据集,并且达到了类似于计算机视觉文献中常用的流行 ImageNet 数据集的规模。为了避免泄露用户和业务敏感信息,我们巧妙地对数据集进行了匿名化处理,同时凭经验表明它仍然代表了算法研究的相关基准。在本文中,我们描述了公共数据集和我们的匿名化过程。

更新日期:2022-09-07
down
wechat
bug