当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Empowering Data Mesh with Federated Learning
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17878
Haoyuan Li, Salman Toor

The evolution of data architecture has seen the rise of data lakes, aiming to solve the bottlenecks of data management and promote intelligent decision-making. However, this centralized architecture is limited by the proliferation of data sources and the growing demand for timely analysis and processing. A new data paradigm, Data Mesh, is proposed to overcome these challenges. Data Mesh treats domains as a first-class concern by distributing the data ownership from the central team to each data domain, while keeping the federated governance to monitor domains and their data products. Many multi-million dollar organizations like Paypal, Netflix, and Zalando have already transformed their data analysis pipelines based on this new architecture. In this decentralized architecture where data is locally preserved by each domain team, traditional centralized machine learning is incapable of conducting effective analysis across multiple domains, especially for security-sensitive organizations. To this end, we introduce a pioneering approach that incorporates Federated Learning into Data Mesh. To the best of our knowledge, this is the first open-source applied work that represents a critical advancement toward the integration of federated learning methods into the Data Mesh paradigm, underscoring the promising prospects for privacy-preserving and decentralized data analysis strategies within Data Mesh architecture.

中文翻译:

通过联邦学习增强数据网格能力

数据架构的演进催生了数据湖的兴起,旨在解决数据管理的瓶颈,促进智能决策。然而,这种集中式架构受到数据源激增以及及时分析和处理不断增长的需求的限制。提出了一种新的数据范式——数据网格来克服这些挑战。数据网格通过将数据所有权从中央团队分配到每个数据域,同时保持联合治理来监控域及其数据产品,将域视为首要关注点。 Paypal、Netflix 和 Zalando 等许多价值数百万美元的组织已经基于这种新架构转变了他们的数据分析管道。在这种数据由每个领域团队在本地保存的去中心化架构中,传统的集中式机器学习无法跨多个领域进行有效的分析,尤其是对于安全敏感的组织。为此,我们引入了一种将联邦学习融入数据网格的开创性方法。据我们所知,这是第一个开源应用工作,代表着将联邦学习方法集成到数据网格范式中的关键进步,强调了数据网格内隐私保护和去中心化数据分析策略的广阔前景建筑学。
更新日期:2024-03-27
down
wechat
bug