论文标题

部分可观测时空混沌系统的无模型预测

Is Non-IID Data a Threat in Federated Online Learning to Rank?

论文作者

Wang, Shuyi, Zuccon, Guido

论文摘要

在此观点论文中,我们研究了非独立和分布式(非IID)数据对联合在线学习的效果(FOLTR),并在这个新的且在很大程度上未开发的信息检索领域中为未来工作的图表方向提供了图表方向。在FOLTR过程中,客户参与联邦,以从每个客户端的隐式点击信号共同创建一个有效的排名,而无需共享数据(文档,查询,点击)。影响联合学习系统表现并对这些方法构成严重挑战的一个众所周知的因素是,在数据之间分布数据的方式可能存在某种类型的偏见。尽管FOLTR系统享有自己的一种联合学习系统,但尚未研究非IID数据在FOLTR中的存在和效果。为此,我们首先列举可能显示跨客户的数据偏差,从而引起非IID问题的可能数据分发设置。然后,我们研究了每个设置对当前最新FOLTR方法的性能,联合成对可区分的梯度下降(FPDGD)的影响,我们强调了哪些数据分布可能对FOLTR方法构成问题。我们还探讨了联邦学习文献中提出的共同方法如何解决FOLTR中的非IID问题。我们认为,这使我们能够揭示FOLTR未来研究的新研究差距。这是对FOLTR领域当前状态的重要贡献,因为要部署FOLTR系统,影响其性能的因素,包括非IID数据的影响,需要彻底理解。

In this perspective paper we study the effect of non independent and identically distributed (non-IID) data on federated online learning to rank (FOLTR) and chart directions for future work in this new and largely unexplored research area of Information Retrieval. In the FOLTR process, clients participate in a federation to jointly create an effective ranker from the implicit click signal originating in each client, without the need to share data (documents, queries, clicks). A well-known factor that affects the performance of federated learning systems, and that poses serious challenges to these approaches, is that there may be some type of bias in the way data is distributed across clients. While FOLTR systems are on their own rights a type of federated learning system, the presence and effect of non-IID data in FOLTR has not been studied. To this aim, we first enumerate possible data distribution settings that may showcase data bias across clients and thus give rise to the non-IID problem. Then, we study the impact of each setting on the performance of the current state-of-the-art FOLTR approach, the Federated Pairwise Differentiable Gradient Descent (FPDGD), and we highlight which data distributions may pose a problem for FOLTR methods. We also explore how common approaches proposed in the federated learning literature address non-IID issues in FOLTR. This allows us to unveil new research gaps that, we argue, future research in FOLTR should consider. This is an important contribution to the current state of FOLTR field because, for FOLTR systems to be deployed, the factors affecting their performance, including the impact of non-IID data, need to be thoroughly understood.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源