探讨全局与局部适应性:针对个性化物联网入侵检测的客户端采样联邦元学习技术

Global or Local Adaptation? Client-Sampled Federated Meta-Learning for Personalized IoT Intrusion Detection

全局适应还是局部适应?用于个性化物联网入侵检测的客户端采样联邦元学习

Haorui Yan, Xi Lin ®

 , Member, IEEE, Shenghong Li ®

 , Senior Member; IEEE, Hao Peng ®

 , and Bo Zhang

Abstract

:With the increasing size of Internet of Things (IoT) devices, cyber threats to IoT systems have increased. Federated learning (FL) has been implemented in an anomaly-based intrusion detection system (NIDS) to detect malicious traffic in IoT devices and counter the threat. However, current FL-based NIDS mainly focuses on global model performance and lacks personalized performance improvement for local data. To address this issue, we propose a novel personalized federated meta-learning intrusion detection approach (PerFLID), which allows multiple participants to personalize their local detection models for local adaptation. PerFLID shifts the goal of the personalized detection task to training a local model suitable for the client’s specific data, rather than a global model. To meet the real-time requirements of NIDS, PerFLID further refines the client selection strategy by clustering the local gradient similarities to find the nodes that contribute the most to the global model per global round. PerFLID can select the nodes that accelerate the convergence of the model, and we theoretically analyze the improvement in the convergence speed of this strategy over the personalized federated learning algorithm. We experimentally evaluate six existing FL-NIDS approaches on three real network traffic datasets and show that our PerFLID approach outperforms all baselines in detecting local adaptation accuracy by 10.11% over the state-of-the-art scheme, accelerating the convergence speed under various parameter combinations.

摘要:随着物联网(Internet of Things,IoT)设备规模的不断扩大,物联网系统面临的网络威胁也日益增多。联邦学习(Federated learning,FL)已被应用于基于异常的入侵检测系统(anomaly-based intrusion detection system,NIDS)中,用于检测物联网设备中的恶意流量并应对此类威胁。然而,当前基于联邦学习的入侵检测系统主要关注全局模型的性能,缺乏针对本地数据的个性化性能提升。为解决这一问题,我们提出了一种新颖的个性化联邦元学习入侵检测方法(Personalized Federated Meta-Learning Intrusion Detection,PerFLID),该方法允许多个参与者对其本地检测模型进行个性化设置,以实现本地适配。PerFLID将个性化检测任务的目标转变为训练适合客户端特定数据的本地模型,而非全局模型。为满足入侵检测系统的实时性要求,PerFLID通过对本地梯度相似度进行聚类,进一步优化了客户端选择策略,以找出在每个全局轮次中对全局模型贡献最大的节点。PerFLID能够选择加速模型收敛的节点,并且我们从理论上分析了该策略相较于个性化联邦学习算法在收敛速度上的提升。我们在三个真实网络流量数据集上对六种现有的联邦学习入侵检测方法进行了实验评估,结果表明,我们的PerFLID方法在检测本地适配准确性方面比现有最先进的方案高出10.11%,并在各种参数组合下加快了收敛速度。

Index Terms-Internet of Things security, personalized traffic intrusion detection, federated meta-learning, client selection aggregation.

关键词:物联网安全,个性化流量入侵检测,联邦元学习。 客户端选择聚合

I. INTRODUCTION

一、引言

THE Internet of Things (IoT) is a network system that connects physical devices and machines to the Internet using information sensors and communication technologies. This enables IoT devices to communicate and exchange data, providing services for scenarios such as smart homes, industrial automation, smart healthcare, and smart transportation [1], [2], [3], [4].

物联网(IoT)是一种利用信息传感器和通信技术将物理设备和机器连接到互联网的网络系统。这使得物联网设备能够进行通信和数据交换,为智能家居、工业自动化、智能医疗和智能交通等场景提供服务 [1]、[2]、[3]、[4]。

However, serious security issues arise when IoT devices are compromised by malicious intruders and participate in data interaction and processing. Intruders can penetrate the network layer through which devices and external networks exchange data, and threaten IoT devices that lack security defenses. Since 2016, the well-known Mirai botnet has initiated numerous large-scale distributed denial-of-service (DDoS) attacks against IoT devices [5]. The BlueBorne vulnerability, identified in 2017, affects nearly all Bluetooth-enabled devices and poses a significant threat to IoT devices [6].

然而,当物联网设备被恶意入侵者攻陷并参与数据交互和处理时,就会出现严重的安全问题。入侵者可以渗透设备与外部网络进行数据交换的网络层,并对缺乏安全防护的物联网设备构成威胁。自2016年以来,著名的Mirai僵尸网络已对物联网设备发起了多次大规模分布式拒绝服务(Distributed Denial – of – Service,DDoS)攻击 [5]。2017年发现的BlueBorne漏洞影响了几乎所有支持蓝牙的设备,对物联网设备构成了重大威胁 [6]。

Due to the limited processing power and storage capacity of many IoT device systems, sophisticated security systems are frequently feasible. As a result, intrusion detection techniques have been employed to mitigate unknown attacks in the network [7]. However, traffic-based intrusion detection techniques have several limitations, including insufficient data in public traffic sets and difficulty in detecting IoT attacks such as DDoS, exploits, reconnaissance, and worms [8]. Therefore, relying solely on this approach to secure IoT devices is unreliable. Centralized detection systems are unsuitable for IoT architectures with a large number of distributed components, making it difficult to detect single points of failure in IoT. Additionally, detection techniques must minimize latency to meet the frequent data processing demands of IoT devices, while ensuring robust data privacy. As IoT traffic continues to increase, traditional intrusion detection systems face significant challenges in providing fast detection due to their high computational overhead [9], [10]. Recent studies review conventional detection methods and emphasize that rapid detection has become one of the primary challenges in the field of network intrusion detection [11].

由于许多物联网(IoT)设备系统的处理能力和存储容量有限,复杂的安全系统往往难以实现。因此,入侵检测技术已被用于减轻网络中的未知攻击[7]。然而,基于流量的入侵检测技术存在一些局限性,包括公共流量集中的数据不足,以及难以检测物联网攻击,如分布式拒绝服务(DDoS)攻击、漏洞利用、侦察和蠕虫攻击等[8]。因此,仅依靠这种方法来保障物联网设备的安全是不可靠的。集中式检测系统不适用于具有大量分布式组件的物联网架构,这使得检测物联网中的单点故障变得困难。[wc1] 此外,检测技术必须将延迟降至最低,以满足物联网设备频繁的数据处理需求,同时确保强大的数据隐私保护。随着物联网流量的持续增加,传统的入侵检测系统由于其较高的计算开销,在提供快速检测方面面临着重大挑战[9]、[10]。近期的研究回顾了传统的检测方法,并强调快速检测已成为网络入侵检测领域的主要挑战之一[11]。

According to current research, federated learning (FL) based intrusion detection systems (IDS) could provide a solution to some of these issues due to their improved adaptability and scalability [12]. FL schemes allow users to jointly train detection models without sharing their private data, addressing the problem of insufficient public data in IoT scenarios while protecting data privacy [13]. This contrasts with non-federated machine learning methods, which do not offer this level of privacy protection and may require a larger amount of data to be effective [14], [15]. 556-6021 (C) 2024 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.

根据当前研究,基于联邦学习(FL)的入侵检测系统因其适应性和可扩展性的提升,能够为其中一些问题提供解决方案[12]。联邦学习方案允许用户在不共享其私有数据的情况下联合训练检测模型,在保护数据隐私的同时,解决了物联网场景中公共数据不足的问题[13]。这与非联邦机器学习方法形成对比,后者无法提供这种程度的隐私保护,并且可能需要大量数据才能有效[14]、[15]。556 – 6021 (C) 2024 电气与电子工程师协会(Institute of Electrical and Electronics Engineers,IEEE)。保留所有权利,包括文本和数据挖掘、人工智能及类似技术训练的权利。允许个人使用,但重新发布/再分发需获得电气与电子工程师协会的许可。[wc2] 

image

Fig. 1. A framework for traffic intrusion detection of IoT devices based on PFL.

图1. 基于隐私联邦学习(PFL)的物联网设备流量入侵检测框架。

Although FL is a detection scheme for detecting various types of cyber threats to the IoT, there are still some unresolved problems. Li et al. [16] proposed an FL framework to build an IoT intrusion detection model collaboratively. Chatterjee and Hanawal [17] proposed an FL-based intrusion detection system that uses a hybrid federated averaging (FedAvg) and noise-resistant integration framework to deal with labeling noise. Liu et al. [18] used blockchain technology for storing and sharing FL models, which is used to ensure the security of the aggregated detection models. These studies improved the structure of the detection model by utilizing existing FL algorithms and presented the application of FL to IDS in the context of the IoT. However, it is important to note that these studies assume a uniform distribution of client data, and the goal is to train an optimal global model. Models trained based on this assumption lack local data adaptation. First, the data is non-independently and identically distributed (Non-IID) among clients, which is due to the fact that IoT devices are produced by different manufacturers and configured with different protocols, resulting in varying traffic data from different devices in different network scenarios. As a result, the global model obtained from training cannot adapt to these data. Personalized Federated Learning (PFL) is an approach to address the problem of slow convergence and poor performance on Non-IID data as well as the lack of personalization of the model for local tasks or datasets [19]. One of the methods to achieve personalization is meta-learning, which optimizes the global model to enable fast convergence of user-local training by treating the training phase of FL as the training phase of meta-learning, and the personalization phase of the FL model as the testing phase of meta-learning [20].

尽管联邦学习(FL)是一种用于检测物联网(IoT)各类网络威胁的检测方案,但仍存在一些未解决的问题。李(Li)等人[16]提出了一个联邦学习框架,用于协同构建物联网入侵检测模型。查特吉(Chatterjee)和哈纳瓦尔(Hanawal)[17]提出了一种基于联邦学习的入侵检测系统,该系统使用混合联邦平均(FedAvg)和抗噪声集成框架来处理标签噪声。刘(Liu)等人[18]利用区块链技术来存储和共享联邦学习模型,以确保聚合检测模型的安全性。这些研究通过利用现有的联邦学习算法改进了检测模型的结构,并展示了联邦学习在物联网背景下入侵检测系统(IDS)中的应用。然而,需要注意的是,这些研究假设客户端数据是均匀分布的,目标是训练出一个最优的全局模型。基于此假设训练出的模型缺乏对本地数据的适应性。首先,客户端之间的数据是非独立同分布(Non – IID)的,这是因为物联网设备由不同制造商生产,并配置了不同的协议,导致不同网络场景下不同设备的流量数据各不相同。因此,通过训练得到的全局模型无法适应这些数据。个性化联邦学习(PFL)是一种解决非独立同分布数据收敛速度慢、性能差以及模型在本地任务或数据集上缺乏个性化问题的方法[19]。实现个性化的方法之一是元学习,它将联邦学习的训练阶段视为元学习的训练阶段,将联邦学习模型的个性化阶段视为元学习的测试阶段,从而优化全局模型,使用户本地训练能够快速收敛[20]。

In response to this, we propose a novel framework for intrusion detection in IoT traffic, termed Personalized Federated Learning Intrusion Detection (PerFLID). To simplify the description, we replace the client in FL with an IoT edge node. PerFLID enables the training of local models on edge nodes with different traffic data distributions in the IoT. In contrast to global models, the locally trained models exhibit enhanced personalization, making them optimal for intrusion detection. Specifically, we sample and extract features from the traffic information of each edge node, treating intrusion detection as a feature classification task with each edge node’s detection as a subtask. The central server sends malicious traffic detection models to the edge nodes, which are then trained collaboratively using a federated meta-learning approach. During the training process of each subtask, the edge nodes execute the model-agnostic meta-learning (MAML) algorithm locally to update parameters while retaining local feature information. With this approach, the global model converges with only a few rounds of gradient descent at the edge nodes, resulting in higher accuracy for personalized intrusion detection. To enable fast convergence during model training, we design a node selection strategy based on gradient feature clustering. The strategy clusters the edge nodes based on their gradient features in each communication round, calculates the probability of each node being selected based on the feature distribution, and samples the optimal subset of nodes participating in the aggregation. The representative gradient features of the edge nodes are preserved during the aggregation process at the central server, and each training round is run only once at the server, thus not introducing additional communication overhead. The main contributions of this paper are as follows:

针对这一问题,我们提出了一种用于物联网(IoT)流量入侵检测的新型框架,称为个性化联邦学习入侵检测(PerFLID)。为简化描述,我们将联邦学习(FL)中的客户端替换为物联网边缘节点。PerFLID 允许在物联网中具有不同流量数据分布的边缘节点上训练本地模型。与全局模型相比,本地训练的模型具有更强的个性化,使其非常适合用于入侵检测。具体而言,我们从每个边缘节点的流量信息中采样并提取特征,将入侵检测视为一个特征分类任务,每个边缘节点的检测作为一个子任务。中央服务器将恶意流量检测模型发送到边缘节点,然后使用联邦元学习方法进行协同训练。在每个子任务的训练过程中,边缘节点在本地执行与模型无关的元学习(model-agnostic meta-learning,MAML)算法来更新参数,同时保留本地特征信息。通过这种方法,全局模型在边缘节点只需经过几轮梯度下降即可收敛,从而提高了个性化入侵检测的准确性。为了在模型训练期间实现快速收敛,我们设计了一种基于梯度特征聚类的节点选择策略。该策略在每一轮通信中根据边缘节点的梯度特征对其进行聚类,根据特征分布计算每个节点被选中的概率,并采样参与聚合的最优节点子集。在中央服务器的聚合过程中保留了边缘节点的代表性梯度特征,并且每个训练轮次在服务器上仅运行一次,因此不会引入额外的通信开销。本文的主要贡献如下:

  • We propose a Personalized Federated Learning Intrusion Detection Framework, enabling personalized intrusion detection for IoT devices facing attacks with unknown traffic types. This method leverages meta-learning to develop personalized models, effectively adapting to local data characteristics.
  • 我们提出了一个个性化联邦学习入侵检测框架(Personalized Federated Learning Intrusion Detection Framework),能够为面临未知流量类型攻击的物联网设备实现个性化入侵检测。该方法利用元学习来开发个性化模型,有效适应本地数据特征。
  • We design a node selection strategy that selects nodes based on the clustering results of local gradient features. This strategy is theoretically proven to accelerate the convergence speed during model training, thereby enhancing the overall efficiency of our framework.
  • 我们设计了一种节点选择策略,该策略基于本地梯度特征的聚类结果来选择节点。从理论上证明,该策略能够加快模型训练过程中的收敛速度,从而提高我们框架的整体效率。
  • We demonstrate that our proposed personalized traffic intrusion detection framework has a convergence upper bound. Furthermore, we conduct extensive simulation experiments with three neural network models on three NIDS datasets to verify that the proposed framework achieves higher detection accuracy with local adaptation and faster training convergence.
  • 我们证明了所提出的个性化流量入侵检测框架存在收敛上界。此外,我们在三个网络入侵检测系统(NIDS)数据集上使用三种神经网络模型进行了广泛的模拟实验,以验证所提出的框架在本地自适应的情况下能够实现更高的检测准确率和更快的训练收敛速度。
  • The rest of this paper is organized as follows: In Section II, we review the related work of FL and traffic intrusion detection and summarize the PFL approach. In Section III, we illustrate the underlying architecture with implementation details. In Section IV, we introduce our PerFLID workflow. In Section V, We perform a theoretical convergence analysis of PerFLID. In Section VI, we present and discuss our evaluation results. Finally, we present the conclusion in Section VII.

    本文的其余部分组织如下:在第二部分,我们回顾联邦学习(FL)和流量入侵检测的相关工作,并总结个性化联邦学习(PFL)方法。在第三部分,我们详细说明底层架构及其实现细节。在第四部分,我们介绍我们的个性化联邦学习入侵检测(PerFLID)工作流程。在第五部分,我们对PerFLID进行理论收敛分析。在第六部分,我们展示并讨论评估结果。最后,在第七部分给出结论。

    II. RELATED WORK

    二、相关工作

    A. Federated Learning Based Traffic Intrusion Detection

    A. 基于联邦学习的流量入侵检测

    Nguyen et al. proposed DIoT [15], a self-learning distributed system for detecting IoT devices infected with Mirai malware. This intrusion detection system uses FL methods for the first time, enabling distributed learning of models across multiple clients. However, this method only targets Mirai malware attacks and lacks a complete implementation of the intrusion detection FL framework. Later, several researchers applied different FL models to solve the IoT intrusion detection problem. Mothukuri et al. [12] proposed an anomaly detection method with FL, which protects user data privacy by training anomaly detection machine learning models on IoT devices without transmitting user data to a centralized server. They used long short-term memory (LSTM) and gated recurrent unit (GRU) neural network models to train ML models on Modbus network datasets. The results show the lowest error rate in predicting attacks and fewer false alarms than centralized machine learning. Aouedi et al. [21] implemented a semi-supervised learning scheme for intrusion detection in FL by training a local autoencoder (AE) to learn the features of intrusion data. Zhao et al. [22] proposed a network anomaly detection method based on FL and transfer learning to address the issue of scarce training data for network anomaly detection. The model achieved a success rate of 97.23%

     in detecting vulnerability attacks on the UNSW-NB15 dataset. To quickly detect APT attacks with privacy protection, Hu

     et al. [23] proposed a traffic detection method based on coalitional meta-learning. The method achieves a detection accuracy distribution of 86.67% and 67.6% on the CIC-IDS2017 and DAPT2020 datasets, respectively. Ding et al. [24] proposed a scalable NIDS for large-scale IoT networks. This system leverages a meta-learning framework to optimize the parallelism of GNN-based NIDS and introduces a coalition formation strategy to enhance the accuracy and reduce the communication overhead of the NIDS.

    阮(Nguyen)等人提出了物联网分布式检测系统(DIoT)[15],这是一个用于检测感染了米莱(Mirai)恶意软件的物联网设备的自学习分布式系统。该入侵检测系统首次使用了联邦学习(FL)方法,实现了跨多个客户端的模型分布式学习。然而,这种方法仅针对mirai恶意软件攻击,且缺乏对入侵检测联邦学习框架的完整实现。后来,几位研究人员应用不同的联邦学习模型来解决物联网入侵检测问题。Mothukuri等人[12]提出了一种基于联邦学习的异常检测方法,该方法通过在物联网设备上训练异常检测机器学习模型,而不将用户数据传输到集中式服务器,从而保护用户数据隐私。他们使用长短期记忆(LSTM)和门控循环单元(GRU)神经网络模型在Modbus网络数据集上训练机器学习模型。结果表明,与集中式机器学习相比,该方法在预测攻击时的错误率最低,误报也更少。Aouedi等人[21]通过训练局部自编码器(AE)来学习入侵数据的特征,实现了一种用于联邦学习中入侵检测的半监督学习方案。赵(Zhao)等人[22]提出了一种基于联邦学习和迁移学习的网络异常检测方法,以解决网络异常检测训练数据稀缺的问题。该模型在检测UNSW – NB15数据集上的漏洞攻击时成功率达到了 97.23%

     。为了在保护隐私的情况下快速检测高级持续性威胁(APT)攻击, Hu

     等人[23]提出了一种基于联邦元学习的流量检测方法。该方法在CIC – IDS2017和DAPT2020数据集上的检测准确率分别达到了86.67%和67.6%。丁(Ding)等人[24]为大规模物联网网络提出了一种可扩展的网络入侵检测系统(NIDS)。该系统利用元学习框架优化基于图神经网络(GNN)的网络入侵检测系统的并行性,并引入了联盟形成策略,以提高网络入侵检测系统的准确性并降低通信开销。

    B. Personalized Federated Learning in IoT

    B. 物联网中的个性化联邦学习

    Tan et al. [19] proposed to tackle two challenges in federated learning: slow convergence and subpar performance when dealing with highly heterogeneous (Non-IID) data, as well as the absence of personalized models for local tasks or datasets. Luca et al. [25] introduced the use of data augmentation to address the issue of out-of-domain generalization in federated learning. They demonstrated that suitable data augmentation could mitigate Non-IID effects in FL, thereby enhancing FL generalization and convergence from a causal perspective. Fraboni et al. [26] propose a jointly learned client selection strategy based on cluster selection. Compared to multinomial distribution (MD) sampling, this strategy is both unbiased and reduces the variance of random client aggregation. Implementing personalized federated learning includes a variety of model-based approaches in addition to data-based approaches, one of which is meta-learning. Meta-learning aims to enable models to rapidly learn new tasks based on existing knowledge. The MAML algorithm [27] possesses robust generalization capabilities and can be applied in various gradient descent methods, such as supervised learning and reinforcement learning, hence its designation as a model-independent meta-learning algorithm. Fallah et al. [28] proposed Per-FedAvg within FedAvg using MAML equations. They demonstrated that meta-learning can enhance the training of shared initial global models. Participants involved in the training or new clients only need to execute a few steps of gradient descent on local data. This enables adaptation to the local dataset and ultimately yields a more personalized model.

    Tan等人 [19] 提出解决联邦学习中的两个挑战:处理高度异构(非独立同分布,Non-IID)数据时收敛速度慢和性能不佳的问题,以及缺乏针对本地任务或数据集的个性化模型的问题。Luca等人 [25] 引入了数据增强的方法来解决联邦学习中的域外泛化问题。他们证明了合适的数据增强可以减轻联邦学习(FL)中的非独立同分布效应,从而从因果关系的角度提高联邦学习的泛化能力和收敛性。Fraboni等人 [26] 提出了一种基于聚类选择的联合学习客户端选择策略。与多项分布(MD)采样相比,该策略既无偏又能降低随机客户端聚合的方差。除了基于数据的方法外,实现个性化联邦学习还包括多种基于模型的方法,元学习就是其中之一。元学习旨在使模型能够基于现有知识快速学习新任务。模型无关元学习(MAML)算法 [27] 具有强大的泛化能力,可应用于各种梯度下降方法,如监督学习和强化学习,因此它被称为与模型无关的元学习算法。Fallah等人 [28] 在联邦平均算法(FedAvg)中使用模型无关元学习算法的公式提出了个性化联邦平均算法(Per-FedAvg)。他们证明了元学习可以改进共享初始全局模型的训练。参与训练的参与者或新客户端只需在本地数据上执行几步梯度下降。这使得模型能够适应本地数据集,最终得到更个性化的模型。

    In our paper, we address the lack of local data adaptation in existing federated learning intrusion detection work by proposing a locally personalized scheme based on federated meta-learning. We provide detailed theoretical proof to support our approach. Additionally, to expedite the deployment of federated meta-learning, we have optimized the client aggregation strategy. This optimization accelerates the deployment of meta-learning without incurring additional resource overhead, and we have also provided theoretical proof for this improvement.

    在我们的论文中,我们通过提出一种基于联邦元学习的本地个性化方案,解决了现有联邦学习入侵检测工作中缺乏本地数据自适应的问题。我们提供了详细的理论证明来支持我们的方法。此外,为了加快联邦元学习的部署,我们优化了客户端聚合策略。这种优化在不产生额外资源开销的情况下加速了元学习的部署,并且我们也为这一改进提供了理论证明[wc3] 。

    III. PRELIMINARIES

    A. Federated Meta Learning

    A. 联邦元学习

    Firstly, let us briefly review the formulation of MAML. The core idea of MAML is to learn an effective initial set of model parameters, enabling the model to adapt quickly and perform well on new tasks with only a few gradient updates. In the context of federated learning, for each participating user, if we assume each user takes the initial point and updates it using one step of gradient descent based on its own loss function, the federated learning global optimization objective Fw

     is defined as follows:

    首先,让我们简要回顾一下模型无关元学习(MAML,Model-Agnostic Meta-Learning)的公式化表达。MAML的核心思想是学习一组有效的模型初始参数,使模型能够在仅进行几次梯度更新的情况下,快速适应新任务并表现良好。在联邦学习的背景下,对于每个参与用户,如果我们假设每个用户都从初始点开始,并基于自身的损失函数使用一步梯度下降法对其进行更新,那么联邦学习全局优化目标 Fw

     定义如下:

    minw∈RdFw := 1Ni=1Nfiwαfiw.  (1)

    Here, the local dataset Di

     is generated by post-processing the traffic files sampled from IoT devices, which is then used to train the local model parameters wi

     . The individual loss function is defined as fiw

     , and the parameters wi

     are updated via gradient descent with respect to fi

     . The step size α0

     represents the learning rate utilized by the node for training the model.

    这里,本地数据集 Di

     是通过对从物联网(IoT)设备采样的流量文件进行后处理生成的,然后用于训练本地模型参数 wi

     。个体损失函数定义为 fiw

     ,参数 wi

     通过相对于 fi

     的梯度下降法进行更新。步长 α0

     表示节点用于训练模型的学习率。

    The advantage of this computational method is that both existing and new users can adopt it by using the solution of the new problem as an initial point and slightly adjusting it based on their own data. This means that users can utilize the initialized result and then update it with their dataset Di

     by performing only one or a few gradient descent steps. Users define their metafunctions locally as follows:

    这种计算方法的优势在于,现有用户和新用户都可以采用它,将新问题的解作为初始点,并根据自身数据对其进行微调。这意味着用户可以利用初始化结果,然后通过仅执行一步或几步梯度下降,使用其数据集 Di

     对其进行更新。用户在本地定义其元函数如下:

    Fiw := fiwαfiw.  (2)

    Following the completion of global model aggregation by the server, the updated model is dispatched to the node to commence a fresh round of meta-learning training. Through multiple rounds of communication, the initial model Fw

     is attained. A detailed discussion on the model update algorithm will be provided in Section IV.

    服务器完成全局模型聚合后,将更新后的模型分发到节点,以开始新一轮的元学习训练。通过多轮通信,得到初始模型 Fw

     。将在第四节详细讨论模型更新算法。

    B. Data Heterogeneity

    B. 数据异构性

    We define the traffic dataset D={x,y}

     as the global dataset. Here, x

     represents the features extracted from network traffic that includes benign traffic as well as the latest common network attacks (such as Fuzzer, Backdoor, Exploits, DDoS, etc.), including attributes such as timestamps, source and destination IP addresses, source and destination ports, and protocol type. The label y

     corresponds to x

     , representing the type of network attack. Consider N

     IoT devices, labeled as C1,C2,…,Cn

     , acting as nodes. Each node has its own local dataset, denoted as Di=xi,yi

     . Similarly, the label yi

     of the local data from edge IoT devices corresponds to the type of network attack occurring within the local network of the device. Notate the local dataset size and global dataset size with Di

     and D

     respectively. Here, D=iNDi

     denotes the total number of global samples equals the sum of local samples. The distribution of traffic data from real IoT devices is diverse, making it challenging to construct meaningful federated traffic datasets due to privacy concerns. Following the suggestion by Li et al. [29], dividing the real dataset into a distributed Non-IID dataset helps balance the local data across nodes while facilitating effective FL experiments. In this section, we discuss how to partition the traffic dataset. Initially, we assume that traffic data on any device is associated with one label. We define the local data distribution Pxi,yi=PxiyiPyi

     or Pxi,yi=PyixiPxi

     . Drawing from Non-IID classification [30], we propose three methods for partitioning federated traffic data:

    我们将流量数据集 D={x,y}

     定义为全局数据集。这里, x

     表示从网络流量中提取的特征,其中包括良性流量以及最新的常见网络攻击(如模糊测试器(Fuzzer)、后门(Backdoor)、漏洞利用(Exploits)、分布式拒绝服务攻击(DDoS)等),包括时间戳、源 IP 地址和目的 IP 地址、源端口和目的端口以及协议类型等属性。标签 y

     与 x

     相对应,表示网络攻击的类型。考虑 N

     个物联网(IoT)设备,标记为 C1,C2,…,Cn

     ,作为节点。每个节点都有自己的本地数据集,表示为 Di=xi,yi

     。同样,来自边缘物联网设备的本地数据的标签 yi

     对应于设备本地网络内发生的网络攻击类型。分别用 Di

     和 D

     表示本地数据集大小和全局数据集大小。这里, D=iNDi

     表示全局样本总数等于本地样本之和。来自真实物联网设备的流量数据分布多样,由于隐私问题,构建有意义的联邦流量数据集具有挑战性。根据Li等人 [29] 的建议,将真实数据集划分为分布式非独立同分布(Non – IID)数据集有助于平衡各节点的本地数据,同时便于进行有效的联邦学习(FL)实验。在本节中,我们讨论如何划分流量数据集。最初,我们假设任何设备上的流量数据都与一个标签相关联。我们定义本地数据分布 Pxi,yi=PxiyiPyi

     或 Pxi,yi=PyixiPxi

     。借鉴非独立同分布分类 [30],我们提出了三种划分联邦流量数据的方法:

    1) Label Imbalance: In most IoT devices, the types of attacks originating from device vulnerabilities are typically limited, resulting in different label distributions for each node’s dataset,     i.e., the distribution Pyi

     varies across nodes. To quantify this, we use the matrix X∈RK×N

     as the label distribution matrix, where K

     represents the number of attack categories and N

     represents the number of nodes. Each row vector xk∈RN

     represents the probability distribution of category k

     across different nodes, with each dimension of the vector indicating the proportion of samples from category k

     assigned to different nodes. To define a random variable for each category and allow for the adjustment of the distribution’s balance, we use the Dirichlet distribution, which is suitable for multivariate random variables and has been widely discussed in studies of data heterogeneity in FL [31]. Therefore, we sample from PK∼DPα,G0

     and select a random vector as the probability distribution vector xk∈RN

     for the labels. Here, G0

     represents the base distribution for a single node, and the imbalance can be adjusted through the parameter α

     , where smaller values of α

     result in greater imbalance.

    1) 标签不平衡:在大多数物联网(IoT)设备中,源自设备漏洞的攻击类型通常有限,导致每个节点的数据集标签分布不同,即分布 Pyi

     在不同节点间存在差异。为了量化这一情况,我们使用矩阵 X∈RK×N

     作为标签分布矩阵,其中 K

     表示攻击类别数量, N

     表示节点数量。每个行向量 xk∈RN

     表示类别 k

     在不同节点间的概率分布,向量的每个维度表示分配到不同节点的类别 k

     样本的比例。为了为每个类别定义一个随机变量并调整分布的平衡性,我们使用狄利克雷分布(Dirichlet distribution),它适用于多元随机变量,并且在联邦学习(FL)的数据异质性研究中已被广泛讨论 [31]。因此,我们从 PK∼DPα,G0

     中采样,并选择一个随机向量作为标签的概率分布向量 xk∈RN

     。这里, G0

     表示单个节点的基础分布,不平衡程度可以通过参数 α

     进行调整, α

     的值越小,不平衡程度越大。

    2) Feature Imbalance: Abnormal activities in IoT traffic data significantly differ from normal system activities, even within data sampled from the same device, abnormal and normal traffic data possess different traits. We denote these feature imbalances as Pxi

     distribution imbalances.

    2) 特征不平衡:物联网流量数据中的异常活动与正常系统活动显著不同,即使是从同一设备采样的数据中,异常和正常流量数据也具有不同的特征。我们将这些特征不平衡表示为 Pxi

     分布不平衡。

    3) Quantity Imbalance: Differences in the communication capabilities of IoT devices lead to variations in the scale of their traffic data, resulting in different sizes for each node’s local dataset Di

     . Similar to label imbalance, we utilize the Dirichlet distribution to allocate varying quantities of labeled samples to each node. The random vector xn∈RN

     represents the distribution of sample quantities for each category across nodes, with each dimension of the vector indicating the proportion of samples assigned to different nodes. The vector xn

     is sampled from the distribution Pn∼DPα,G0

     . The product xn×D=Dii=1N

     corresponds to the local dataset size for each label at a node. This paper simultaneously employs the three aforementioned partitioning methods to divide the traffic dataset, thereby creating local datasets for experiments and achieving a Non-IID data distribution for IoT devices.

    3) 数量不平衡:物联网设备通信能力的差异导致其流量数据规模存在变化,从而使每个节点的本地数据集大小不同 Di

     。与标签不平衡类似,我们利用狄利克雷分布(Dirichlet distribution)为每个节点分配不同数量的带标签样本。随机向量 xn∈RN

     表示各类别样本数量在各节点间的分布情况,向量的每个维度表示分配给不同节点的样本比例。向量 xn

     是从分布 Pn∼DPα,G0

     中采样得到的。乘积 xn×D=Dii=1N

     对应于节点上每个标签的本地数据集大小。本文同时采用上述三种划分方法对流量数据集进行划分,从而为实验创建本地数据集,并实现物联网设备的非独立同分布(Non-IID)数据分布。

    IV.The PerFLID WorkfLOW

    A. Architecture Design of Personalized Federated Meta-Learning Intrusion Detection

    A. 个性化联邦元学习入侵检测的架构设计

    In the IoT scenario, the PFL framework for traffic intrusion detection is shown in Figure 1 and consists of three layers: the node layer, which includes IoT edge devices, security gateways, and firewalls; the data transmission layer, which mainly consists of communication base stations; and the central server layer, which is usually deployed on a cloud server. The cloud server is responsible for training and maintaining the global model Mg

     . After the training is completed, each device in the node layer then performs several rounds of local updates based on Mg

     to obtain the local personalized model PMl=pml0,…,pmlN

     . The specific implementation scheme of PerFLID is shown in Figure 2, and the final trained local personalized model PMl

     can be integrated into the traffic intrusion detection tool. The final trained local personalized model PMl

     can be integrated into a traffic intrusion detection tool to deploy effective traffic detection locally. We categorize data collection to model training into the following five steps:

    在物联网(IoT)场景中,用于流量入侵检测的PFL框架如图1所示,由三层组成:节点层,包括物联网边缘设备、安全网关和防火墙;数据传输层,主要由通信基站组成;以及中央服务器层,通常部署在云服务器上。云服务器负责训练和维护全局模型 Mg

     。训练完成后,节点层中的每个设备随后基于 Mg

     进行几轮本地更新,以获得本地个性化模型 PMl=pml0,…,pmlN

     。PerFLID的具体实施方案如图2所示,最终训练得到的本地个性化模型 PMl

     可以集成到流量入侵检测工具中。最终训练得到的本地个性化模型 PMl

     可以集成到流量入侵检测工具中,以在本地部署有效的流量检测。我们将从数据收集到模型训练分为以下五个步骤:

    1) To identify unknown network attacks, IoT end devices utilize the open-source tool CICflowmeter for traffic characterization. The tool samples traffic flowing through the gateway firewall, extracts feature information from packets stored in Packet Capture (PCAP) data format files and builds a local dataset in Comma Separated Values (CSV) format.

    1) 为识别未知网络攻击,物联网终端设备使用开源工具CIC流流量计(CICflowmeter)进行流量特征分析。该工具对通过网关防火墙的流量进行采样,从以数据包捕获(Packet Capture,PCAP)数据格式文件存储的数据包中提取特征信息,并构建逗CSV格式的本地数据集。

    2) The central server employs the initialization parameter w0

     to train the initialization model, which is subsequently transmitted to the selected edge node via the IoT. The detection local model Ml

     of the edge nodes is kept the same as the central server. The task of this model is to learn traffic features and perform multi-class prediction. Considering that the traffic feature data collected by the edge devices exhibits uneven label distribution, feature distribution, and volume distribution, for local data adaptation, we use in Equation (2) as a loss function on the edge devices. This facilitates initial model training for local traffic feature updates and model parameters wt

     .

    2) 中央服务器使用初始化参数 w0

     训练初始化模型,随后通过物联网将其传输到选定的边缘节点。边缘节点的检测本地模型 Ml

     与中央服务器保持一致。该模型的任务是学习流量特征并进行多类别预测。考虑到边缘设备收集的流量特征数据存在标签分布、特征分布和数量分布不均的情况,为实现本地数据适配,我们在边缘设备上使用公式(2)作为损失函数。这有助于对本地流量特征更新和模型参数 wt

     进行初始模型训练。

    3) Once an edge node has updated its model parameters through meta-learning, it uploads the new model parameters wt

     to the central server via the base station. Subsequently, the central server performs a weighted aggregation based on the size of each node’s data as weights, resulting in new parameters wt+1

     for Mg

     .

    3) 一旦边缘节点通过元学习更新了其模型参数,它会通过基站将新的模型参数 wt

     上传到中央服务器。随后,中央服务器以每个节点的数据大小为权重进行加权聚合,从而得到 Mg

     的新参数 wt+1

     。

    4) The central server sends the aggregated parameter wt+1

     to the selected edge nodes. The probability of each node being selected, determined by the gradient parameters of the local model Ml

     uploaded to the server, is calculated as shown in Algorithm 1. Then the edge node performs further meta-learning training and continues the iterative process described earlier.

    4) 中央服务器将聚合后的参数 wt+1

     发送给选定的边缘节点。每个节点被选中的概率由上传到服务器的本地模型 Ml

     的梯度参数决定,其计算方法如算法 1 所示。然后,边缘节点进行进一步的元学习训练,并继续前面描述的迭代过程。

    image

        

    Fig. 2. The Overview of PerFLID Framework.

    图 2. PerFLID 框架概述。

    5) After the last round of model aggregation, the global model Mg

     is distributed to all edge nodes. Subsequently, each edge node performs one or several rounds of training based on this model using the local dataset and finally obtains the personalized model PMli

     .

    5) 在最后一轮模型聚合之后,全局模型 Mg

     被分发到所有边缘节点。随后,每个边缘节点基于该模型使用本地数据集进行一轮或多轮训练,最终得到个性化模型 PMli

     。

    B. Meta-Learning Training in Edge Node

    B. 边缘节点的元学习训练

    Each device can ultimately train its personalized model based on the initial model to address the objective function in Equation (2) and enable the server to discern disparities between various edge node models. Algorithm 1 outlines the meta-learning training process. The initial step involves computing the gradient of each edge device’s local loss function ∇Fiw

     :

    每个设备最终都可以基于初始模型训练其个性化模型,以求解公式(2)中的目标函数,并使服务器能够识别不同边缘节点模型之间的差异。算法1概述了元学习训练过程。第一步是计算每个边缘设备的局部损失函数 ∇Fiw

     的梯度:

    Fiw=Iα2fiwfiwαfiw.  (3)

    Calculating the gradient ∇fiw

     in each round usually requires a high computational cost. Therefore, we adopt a batch of data Di

     and conduct unbiased analysis ∇fiw,Di

     :

    通常,每一轮计算梯度 ∇fiw

     都需要较高的计算成本。因此,我们采用一批数据 Di

     并进行无偏分析 ∇fiw,Di

     :

    fiw,Di := 1Dix,yDiliw;x,y.  (4)

    In this process, liw;x,y

     represents the loss function of the edge device training model. In the second part, during the k

     th round of communication, the server initially selects the edge devices for the current global model wk

     based on the node selection strategy. Each edge device iSk

     then executes τ

     steps of random gradient descent concerning Fi

     . These local updates generate a local sequence wi,tk+1t=0τ

     , where wi,0k+1=wik+1

     and wi,τk+1=wik+2

     for 1tτ

     :

    在这个过程中, liw;x,y

     表示边缘设备训练模型的损失函数。在第二部分,在第 k

     轮通信期间,服务器最初根据节点选择策略为当前全局模型 wk

     选择边缘设备。然后,每个边缘设备 iSk

     针对 Fi

     执行 τ

     步随机梯度下降。这些局部更新生成一个局部序列 wi,tk+1t=0τ

     ,其中对于 1tτ

     有 wi,0k+1=wik+1

     和 wi,τk+1=wik+2

     :

    wi,tk+1=wi,t1k+1βFiwi,t1k+1,  (5)

    Sk

     denotes the collection of nodes participating in the iteration. β

     represents the learning rate of local updates, where k+1

     indicates training using the meta-model parameters of the k

     rounds. ∇Fiwi,t1k+1

     denotes the estimated value of ∇Fiwi,t1k+1

     in Equation (7). Then, adding both sides of Equation (5) for 1tτ

     , we note that:

    Sk

     表示参与迭代的节点集合。 β

     表示局部更新的学习率,其中 k+1

     表示使用第 k

     轮的元模型参数进行训练。 ∇Fiwi,t1k+1

     表示方程 (7) 中 ∇Fiwi,t1k+1

     的估计值。然后,对 1tτ

     对方程 (5) 的两边求和,我们注意到:

    wik+2=wik+1βt=1τFiwi,t1k+1.  (6)

    In the final step, independent batches Di,t,Di,t'

     , and Di,t''

     are utilized to compute the stochastic gradient ∇Fiwi,t1k+1

     for all local iterations, as follows:

    在最后一步中,使用独立批次 Di,t,Di,t'

     和 Di,t''

     来计算所有局部迭代的随机梯度 ∇Fiwi,t1k+1

     ,如下所示:

    Fiwi,t1k+1 := Iα2fiwi,t1k+1,Di,t''⋅

    fiwi,t1k+1αfiwi,t1k+1,Di,t,Di,t'.  (7)

    Once the local update wi,τk+1

     is completed, the edge node uploads it to the server. Subsequently, the server calculates the global wk+1

     based on the aggregated weights pik

     of each node:

    一旦本地更新 wi,τk+1

     完成,边缘节点将其上传到服务器。随后,服务器根据每个节点的聚合权重 pik

     计算全局 wk+1

     :

    wk+1=iSkpikwi,τk+1.  (8)

    Finally, the server distributes the global model wk+1

     to the selected set of nodes Sk

     and initiates a new round of training. After a certain number of training rounds, each node locally updates its model Ml

     , ultimately obtaining a personalized model PMl

     . Compared to Ml

     , the personalized model PMl

     derived from our method better adapts to local data. We will theoretically and experimentally validate the effectiveness of our method in Sections V and VI, respectively.

    最后,服务器将全局模型 wk+1

     分发到选定的节点集 Sk

     并启动新一轮训练。经过一定数量的训练轮次后,每个节点在本地更新其模型 Ml

     ,最终获得个性化模型 PMl

     。与 Ml

     相比,我们的方法得到的个性化模型 PMl

     能更好地适应本地数据。我们将分别在第五节和第六节从理论和实验上验证我们方法的有效性。

    C. Node Selection Based on Gradient Similarity Clustering

    C. 基于梯度相似度聚类的节点选择

    In FL, the global model is obtained by aggregating all nodes and performing a weighted summation of the local model parameters wit

     returned by the nodes to the server,

    在联邦学习(FL)中,全局模型是通过聚合所有节点并对节点返回给服务器的局部模型参数 wit

     进行加权求和得到的。

    wt=iNpiwit.  (9)

    Here, pi=DiD

     denotes aggregation weights of nodes participating in the iteration depend on the ratio of the number of local samples to the number of global samples. Since the FedAvg algorithm does not involve all nodes in each round of aggregation, but randomly selects m=rN

     nodes based on the ratio r

     , the final global model with weights is:

    这里, pi=DiD

     表示参与迭代的节点的聚合权重,该权重取决于局部样本数量与全局样本数量的比例。由于联邦平均(FedAvg)算法在每一轮聚合中并不涉及所有节点,而是根据比例 r

     随机选择 m=rN

     个节点,因此最终带权重的全局模型为:

    wt=iSkpiwit+iSkpiwit1.  (10)

    Algorithm 1 PerFLID Workflow

    算法1: PerFLID工作流程

    .

    In Equation (10), the set of nodes Sk

     is randomly selected by the server. However, this random selection strategy poses potential security risks in real-world federated intrusion detection scenarios. If a malicious node exists, it may be randomly selected to join the set Sk

     and participate in the aggregation process through backdoor attacks. Once a node is subjected to an untargeted backdoor attack, the task accuracy of the global model is compromised [32]. Under such influence, the global model may ignore the features of some key nodes, worsening the convergence effect of the global model. Consequently, the generalization performance of the final trained initial model will be affected, hindering the process of user-personalized model training.

    在公式 (10) 中,节点集 Sk

     由服务器随机选择。然而,这种随机选择策略在现实世界的联邦入侵检测场景中存在潜在的安全风险。如果存在恶意节点,它可能会被随机选中加入集合 Sk

     ,并通过后门攻击参与聚合过程。一旦某个节点遭受无针对性的后门攻击,全局模型的任务准确性就会受到影响 [32] 。在这种影响下,全局模型可能会忽略一些关键节点的特征,从而恶化全局模型的收敛效果。因此,最终训练的初始模型的泛化性能将受到影响,阻碍用户个性化模型的训练过程。

    Several algorithms have been proposed to address the problem of backdoor attacks in federated learning models, including distance-based defense methods [33] and defense methods based on Singular Value Decomposition (SVD) and clustering [34]. Our proposed aggregation method is similar to [34], however, our paper places a greater emphasis on designing a new node selection strategy for the rapid aggregation of local personalized models. This strategy must be unbiased and ensure that the total expected value of each selected node equals the global aggregation value obtained by selecting all nodes, as shown in the following equation:

    为解决联邦学习模型中的后门攻击问题,人们已经提出了多种算法,包括基于距离的防御方法[33]以及基于奇异值分解(SVD)和聚类的防御方法[34]。我们提出的聚合方法与文献[34]类似,不过,本文更侧重于设计一种新的节点选择策略,用于快速聚合本地个性化模型。该策略必须是无偏的,并且要确保每个被选节点的总期望值等于选择所有节点所得到的全局聚合值,如以下公式所示:

    ESkwt=ESkjSkpjwjt := iNpiwit.  (11)

    In Algorithm 1, the server retains the model parameters of edge nodes from the previous round of model communication. We assume that the model parameters are represented as vectors. Many studies have explored node clustering in client selection, utilizing various clustering algorithms like DBSCAN [35], hierarchical clustering [26], and K-means [36]. Due to the large size of local models and the presence of high-dimensional features in the models, some have proposed their distance metrics [33]. In our approach, we measure the similarity of node models using cosine similarity and then classify the node models using Arccos and hierarchical clustering. The server then measures the similarity between each pair of client model parameters using the cosine similarity, which is calculated as follows:

    在算法1中,服务器保留上一轮模型通信中边缘节点的模型参数。我们假设模型参数以向量形式表示。许多研究在客户端选择中探索了节点聚类,采用了各种聚类算法,如DBSCAN [35]、层次聚类[26]和K-means [36]。由于局部模型规模较大且模型中存在高维特征,一些研究提出了自己的距离度量方法[33]。在我们的方法中,我们使用余弦相似度来衡量节点模型的相似度,然后使用反余弦(Arccos)和层次聚类对节点模型进行分类。然后,服务器使用余弦相似度来衡量每对客户端模型参数之间的相似度,其计算方法如下:

    cosine_similarity A,B=ABA∥⋅∥B∥ .   (12)

    where AB

     is the dot product of vectors A

     and B

     , and ∥A

     and ∥B

     are the modes of vectors A

     and B

     , respectively. Next, the cosine similarity is converted to an angle using the inverse cosine function (Arccos) to better represent the differences between the vectors, which is calculated as:

    其中 AB

     是向量 A

     和 B

     的点积, ∥A

     和 ∥B

     分别是向量 A

     和 B

     的模。接下来,使用反余弦函数(Arccos)将余弦相似度转换为角度,以便更好地表示向量之间的差异,其计算方法为:

    θ=arccos cosine _ similarity A,B .   (13)

    where θ

     is the angle between the vectors A

     and B

     . Then calculate the angle between the gradients of each node to get the angle matrix Θ

     . Finally, select a clustering algorithm (such as K-means, hierarchical clustering, etc.) to cluster the angle matrix Θ

     and obtain m

     categories.

    其中 θ

     是向量 A

     和 B

     之间的夹角。然后计算每个节点梯度之间的夹角,得到夹角矩阵 Θ

     。最后,选择一种聚类算法(如 K – 均值聚类、层次聚类等)对夹角矩阵 Θ

     进行聚类,得到 m

     个类别。

    Note that the number of clusters should match the number of nodes sampled by the server to ensure unbiased selection. Subsequently, the probability of each node being selected for each selection is calculated based on the number of samples in the cluster, generating m

     independent probability distributions Rtkt=1m

     .

    注意,聚类的数量应与服务器采样的节点数量相匹配,以确保选择的无偏性。随后,根据聚类中的样本数量计算每次选择时每个节点被选中的概率,生成 m

     个独立的概率分布 Rtkt=1m

     。

    Although our gradient clustering-based node selection method is not specifically designed for backdoor attack scenarios, it can still screen out malicious nodes by leveraging the gradient differences between nodes. This capability enables it to be combined with other secure aggregation methods to defend against backdoor attacks in each round of federated learning, thereby circumventing security risks during the training of personalized NIDS models.

    尽管我们基于梯度聚类的节点选择方法并非专门为后门攻击场景设计,但它仍可以利用节点之间的梯度差异筛选出恶意节点。这种能力使其能够与其他安全聚合方法相结合,在每一轮联邦学习中抵御后门攻击,从而规避个性化网络入侵检测系统(NIDS)模型训练过程中的安全风险。

    We will prove our algorithm satisfies the assumption of unbiased selection in Section     V. Although Algorithm 1 requires execution in each round of communication, it does not incur additional communication overhead, and our experimental results in Section VI also demonstrate a faster convergence during training.

    我们将在第五节证明我们的算法满足无偏选择的假设。尽管算法1需要在每一轮通信中执行,但它不会产生额外的通信开销,并且我们在第六节的实验结果也表明训练期间的收敛速度更快。

    V. Convergence Analysis of PerFLID

    五、PerFLID的收敛性分析

    In this section, we provide a theoretical analysis of how PerFLID improves the convergence speed of federated learning. This includes the model assumptions used to analyze the convergence of federated meta-learning in Section V-A, as well as an analysis of how meta-learning with randomly participating nodes in aggregation affects convergence, as discussed in Section V-B. Furthermore, we present a theoretical examination of the strategy for selecting sets for updates based on node gradient clustering in each training round. We demonstrate that this strategy leads to a more effective lower bound for aggregation selection, as shown in Section V-C and Section V-D.

    在本节中,我们对PerFLID如何提高联邦学习的收敛速度进行理论分析。这包括在第五节A中用于分析联邦元学习收敛性的模型假设,以及在第五节B中讨论的聚合中随机参与节点的元学习如何影响收敛的分析。此外,我们对每一轮训练中基于节点梯度聚类选择更新集合的策略进行理论检验。我们证明了该策略能为聚合选择带来更有效的下界,如第五节C和第五节D所示。

    A. Modeling Assumptions

    A. 建模假设

    We concentrate on non-convex settings and delineate the communication between the server and nodes in two consecutive rounds to ensure that the anticipated decrease in the global loss function satisfies EFwk+1-EFwkε

     . For the theoretical analysis of PerFLID, several standard assumptions are typically made on the FML models (refer to     e.g., [28], [37], [38]). We will utilize the following assumptions in our analysis:

    我们专注于非凸设置,并描绘了服务器和节点在连续两轮中的通信情况,以确保全局损失函数的预期下降满足 EFwk+1-EFwkε

     。对于PerFLID的理论分析,通常会对联邦机器学习(FML)模型做出几个标准假设(例如,参见[28]、[37]、[38])。我们将在分析中使用以下假设:

    Assumption 1: The gradient of functions fi

     is bounded by a nonnegative constant Ui

     ,     i.e., ∇fiwUi

    假设1:函数 fi

     的梯度由一个非负常数 Ui

     界定,即 ∇fiwUi

    Assumption 2: Function fi

     is twice continuously differentiable and Li

     -smooth for every node i∈{1,…,N}

     ,     i.e., ∇fiw-∇fiuLiwu∥∀w,u∈Rd

     . The gradient Lipschitz assumption also implies that fi

     satisfies the following conditions for all w,u∈Rd

     :

    假设2:函数 fi

     二次连续可微,并且对于每个节点 i∈{1,…,N}

     是 Li

     -光滑的,即 ∇fiw-∇fiuLiwu∥∀w,u∈Rd

     。梯度利普希茨(Lipschitz)假设还意味着,对于所有 w,u∈Rd

     , fi

     满足以下条件:

    fiwfiu-∇fiu,wuLi2wu2.  (14)

    As discussed in Section IV-B, the update rules in Algorithm 1 involve the second derivatives of all nodes’ loss functions. Therefore, it is necessary to impose positive definiteness assumptions on the Hessian matrices of all fi

     .

    如第四节B中所讨论的,算法1中的更新规则涉及所有节点损失函数的二阶导数。因此,有必要对所有 fi

     的海森(Hessian)矩阵施加正定性假设。

    Assumption 3: For every node i∈{1,…,N}

     , the Hessian of function fi

     is ρi

     -Lipschitz continuous,     i.e.,

    假设3:对于每个节点 i∈{1,…,N}

     ,函数 fi

     的海森矩阵是 ρi

     – 利普希茨连续的,即:

    2fiw-∇2fiuρiwu∥∀w,u∈Rd.  (15)

    A significant disparity in loss between nodes and the server can hinder model convergence in each communication round. Therefore, we make the following similar assumptions about their gradients.

    节点与服务器之间的损失存在显著差异,这可能会阻碍每一轮通信中的模型收敛。因此,我们对它们的梯度做出以下类似假设。

    Assumption 4: The gradient of local functions Fiw

     is at most B-locally dissimilar from Fw

     for each node i

     ,     i.e., ∇FiwBi∥∇Fw

     . We expect both sides of the equation to get EiFiw2≤∥∇Fw2B2

     .

    假设4:对于每个节点 i

     ,局部函数 Fiw

     的梯度与 Fw

     至多存在B-局部差异,即 ∇FiwBi∥∇Fw

     。我们期望该等式两边能得到 EiFiw2≤∥∇Fw2B2

     。

    In the remainder of this paper, we define ρ := maxρi

     and L := maxLi

     , which can be regarded as bounds in each assumption.

    在本文的其余部分,我们定义 ρ := maxρi

     和 L := maxLi

     ,它们可以被视为每个假设中的边界。

    Assumption 5: The variance of the difference between the local and global model parameters is limited for all nodes i at any local update time t

     ,     i.e.,

    假设 5:在任何局部更新时间 t

     ,所有节点 i 的局部和全局模型参数之间差异的方差是有限的,即

    E1ni=1nwi,tkwtk2β2σw2 for any 0tτ.  (16)

    The discrepancies in gradients and Hessians between nodes and the server during each training round are influenced by the participating training datasets Di,Di'

     , and Di''

     . We make assumptions about the differences in gradients and Hessians.

    在每个训练轮次中,节点与服务器之间的梯度和海森矩阵的差异受参与训练的数据集 Di,Di'

     和 Di''

     的影响。我们对梯度和海森矩阵的差异做出假设。

    Assumption 6: The gradient estimate ∇Fiw

     in (7) which is computing by using Di,Di'

     and Di''

     is close enough to the true value,     i.e.,

    假设 6:在 (7) 中使用 Di,Di'

     和 Di''

     计算得到的梯度估计值 ∇Fiw

     足够接近真实值,即

    E∇Fiw-∇FiwσφD2 := φDi,

    E∇Fiw-∇Fiw2σϕD2 := ϕDi,Di',Di''.  (17)

    B. Meta-Learning With Random Node Selection

    B. 随机节点选择的元学习

    As discussed in Section IV-B, when there are numerous nodes, the standard federated meta-learning algorithm randomly selects m

     nodes as set Sk

     to perform local training in each communication round. The enhancement of the global model by nodes in each round of updates is influenced by the distribution of each node’s data. Therefore, random selection cannot consistently pick the nodes that contribute optimally to global updates. To prove this point, we first need to establish the intermediate result that, under Algorithm Assumptions 1 to 3, the local meta-function Fiw

     defined by Equation (2) and the average of each node’s meta-function Fw=1/Ni=1NFiw

     are both smooth.

    如第四节B部分所述,当存在大量节点时,标准联邦元学习算法会在每一轮通信中随机选择 m

     个节点作为集合 Sk

     来进行局部训练。每一轮更新中节点对全局模型的增强效果会受到每个节点数据分布的影响。因此,随机选择无法始终挑选出对全局更新贡献最优的节点。为了证明这一点,我们首先需要建立一个中间结果,即在算法假设1至3的条件下,由公式(2)定义的局部元函数 Fiw

     和每个节点的元函数平均值 Fw=1/Ni=1NFiw

     都是平滑的。

    Lemma 1: If Assumptions 1,2, and 3 hold, then Fi

     in Equation (2) with α0,1/L

     is smooth with parameter L := 4L+αβU

     . Hence, the Fw

     is also smooth with parameter L

     .

    引理1:若假设1、假设2和假设3成立,则方程(2)中带有 α0,1/L

     的 Fi

     关于参数 L := 4L+αβU

     是光滑的。因此,带有参数 L

     的 Fw

     也是光滑的。

    Second, we use the B-locally similarity in Assumptions 4 to derive the difference between the local loss Fiw

     and the global Fw

     .

    其次,我们利用假设4中的B局部相似性来推导局部损失 Fiw

     和全局损失 Fw

     之间的差异。

    Lemma 2: When B-locally dissimilar in Assumptions 4 follows that: B1+σB2

     . Then the true rather than estimated bounded variance in Assumptions 6 holds,     i.e., EiFiw-∇Fw2σB2.

    引理2:当假设4中的B局部不相似性成立时,有: B1+σB2

     。那么,假设6中真实(而非估计)的有界方差成立,即: EiFiw-∇Fw2σB2.

    Using Lemmas 1, and 2, we analyze the expected reduction of the target when executing a step of Algorithm 1 with nodes randomly selected.

    利用引理1和引理2,我们分析在随机选择节点执行算法1的一步时,目标函数的期望减少量。

    Theorem 1: Consider the objective function F

     defined in Equation (1) for the case that α0,1/L

     . Suppose that the conditions in Assumptions 1 to 4 and 6 are satisfied, and recall the definitions of L

     from Lemma 1. Consider running Algorithm 1 with the randomly selected set Sk

     for K

     rounds with τ

     local updates in each round and with β1/10τL

     . The expected decrease in the global loss function satisfies

    定理1:考虑在 α0,1/L

     的情况下,由方程(1)定义的目标函数 F

     。假设假设1至4以及假设6中的条件均满足,并回顾引理1中 L

     的定义。考虑使用随机选择的集合 Sk

     运行算法1,进行 K

     轮,每轮进行 τ

     次局部更新,且 β1/10τL

     。全局损失函数的期望下降量满足

    Fwk+1Fwk3τβ10Fwk2

    +τβ2LβσϕD2σφD2+22Lβ1σB2

    +τβ2τ1L2β21 .   (18)

    Proof: We define the average of randomly selected nodes in the k

     -th round of updates as wk=1/mSkwik

     . By Lemma 1, we know Fw

     is smooth therefore by Assumption 2,Fw

     also satisfies the radient Lipschitz assumption, thus we have

    证明:我们将第 k

     轮更新中随机选择的节点的平均值定义为 wk=1/mSkwik

     。根据引理 1,我们知道 Fw

     是平滑的,因此根据假设 2,Fw

     也满足梯度利普希茨(Gradient Lipschitz)假设,于是我们有

    Fwk+1Fwk+∇Fwk,wk+1wk

    +L2wk+1wk2,  (19)

    recall the equality (6)

    回顾等式 (6)

    wik+1wk=-βt=1τFiwi,t1k,  (20)

    due to

    由于

    Assumption 5, for the nodes selected to participate in the aggregation, sum the two sides of the equation (20) separately, we obtain

    假设 5,对于被选中参与聚合的节点,分别对等式 (20) 的两边求和,我们得到

    wk+1wk=-βmiSkt=1τFiwi,t1k  (21a)

    wk+1wk2=βmiSkt=1τFiwi,t1k2.  (21b)

    Put (21a), (21b) into (19) and take expectation from both sides

    将 (21a)、(21b) 代入 (19) 并对两边取期望

    EFwk+1≤EFwk

    βE∇Fwk,1miSkt=1τFiwi,t1k

    +L2β2E1miSkt=1τFiwi,t1k2.  (22)

    Next, note that

    接下来,注意到

    1miSkt=1τFiwi,t1k

    =A+B+C+1miSkt=1τFiwt1k,  (23)

    We will separately bound the terms in the above formula in the Supplementary Material, and thus, by Equations (58), (60), (62) and (63), we have

    我们将在补充材料中分别对上述公式中的各项进行界定,因此,根据方程(58)、(60)、(62)和(63),我们有

    E1miSkt=1τFiwi,t1k24τE∇Fwt1k2

    +4τσϕ2+8τσB2+4ττ1σw2L2β2.  (24)

    Now, we lower bound the term

    现在,我们对该项进行下界界定

    E∇Fwk,1miSkt=1τFiwi,t1k.  (25)

    We will separately bound the terms in the above formula in the Supplementary Material, by (54), we have

    我们将在补充材料中分别对上述公式中的各项进行界定,根据(54),我们有

    E∇Fwk,1miSkt=1τFiwi,t1k

    =E∇Fwk,A+B+C+1miSkt=1τFiwi,t1k

    ≥E∇Fwk,1miSkt=1τFiwi,t1k

    -∥E∇Fwk,A∥-∥E∇Fwk,B+C∥.  (26)

    By plugging Equations (68) to (70) in Equation (26) implies

    将方程(68)至(70)代入方程(26)可得

    E∇Fwk,1miSkt=1τFiwi,t1k

    τ2Fwk2τσφD22τσB22ττ1L2β2σw2.  (27)

    Due to randomly selecting nodes to participate in aggregation, we have ESkFwk=Fwk

     . Substituting Equation (27) and Equation (25) in Equation (22) implies

    由于随机选择节点参与聚合,我们有 ESkFwk=Fwk

     。将方程(27)和方程(25)代入方程(22)可得

    Fwk+1Fwkτβ122Fwk2

    +τβ2LβσϕD2+σφD2+2τβσB22Lβ+1

    +2ττ1L2β3σw2+1

    Fwk3τβ10Fwk2

    +τβ2LβσϕD2+σφD2+2σB22Lβ+1

    +τβ2τ1L2β2+1 .   (28)

    The last inequality is obtained using β1/10τL

     , which gives us the desired result.

    最后一个不等式是利用 β1/10τL

     得到的,这为我们得到了所需的结果。

    Theorem 1 shows a dependency on parameters L,B,β,τ

     of the FML model. Note that σφD2,σϕD2

     is not a constant, and as expressed in Assumption 5, we can make it arbitrarily small by choosing batch sizes D,D'

     and D''

     large enough. Once again, we focus on special cases to examine the tightness of our results. Let σφD2=σϕD2=0

     , and τ=1

     . In this case, reduces to stochastic gradient descent, where the only source of stochasticity is the batches of gradient. The second term on the right-hand side in Equation (18) reduces to Oβ2LB2

     . This is the classic result for stochastic gradient descent for nonconvex functions, and we recover the lower bounds [39].

    定理1展示了FML模型对参数 L,B,β,τ

     的依赖性。请注意, σφD2,σϕD2

     不是一个常数,并且如假设5所述,我们可以通过选择足够大的批量大小 D,D'

     和 D''

     使其任意小。我们再次关注特殊情况,以检验我们结果的紧性。设 σφD2=σϕD2=0

     ,且 τ=1

     。在这种情况下,其简化为随机梯度下降,其中随机性的唯一来源是梯度批次。方程(18)右侧的第二项简化为 Oβ2LB2

     。这是非凸函数随机梯度下降的经典结果,并且我们恢复了下界[39]。

    C. Clustering-Based Node Selection

    C. 基于聚类的节点选择

    Algorithm 1 states that the node with the highest contribution to the global model in each communication round is most probably selected as the solution to the gradient difference between nodes. The contribution of each node to the global model is represented by hierarchically clustering the gradients uploaded by the nodes and generating independent distributions Rtkt=1m

     . Selection based on the clustering results can speed up the convergence of the model compared to randomly selecting nodes. This point is supported by [40] theoretical proof of convergence of the Multinomial Distribution (MD) Selection section. First, we give Proposition     V.1.

    算法1表明,在每一轮通信中,对全局模型贡献最大的节点最有可能被选为解决节点间梯度差异的方案。每个节点对全局模型的贡献通过对节点上传的梯度进行分层聚类并生成独立分布 Rtkt=1m

     来表示。与随机选择节点相比,基于聚类结果进行选择可以加快模型的收敛速度。这一点得到了[40]中关于多项分布(Multinomial Distribution,MD)选择部分收敛性的理论证明的支持。首先,我们给出命题V.1。

    Proposition 1: The probability rk,it

     of clustered selection from the distribution Rtkt=1m

     in Proposition 1 satisfies Equation (29) the following property as a sufficient condition for unbiased selection:

    命题1:命题1中从分布 Rtkt=1m

     进行聚类选择的概率 rk,it

     满足方程(29),以下性质是无偏选择的充分条件:

    k∈{1,…,m},i=1Nrk,it=1,

    i∈{1,…,n},t=1mrk,it=mDiD.  (29)

    Proof: By Equation (29), when selecting one node from one of the m

     distributions Rtk

     , we have:

    证明:根据方程(29),当从 m

     个分布 Rtk

     中的一个分布中选择一个节点时,我们有:

    ERtkjRtkpjRtkwjk=i=1nrk,itwik.

    Due to the linear relationship of the expected values, the expected new global model is the average of the weighted models obtained for each distribution Rtkt=1m

     according to Equation (30).

    由于期望值的线性关系,根据方程(30),新的全局期望模型是为每个分布 Rtkt=1m

     获得的加权模型的平均值。

    ESkwk=k=1m1mi=1nrk,itwik=i=1npiwik,  (30)

    which means that our theory is valid. ▫

    这意味着我们的理论是有效的。 ▫

    We demonstrate our cluster selection, as described in Proposition     V.1, satisfies Lemma     V.3. Additionally, the MD selection method has similar bounds, indicating that using the FedAvg aggregation algorithm exhibits identical asymptotic behavior when using our proposed clustering and MD selection.

    我们证明了如命题V.1所述的聚类选择方法满足引理V.3。此外,MD选择方法具有相似的边界,这表明在使用我们提出的聚类和MD选择时,采用FedAvg聚合算法表现出相同的渐近行为。

    Lemma 3: Suppose C1,C2,…,Cm

     are the index of sampled nodes. We have

    引理3:假设 C1,C2,…,Cm

     是采样节点的索引。我们有

    ESk1mj=1mwCj=i=1nDiDwi,

    E1miSkwikwk2i=1npiwikwk2.  (31)

    We will analyze the convergence of node-sampled federated meta-learning in the next subsection.

    我们将在下一小节分析节点采样联邦元学习的收敛性。

    D. Meta-Learning With Clustering-Based Node Selection

    D. 基于聚类节点选择的元学习

    In Theorem     V.1 we prove an upper bound on the convergence of meta-learning for random node selection satisfying Assumptions 1 to 3. In Proposition 1 we show that node selection based on cluster sampling has the same bound on FL convergence as MD-based selection. We now combine the two conclusions to prove an upper bound on the convergence of the node selection strategy using meta-learning with cluster-based in Theorem     V.2:

    在定理V.1中,我们证明了满足假设1至3的随机节点选择的元学习收敛性的上界。在命题1中,我们表明基于聚类抽样的节点选择在联邦学习(FL)收敛性上与基于模型差异(MD)的选择具有相同的界。现在,我们结合这两个结论,在定理V.2中证明使用基于聚类的元学习的节点选择策略的收敛性上界:

    Theorem 2: Consider the objective function F

     defined in (1) for the case that α0,1/L

     . Suppose that the conditions in Assumptions 1-5 are satisfied, and recall the definitions of L from Lemma 1. Consider running Algorithm 1 along with its node selection strategy. Then obtain the selected set Sk

     for K

     rounds with τ

     local updates in each round and with β1/10τL

     . We have the following expected decrease in the global objective

    定理2:考虑在 α0,1/L

     的情况下,由(1)式定义的目标函数 F

     。假设满足假设1 – 5中的条件,并回顾引理1中L的定义。考虑运行算法1及其节点选择策略。然后,在 K

     轮中,每轮进行 τ

     次局部更新,且 β1/10τL

     ,得到选定集合 Sk

     。我们得到全局目标函数的如下期望下降量

    Fwk+1Fwk3τβ10Fwk2

    +τβ2LβσϕD2σφD2+2p2Lβ1σB2

    +τβp2τ1L2β21 .   (32)

    Proof: We now prove the convergence of Algorithm 1, FML using our node selection strategy. Similarly, We define the average of selected nodes in the k

     -th round of updates as wk=

     1/mSk'wik

     . To distinguish it from the set of randomly chosen nodes, according to Algorithm 1, we denote by Sk'

     the set of nodes chosen and ESk'Fwk

     as the expectation of the global loss, for the subsequent proofs. By Lemma 1 and Equation (22), we have

    证明:我们现在使用节点选择策略来证明算法1(FML)的收敛性。类似地,我们将第 k

     轮更新中所选节点的平均值定义为 wk=

     1/mSk'wik

     。为了将其与随机选择的节点集区分开来,根据算法1,在后续证明中,我们用 Sk'

     表示所选节点的集合,用 ESk'Fwk

     表示全局损失的期望。根据引理1和方程(22),我们有

    ESk'Fwk+1≤ESk'Fwk

    βE∇Fwk,1miSk't=1τFiwi,t1k

    +L2β2E1miSk't=1τFiwi,t1k2.  (33)

    We will prove Theorem     V. 2 along the same lines as we proved Theorem     V.1. Note that

    我们将按照证明定理V.1的相同思路来证明定理V.2。注意

    1miSkt=1τFiwi,t1k

    =A'+B'+C'+1miSkt=1τFiwt1k.  (34)

    Based on our assumptions and Lemma 2, the upper bounds of A'

     and B'

     remain constant in the above equation. Next, we demonstrate the impact of the choice strategy on C'

     .

    基于我们的假设和引理2,上述方程中 A'

     和 B'

     的上界保持不变。接下来,我们将论证选择策略对 C'

     的影响。

    C'2t=1τ1miSk'Fiwt1k-∇Fiwt1k2

    L2mt=1τiSk'wt1kwt1k2

    =L2t=1τwt1kwt1k2.  (35)

    Based on the conclusion in Lemma 3 to bound C'

     , taking the mathematical expectation for both sides of Equation (35).

    根据引理3中的结论对 C'

     进行界定,对等式(35)两边取数学期望。

    EC'2L2Et=1τEwt1kwt1k2

    =L2t=1τwt1kwt1k2

    L2t=1τi=1npiwi,t1kwt1k2

    pL2β2ττ1σw2.  (36)

    Then, we bound the last term of Equation (34), and we have

    然后,我们对等式(34)的最后一项进行界定,可得

    E1miSkFiwt1k2≤E∇Fwt1k2+pσB2.  (37)

    Note that, by Equation (64), we have

    注意,根据方程(64),我们有

    1miSkt=1τFiwi,t1k24A'2+4B'2+4C'2

    +41miSkt=1τFiwt1k2,

    (38)

    and thus, by Equations (58), (60), (62) and (63), we have

    因此,根据方程(58)、(60)、(62)和(63),我们有

    E1miSkt=1τFiwi,t1k24τE∇Fwk2

    +4τσϕ2+4τ1+pσB2+4L2β2ττ1σw2.  (39)

    Next, we lower bound the term

    接下来,我们对该项进行下界估计

    E∇Fwk,1miSkt=1τFiwi,t1k.  (40)

    By (34), we have

    根据(34),我们有

    E∇Fwk,1miSk't=1τFiwi,t1k

    ≥E∇Fwk,1miSk't=1τFiwi,t1k

    -E∇Fwk,A'-E∇Fwk,B'+C'.  (41)

    We bound the terms of the above inequality separately. First note that

    我们分别对上述不等式的各项进行界定。首先注意到

    E∇Fwk,1miSk't=1τFiwi,t1k

    =τE∇Fwk,E1miSk'Fiwik

    =τE∇Fwk2.  (42)

    Second, we have

    其次,我们有

    E∇Fwk,A'=E∇Fwk,EA'

    τ4E∇Fwk2+τσφD2,  (43)

    where the last inequality follows from

    最后一个不等式由……得出

    Assumption 5. Last, we use the fact that

    假设5。最后,我们利用以下事实

    E∇Fwk,B'+C'≤τ4E∇Fwk2

    +2EB'2+2EC'2 .

    (44)

    Plugging Equations (68) to (70) in Equation (26) implies

    将方程(68)至(70)代入方程(26)可得

    E∇Fwk,1miSkt=1τFiwi,t1k

    τ2Fwk2τσφD22τσB22pL2β2ττ1σw2.  (45)

    Due to randomly selecting nodes to participate in aggregation, we have ESkFwk=Fwk

     . Substituting Equation (27) and Equation (25) in Equation (22) implies

    由于随机选择节点参与聚合,我们有 ESkFwk=Fwk

     。将方程(27)和方程(25)代入方程(22)可得

    ESk'Fwk+1Fwkτβ122Fwk2

    +τβ2LβσϕD2+σφD2+2τβσB22Lβ+1

    +2ττ1L2β3σw2+1

    Fwk3τβ10Fwk2

    +τβ2LβσϕD2+σφD2+2p2Lβ+1σB2

    +τβp2τ1L2β2+1 .   (46)

    The last inequality is obtained using β1/10τL

     , which gives us the desired result.

    最后一个不等式是利用 β1/10τL

     得到的,这就得到了我们想要的结果。

    Theorem     V.2 differs from Theorem     V.1 in that it concludes that the upper bound on convergence decreases each round with the addition of client selection strategies. This bound is now limited to the maximum probability of being selected, p

     . In general, p1/m<1

     , which inevitably results in a lower upper bound on convergence, thus speeding up the convergence of the model during training. Let σφD2=σϕD2=0

     , and τ=1

     . In this case, the second term on the right-hand side in Equation (32) reduces to Opβ2LB2

     , which is similar to complexity in Theorem     V.1, but with a new constraint term.

    定理V.2与定理V.1的不同之处在于,它得出的结论是,随着客户端选择策略的加入,每一轮的收敛上限都会降低。这个上限现在被限制为被选中的最大概率 p

     。一般来说, p1/m<1

     ,这不可避免地会导致收敛上限更低,从而加快训练期间模型的收敛速度。设 σφD2=σϕD2=0

     ,且 τ=1

     。在这种情况下,方程(32)右侧的第二项简化为 Opβ2LB2

     ,这与定理V.1中的复杂度类似,但有一个新的约束项。

    VI. EVALUATION

    六、评估

    A. Experimental Settings and Datasets

    A. 实验设置与数据集

    1) Environmental Setup: In this study, both our global and local models utilize three different deep learning neural networks for three different datasets. 1

     Note that the model structure was adjusted to fit the various datasets while maintaining uniformity across different comparison methods. The model and the PFL framework were implemented using PyTorch 1.13.1. Experiments were conducted on the Ubuntu 20.04.3 LTS platform with an NVIDIA GeForce RTX 2080TI GPU.

    1) 环境设置:在本研究中,我们的全局模型和局部模型均针对三个不同的数据集采用了三种不同的深度学习神经网络。 1

     请注意,在调整模型结构以适应各种数据集的同时,要确保不同比较方法之间的一致性。模型和PFL框架使用PyTorch 1.13.1实现。实验在搭载NVIDIA GeForce RTX 2080TI GPU的Ubuntu 20.04.3 LTS平台上进行。

    2) Dataset Description and Preprocessing: In this paper, we conducted experiments using three widely-used IoT traffic intrusion detection datasets: the UNSW-NB15 dataset [41], the CIC-IDS2017 dataset [42], and the BoT-IoT dataset [43]. The UNSW-NB15 dataset was generated by the UNSW Cyber Range Laboratory, combining real-world benign traffic with modern, synthetic attack behaviors. The CIC-IDS2017 dataset includes both benign and various common attack types, providing raw traffic data as well as network traffic analysis results, which include timestamps, source and destination IPs, ports, protocols, and attack labels. The BoT-IoT dataset contains over 72 million records, covering a wide range of attacks including DDoS, DoS, OS scanning, service scanning, keystroke logging, and data exfiltration. These datasets have been widely used to evaluate intrusion detection systems, and their generation methods are well-documented in the respective publications, ensuring a diverse and representative set of attack scenarios and network conditions. Our preprocessing of this three dataset involves three steps: (1) conducting dimensionality reduction (PCA) on the three-column feature variables of character types and applying one-hot encoding to ensure uniqueness; (2) segregating the merged character variables and other features from the label class; (3) filling in missing data and performing reselection and feature normalization operations according to the Non-IID settings.

    2) 数据集描述与预处理:在本文中,我们使用三个广泛使用的物联网流量入侵检测数据集进行了实验:UNSW – NB15数据集 [41]、CIC – IDS2017数据集 [42] 和BoT – IoT数据集 [43]。UNSW – NB15数据集由新南威尔士大学网络靶场实验室(UNSW Cyber Range Laboratory)生成,它将现实世界的良性流量与现代合成攻击行为相结合。CIC – IDS2017数据集包含良性流量和各种常见攻击类型,提供原始流量数据以及网络流量分析结果,其中包括时间戳、源IP和目的IP、端口、协议和攻击标签。BoT – IoT数据集包含超过7200万条记录,涵盖了广泛的攻击类型,包括分布式拒绝服务(DDoS)、拒绝服务(DoS)、操作系统扫描、服务扫描、击键记录和数据泄露。这些数据集已被广泛用于评估入侵检测系统,并且它们的生成方法在各自的出版物中有详细记录,确保了攻击场景和网络条件的多样性和代表性。我们对这三个数据集的预处理包括三个步骤:(1)对字符类型的三列特征变量进行降维(主成分分析,PCA),并应用独热编码以确保唯一性;(2)将合并后的字符变量和其他特征与标签类别分离;(3)根据非独立同分布(Non – IID)设置填充缺失数据,并进行重新选择和特征归一化操作。

    3) Baselines: We selected four state-of-the-art federated intrusion detection models: Deepfed [16], FSLAD [21], Fed-ANIDS [44], and SSFL [45] as baselines. These methods showcase significant advancements in federated learning for intrusion detection over the past four years. Our method, PerFLID, is designed to adapt to local data distributions. Additionally, unlike these models, our approach introduces a novel node selection mechanism to accelerate federated learning.

    3) 基线模型:我们选择了四个最先进的联邦入侵检测模型作为基线,分别是Deepfed [16]、FSLAD [21]、Fed – ANIDS [44]和SSFL [45]。这些方法展示了过去四年中联邦学习在入侵检测领域的重大进展。我们的方法PerFLID旨在适应本地数据分布。此外,与这些模型不同的是,我们的方法引入了一种新颖的节点选择机制来加速联邦学习。

    4) Dataset Partitioning: We utilize the Non-IID data partitioning strategy mentioned in Section III-B to segment the three datasets and create local traffic datasets for each node. The specific process is as follows: we allocate 80%

     of the dataset for training nodes, reserving the remaining 20%

     for testing nodes. Regardless of being a training or testing node, the dataset adheres to imbalances in attack type distribution, data characteristics, and quantity, aligning with the heterogeneity of real IoT data. Using the UNSW-NB15 dataset as an example, this dataset includes nine types of attacks, forming nine labels. We divide these labels among a specified number of nodes based on a Dirichlet distribution, ensuring that each label’s probability for a node is a sample from the Dirichlet distribution. Then, we calculate the number of attack types each node contains based on this probability (excluding attack types with fewer than 5 samples). Finally, we ensure each node has at least one attack type but no more than five. When the number of training nodes =100

     , at a certain node, the major attack types (including normal traffic) and their sample counts are as follows: Normal 637 (63.89%), Backdoor 16 (1.61%), DoS 163 (16.35%), Exploits 86 (8.63%), Reconnaissance 88 (8.83%), Worms 7 (0.70%). For each round of experiments, training and test data are randomly selected based on the aforementioned ratio, with the experiment repeated ten times, and the average value serving as the experimental result.

    4) 数据集划分:我们采用第三节B部分提到的非独立同分布(Non-IID)数据划分策略对三个数据集进行分割,为每个节点创建本地流量数据集。具体过程如下:我们将数据集的 80%

     分配给训练节点,其余的 20%

     留给测试节点。无论节点是用于训练还是测试,数据集在攻击类型分布、数据特征和数量上都存在不平衡,这与真实物联网数据的异构性相符。以UNSW – NB15数据集(新南威尔士大学网络入侵检测数据集)为例,该数据集包含九种攻击类型,形成九个标签。我们根据狄利克雷分布(Dirichlet distribution)将这些标签分配到指定数量的节点上,确保每个节点获得每个标签的概率是从狄利克雷分布中抽取的样本。然后,我们根据这个概率计算每个节点包含的攻击类型数量(排除样本数少于5的攻击类型)。最后,我们确保每个节点至少有一种攻击类型,但不超过五种。当训练节点数量为 =100

     时,在某个节点上,主要攻击类型(包括正常流量)及其样本数量如下:正常流量637个(63.89%)、后门攻击16个(1.61%)、拒绝服务攻击(DoS)163个(16.35%)、漏洞利用攻击86个(8.63%)、侦察攻击88个(8.83%)、蠕虫攻击7个(0.70%)。对于每一轮实验,根据上述比例随机选择训练数据和测试数据,实验重复十次,取平均值作为实验结果。

    5) Evaluation Metrics: Given that each edge node in our PerFLID model may encounter n

     (where n3

     ) unknown types of attacks, we evaluate the performance of our model using accuracy, false alarm rate, and macro-F1 score as evaluation metrics. The calculation of these metrics is as follows:

    5) 评估指标:鉴于我们的PerFLID模型中的每个边缘节点可能会遇到 n

     (其中 n3

     )种未知类型的攻击,我们使用准确率、误报率和宏F1分数作为评估指标来评估我们模型的性能。这些指标的计算方法如下:

    Accuracy rate: the proportion of correctly classified traffic samples (including normal traffic and abnormal traffic) to the total samples.

    准确率:正确分类的流量样本(包括正常流量和异常流量)占总样本的比例。

    Accuracy =TP+TNTP+TN+FP+FN=1NiNTPi+TNiDi.  (47)

    Recall: the true positive rate, represents the proportion of normal traffic samples incorrectly identified as abnormal among all samples.

    召回率:真正率,代表所有样本中被错误识别为异常的正常流量样本的比例。

    Recall =TPTP+FN.  (48)

    Macro F1-score: Considering the large difference in the number of attack samples for each type of traffic at edge nodes, calculate the Macro F1-score. The calculation formula for the Macro F1-score is:

    宏F1分数:考虑到边缘节点每种流量的攻击样本数量差异较大,计算宏F1分数。宏F1分数的计算公式为:

    Macro F1-score =1Ki=1K2⋅ Precision i⋅ Recall i Precision i+ Recall i.  (49)

    where K

     is the total number of classes, The Macro F1-score measures the performance of a multi-class classification model by averaging the F1-scores of each class.

    其中 K

     是类别总数。宏F1分数通过对每个类别的F1分数求平均值来衡量多类别分类模型的性能。

    B.The Performance of Personalization in PerFLID

    B. PerFLID中的个性化性能

    Capability testing of different federated learning distribution parameter models involves allocating training data based on the three imbalances discussed in Section III-B. The allocation method utilizes the label Dirichlet selection technique, where traffic data labels are distributed according to the parameter α

     , controlling the level of data heterogeneity. For all three datasets, we set the same parameters α

     to 0.1,0.5,1.0,5,10

     . Similarly, the test node’s data follows the distribution with the same parameters. By training our model under different parameters, we aim to evaluate the model’s accuracy when tested on various edge nodes. The results are illustrated in Figure 3 and according to Figure 3, we evaluate the personalization performance from accuracy:

    不同联邦学习分布参数模型的性能测试涉及根据第三节B中讨论的三种不平衡情况分配训练数据。分配方法采用标签狄利克雷选择技术,其中流量数据标签根据参数 α

     进行分布,以控制数据异质性的程度。对于所有三个数据集,我们将相同的参数 α

     设置为 0.1,0.5,1.0,5,10

     。同样,测试节点的数据也遵循具有相同参数的分布。通过在不同参数下训练我们的模型,我们旨在评估模型在各种边缘节点上进行测试时的准确性。结果如图3所示,根据图3,我们从准确性方面评估个性化性能:

    1) Accuracy: Personalized traffic intrusion detection models are expected to demonstrate high accuracy in local detection tasks [19]. We measure the degree of personalization by the average improvement in the classification accuracy of unknown types of traffic of the edge node’s personalized model before and after training. According to Figure 3a, where α=0.1

     represents the highest data heterogeneity, PerFLID achieved the most significant personalization improvement of 34.5%

     , compared to 30.1%

     with Fed-ANIDS and 18.7% with Deepfed models. Table I presents specific experimental results, comparing precision, recall, and macro-F1 scores obtained by PerFLID, FSLAD, Fed-ANIDS, SSFL, and Deepfed models when federated training to meet various personalized needs. Notably, at α=0.1

     , our method achieves precision, recall, and macro-F1 scores of 79.6%,74.5%

     , and 75.5%, respectively, on the UNSW-NB15 dataset, 82.5%, 77.4%, and 78.7% on the CIC-IDS2017 dataset, and 90.5% 89.9% 90.3% on the Bot-IoT dataset.

    1) 准确性:个性化流量入侵检测模型有望在本地检测任务中展现出较高的准确性 [19]。我们通过边缘节点的个性化模型在训练前后对未知类型流量分类准确率的平均提升来衡量个性化程度。根据图 3a,其中 α=0.1

     表示最高的数据异质性,与联邦自适应网络入侵检测系统(Fed – ANIDS)的 30.1%

     和深度联邦(Deepfed)模型的 18.7% 相比,个性化联邦学习入侵检测(PerFLID)实现了最为显著的个性化提升,达到 34.5%

     。表 I 展示了具体的实验结果,比较了在联邦训练中,个性化联邦学习入侵检测(PerFLID)、基于元学习的联邦学习异常检测(FSLAD)、联邦自适应网络入侵检测系统(Fed – ANIDS)、半监督联邦学习(SSFL)和深度联邦(Deepfed)模型为满足各种个性化需求所获得的精确率、召回率和宏 F1 分数。值得注意的是,在 α=0.1

     时,我们的方法在新南威尔士大学网络安全数据集(UNSW – NB15)上的精确率、召回率和宏 F1 分数分别达到 79.6%,74.5%

     和 75.5%,在加拿大网络安全研究所入侵检测系统数据集(CIC – IDS2017)上分别为 82.5%、77.4% 和 78.7%,在物联网僵尸网络数据集(Bot – IoT)上分别为 90.5%、89.9% 和 90.3%。

    image

        

    Fig. 3. Comparison results of the model accuracy in detecting unknown attacks with varied data distribution variance parameters α

     on UNSW-NB15, CIC-IDS2017, BoT-IoT datasets under localized personalized updates. (Since multiple nodes were tested, the shaded portions of the graph represent confidence intervals for the accuracy, which is the same in Figure 4).

    图3. 在局部个性化更新下,模型在UNSW – NB15、CIC – IDS2017和BoT – IoT数据集上检测未知攻击时,准确率随数据分布方差参数 α

     变化的比较结果。(由于测试了多个节点,图中的阴影部分表示准确率的置信区间,图4同理)。

    Several observations can be made from Figure 3: First, for low personalization needs α=10

     , all federated learning models achieved 90%

     classification accuracy, as shown in Figure 3d. However, only PerFLID performed well under strong personalization requirements α=0.1

     , achieving 79.6% accuracy in classifying unknown attacks using the CNN network. Second, the accuracy before personalized training on the new test node ranges from 45% to 55%. This discrepancy is due to the suboptimal use of the global model as the initial model for the edge node, which is addressed through personalized training. As the demand for personalization decreases, the global model approaches the optimal solution, leading to an increase in test accuracy after personalized training, from 79.6%

     to 90.3%

     . Finally, the experiment results in Table I demonstrate that the CNN, LSTM, and CNN-BiLSTM networks achieve local personalization improvements in edge devices. Although the LSTM network exhibits higher test accuracy than CNN (1.4%-2.9% improvement), the relative performance of the same network remains consistent.

    从图3可以得出以下几点观察结果:首先,对于低个性化需求 α=10

     ,所有联邦学习模型都达到了 90%

     的分类准确率,如图3d所示。然而,只有PerFLID(个性化联邦学习身份识别)在强个性化要求 α=0.1

     下表现良好,使用卷积神经网络(CNN)对未知攻击进行分类时准确率达到了79.6%。其次,新测试节点在个性化训练前的准确率在45%至55%之间。这种差异是由于将全局模型作为边缘节点的初始模型使用效果不佳造成的,而个性化训练可以解决这一问题。随着个性化需求的降低,全局模型接近最优解,从而使个性化训练后的测试准确率从 79.6%

     提高到 90.3%

     。最后,表I中的实验结果表明,卷积神经网络(CNN)、长短期记忆网络(LSTM)和卷积神经网络 – 双向长短期记忆网络(CNN – BiLSTM)在边缘设备上实现了局部个性化改进。尽管长短期记忆网络(LSTM)的测试准确率比卷积神经网络(CNN)高(提高了1.4% – 2.9%),但同一网络的相对性能保持一致。

    C.The Performance of Global Model Test

    C. 全局模型测试的性能

    After investigating PerFLID’s personalization following several rounds of local learning, we conducted additional experiments to determine whether PerFLID’s global model can handle complex global scenarios during the training phase. Using the same three datasets, we tested the accuracy of the global model and compared it with the latest meta-learning methods [46], the results are shown in Table II.

    在对PerFLID经过几轮本地学习后的个性化进行研究之后,我们进行了额外的实验,以确定PerFLID的全局模型在训练阶段是否能够处理复杂的全局场景。我们使用相同的三个数据集,测试了全局模型的准确性,并将其与最新的元学习方法[46]进行了比较,结果如表二所示。

    D.The Verification of Node Selection Strategies

    D.节点选择策略的验证

    This experiment is used to evaluate the effectiveness of the key component of PerFLID,     i.e., to demonstrate that a node selection strategy based on cluster selection accelerates the convergence of the model. We trained using a dataset with 120 nodes, with a data distribution parameter of α=0.1

     and other parameters identical to those in Section VI-A. To validate the generality of our node selection strategy, we compare three approaches to achieve intrusion detection: federated meta-learning [23], federated learning [45], and PerFLID. The experiments aim to evaluate the local adaptation and runtime of the trained models. The results of the experiments are shown in Table III, where the average running time of the algorithms intuitively shows that our approach runs faster while achieving similar detection accuracy. Thus, our model exhibits faster convergence compared to the other two methods, requiring only 4-10 epochs of personalized training to converge for each node with different data distributions. This observation aligns with our theoretical analysis, indicating efficient convergence without additional overhead. The last two columns of Table III present the experimental results of applying our node selection strategy to PFTD based on federated meta-learning and SSFL based on federated learning. It is evident that our node selection strategy significantly accelerates the training speed of each model, while also enhancing global accuracy and improving adaptation to local data.

    本实验用于评估PerFLID关键组件的有效性,即证明基于聚类选择的节点选择策略可加速模型收敛。我们使用包含120个节点的数据集进行训练,数据分布参数为 α=0.1

     ,其他参数与第六节A部分相同。为验证我们节点选择策略的通用性,我们比较了三种实现入侵检测的方法:联邦元学习[23]、联邦学习[45]和PerFLID。实验旨在评估训练模型的局部适应性和运行时间。实验结果如表III所示,算法的平均运行时间直观地表明,我们的方法在达到相似检测精度的同时运行速度更快。因此,与其他两种方法相比,我们的模型收敛速度更快,对于具有不同数据分布的每个节点,仅需4 – 10个轮次的个性化训练即可收敛。这一观察结果与我们的理论分析一致,表明在没有额外开销的情况下实现了高效收敛。表III的最后两列展示了将我们的节点选择策略应用于基于联邦元学习的PFTD(可能是某种基于联邦元学习的入侵检测方法,具体需结合上下文确定)和基于联邦学习的SSFL(可能是某种基于联邦学习的入侵检测方法,具体需结合上下文确定)的实验结果。显然,我们的节点选择策略显著加快了每个模型的训练速度,同时还提高了全局精度并改善了对本地数据的适应性。

    TABLE I

    EXPERIMENTAL RESULTS AND COMPARATIVE EXPERIMENTAL RESULTS

    实验结果与对比实验结果

    TABLE II

    COMPARISON ACCURACY RESULTS OF GLOBAL MODEL ON THREE DATASETS

    全局模型在三个数据集上的比较准确性结果

    E.The Additional Participant Scenario Test

    Since different federated learning parameters affect the model training effect, for the presence of additional nodes involved in training, we compared the impact of different numbers of edge nodes on the model training accuracy when the node data heterogeneity parameter α=0.5

     . The experimental results are shown in the Figure 4. We also increased the learning rate and conducted tests after enough rounds of personalized training. Two conclusions can be drawn:

    由于不同的联邦学习参数会影响模型训练效果,对于有额外节点参与训练的情况,我们比较了在节点数据异质性参数为 α=0.5

     时,不同数量的边缘节点对模型训练准确率的影响。实验结果如图4所示。我们还提高了学习率,并在进行足够轮次的个性化训练后进行了测试。可以得出两个结论:

    image

        

    Fig. 4. The impact of additional participant nodes C

     in federated training test on UNSW-NB15 dataset.

    图4. 联邦训练测试中额外参与节点 C

     对UNSW – NB15数据集的影响。

    1) Impact of Additional Participant Nodes: The average accuracy of the personalized model increases with the number of edge nodes. This is because with more edge nodes, the global model can access a greater number and variety of sample features. However, this improvement is not infinite and may even become counterproductive. When the total number of nodes reaches 500, the accuracy of our model plateaus, while the accuracy of PFTD drops by 5.42%

     . We believe that when the edge nodes cover a sufficient number of samples of all types, the model accuracy reaches its peak. Further increasing the number of edge nodes only increases storage overhead and model aggregation delay at the central server. Moreover, without a node selection strategy, increasing the number of edge nodes may result in certain types of samples not being selected for aggregation, thereby reducing model accuracy.

    1) 额外参与节点的影响:个性化模型的平均准确率随边缘节点数量的增加而提高。这是因为边缘节点越多,全局模型就能获取更多数量和种类的样本特征。然而,这种提升并非无限的,甚至可能适得其反。当节点总数达到500个时,我们模型的准确率趋于平稳,而PFTD(原词)的准确率下降了 5.42%

     。我们认为,当边缘节点涵盖了所有类型的足够数量的样本时,模型准确率达到峰值。进一步增加边缘节点数量只会增加中央服务器的存储开销和模型聚合延迟。此外,若没有节点选择策略,增加边缘节点数量可能导致某些类型的样本未被选中进行聚合,从而降低模型准确率。

    TABLE III

    ACCURACY FOR THE EXPERIMENTS OF THE GLOBAL AND LOCAL SCENARIO

    全局和局部场景实验的准确率

    2) Local Personalization: For the globally learned model, conducting multiple epochs (more than 20) of local personalization is meaningless. Our proposed model quickly converges locally during personalized training on edge devices. Additional rounds of learning do not significantly improve model accuracy and may lead to overfitting, consuming local device resources unnecessarily.

    2) 本地个性化:对于全局学习的模型而言,进行多轮次(超过20轮)的本地个性化训练并无意义。我们提出的模型在边缘设备上进行个性化训练时能快速实现本地收敛。额外的学习轮次并不会显著提高模型的准确率,反而可能导致过拟合,不必要地消耗本地设备资源。

    VII. CONCLUSION AND FUTURE WORKS

    七、结论与未来工作

    A novel personalized federated learning framework, Per-FLID, was proposed for detecting traffic intrusion in IoT nodes. Using meta-learning, the central model learns data characteristics from edge devices to create adaptive models, enabling personalized intrusion detection against unknown traffic attacks. To accelerate deployment, we designed a node selection strategy based on local gradient feature clustering, theoretically proving it speeds up convergence and improves efficiency. We demonstrated the framework’s convergence upper bound and conducted extensive simulations with various neural network models on three NIDS datasets. Results show the personalized model has local adaptability and faster training convergence. Experiments confirm that our personalized federated learning intrusion detection framework achieves state-of-the-art performance in federated machine learning scenarios, meets the personalized needs of edge devices, and maintains optimal local attack detection in distributed environments.

    本文提出了一种新颖的个性化联邦学习框架PerFLID,用于检测物联网节点中的流量入侵。通过元学习,中心模型从边缘设备学习数据特征以创建自适应模型,从而实现针对未知流量攻击的个性化入侵检测。为了加速部署,我们设计了一种基于局部梯度特征聚类的节点选择策略,并从理论上证明了该策略可加快收敛速度并提高效率。我们推导了该框架的收敛上界,并在三个网络入侵检测数据集上使用各种神经网络模型进行了广泛的仿真实验。结果表明,个性化模型具有本地适应性且训练收敛速度更快。实验证实,我们的个性化联邦学习入侵检测框架在联邦机器学习场景中达到了当前的最优性能,满足了边缘设备的个性化需求,并在分布式环境中保持了最佳的本地攻击检测能力。[wc4] 

    Currently, PerFLID primarily addresses conventional IoT threats, but detecting advanced persistent threats (APTs) in IoT environments remains a challenge due to their time-sensitive and covert nature. Future research will focus on enhancing our federated learning framework to address APT characteristics by developing new detection methods, incorporating time series analysis techniques, and improving real-time detection capabilities to ensure timely responses.

    目前,PerFLID主要应对传统的物联网威胁,但由于高级持续性威胁具有时间敏感性和隐蔽性,在物联网环境中检测这些威胁仍然是一项挑战。未来的研究将专注于增强我们的联邦学习框架,通过开发新的检测方法、结合时间序列分析技术以及提高实时检测能力来应对高级持续性威胁的特征,以确保能够及时做出响应。

    REFERENCES

    参考文献

    [1]     B.     L.     R. Stojkoska and     K.     V. Trivodaliev, "A review of Internet of Things for smart home: Challenges and solutions,"     J. Cleaner Prod., vol. 140, pp. 1454-1464, Jan. 2017.

        B.     L.     R. 斯托约斯科斯卡(B.     L.     R. Stojkoska)和     K.     V. 特里沃达利耶夫(K.     V. Trivodaliev),“智能家居物联网综述:挑战与解决方案”,《清洁生产杂志》(J. Cleaner Prod.),第 140 卷,第 1454 – 1464 页,2017 年 1 月。

    [2]     H. Boyes,     B. Hallaq,     J. Cunningham, and     T. Watson, "The industrial Internet of Things (IIoT): An analysis framework," Comput. Ind., vol. 101, pp. 1-12, Oct. 2018.

        H. 博伊斯(H. Boyes)、B. 哈拉格(B. Hallaq)、J. 坎宁安(J. Cunningham)和     T. 沃森(T. Watson),“工业物联网(Industrial Internet of Things,IIoT):一个分析框架”,《计算机与工业》(Comput. Ind.),第 101 卷,第 1 – 12 页,2018 年 10 月。

    [3]     M.     M. Islam,     A. Rahaman, and     M.     R. Islam, "Development of smart healthcare monitoring system in IoT environment," Social Netw. Com-put. Sci., vol. 1, no. 3, pp. 1-11, May 2020.

        M.     M. 伊斯兰(M.     M. Islam)、A. 拉赫曼(A. Rahaman)和     M.     R. 伊斯兰(M.     R. Islam),“物联网环境下智能医疗监测系统的开发”,《社交网络与计算机科学》(Social Netw. Com-put. Sci.),第 1 卷,第 3 期,第 1 – 11 页,2020 年 5 月。

    [4]     M. Humayun,     N. Jhanjhi,     B. Hamid, and     G. Ahmed, "Emerging smart logistics and transportation using IoT and blockchain," IEEE Internet Things Mag., vol. 3, no. 2, pp. 58-62, Jun. 2020.

        M. 胡马云(M. Humayun)、N. 詹吉(N. Jhanjhi)、B. 哈米德(B. Hamid)和     G. 艾哈迈德(G. Ahmed),“利用物联网和区块链的新兴智能物流与运输”,《IEEE 物联网杂志》(IEEE Internet Things Mag.),第 3 卷,第 2 期,第 58 – 62 页,2020 年 6 月。

    [5]     M. Antonakakis et al., "Understanding the Mirai botnet," in Proc. 26th USENIX Secur. Symp., 2017, pp. 1093-1110.

        M. 安东纳卡基斯(M. Antonakakis)等人,“理解米拉伊僵尸网络”,收录于《第 26 届 USENIX 安全研讨会论文集》(Proc. 26th USENIX Secur. Symp.),2017 年,第 1093 – 1110 页。

    [6]     B. Seri and     A. Livne. (2019). Exploiting Blueborne in Linux-Based Iot Devices. [Online]. Available: https://media.armis.com/pdfs/ wp-exploiting-blueborne-in-linuxbased-iot-devices-en.pdf

        B. 塞里(B. Seri)和     A. 利夫内(A. Livne)。(2019 年)。利用基于 Linux 的物联网设备中的蓝牙漏洞(Exploiting Blueborne in Linux-Based Iot Devices)。[在线]。可访问:https://media.armis.com/pdfs/ wp-exploiting-blueborne-in-linuxbased-iot-devices-en.pdf

    [7]     N. Moustafa,     B. Turnbull, and     K.-K.     R. Choo, "An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of Internet of Things," IEEE Internet Things     J., vol. 6, no. 3, pp. 4815-4830, Jun. 2018.

        N. 穆斯塔法(N. Moustafa)、B. 特恩布尔(B. Turnbull)和     K.-K.     R. 朱(K.-K.     R. Choo),“基于所提出的统计流特征的集成入侵检测技术,用于保护物联网网络流量”,《IEEE物联网学报》(IEEE Internet Things     J.),第6卷,第3期,第4815 – 4830页,2018年6月。

    [8]     M.     F. Umer,     M. Sher, and     Y. Bi, "Flow-based intrusion detection: Techniques and challenges," Comput. Secur., vol. 70, pp. 238-254, Sep. 2017.

        M.     F. 乌默尔(M.     F. Umer)、M. 舍尔(M. Sher)和     Y. 毕(Y. Bi),“基于流的入侵检测:技术与挑战”,《计算机安全》(Comput. Secur.),第70卷,第238 – 254页,2017年9月。

    [9]     E. Aydin and     S. Bahtiyar, "OCIDS: An online CNN-based network intrusion detection system for DDoS attacks with IoT botnets," in Proc. 14th Int. Conf. Secur. Inf. Netw. (SIN), Dec. 2021, pp. 1-8.

        E. 艾丁(E. Aydin)和     S. 巴赫蒂亚尔(S. Bahtiyar),“OCIDS:一种基于在线卷积神经网络的网络入侵检测系统,用于检测物联网僵尸网络的分布式拒绝服务攻击”,收录于《第14届国际安全信息网络会议论文集》(Proc. 14th Int. Conf. Secur. Inf. Netw. (SIN)),2021年12月,第1 – 8页。

    [10]     S. Seth,     G. Singh, and     K. Kaur Chahal, "A novel time efficient learning-based approach for smart intrusion detection system,"     J. Big Data, vol. 8, no. 1,     p. 111, Dec. 2021.

        S. 塞思(S. Seth)、G. 辛格(G. Singh)和     K. 考尔·查哈尔(K. Kaur Chahal),“一种基于学习的高效智能入侵检测系统新方法”,《大数据杂志》(J. Big Data),第 8 卷,第 1 期,第 111 页,2021 年 12 月。

    [11]     C. Zhang,     D. Jia,     L. Wang,     W. Wang,     F. Liu, and     A. Yang, "Comparative research on network intrusion detection methods based on machine learning," Comput. Secur., vol. 121, Oct. 2022, Art. no. 102861.

        C. 张(C. Zhang)、D. 贾(D. Jia)、L. 王(L. Wang)、W. 王(W. Wang)、F. 刘(F. Liu)和     A. 杨(A. Yang),“基于机器学习的网络入侵检测方法比较研究”,《计算机安全》(Comput. Secur.),第 121 卷,2022 年 10 月,文章编号 102861。

    [12]     V. Mothukuri,     P. Khare,     R.     M. Parizi,     S. Pouriyeh,     A. Dehghantanha, and     G. Srivastava, "Federated-learning-based anomaly detection for IoT security attacks," IEEE Internet Things     J., vol. 9, no. 4, pp. 2545-2554, Feb. 2021.

        V. 莫图库里(V. Mothukuri)、P. 哈雷(P. Khare)、R.     M. 帕里齐(R.     M. Parizi)、S. 波里耶(S. Pouriyeh)、A. 德赫甘塔纳哈(A. Dehghantanha)和     G. 斯里瓦斯塔瓦(G. Srivastava),“基于联邦学习的物联网安全攻击异常检测”,《IEEE 物联网学报》(IEEE Internet Things     J.),第 9 卷,第 4 期,第 2545 – 2554 页,2021 年 2 月。

    [13]     B. McMahan,     E. Moore,     D. Ramage,     S. Hampson, and     B.     A.     Y. Arcas, "Communication-efficient learning of deep networks from decentralized data," Proc. Artif. Intell. Statist., vol. 3, pp. 1273-1282, May 2017.

        B. 麦克马汉(B. McMahan)、E. 摩尔(E. Moore)、D. 拉梅奇(D. Ramage)、S. 汉普森(S. Hampson)和     B.     A.     Y. 阿尔卡斯(B.     A.     Y. Arcas),“基于分散数据的深度网络通信高效学习”,《人工智能统计学会议论文集》(Proc. Artif. Intell. Statist.),第 3 卷,第 1273 – 1282 页,2017 年 5 月。

    [14]     N. Chaabouni,     M. Mosbah,     A. Zemmari,     C. Sauvignac, and     P. Faruki, "Network intrusion detection for IoT security based on learning techniques," IEEE Commun. Surveys Tuts., vol. 21, no. 3, pp. 2671-2701, 3rd Quart., 2019.

        N. 查布尼(N. Chaabouni)、M. 莫斯巴赫(M. Mosbah)、A. 泽马里(A. Zemmari)、C. 索维尼阿克(C. Sauvignac)和     P. 法鲁基(P. Faruki),“基于学习技术的物联网安全网络入侵检测”,《IEEE通信综述与教程》(IEEE Commun. Surveys Tuts.),第 21 卷,第 3 期,第 2671 – 2701 页,2019 年第 3 季度。

    [15]     T.     D. Nguyen,     S. Marchal,     M. Miettinen,     H. Fereidooni,     N. Asokan, and     A.-R. Sadeghi, "DIoT: A federated self-learning anomaly detection system for IoT," in Proc. IEEE 39th Int. Conf. Distrib. Comput. Syst. (ICDCS), May 2019, pp. 756-767.

        T.     D. 阮(T.     D. Nguyen)、S. 马尔沙尔(S. Marchal)、M. 米耶蒂宁(M. Miettinen)、H. 费雷多尼(H. Fereidooni)、N. 阿索坎(N. Asokan)和     A.-R. 萨德吉(A.-R. Sadeghi),“DIoT:一种用于物联网的联邦自学习异常检测系统”,收录于《第 39 届 IEEE 分布式计算系统国际会议论文集》(Proc. IEEE 39th Int. Conf. Distrib. Comput. Syst. (ICDCS)),2019 年 5 月,第 756 – 767 页。

    [16]     B. Li,     Y. Wu,     J. Song,     R. Lu,     T. Li, and     L. Zhao, "DeepFed: Federated deep learning for intrusion detection in industrial Cyber-Physical systems," IEEE Trans. Ind. Informat., vol. 17, no. 8, pp. 5615-5624, Aug. 2021.

    李(Li)、吴(Wu)、宋(Song)、陆(Lu)、李(Li)和赵(Zhao),“DeepFed:工业信息物理系统中用于入侵检测的联邦深度学习”,《IEEE工业信息学汇刊》,第17卷,第8期,第5615 – 5624页,2021年8月。

    [17]     S. Chatterjee and     M.     K. Hanawal, "Federated learning for intrusion detection in IoT security: A hybrid ensemble approach," Int.     J. Internet Things Cyber-Assurance, vol. 1, no. 1,     p. 1, 2023.

    查特吉(Chatterjee)和哈纳瓦尔(Hanawal),“物联网安全中用于入侵检测的联邦学习:一种混合集成方法”,《国际物联网网络安全保障杂志》,第1卷,第1期,第1页,2023年。

    [18]     H. Liu et al., "Blockchain and federated learning for collaborative intrusion detection in vehicular edge computing," IEEE Trans. Veh. Technol., vol. 70, no. 6, pp. 6073-6084, Jun. 2021.

    刘(Liu)等人,“区块链和联邦学习在车联网边缘计算协同入侵检测中的应用”,《IEEE车辆技术汇刊》,第70卷,第6期,第6073 – 6084页,2021年6月。

    [19]     A.     Z. Tan,     H. Yu,     L. Cui, and     Q. Yang, "Towards personalized federated learning," IEEE Trans. Neural Netw. Learn. Syst., vol. 3, no. 1, pp. 1-17, Apr. 2022.

    谭安志(A.     Z. Tan)、于浩(H. Yu)、崔磊(L. Cui)和杨强(Q. Yang),“迈向个性化联邦学习”,《电气与电子工程师协会神经网络与学习系统汇刊》(IEEE Trans. Neural Netw. Learn. Syst.),第3卷,第1期,第1 – 17页,2022年4月。

    [20]     V. Kulkarni,     M. Kulkarni, and     A. Pant, "Survey of personalization techniques for federated learning," in Proc. 4th World Conf. Smart Trends Syst., Secur. Sustain. (WorldS4), 2020, pp. 794-797.

    维克拉姆·库尔卡尼(V. Kulkarni)、玛德哈维·库尔卡尼(M. Kulkarni)和阿努拉格·潘特(A. Pant),“联邦学习个性化技术综述”,收录于《第四届世界智能趋势系统、安全与可持续发展会议论文集》(Proc. 4th World Conf. Smart Trends Syst., Secur. Sustain. (WorldS4)),2020年,第794 – 797页。

    [21]     O. Aouedi,     K. Piamrat,     G. Muller, and     K. Singh, "Federated semisupervised learning for attack detection in industrial Internet of Things," IEEE Trans. Ind. Informat., vol. 19, no. 1, pp. 286-295, Jan. 2023.

    奥马尔·奥埃迪(O. Aouedi)、卡尼卡·皮亚姆拉特(K. Piamrat)、格哈德·穆勒(G. Muller)和卡兰·辛格(K. Singh),“工业物联网攻击检测的联邦半监督学习”,《电气与电子工程师协会工业信息汇刊》(IEEE Trans. Ind. Informat.),第19卷,第1期,第286 – 295页,2023年1月。

    [22]     R. Zhao,     Y. Yin,     Y. Shi, and     Z. Xue, "Intelligent intrusion detection based on federated learning aided long short-term memory," Phys. Commun., vol. 42, Oct. 2020, Art. no. 101157.

    赵(Zhao)、尹(Yin)、施(Shi)和薛(Xue),“基于联邦学习辅助长短时记忆网络的智能入侵检测”,《物理学通讯》(Phys. Commun.),第42卷,2020年10月,文章编号101157。

    [23]     Y. Hu,     J. Wu,     G. Li,     J. Li, and     J. Cheng, "Privacy-preserving few-shot traffic detection against advanced persistent threats via federated meta learning," IEEE Trans. Netw. Sci. Eng., vol. 11, no. 3, pp. 2549-2560, May 2024.

    胡(Hu)、吴(Wu)、李(Li)、李(Li)和程(Cheng),“通过联邦元学习实现针对高级持续威胁的隐私保护少样本流量检测”,《电气与电子工程师协会网络科学与工程汇刊》(IEEE Trans. Netw. Sci. Eng.),第11卷,第3期,第2549 – 2560页,2024年5月。

    [24]     H. Ding,     L. Chen,     S. Li,     Y. Bai,     P. Zhou, and     Z. Qu, "Divide, conquer, and coalesce: Meta parallel graph neural network for IoT intrusion detection at scale," in Proc. ACM Web Conf., May 2024, pp. 1656-1667.

    丁(Ding)、陈(Chen)、李(Li)、白(Bai)、周(Zhou)和曲(Qu),“分治与合并:用于大规模物联网入侵检测的元并行图神经网络”,收录于《ACM网络会议论文集》(Proc. ACM Web Conf.),2024年5月,第1656 – 1667页。

    [25]     A. Back de Luca,     G. Zhang,     X. Chen, and     Y. Yu, "Mitigating data heterogeneity in federated learning with data augmentation," 2022, arXiv:2206.09979.

        A. 巴克·德·卢卡(A. Back de Luca)、G. 张(G. Zhang)、X. 陈(X. Chen)和Y. 于(Y. Yu),《通过数据增强缓解联邦学习中的数据异质性》,2022年,预印本编号:arXiv:2206.09979。

    [26]     Y. Fraboni,     R. Vidal,     L. Kameni, and     M. Lorenzi, "Clustered sampling: Low-variance and improved representativity for clients selection in federated learning," in Proc. Int. Conf. Mach. Learn., 2021, pp. 3407-3416.

        Y. 弗拉博尼(Y. Fraboni)、R. 维达尔(R. Vidal)、L. 卡梅尼(L. Kameni)和     M. 洛伦齐(M. Lorenzi),“聚类采样:联邦学习中客户端选择的低方差和改进代表性”,收录于《国际机器学习会议论文集》(Proc. Int. Conf. Mach. Learn.),2021 年,第 3407 – 3416 页。

    [27]     C. Finn,     P. Abbeel, and     S. Levine, "Model-agnostic meta-learning for fast adaptation of deep networks," in Proc. 34th Int. Conf. Mach. Learn., vol. 70, Aug. 2017, pp. 1126-1135.

        C. 芬恩(C. Finn)、P. 阿比尔(P. Abbeel)和     S. 莱文(S. Levine),“用于深度网络快速适应的与模型无关的元学习”,收录于《第 34 届国际机器学习会议论文集》(Proc. 34th Int. Conf. Mach. Learn.),第 70 卷,2017 年 8 月,第 1126 – 1135 页。

    [28]     A. Fallah,     A. Mokhtari, and     A. Ozdaglar, "Personalized federated learning: A meta-learning approach," 2020, arXiv:2002.07948.

        A. 法拉赫(A. Fallah)、A. 莫赫塔里(A. Mokhtari)和     A. 奥兹达格拉尔(A. Ozdaglar),“个性化联邦学习:一种元学习方法”,2020 年,预印本 arXiv:2002.07948。

    [29]     Q. Li,     Y. Diao,     Q. Chen, and     B. He, "Federated learning on non-IID data silos: An experimental study," in Proc. IEEE 38th Int. Conf. Data Eng. (ICDE), May 2022, pp. 965-978.

    李 Q(Q. Li)、刁 Y(Y. Diao)、陈 Q(Q. Chen)和何 B(B. He),“非独立同分布(non – IID)数据孤岛下的联邦学习:一项实验研究”,收录于《第 38 届 IEEE 国际数据工程会议论文集》(Proc. IEEE 38th Int. Conf. Data Eng. (ICDE)),2022 年 5 月,第 965 – 978 页。

    [30]     S. Moon and     W.     H. Lee, "Privacy-preserving federated learning in Healthcare," in Proc. Int. Conf. Electron., Inf., Commun. (ICEIC), 2023, pp. 1-4.

    文·穆恩(S. Moon)和李·W·H(W.     H. Lee),《医疗保健中的隐私保护联邦学习》,收录于《国际电子、信息与通信会议论文集》(Proc. Int. Conf. Electron., Inf., Commun. (ICEIC)),2023年,第1 – 4页。

    [31]     T.-M. Harry Hsu,     H. Qi, and     M. Brown, "Measuring the effects of non-identical data distribution for federated visual classification," 2019, arXiv:1909.06335.

    徐·T – M·哈里(T.-M. Harry Hsu)、齐·H(H. Qi)和布朗·M(M. Brown),《测量非相同数据分布对联邦视觉分类的影响》,2019年,预印本编号:arXiv:1909.06335。

    [32]     X. Gong,     Y. Chen,     Q. Wang, and     W. Kong, "Backdoor attacks and defenses in federated learning: State-of-the-art, taxonomy, and future directions," IEEE Wireless Commun., vol. 30, no. 2, pp. 114-121, Apr. 2023.

    龚·X(X. Gong)、陈·Y(Y. Chen)、王·Q(Q. Wang)和孔·W(W. Kong),《联邦学习中的后门攻击与防御:最新进展、分类和未来方向》,《IEEE无线通信》(IEEE Wireless Commun.),第30卷,第2期,第114 – 121页,2023年4月。

    [33]     S. Huang,     Y. Li,     C. Chen,     L. Shi, and     Y. Gao, "Multi-metrics adaptively identifies backdoors in federated learning," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 4629-4639.

    黄·S(S. Huang)、李·Y(Y. Li)、陈·C(C. Chen)、施·L(L. Shi)和高·Y(Y. Gao),《多指标自适应识别联邦学习中的后门》,收录于《IEEE/CVF国际计算机视觉会议论文集》(Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV)),2023年10月,第4629 – 4639页。

    [34]     Y. Wang,     D. Zhai, and     Y. Xia, "SCFL: Mitigating backdoor attacks in federated learning based on SVD and clustering," Comput. & Secur., vol. 133, Aug. 2023, Art. no. 103414.

    王(Wang)、翟(Zhai)和夏(Xia),“SCFL:基于奇异值分解(SVD)和聚类缓解联邦学习中的后门攻击”,《计算机与安全》(Comput. & Secur.),第133卷,2023年8月,文章编号103414。

    [35]     J. Wolfrath,     N. Sreekumar,     D. Kumar,     Y. Wang, and     A. Chandra, "HACCS: Heterogeneity-aware clustered client selection for accelerated federated learning," in Proc. IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), May 2022, pp. 985-995.

    沃尔弗拉思(Wolfrath)、斯里库马尔(Sreekumar)、库马尔(Kumar)、王(Wang)和钱德拉(Chandra),“HACCS:用于加速联邦学习的异构感知聚类客户端选择”,收录于《电气与电子工程师协会国际并行与分布式处理研讨会论文集》(Proc. IEEE Int. Parallel Distrib. Process. Symp. (IPDPS)),2022年5月,第985 – 995页。

    [36]     K. Muhammad et al., "FedFast: Going beyond average for faster training of federated recommender systems," in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2020, pp. 1234-1242.

    穆罕默德(Muhammad)等人,“FedFast:超越平均以实现联邦推荐系统的更快训练”,收录于《第26届美国计算机协会知识发现与数据挖掘国际会议论文集》(Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min.),2020年,第1234 – 1242页。

    [37]     H.     T. Nguyen,     V. Sehwag,     S. Hosseinalipour,     C.     G. Brinton,     M. Chiang, and     H. Vincent Poor, "Fast-convergent federated learning," IEEE     J. Sel. Areas Commun., vol. 39, no. 1, pp. 201-218, Jan. 2021.

        H.     T. 阮(Nguyen)、V. 塞瓦格(Sehwag)、S. 侯赛因阿里普尔(Hosseinalipour)、C.     G. 布林顿(Brinton)、M. 蒋(Chiang)和     H. 文森特·普尔(Vincent Poor),“快速收敛的联邦学习”,《IEEE 特选通信领域期刊》(IEEE     J. Sel. Areas Commun.),第 39 卷,第 1 期,第 201 – 218 页,2021 年 1 月。

    [38]     T. Li,     A.     K. Sahu,     M. Zaheer,     M. Sanjabi,     A. Talwalkar, and     V. Smith, "Federated optimization in heterogeneous networks," Proc. Mach. Learn. Syst., vol. 2, pp. 429-450, May 2020.

        T. 李(Li)、A.     K. 萨胡(Sahu)、M. 扎希尔(Zaheer)、M. 桑贾比(Sanjabi)、A. 塔尔瓦尔卡(Talwalkar)和     V. 史密斯(Smith),“异构网络中的联邦优化”,《机器学习系统会议论文集》(Proc. Mach. Learn. Syst.),第 2 卷,第 429 – 450 页,2020 年 5 月。

    [39]     Y. Arjevani,     Y. Carmon,     J.     C. Duchi,     D.     J. Foster,     N. Srebro, and     B. Woodworth, "Lower bounds for non-convex stochastic optimization," Math. Program., vol. 199, nos. 1-2, pp. 165-214, May 2023.

        Y. 阿杰瓦尼(Arjevani)、Y. 卡蒙(Carmon)、J.     C. 杜奇(Duchi)、D.     J. 福斯特(Foster)、N. 斯雷布罗(Srebro)和     B. 伍德沃思(Woodworth),“非凸随机优化的下界”,《数学规划》(Math. Program.),第 199 卷,第 1 – 2 期,第 165 – 214 页,2023 年 5 月。

    [40]     J. Wang,     Q. Liu,     H. Liang,     G. Joshi, and     V.     H. Poor, "Tackling the objective inconsistency problem in heterogeneous federated optimization," in Proc. NIPS, 2020, pp. 7611-7623.

    王(Wang)、刘(Liu)、梁(Liang)、乔希(Joshi)和普尔(Poor),“解决异构联邦优化中的目标不一致问题”,收录于《神经信息处理系统大会论文集》(Proc. NIPS),2020年,第7611 – 7623页。

    [41]     N. Moustafa and     J. Slay, "UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)," in Proc. Mil. Commun. Inf. Syst. Conf. (MilCIS), Nov. 2015, pp. 1-6.

    穆斯塔法(Moustafa)和斯雷(Slay),“UNSW – NB15:用于网络入侵检测系统的综合数据集(UNSW – NB15网络数据集)”,收录于《军事通信与信息系统会议论文集》(Proc. Mil. Commun. Inf. Syst. Conf. (MilCIS)),2015年11月,第1 – 6页。

    [42]     I. Sharafaldin,     A.     H. Lashkari, and     A.     A. Ghorbani, "Toward generating a new intrusion detection dataset and intrusion traffic characterization," in Proc. ICISSp, vol. 1, May 2018, pp. 108-116.

    沙拉法尔丁(Sharafaldin)、拉什卡里(Lashkari)和戈尔巴尼(Ghorbani),“迈向生成新的入侵检测数据集和入侵流量特征描述”,收录于《信息安全与隐私国际会议论文集》(Proc. ICISSp),第1卷,2018年5月,第108 – 116页。

    [43]     N. KoronIoTis,     N. Moustafa,     E. Sitnikova, and     B. Turnbull, "Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset," Future Gener. Comput. Syst., vol. 100, pp. 779-796, Nov. 2019.

        N. 科罗尼奥蒂斯(N. KoronIoTis)、N. 穆斯塔法(N. Moustafa)、E. 西特尼科娃(E. Sitnikova)和     B. 特恩布尔(B. Turnbull),“面向物联网中用于网络取证分析的真实僵尸网络数据集的开发:Bot – IoT 数据集”,《未来计算机系统》(Future Gener. Comput. Syst.),第 100 卷,第 779 – 796 页,2019 年 11 月。

    [44]     M.     J. Idrissi et al., "Fed-ANIDS: Federated learning for anomaly-based network intrusion detection systems," Expert Syst. Appl., vol. 234, Dec. 2023, Art. no. 121000.

        M.     J. 伊德里斯(M.     J. Idrissi)等人,“Fed – ANIDS:基于异常的网络入侵检测系统的联邦学习”,《专家系统应用》(Expert Syst. Appl.),第 234 卷,2023 年 12 月,文章编号 121000。

    [45]     R. Zhao,     Y. Wang,     Z. Xue,     T. Ohtsuki,     B. Adebisi, and     G. Gui, "Semisupervised federated-learning-based intrusion detection method for Internet of Things," IEEE Internet Things     J., vol. 10, no. 10, pp. 8645-8657, May 2023.

        R. 赵(R. Zhao)、Y. 王(Y. Wang)、Z. 薛(Z. Xue)、T. 大槻(T. Ohtsuki)、B. 阿德巴西(B. Adebisi)和     G. 桂(G. Gui),“基于半监督联邦学习的物联网入侵检测方法”,《IEEE 物联网学报》(IEEE Internet Things     J.),第 10 卷,第 10 期,第 8645 – 8657 页,2023 年 5 月。

    [46]     C. Xu,     J. Shen, and     X. Du, "A method of few-shot network intrusion detection based on meta-learning framework," IEEE Trans. Inf. Forensics Security, vol. 15, pp. 3540-3552, 2020.

    徐(Xu)、沈(Shen)和杜(Du),“基于元学习框架的小样本网络入侵检测方法”,《电气与电子工程师协会信息取证与安全汇刊》(IEEE Trans. Inf. Forensics Security),第15卷,第3540 – 3552页,2020年。


     [wc1]转折有问题,首先应该说明,当前物联网结构存在大量分布式组件,通过单点模型部署来检测是不现实且不符合实际的

     [wc2]前面强调了快速检测时网络入侵检测的主要挑战,那这一段讲联邦学习的好处加一点在于联邦学习可以部署到多个边缘设备端进行协作检测,不需要将数据回传到中心服务器,执行效率更快。

    1.  [wc3]个性化联邦学习
    2. 聚合阶段策略的改进优化

     [wc4]联邦元学习的定义在于利用所有客户端的数据得到一个初始模型,然后各个客户端使用该初始模型在本地进行几次梯度下降就能得到最终模型。论文实际上就是在联邦元学习论文上套了入侵检测的背景, 其亮点在于节点选择策略,通过两两节点计算余弦角,通过层次聚类/kmeans聚类将节点划分到m个类别中,计算每个节点在类别中的节点选择概率,根据节点选择概率分布,选择每个类别中概率最大的m个节点作为下次用来更新全局模型的训练节点。

    作者:paperwork666

    物联沃分享整理
    物联沃-IOTWORD物联网 » 探讨全局与局部适应性:针对个性化物联网入侵检测的客户端采样联邦元学习技术

    发表回复