Federated Learning. Yang Liu

Читать онлайн.
Название Federated Learning
Автор произведения Yang Liu
Жанр Компьютеры: прочее
Серия Synthesis Lectures on Artificial Intelligence and Machine Learning
Издательство Компьютеры: прочее
Год выпуска 0
isbn 9781681737188



Скачать книгу

conducted via a mask encrypted by Paillier’s scheme and the intermediate data computed by each party. The encrypted masked intermediate results are exchanged between the two parties in the secure gradient descent algorithm. Finally, the encrypted gradient is sent to a coordinator for decryption and model update.

      CryptoNets [Gilad-Bachrach et al., 2016] is an HE-based methodology announced by Microsoft Research that allows secure evaluation (inference) of encrypted queries over already trained neural networks on cloud servers: queries from the clients can be classified securely by the trained neural network model on a cloud server without inferring any information about the query or the result. The CryptoDL [Hesamifard et al., 2017] framework is a leveled HE-based approach for secure neural network inference. In CryptoDL, several activation functions are approximated using low-degree polynomials and mean-pooling is used as a replacement for max-pooling. The GAZELLE [Juvekar et al., 2018] framework is proposed as a scalable and low-latency system for secure neural network inference. In GAZELLE, to conduct secure nonlinear function evaluation in neural network inference, HE and traditional secure two-party computation techniques (such as GC) are combined in an intricate way. The packed additive homomorphic encryption (PAHE) embraced in GAZELLE allows single instruction multiple data (SIMD) arithmetic homomorphic operations over encrypted data.

      FedMF [Chai et al., 2019] uses Paillier’s HE for secure federated matrix factorization assuming honest-but-curious server and honest clients. Secure federated transfer learning is studied via Paillier’s HE scheme in Liu et al. [2019], where the semi-honest third party is into the discard by mixing HE with additive secret sharing in decryption process.

      DP was originally developed to facilitate secure analysis over sensitive data. With the rise of ML, DP has become an active research field again in the ML community. This is motivated by the fact that many exciting results from DP can be applied to PPML [Dwork et al., 2016, 2006]. The key idea of DP is to confuse the adversaries when they are trying to query individual information from the database so that adversaries cannot distinguish individual-level sensitivity from the query result.

       Definition

      DP is a privacy definition initially proposed by Dwork et al. [2006], developed in the context of statistical disclosure control. It provides an information-theoretic security guarantee that the outcome of a function to be insensitive to any particular record in the dataset. Therefore, DP can be used to resist the membership inference attack. The (∊, δ)-differential privacy is defined as follows.

      Definition 2.2 (∊, δ)-differential privacy. A randomized mechanism M preserves (∊, δ)-differential privacy if given any two datasets D and D′ differing by only one record, and for all SRange (M),

Image

      where ∊ is the privacy budget and δ is the failure probability.

      The quantity Image is called the privacy loss, with ln denoting natural logarithm operation. When δ = 0, the stronger notion of ∊-differential privacy is achieved.

      DP has utility-privacy trade-offs as it introduces noise to data. Jayaraman and Evans [2019] found out that current mechanisms for differential privacy for ML rarely offer acceptable utility-privacy trade-offs: settings that provide limited accuracy loss provide little effective privacy, and settings that provide strong privacy result in large accuracy loss.

       Categorization of DP Schemes

      Typically, there are mainly two ways to achieve DP by adding noise to the data. One is the addition of noise according to the sensitivity of a function [Dwork et al., 2006]. The other is choosing noise according to an exponential distribution among discrete values [McSherry and Talwar, 2007].

      The sensitivity of a real-valued function expresses the maximum possible change in its value due to the addition or removal of a single sample.

      Definition 2.3 Sensitivity. For two datasets D and D′ differing by only one record, and a function M : D → Rd over an arbitrary domain, the sensitivity of M is the maximum change in the output of M over all possible inputs:

Image

      where ‖·‖ is a norm of the vector. The l1-sensitivity or the l2-sensitivity is defined when the l1-norm or l2-norm is applied, respectively.

      We denote the Laplace distribution with parameter b as Lap(b). Lap(b) has a probability density function Image. Given a function M with sensitivity ΔM, the addition of noise drawn from a calibrated Laplace distribution Lap(ΔM/∊) maintains ∊-differential privacy [Dwork et al., 2006].

      Theorem 2.4 Given a function M : D → Rd over an arbitrary domain D, for any input X, the function:

Image

      provides ∊-differential privacy. The ∊-differential privacy can also be achieved by adding independently generated Laplace noise from distribution Lap(ΔM/∊) to each of the d output terms.

      The addition of Gaussian or binomial noise, scaled to the l2-sensitivity of the function, sometimes yields better accuracy, while only ensuring the weaker (∊, δ)-differential privacy [Dwork et al., 2006, Dwork and Nissim, 2004].

      The exponential mechanism [McSherry and Talwar, 2007] is another way to obtain DP. The exponential mechanism is given a quality function q that scores outcomes of a calculation, where higher scores are better. For a given database and ∊ parameter, the quality function induces a probability distribution over the output domain, from which the exponential mechanism samples the outcome. This probability distribution favors high-scoring outcomes, while ensuring ∊-differential privacy.

      Definition 2.5 Let q : (Dn × R) → R be a quality function, which given a dataset d ∈ Dn, assigns a score to each outcome r ∈ R. For any two datasets D and D′ differing by only one record, let Image. Let M be a mechanism for choosing an outcome r ∈ R given a dataset instance dDn. Then, the mechanism M, defined as

Image

      provides ∊-differential privacy.

      The DP algorithms can be categorized according to how and where the perturbation is applied.

      1. Input perturbation: The noise is added to the training data.

      2. Objective perturbation: The noise is added to the objective function of the learning algorithms.

      3. Algorithm perturbation: The noise is added to the intermediate values such as gradients in iterative algorithms.

      4. Output perturbation: The noise is added to the output parameters after training.

      DP still exposes the statistics of a party, which are sensitive in some cases, such as financial data, medical data and other commercial and health applications. Readers who are interested in DP and willing to learn more about it can refer to the tutorial given by Dwork and Roth [2014].

       Application in PPML

      In federated learning, to enable model training on distributed datasets held by multiple parties, local differential privacy (LDP) can be used. With local