Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 23)

CSE5313 Coding and information theory for data science (Lecture 23)

Coded Computing

Motivation

Some facts:

  • Mooreโ€™s law is saturating.
    • Improving CPU performance is hard.
  • Modern datasets are growing remarkably large.
    • E.g., TikTok, YouTube.
  • Learning tasks are computationally heavy.
    • E.g., training neural networks.

Solution: Distributed Computing for Scalability

  • Offloading computation tasks to multiple computation nodes.
  • Gather and accumulate computation results.
  • E.g., Apache Hadoop, Apache Spark, MapReduce.

General Framework

  • The system involves 1 master node and PP worker nodes.
  • The master has a dataset DD and wants f(D)f(D), where ff is some function.
  • The master partitions D=(D1,โ‹ฏโ€‰,DP)D=(D_1,\cdots,D_P), and sends DiD_i to node ii.
  • Every node ii computes g(Di)g(D_i), where gg is somem function.
  • Finally, the master collects g(D1),โ‹ฏโ€‰,g(DP)g(D_1),\cdots,g(D_P) and computes f(D)=h(g(D1),โ‹ฏโ€‰,g(DP))f(D)=h(g(D_1),\cdots,g(D_P)), where hh is some function.

Challenges

Stragglers

  • Nodes that are significantly slower than the others.

Adversaries

  • Nodes that return errounous results.
    • Computation/communication errors.
    • Adversarial attacks.

Privary

  • Nodes may be curious about the dataset.

Resemblance to communication channel

Suppose f,g=idโกf,g=\operatorname{id}, and let D=(D1,โ‹ฏโ€‰,DP)โˆˆFpD=(D_1,\cdots,D_P)\in \mathbb{F}^p a message.

  • DiD_i is a field element
  • F\mathbb{F} could be R\mathbb{R} or C\mathbb{C}, Fq\mathbb{F}^q.

Observation: This is a distributed storage system.

  • An erasure - node that does not respond.
  • An error - node that returns errounous results.

Solution:

  • Add redundancy to the message
  • Error-correcting codes.

Coded Distributed Computing

  • The master partitions DD and encodes it before sending to PP workers.
  • Workers perform computations on coded data D~\tilde{D} and generate coded results g(D~)g(\tilde{D}).
  • The master decode the coded results and obtain f(D)=h(g(D~))f(D)=h(g(\tilde{D})).

Outline

Matrix-Vector Multiplication

  • MDS codes.
  • Short-Dot codes.

Matrix-Matrix Multiplication

  • Polynomial codes.
  • MatDot codes.

Polynomial Evaluation

  • Lagrange codes.
  • Application to BLockchain.

Trivial solution - replication

Why no straggler tolerance?

  • We employ an individual worker node ii to compute yi=(ai,โ€ฆ,aiN)โ‹…(x1,โ€ฆ,xN)Ty_i=(a_i,\ldots,a_{iN})\cdot (x_1,\ldots,x_N)^T.

Replicate the computation?

  • Let r+1r+1 nodes compute every yiy_i.

We need P=rM+MP=rM+M worker nodes to tolerate rr erasures and โŒŠr2โŒ‹\lfloor \frac{r}{2}\rfloor adversaries.

Use of MDS codes

Let 2โˆฃM2|M and P=3P=3.

Let A1,A2A_1,A_2 be submatrices of AA such that A=[A1โŠคโˆฃA2โŠค]โŠคA=[A_1^\top|A_2^\top]^\top.

  • Worker node 1 conputes A1โ‹…xA_1\cdot x.
  • Worker node 2 conputes A2โ‹…xA_2\cdot x.
  • Worker node 3 conputes (A1+A2)โ‹…x(A_1+A_2)\cdot x.

Observation: the results can be obtained from any two worker nodes.

Let GโˆˆFMร—PG\in \mathbb{F}^{M\times P} be the generator matrix of an (P,M)(P,M) MDS code.

The master node computes F=GโŠคAโˆˆFPร—NF=G^\top A\in \mathbb{F}^{P\times N}.

Every worker node ii computers Fiโ‹…xF_i\cdot x.

  • Fi=(GโŠคA)iF_i=(G^\top A)_i is the i-th row of GโŠคAG^\top A.

Notice that Fx=GโŠคAโ‹…x=GโŠคyFx=G^\top A\cdot x=G^\top y is the codeword of yy.

Node ii computes an entry in this codeword.

11 response = 11 entyr of the codeword.

The master does not need all workers to respond to obtain yy.

  • The MDS property allows decoding from any MM yiy_iโ€˜s
  • This scheme tolerates Pโˆ’MP-M erasures, and the recovery threshold K=MK=M.
  • We need P=r+MP=r+M worker nodes to tolerate rr stragglers or r2\frac{r}{2} adversaries.
    • With replication, we need P=rM+MP=rM+M worker nodes.

Potential improvements for MDS codes

  • The matrix AA is usually a (trained) model, and xx is the data (feature vector).
  • xx is transmitted frequently, while the row of AA (or GโŠคAG^\top A) is communicated in advance.
  • Every worker needs to receive the entire xx and compute the dot product.
  • Communication-heavy
  • Can we design a scheme that allows every node only receive only a part of xx?

Short-Dot codes

link to paperย 

We want to create a matrix FโˆˆFPร—MF\in \mathbb{F}^{P\times M} from AA such that:

  • Every node computes Fiโ‹…xF_i\cdot x.
  • Every KK rows linearly span the row space of AA.
  • Each row of FF contains at most ss non-zero entries.

In the MDS method, F=GโŠคAF=G^\top A.

  • The recovery threshold K=MK=M.
  • Every worker node needs to receive s=Ns=N symbols (the entire x)

No free lunch

Can we trade the recovery threshold KK for a smaller ss?

  • Every worker node receives less than NN symbols.
  • The master will need more than MM responses to recover the computation result.

Construction of Short-Dot codes

Choose a super-regular matrix BโˆˆFPร—KB\in \mathbb{F}^{P\times K}, where PP is the number of worker nodes.

  • A matrix is supper-regular if every square submatrix is invertible.
  • Lagrange/Cauchy matrix is super-regular (next lecture).

Create matrix A~\tilde{A} by stacking some ZโˆˆF(Kโˆ’M)ร—NZ\in \mathbb{F}^{(K-M)\times N} below matrix AA.

Let F=Bโ‹…A~โˆˆFPร—NF=B\cdot \tilde{A}\in \mathbb{F}^{P\times N}.

Short-Dot: create matrix FโˆˆFPร—NF\in \mathbb{F}^{P\times N} such that:

  • Every KK rows linearly span the row space of AA.
  • Each row of FF contains at most s=Pโˆ’K+MPs=\frac{P-K+M}{P}. NN non-zero entries (sparse).

Recovery of Short-Dot codes

Claim: Every KK rows of FF linearly span the row space of AA.

Proof

Since BB is supper-regular, it is also MDS, i.e., every Kร—KK\times K submatrix of BB is invertible.

Hence, every row of AA can be represented as a linear combination of any KK rows of FF.

That is, for every XโІ[P],โˆฃXโˆฃ=K\mathcal{X}\subseteq[P],|\mathcal{X}|=K, we can have A~=(BX)โˆ’1FX\tilde{A}=(B^{\mathcal{X}})^{-1}F^{\mathcal{X}}.

What about the sparsity of FF?

  • Want each row of FF to be sparse.

Sparsity of Short-Dot codes

Build Pร—PP\times P square matrix whose each row/column contains Pโˆ’K+MP-K+M non-zero entries.

Concatenate NP\frac{N}{P} such matrices and obtain

[Missing slides 18]

We now investigate what ๐‘ should look like to construct such a matrix ๐น. โ€ข Recall that each column of ๐น must contains ๐พ โˆ’ ๐‘€ zeros. โ€“ They are indexed by the set ๐’ฐ โˆˆ ๐‘ƒ , where |๐’ฐ| = ๐พ โˆ’ ๐‘€. โ€“ Let ๐ต ๐’ฐ โˆˆ ๐”ฝ ๐พโˆ’๐‘€ ร—๐พ be a submatrix of ๐ต containing rows indexed by ๐’ฐ. โ€ข Since ๐น = ๐ต๐ดแˆš , it follows that ๐น๐‘— = ๐ต๐ดแˆš ๐‘— , where ๐น๐‘— and ๐ดแˆš ๐‘— are the ๐‘—-th column of ๐น and ๐ดแˆš. โ€ข Next, we have ๐ต ๐’ฐ๐ดแˆš ๐‘— = 0 (๐พโˆ’๐‘€)ร—1. โ€ข Split ๐ต ๐’ฐ = [๐ต 1,๐‘€ ๐’ฐ ,๐ต[๐‘€+1,๐พ] ๐’ฐ ], ๐ดแˆš ๐‘— = ๐ด๐‘— ๐‘‡ , ๐‘๐‘— ๐‘‡ ๐‘‡ . โ€ข ๐ต ๐’ฐ๐ดแˆš ๐‘— = ๐ต 1,๐‘€ ๐’ฐ ๐ด๐‘— + ๐ต[๐‘€+1,๐พ] ๐’ฐ ๐‘๐‘— = 0 (๐พโˆ’๐‘€)ร—1. โ€ข ๐‘๐‘—= โˆ’(๐ต ๐‘€+1,๐พ ๐’ฐ ) โˆ’1 ๐ต 1,๐‘€ ๐’ฐ ๐ด๐‘— . โ€ข Note that ๐ต ๐‘€+1,๐พ ๐’ฐ โˆˆ ๐”ฝ ๐พโˆ’๐‘€ ร— ๐พโˆ’๐‘€ is invertible. โ€“ Since ๐ต is super-regular.

Last updated on