CSE5313 Coding and information theory for data science (Lecture 23)
Coded Computing
Motivation
Some facts:
- Mooreโs law is saturating.
- Improving CPU performance is hard.
- Modern datasets are growing remarkably large.
- E.g., TikTok, YouTube.
- Learning tasks are computationally heavy.
- E.g., training neural networks.
Solution: Distributed Computing for Scalability
- Offloading computation tasks to multiple computation nodes.
- Gather and accumulate computation results.
- E.g., Apache Hadoop, Apache Spark, MapReduce.
General Framework
- The system involves 1 master node and worker nodes.
- The master has a dataset and wants , where is some function.
- The master partitions , and sends to node .
- Every node computes , where is somem function.
- Finally, the master collects and computes , where is some function.
Challenges
Stragglers
- Nodes that are significantly slower than the others.
Adversaries
- Nodes that return errounous results.
- Computation/communication errors.
- Adversarial attacks.
Privary
- Nodes may be curious about the dataset.
Resemblance to communication channel
Suppose , and let a message.
- is a field element
- could be or , .
Observation: This is a distributed storage system.
- An erasure - node that does not respond.
- An error - node that returns errounous results.
Solution:
- Add redundancy to the message
- Error-correcting codes.
Coded Distributed Computing
- The master partitions and encodes it before sending to workers.
- Workers perform computations on coded data and generate coded results .
- The master decode the coded results and obtain .
Outline
Matrix-Vector Multiplication
- MDS codes.
- Short-Dot codes.
Matrix-Matrix Multiplication
- Polynomial codes.
- MatDot codes.
Polynomial Evaluation
- Lagrange codes.
- Application to BLockchain.
Trivial solution - replication
Why no straggler tolerance?
- We employ an individual worker node to compute .
Replicate the computation?
- Let nodes compute every .
We need worker nodes to tolerate erasures and adversaries.
Use of MDS codes
Let and .
Let be submatrices of such that .
- Worker node 1 conputes .
- Worker node 2 conputes .
- Worker node 3 conputes .
Observation: the results can be obtained from any two worker nodes.
Let be the generator matrix of an MDS code.
The master node computes .
Every worker node computers .
- is the i-th row of .
Notice that is the codeword of .
Node computes an entry in this codeword.
response = entyr of the codeword.
The master does not need all workers to respond to obtain .
- The MDS property allows decoding from any โs
- This scheme tolerates erasures, and the recovery threshold .
- We need worker nodes to tolerate stragglers or adversaries.
- With replication, we need worker nodes.
Potential improvements for MDS codes
- The matrix is usually a (trained) model, and is the data (feature vector).
- is transmitted frequently, while the row of (or ) is communicated in advance.
- Every worker needs to receive the entire and compute the dot product.
- Communication-heavy
- Can we design a scheme that allows every node only receive only a part of ?
Short-Dot codes
We want to create a matrix from such that:
- Every node computes .
- Every rows linearly span the row space of .
- Each row of contains at most non-zero entries.
In the MDS method, .
- The recovery threshold .
- Every worker node needs to receive symbols (the entire x)
No free lunch
Can we trade the recovery threshold for a smaller ?
- Every worker node receives less than symbols.
- The master will need more than responses to recover the computation result.
Construction of Short-Dot codes
Choose a super-regular matrix , where is the number of worker nodes.
- A matrix is supper-regular if every square submatrix is invertible.
- Lagrange/Cauchy matrix is super-regular (next lecture).
Create matrix by stacking some below matrix .
Let .
Short-Dot: create matrix such that:
- Every rows linearly span the row space of .
- Each row of contains at most . non-zero entries (sparse).
Recovery of Short-Dot codes
Claim: Every rows of linearly span the row space of .
Proof
Since is supper-regular, it is also MDS, i.e., every submatrix of is invertible.
Hence, every row of can be represented as a linear combination of any rows of .
That is, for every , we can have .
What about the sparsity of ?
- Want each row of to be sparse.
Sparsity of Short-Dot codes
Build square matrix whose each row/column contains non-zero entries.
Concatenate such matrices and obtain
[Missing slides 18]
We now investigate what ๐ should look like to construct such a matrix ๐น. โข Recall that each column of ๐น must contains ๐พ โ ๐ zeros. โ They are indexed by the set ๐ฐ โ ๐ , where |๐ฐ| = ๐พ โ ๐. โ Let ๐ต ๐ฐ โ ๐ฝ ๐พโ๐ ร๐พ be a submatrix of ๐ต containing rows indexed by ๐ฐ. โข Since ๐น = ๐ต๐ดแ , it follows that ๐น๐ = ๐ต๐ดแ ๐ , where ๐น๐ and ๐ดแ ๐ are the ๐-th column of ๐น and ๐ดแ. โข Next, we have ๐ต ๐ฐ๐ดแ ๐ = 0 (๐พโ๐)ร1. โข Split ๐ต ๐ฐ = [๐ต 1,๐ ๐ฐ ,๐ต[๐+1,๐พ] ๐ฐ ], ๐ดแ ๐ = ๐ด๐ ๐ , ๐๐ ๐ ๐ . โข ๐ต ๐ฐ๐ดแ ๐ = ๐ต 1,๐ ๐ฐ ๐ด๐ + ๐ต[๐+1,๐พ] ๐ฐ ๐๐ = 0 (๐พโ๐)ร1. โข ๐๐= โ(๐ต ๐+1,๐พ ๐ฐ ) โ1 ๐ต 1,๐ ๐ฐ ๐ด๐ . โข Note that ๐ต ๐+1,๐พ ๐ฐ โ ๐ฝ ๐พโ๐ ร ๐พโ๐ is invertible. โ Since ๐ต is super-regular.