arXiv stat.ML
· Papers
Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization
arXiv:2501.07526v2 Announce Type: replace-cross Abstract: Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where communication is more expensi