Skip to content
arXiv stat.ML · Papers

Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization

arXiv:2501.07526v2 Announce Type: replace-cross Abstract: Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where communication is more expensi