Distributed Deep Learning
HAL: Computer System for Scalable Deep Learning
V. Kindratenko, D. Mu, Y. Zhan, J. Maloney, S. Hashemi, B. Rabe, K. Xu, R. Campbell, J. Peng, and W. Gropp.
Distributed training on HAL with PyTorch and NVIDIA Apex
ImageNet benchmark experiments for performance analysis
Member of the NCSA HAL cluster admin team, now called NCSA CAII
NCSA HAL cluster tutorial series on distributed deep learning