文章目录 1 整体架构 2 执行步骤 2.1 构建计算图 2.2 deferred execution(延后执行) 2.3 single-device execution 2.4 multi-device execution 3 并行运算 3.1 data parallel training 3.2……