Skip to content

Glow Roadmap

Nadav Rotem edited this page Apr 3, 2019 · 12 revisions

This page tracks the ongoing development of Glow. It documents the goals for upcoming development iterations, the status of some high-level tasks, and relevant information that can help people join the ongoing efforts.

Top-Level Tasks

Load additional quantized neural networks

Quantization is the process of converting neural networks that are programmed using 32-bit floating point operations to using 8-bit integer arithmetic. Glow can quantize existing floating point networks using Profile Guided Quantization and then run the quantized model for inference. Glow starts to support loading quantized Caffe2/ONNX models directly. The goal of this top-level task is to extend the loader support to additional quantized Caffe2 operators (https://github.com/pytorch/pytorch/tree/master/caffe2/quantization/server) and ONNX operators.

Contact person: @beicy Issue: Support directly loading a quantized model

Asynchronous Model Execution

Glow is designed as a compiler and execution engine for neural network hardware accelerators. The current implementation of the execution engine is very basic and exposes a simple single-device synchronous run method. The goal for this top-level task is to rewrite the execution engine and implement an asynchronous execution mechanism that can be extended to support execution of code on multiple acceleration units concurrently. The execution engine will need to manage the state of multiple cards, queue requests and manage the complex state of buffers on the host and the device.

Contact person: @gcatron Issue: Glow Runtime

ONNXIFI integration

Glow integrates into PyTorch using the ONNXIFI interface. This interface offloads the compute graph from PyTorch onto Glow. This top-level task tracks the work to fully implement the ONNXIFI specification and to qualify the compiler using the ONNXIFI test suite.

Contact person: @rdzhabarov Issue: ONNXIFI support

Improved support for Training accelerators

Glow is designed to support both Inference and Training accelerators. The focus of the initial bringup effort was the support of Inference accelerators. In the next development iterations we'll focus on improving the support and productizing training accelerators. This work will include adding additional gradient operators, improving the gradient check test coverage and enabling asynchronous communication patterns in the runtime.

Contact person: @rdzhabarov Issue: