Tensor Operator Set Architecture (TOSA) provides a set of whole-tensor operations commonly employed by Deep Neural Networks. The intent is to enable a variety of implementations running on a diverse range of processors, with the results at the TOSA level consistent across those implementations. Applications or frameworks which target TOSA can therefore be deployed on a wide range of different processors, such as CPUs or GPUs, with defined accuracy and compatibility constraints. Most operators from the common ML frameworks (TensorFlow, PyTorch, etc.) should be expressible in TOSA. It is expected that there will be tools to lower from the ML frameworks into TOSA.
- A minimal and stable set of tensor-level operators to which machine learning framework operators can be reduced.
- Full support for both quantized integer and floating-point content.
- Precise functional description of the behavior of every operator, including the treatment of their numerical behavior in the case of precision, saturation, scaling, etc. as required by quantized datatypes.
- Agnostic to any single high-level framework, compiler backend stack or particular target.
- The detailed functional and numerical description enables precise code construction for a diverse range of targets – SIMD CPUs, GPUs and custom hardware like NPUs/TPUs.
The TOSA Specification itself is written as AsciiDoc mark-up and developed in its raw mark-up form, managed through a git repository here: https://git.mlplatform.org/tosa/specification.git/. The specification is developed and versioned much like software is. While the mark-up is legible and can be read fairly easily in its raw form, it’s probably better to build or “render” the mark-up into a PDF document, or similar. To do this, please follow the instructions in the README.md in the root of the specification repository. For convenience, you can view a PDF version of v0.20.0 here:
In addition to the TOSA specification document, there is also an reference model written as C code which implements the exact behaviour as set out in the specification. This reference implementation is intended to be the “golden reference” for other TOSA implementations to be compared against and incorporated into regression test suites and such. The reference model consumes a FlatBuffers serialization of the network subgraph generated by the TOSA Serialization Library, along with input tensors for placeholder nodes in NumPy format. By default, the model validates and evaluates the network subgraph, and writes out the resulting output tensors in NumPy format. Alongside the TOSA Reference Model there is also a comprehensive set of tools for generating unit tests which exercise the behaviour of the TOSA operators. The tests are created as TOSA operations serialized in the flatbuffer format with inputs created in files using the NumPy format, as used by the reference model. The git repo containing the Reference Model and test generator is here: https://git.mlplatform.org/tosa/reference_model.git. For details on how to build and use the TOSA Reference Model, review the README.md in the root of that repo.
TOSA is intended to be very open technology developed collaboratively with the community. As such, contributions to the Specification as well as the Reference Model, test generator, etc. are strongly encouraged and very welcome. The process for contributing to the Reference Model and related code is slightly different to the process for contributing to the TOSA Specification, owing mainly to the difference between a software and documentation.
TOSA defines a set of primitive operators to which higher level operators can be lowered in a consistent way. To remain effective and efficient to implement the set of operators must be constrained to a reasonably small set of primitive operations out of which others can be constructed. The following principles govern how decisions are made about the addition of new operators within TOSA:
- An operator shall be a primitive operation or building block that cannot be broken down into simpler whole tensor operations. If the operator can be broken down, then we should look at the component operators.
- An operator shall be a usable as a component out of which more complex operations can be constructed. Single use operators have a high architectural cost and a more reusable version should be considered instead.
- Precision should be appropriate for the input and output data types. Precision higher than that needed to calculate the result leads to extra implementation cost.
- Numerical definition of common sub-operations should be consistent between operators (for example: value scaling). Consistent sub-operation definition reduces the operator implementation cost.
- The valid input and output ranges for all operands shall be specified. Ranges are required to makes consistent (numerically agreeing) implementations possible.
- Integer operators shall be implementable in a bit-exact form with good efficiency on CPU, GPU and hardware targets. Reduces implementation cost and gives consistent inference result.
Note: This is currently just a proposal, the exact process is still being worked on. In order for us to be able to accept contributions to the TOSA specification, it is necessary for the contributor to agree to a Contribution Agreement (which is currently in the process of being drafted). This will most likely be presented as a click-through licence a developer can choose to accept, with their acceptance recorded against their MLPlatform.org account. Once the Contribution Agreement has been accepted, contributing the the specification will follow the same process as other projects hosted on MLPlatform.org - see https://www.mlplatform.org/contributing/. Patches are made against the specification markup and reviewed on https://review.mlplatform.org/
Contributing to the Reference Model, unit test generator and related source code largely follows the same process as other projects hosted on MLPlatform.org - see https://www.mlplatform.org/contributing/ (though TOSA Reference Model is licensed using the Apache-2 licence).