11. Concluding remarks
Encoding of 8-, 16-, 32- and 64-bit integers and 32- and 64-bit floats has been implemented in all general-purpose processors for decades. The fixed-point format is used mainly in signal processors.
Over the past ten years or so, new formats have been introduced, the result of two considerations:
deep neural networks, particularly for inference, can benefit from reduced formats, enabling the surface area of arithmetic operators and power dissipation to be reduced, without any significant loss of precision. Some of these formats, such as 16-bit floats (FP16, BFP16), are implemented in the instruction sets of general-purpose processors. The TF32 format is implemented in the tensors of recent Nvidia GPUs;
many applications use specialized processors such as neural processors (Google TPU,...
Exclusive to subscribers. 97% yet to be discovered!
Already subscribed? Log in!
Concluding remarks
Article included in this offer
"Software technologies and System architectures"
(
227 articles
)
Updated and enriched with articles validated by our scientific committees
A set of exclusive tools to complement the resources
Bibliography
Exclusive to subscribers. 97% yet to be discovered!
Already subscribed? Log in!