Second-gen Google Cloud TPUs take machine learning to the next level
Google announced the launch of its new Google Cloud tensor processing units (TPUs) which bring unparalleled computational performance to the training of machine learning (ML) models.
Google, at its annual Google I/O developer conference this year, unveiled the next generation of its TPUs which can be used to both build and train ML models with remarkable speed. They will initially be made available on the Google Compute Engine and will be fully integrated with the Google Cloud's storage, networking, and data analytics technologies.
One Google Cloud TPU module contains four 45-teraflop chips that collectively deliver 180 teraflops of floating-point performance—crucial for training neural networks. Keeping scalability in mind, Google has included a custom high-speed network in each TPU that allows them to build 'TPU Pods' by assembling 64 units to form an ML supercomputer with 11.5 petaflops of computational power.
ML, an approach to artificial intelligence that enables machines to learn for themselves from provided data, has witnessed rapid progress in recent times. The process involves first training the underlying ML model and then running the model once it’s trained (inference). While Google's first-generation TPUs were designed only to carry out inference quickly, the newer variant is also geared towards accelerating the training of ML models.
Training an ML model, even on the most advanced GPUs and CPUs currently available, usually takes days or weeks of computation to achieve high levels of accuracy. But the new Google Cloud TPUs are going to change that.
"One of our new large-scale translation models used to take a full day to train on 32 of the best commercially available GPUs—now it trains to the same accuracy in an afternoon using just one-eighth of a TPU pod," Google stated in a post.
Google deployed its first-generation TPUs for internal purposes only—using them for its products like Search, Translate, speech recognition, and Google Photos. The new TPUs, however, will be made available on Google's Cloud Platform which would enable users to build their ML models on Intel CPUs or Nvidia GPUs and then use Google's TPU Cloud for the final processing.
To aid this process, Google has announced that it will offer ML researchers free access to a cluster of 1,000 Cloud TPUs through the open-source TensorFlow Research Cloud program. TensorFlow is an open-source ML system designed by Google for conducting research in ML and deep neural networks which can be used to program the Cloud TPUs.