We are making on-device AI ubiquitous
We envision a world where devices, machines, automobiles, and things are much more intelligent, simplifying and enriching our daily lives. They will be able to perceive, reason, and take intuitive actions based on awareness of the situation, improving just about any experience and solving problems that to this point we’ve either left to the user, or to more conventional algorithms.
Artificial intelligence (AI) is the technology driving this revolution. You may have heard this vision or may think that AI is really about big data and the cloud, and yet Qualcomm’s solutions already have the power, thermal, and processing efficiency to run powerful AI algorithms on the actual device — which brings several advantages.
AI is a pervasive trend that is rapidly accelerating thanks to vast amounts of data and progress in both algorithms and the processing capacity of modern devices. New technology can seem to appear out of nowhere, but oftentimes researchers and engineers have been toiling over it for many years before the timing is right and key progress is made.
At Qualcomm, we have a culture of innovation. We take pride in researching and developing fundamental technologies that will change the world at scale. AI is no different. We started fundamental research more than 10 years ago, and our current products now support many AI use cases, such as computer vision, natural language processing, and malware detection — both for smartphones and autos — and we are researching broader topics, such as AI for wireless connectivity, power management, and photography.
We have a long history of investing in machine learning. In 2007, we started exploring spiking neuron approaches to machine learning for computer vision and motion control applications. We then expanded the scope of the research to look not just at biologically inspired approaches but artificial neural networks — primarily deep learning, which is a sub-category of machine learning. Time and time again, we saw deep learning-based networks demonstrating state-of-the-art results in pattern-matching tasks. A notable example was in 2012 when AlexNet won the prestigious ImageNet Challenge using deep learning techniques rather than traditional hand-crafted computer vision. We’ve also had our own success at the ImageNet Challenge using deep learning techniques, placing as a top-3 performer in challenges for object localization, object detection, and scene classification.
We have also expanded our own research and collaborated with the external AI community into other promising areas and applications of machine learning, like recurrent neural networks, object tracking, natural language processing, and handwriting recognition. In September 2014, we opened Qualcomm Research Netherlands in Amsterdam, a hotbed for machine learning research, and have continued to work closely with Ph.D. students working on forward-thinking ideas through our Qualcomm Innovation Fellowship program. In September 2015, we established a joint research lab with the University of Amsterdam (QUVA) focused on advancing the state-of-the-art in machine learning techniques for mobile computer vision. We further deepened our relationship with Amsterdam’s AI scene by acquiring Scyfer, a leading AI company in Amsterdam. Max Welling, a Scyfer founder, is a renowned professor at the University of Amsterdam, where he has focused on machine learning, computational statistics, and fundamental AI research. Scyfer focuses on applying a wide range of machine learning approaches to real-world problems. The Scyfer team will join Qualcomm Research’s machine learning group.
To make our vision of intelligent devices possible, we also knew that the machine learning-based solutions would need to run on the device — whether a smartphone, car, robot, drone, machine, or other thing. Running the AI algorithms — also known as inference — on the device versus in the cloud has various benefits such as immediate response, enhanced reliability, increased privacy, and efficient use of network bandwidth.
The cloud remains of course very important and will complement on-device processing. The cloud is necessary for pooling of big data and training (currently) that results in many of the AI inference algorithms that run on the device. However, in many cases, inference running entirely in the cloud will have issues for real-time applications that are latency-sensitive and mission-critical like autonomous driving. Such applications cannot afford the roundtrip time or rely on critical functions to operate when in variable wireless coverage. Further, on-device inference is inherently more private.
We are not limiting ourselves to only running inference on the device. We also research training AI on the device for targeted use cases such as gesture recognition, continuous authentication, personalized user interfaces, and precise mapping for autonomous driving — in a synergistic cooperation with the cloud. In fact, we have a unique ability to explore future architectures which can benefit from high-speed connectivity and high-performance local processing, resulting in the best overall system performance.
For over a decade, Qualcomm has been focused on efficient processing of diverse compute workloads within the power, thermal, and size constraints of mobile devices. Qualcomm Snapdragon Mobile Platforms have been the SoC of choice for the highest performance mobile devices. AI workloads present another challenge in this regard. By running various machine learning tasks on the appropriate compute engines — such as the CPU, GPU, and DSP — already in our SoC, we offer the most efficient solution. A key example is the Qualcomm Hexagon DSP, which was originally designed for other vector math-intensive workloads but is being further enhanced to address AI workloads. In fact, the Hexagon DSP with Qualcomm Hexagon Vector eXtensions on Snapdragon 835 has been shown to offer a 25X improvement in energy efficiency and an 8X improvement in performance when compared against running the same workloads (GoogleNet Inception Network) on the Qualcomm Kryo CPU.
The diversity in architecture is essential and you can’t rely on just one type of engine for all workloads. We will continue to evolve our existing engines for machine learning workloads to maintain our lead in maximum performance per watt. Leveraging our research into emerging neural networks, we are well positioned to extend our heterogeneous computing capabilities to address future AI workloads with a focus on maximum performance per watt. In fact, we envisioned dedicated hardware for running AI efficiently back in 2012.
Making heterogeneous computing easy for developers is hard. It is not enough just to have great hardware. To bridge that gap, we have introduced the Snapdragon Neural Processing Engine (NPE) Software Developer Kit (SDK). This features an accelerated runtime for on-device execution of convolutional neural networks (CNN) and recurrent neural networks (RNN) — which are great for tasks like image recognition and natural language processing, respectively — on the appropriate Snapdragon engines, like the Kryo CPU, Qualcomm Adreno GPU, and Hexagon DSP. The same developer API provides access to each of our engines so developers can easily switch their AI tasks from one to another seamlessly.
The Neural Processing Engine also supports common deep learning model frameworks, such as Caffe/Caffe2 and TensorFlow. The SDK is a lightweight, flexible platform designed to deliver optimal performance and power consumption by leveraging Snapdragon technology. The SDK is designed to enable developers and OEMs in a broad range of industries, from health care to security, to run their own proprietary neural network models on portable devices. As an example, at this year’s F8 conference, Facebook and Qualcomm Technologies announced a collaboration to support the optimization of Caffe2, Facebook’s open source deep learning framework, and the NPE framework.
We are in the early days of machine learning journey and deep learning is just one of many machine learning technologies that has the potential to transform computing.
Authored by Matt Grob