Aditya Khosla, an Indian-origin scientist along with a few more scientists, has developed a new artificial intelligence software that can turn any smartphone into an eye-tracking device.
The eye-tracking technology, which can determine where people are directing their gaze in a visual scene, has been widely used in psychological experiments and marketing research, but since the required hardware is pricey, it was not widely used in consumer applications. In addition to making existing applications of eye-tracking technology more accessible, the system developed by researchers at Massachusetts Institute of Technology (MIT) and University of Georgia may enable new computer interfaces or help detect signs of incipient neurological disease or mental illness.
Since few people have the external devices, there is no big incentive to develop applications for them. Since there are no applications, there’s no incentive for people to buy the devices. We thought we should break this circle and try to make an eye tracker that works on a single mobile device, using just your front-facing camera, said Aditya, who is an MIT graduate student.
Researchers built their eye tracker using machine learning, a technique in which computers learn to perform tasks by looking for patterns in large sets of training examples. Their training set includes examples of gaze patterns from 1,500 mobile-device users, Aditya said. Previously, the largest data sets used to train experimental eye-tracking systems had topped out at about 50 users.
The researchers report an initial round of experiments, using training data drawn from 800 mobile device users.
On that basis, they were able to get the system’s margin of error down to 1.5 centimetres, a twofold improvement over previous experimental systems. They later acquired data on another 700 people, and the additional training data has reduced the margin of error to about a centimetre.
To get a sense of how larger training sets might improve performance, the researchers trained and retrained their system using different-sized subsets of their data. Those experiments suggest that about 10,000 training examples should be enough to lower the margin of error to a half-centimetre, which Khosla estimates will be good enough to make the system commercially viable.
To collect their training examples, the researchers developed a simple application for smartphone devices. The application flashes a small dot somewhere on the device’s screen, attracting the user’s attention, then briefly replaces it with either an “R” or an “L,” instructing the user to tap either the right or left side of the screen. Correctly executing the tap ensures that the user has actually shifted his or her gaze to the intended location. During this process, the device camera continuously captures images of the user’s face. The data set contains, on average, 1,600 images for each user.