Breakthrough in “Silent Speech” Tech: MIPT Researchers Close to Prototype

A benchmark, a standard for computer programs, has been developed by researchers at the Moscow Institute of Physics and Technology. This benchmark was used to compare over a dozen popular neural networks and classical algorithms for recognizing gestures of unfamiliar individuals based on electromyography signals. The development offers a better understanding of the “body’s voice” for remote control of technology, tele-droids, augmented reality, and virtual reality. Naked Science, citing a publication in the proceedings of the 2026 28th International Conference on Digital Signal Processing and its Applications (DSPA), detailed the innovation.

Electromyography, as explained by specialists, is the recording of the electrical activity of muscles during contraction. This is done using sensors and electrodes. For instance, when moving fingers, a signal is generated by the forearm muscles; sensors pick it up, and a program translates these signals into commands for specific devices. Such technologies enable the operation of prosthetics, drones, and virtual and augmented reality systems.

The primary advantage of this method is that the recording sensors attach closely to the body. Unlike cameras, lidars, eye-tracking systems, and similar technologies, EMG devices can function effectively in darkness and are not hindered by clothing or other obstructions. They also provide high resolution, capturing even the slightest muscle movements.

Currently, the authors of the study note, neural network algorithms play a significant role in recognition, transforming muscle activity into clear commands. However, the proposed benchmark reveals that contemporary neural networks struggle to accurately recognize gestures from different individuals without prior calibration. This limitation stems from the unique biological characteristics of each person.

During the experiment, more than a dozen neural network architectures were tested. The models were trained on data from a specific group and then evaluated on unfamiliar subjects. However, none of the programs demonstrated acceptable recognition accuracy, with even the top-performing models achieving only around 35% precision.

The main reason for this is that the signal for the same gesture varies significantly from person to person, exceeding the model’s ability to generalize. For example, in identical tasks, women exhibit forearm muscle activity 1.3–2.8 times higher than men, and accuracy decreases by an average of 7% with fatigue. Body temperature also influences the signal spectrum.

The authors concluded that the path to universally applicable EMG devices, which would be highly sought after in the mass market, lies in a combination of pre-trained foundational models and mini-calibration algorithms. The range of their applications is exceptionally broad. Notably, research is underway to create “silent speech” devices. These would record the activity of articulating muscles in the face and neck, converting them into spoken words.