Person-Independent sEMG Gesture Recognition Using LSTM Networks for Human-Computer Interaction


Hand gestures feel natural to perform, which makes them well-suited to use as Human-Computer Interaction interfaces. But detecting them with high accuracy in real-time is a challenging task. This paper presents an approach based on the Long Short-Term Memory Neural Network architecture to evaluate Surface Electromyography signals and determine the gesture performed. This approach is not new and has limited performance on people for whom it wasn’t trained. Therefore, this research evaluates an approach where the Neural Network’s existing knowledge is adjusted to a new person using just a few samples from the new person and very little training. This strategy allows getting accurate results with an approach that is usable in a Human-Computer Interface.


Surface Electromyography (sEMG) is a convenient way to gather Electromyography (EMG) data for Human-Computer Interaction (HCI) interfaces. But because of the nonuniform and noisy nature of sEMG data, it is still a challenging task. To detect hand gestures from it, Neural Networks (NN) can be used. Since the gesture data will vary from session to session as well as from person to person, the model has to generalize very well or be able to specialize in a short period of time. Huge datasets and long training time is required to train a conventional NN on this kind of data.

Hyperparameter tuning is used to refine the model by automatically optimizing specific parameters that define the architecture. The final base model architecture uses two Long Short Term Memory (LSTM) cells and a fully connected layer.

Datasets with different amounts of subjects were used to determine the learned model’s generalization and specialization accuracy. NNs trained on data of many subjects were able to generalize better.

Transfer Learning (TL) can be used to translate a pre-trained model to a different session or a new person with very little additional data and training time. It is implemented by adding a few fully connected layers on top of the base architecture. The resulting network is then re-trained with little data of the new subject while locking the weights of the already trained layers of the base network. These layers learn how to translate the new data to the data expected by the generalized model.

To use this approach in a real-time HCI scenario, short training and inference periods are expected. Since the TL setup requires acquiring data and training the transfer layers, which takes about 15 minutes, this approach is meant for long-term usage of such an application. Inference takes about 0.2 seconds, which is little enough to perform the prediction in real-time.

The resulting NN was able to detect gestures with high accuracy. These results show that this approach is a promising one for use in HCI.

Download Paper Download Slides University Wiki Dataset