
A scientific team comprising individuals from the National Research Nuclear University “MEPhI,” the Kurchatov Institute National Research Centre, and Voronezh State University has devised a technique capable of teaching computers to discern a person’s gender based on their written text with up to 80 percent accuracy. This research was supported by a grant from the Russian Science Foundation, as reported by RIA Novosti.
Numerous recent scientific studies confirm that written language invariably reveals characteristics of its author, such as gender, psychological traits, and educational background. Speech remains a valuable psychodiagnostic instrument employed by HR departments in major corporations and by security service specialists.
By analyzing speech patterns, it is possible to detect the presence of various ailments in an individual (like dementia or depression), as well as suicidal tendencies. The necessity for identifying text authors is escalating with the growth of internet communications, as companies need to understand precisely which demographic groups are interested in their products and services.
Scientists working in this field—including linguists, psychologists, and information technology specialists—construct mathematical models to uncover diverse personality parameters by leveraging numerous numerical values derived from text features.
The team of experts evaluated the performance of various machine learning algorithms utilizing neural networks for text analysis.
In their study, they compared the accuracy of gender identification across texts using two distinct modeling approaches: one based on traditional machine learning algorithms, and another relying on deep learning neural networks. The authors clarify that initially, they merely verified the gender of authors who were not attempting to conceal their gender identity.
Subsequently, the task was made more complex. In texts originally posted on a dating website, the neural network effortlessly detected deception with 100% accuracy, even when the author deliberately used a name associated with the opposite gender in their profile signature.
The research demonstrated that methodologies rooted in the use of convolutional neural networks and deep learning techniques are the most effective for determining author gender. The research collective is now moving on to tackle a new challenge: age recognition.