
Google has officially incorporated speech generation into the API of its Gemini AI platform, company press service reports. Developers can now transform model text replies into lifelike audio, utilizing a unified service without needing external libraries.
The novel feature supports numerous languages and accents. Voice attributes—style, pitch, pace, and expressiveness—can be tailored for specific functions. For instance, an energetic sound could be chosen for a navigation assistant, while a calmer voice suits an educational application.
“Audio generation work is handled through standard REST calls to the Gemini API. The developer sends the text and parameters of the desired voice, and the service returns the finished audio file,” states Google’s technical documentation.
This streamlines integration, eliminating the need for manual adjustment of audio engines. Google notes that the synthesis aims for naturalness, but errors in pronunciation may occur when dealing with highly specialized vocabulary. For such instances, the API provides tools for phonetic adjustment.
Previously, “Zhukovsky.Life” reported that Google denied intentions to introduce advertising into the Gemini chatbot. Google’s Vice President of Global Advertising, Daniel Taylor, called media reports inaccurate, based on uninformed sources, and asserted that current plans for ad placement in the application do not exist.