Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23160

How to save a stream object in Azure text to speech without speaking the text using Python

$
0
0

I want to convert a book to audio, and save the file, so naturally I don't want my computer to be speaking the book out loud while the conversion happens, but looking at the Azure documentation, I frankly don't see a way to get a stream object without speaking the text first. I've already got the code set up so that I can save the file, but I can't save the file unless I play that audio first. I want to convert some text to a stream object without having to listen to my computer utter the text. I realize a very inelegant solution is to simply mute my computer, but still, suppose the conversion takes an hour and I need to take a phone call on it.

import azure.cognitiveservices.speech as speechsdkspeech_config = speechsdk.SpeechConfig(subscription=subscription_key,                                       region=service_region)audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)speech_config.speech_synthesis_voice_name = 'ar-EG-SalmaNeural'speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)

In the following line, I don't want to do this step because this utters the audio:

result = speech_synthesizer.speak_text_async("I'm excited to try text to speech").get()

But I have to do that step in order to get the following steps:

stream = AudioDataStream(result)stream.save_to_wav_file(path)

here

I've tried looking at all the methods listed in the speech_synthesizer object but all of them involve speaking the text, they are listed here:

class SpeechSynthesizer(builtins.object) |  SpeechSynthesizer(speech_config: azure.cognitiveservices.speech.SpeechConfig, audio_config: Optional[azure.cognitiveservices.speech.audio.AudioOutputConfig] = <azure.cognitiveservices.speech.audio.AudioOutputConfig object at 0x137ffc790>, auto_detect_source_language_config: azure.cognitiveservices.speech.languageconfig.AutoDetectSourceLanguageConfig = None) |   |  A speech synthesizer. |   |  :param speech_config: The config for the speech synthesizer |  :param audio_config: The config for the audio output. |      This parameter is optional. |      If it is not provided, the default speaker device will be used for audio output. |      If it is None, the output audio will be dropped. |      None can be used for scenarios like performance test. |  :param auto_detect_source_language_config: The auto detection source language config |   |  Methods defined here: |   |  __del__(self) |   |  __init__(self, speech_config: azure.cognitiveservices.speech.SpeechConfig, audio_config: Optional[azure.cognitiveservices.speech.audio.AudioOutputConfig] = <azure.cognitiveservices.speech.audio.AudioOutputConfig object at 0x137ffc790>, auto_detect_source_language_config: azure.cognitiveservices.speech.languageconfig.AutoDetectSourceLanguageConfig = None) |      Initialize self.  See help(type(self)) for accurate signature. |   |  get_voices_async(self, locale: str = '') -> azure.cognitiveservices.speech.ResultFuture |      Get the available voices, asynchronously. |       |      :param locale: Specify the locale of voices, in BCP-47 format; or leave it empty to get all available voices. |      :returns: A task representing the asynchronous operation that gets the voices. |   |  speak_ssml(self, ssml: str) -> azure.cognitiveservices.speech.SpeechSynthesisResult |      Performs synthesis on ssml in a blocking (synchronous) mode. |       |      :returns: A SpeechSynthesisResult. |   |  speak_ssml_async(self, ssml: str) -> azure.cognitiveservices.speech.ResultFuture |      Performs synthesis on ssml in a non-blocking (asynchronous) mode. |       |      :returns: A future with SpeechSynthesisResult. |   |  speak_text(self, text: str) -> azure.cognitiveservices.speech.SpeechSynthesisResult |      Performs synthesis on plain text in a blocking (synchronous) mode. |       |      :returns: A SpeechSynthesisResult. |   |  speak_text_async(self, text: str) -> azure.cognitiveservices.speech.ResultFuture |      Performs synthesis on plain text in a non-blocking (asynchronous) mode. |       |      :returns: A future with SpeechSynthesisResult. |   |  start_speaking_ssml(self, ssml: str) -> azure.cognitiveservices.speech.SpeechSynthesisResult |      Starts synthesis on ssml in a blocking (synchronous) mode. |       |      :returns: A SpeechSynthesisResult. |   |  start_speaking_ssml_async(self, ssml: str) -> azure.cognitiveservices.speech.ResultFuture |      Starts synthesis on ssml in a non-blocking (asynchronous) mode. |       |      :returns: A future with SpeechSynthesisResult. |   |  start_speaking_text(self, text: str) -> azure.cognitiveservices.speech.SpeechSynthesisResult |      Starts synthesis on plain text in a blocking (synchronous) mode. |       |      :returns: A SpeechSynthesisResult. |   |  start_speaking_text_async(self, text: str) -> azure.cognitiveservices.speech.ResultFuture |      Starts synthesis on plain text in a non-blocking (asynchronous) mode. |       |      :returns: A future with SpeechSynthesisResult. |   |  stop_speaking(self) -> None |      Synchronously terminates ongoing synthesis operation. |      This method will stop playback and clear unread data in PullAudioOutputStream. |   |  stop_speaking_async(self) -> azure.cognitiveservices.speech.ResultFuture |      Asynchronously terminates ongoing synthesis operation. |      This method will stop playback and clear unread data in PullAudioOutputStream. |       |      :returns: A future that is fulfilled once synthesis has been stopped. |  
UPDATE

Someone recommended using the synthesize_speech_to_stream_async method but his code resulted in errors and I haven't heard back from him, but I think he might be on to something.

His code was

speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=service_region)speech_config.speech_synthesis_voice_name = 'ar-EG-SalmaNeural'stream = speechsdk.AudioDataStream(format=speechsdk.AudioStreamFormat(pcm_data_format=speechsdk.PcmDataFormat.Pcm16Bit, sample_rate_hertz=16000, channel_count=1))result = speechsdk.SpeechSynthesizer(speech_config=speech_config).synthesize_speech_to_stream_async("I'm excited to try text to speech", stream).get()stream.save_to_wav_file(path)

This generated an error:

stream = speechsdk.AudioDataStream(            format=speechsdk.AudioStreamFormat(                pcm_data_format=speechsdk.PcmDataFormat.Pcm16Bit,                sample_rate_hertz=16000, channel_count=1))

which recommended:

    stream = speechsdk.AudioDataStream(        format=speechsdk.AudioStreamWaveFormat(            pcm_data_format=speechsdk.PcmDataFormat.Pcm16Bit,            sample_rate_hertz=16000, channel_count=1))

But that generated:

AttributeError: module 'azure.cognitiveservices.speech' has no attribute 'PcmDataFormat'

Viewing all articles
Browse latest Browse all 23160

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>