Integrating Google Cloud Text-to-Speech in Jetpack Compose

Chapter 1: Introduction to Google Cloud TTS

Grab your coffee ☕️ and dive into implementing Google Cloud Text-to-Speech (TTS) in Jetpack Compose. This guide will walk you through the setup and coding process.

Set Up Your Environment

Before we start coding, we need to set up a Google Cloud project and activate the Google Text-to-Speech API.

Create a Google Cloud project.
Enable the Google Text-to-Speech API.
Click on "Create Credentials" and fill in the required fields.
This action will lead you to create a service account necessary for utilizing TTS in your application. Complete the fields as per your requirements.

Next, navigate to the "Credentials" section and select the service account you just created. Click on the "Keys" tab and generate a key for the service account. Choose JSON as the key type and click "Create." We will use this JSON file in our application.

Open your Android Studio project and create a new Android Resource Directory within the res package, changing the resource type to "raw." Drag your key file into this directory.

Dependencies Required

Next, you’ll need to add the necessary dependencies to your project. Open app/build.gradle.kts and update the dependencies block as follows:

dependencies {

implementation("com.google.cloud:google-cloud-texttospeech:2.19.0")

implementation("com.google.auth:google-auth-library-oauth2-http:1.16.0")

implementation("io.grpc:grpc-okhttp:1.55.1")

}

Additionally, you must exclude certain resources to ensure the app runs smoothly:

android {

// ...

packaging {

resources {

excludes += "/META-INF/{AL2.0,LGPL2.1}"

excludes += "META-INF/INDEX.LIST"

excludes += "META-INF/DEPENDENCIES"

}

}

}

Creating the TextToSpeech Class

Now, let’s create a TextToSpeech class. Its constructor will take a Context parameter to access the raw folder containing the key.

class TextToSpeech(private val context: Context) {}

Next, we will create the synthesize function, which accepts the text that needs to be converted into speech.

suspend fun synthesize(text: String): ByteArray? = withContext(Dispatchers.IO) {

try {

// Code for synthesis

} catch (e: ApiException) {

println("API error: ${e.message}")

throw e

} catch (e: StatusRuntimeException) {

println("Error: ${e.message}")

throw e

}

}

To start, we need to establish the credentials.

try {

val stream: InputStream = context.resources.openRawResource(R.raw.credentials)

val credentials: GoogleCredentials = GoogleCredentials.fromStream(stream)

// ...

}

Next, let's configure the settings and create a client.

try {

val settingBuilder: TextToSpeechSettings.Builder = TextToSpeechSettings.newBuilder()

val sessionsSettings = settingBuilder

.setCredentialsProvider(FixedCredentialsProvider.create(credentials))

.build()

val client = TextToSpeechClient.create(sessionsSettings)

// ...

}

Configuring Voice and Audio Encoding

You can specify the desired voice and audio encoding type as follows:

try {

val voiceBuilder = VoiceSelectionParams.newBuilder()

.setName("en-US-Studio-M")

.setLanguageCode("en-US")

val audioConfig = AudioConfig.newBuilder()

.setAudioEncoding(AudioEncoding.MP3).build()

// ...

}

To handle large text inputs, we’ll create a method called splitText to divide the text into manageable segments.

private val maxTextLength = 500 // Max length for each synthesis input

private fun splitText(text: String): List<String> {

val inputTexts = mutableListOf<String>()

var startIndex = 0

var endIndex = maxTextLength

while (startIndex < text.length) {

if (endIndex >= text.length) {

endIndex = text.length

} else {

while (endIndex > startIndex && !text[endIndex].isWhitespace()) {

endIndex--

}

}

val inputText = text.substring(startIndex, endIndex)

inputTexts.add(inputText.trim())

startIndex = endIndex + 1

endIndex = startIndex + maxTextLength

}

return inputTexts

}

Continuing with the Synthesize Function

Let’s now integrate the splitText function into the synthesize method to process the text segments.

try {

val inputTexts = splitText(text)

val audioResults = mutableListOf<ByteArray>()

for (inputText in inputTexts) {

val input: SynthesisInput = SynthesisInput.newBuilder()

.setText(inputText)

.build()

val response = client.synthesizeSpeech(input, voiceBuilder.build(), audioConfig)

audioResults.add(response.audioContent.toByteArray())

}

// ...

}

Finally, we’ll compile the audio pieces into a single byte array to return.

try {

val byteArrayOutputStream = ByteArrayOutputStream()

for (audioResult in audioResults) {

byteArrayOutputStream.write(audioResult)

}

return@withContext byteArrayOutputStream.toByteArray()

}

Creating the User Interface

Now, let’s create a composable function to set up the user interface. This function will also manage the playback of the audio.

@Composable

fun MyScreen() {

val context = LocalContext.current

var mediaPlayer by remember {

mutableStateOf(null)

}

val coroutineScope = rememberCoroutineScope()

Row(

modifier = Modifier.fillMaxSize(),

verticalAlignment = Alignment.CenterVertically,

horizontalArrangement = Arrangement.Center

) {

Text(text = "Hey, my name is Daniel!")

IconButton(

onClick = {

coroutineScope.launch {

val audioTask = async {

TextToSpeech(context = context)

.synthesize("Hey, my name is Daniel!")

}

val audio = audioTask.await()

// Code to save audio and play

}

}

) {

Icon(

imageVector = Icons.Rounded.VolumeUp,

contentDescription = "Volume icon"

)

}

}

}

In the onClick function, we launch a coroutine to synthesize the text.

Saving and Playing the Audio

Once we have the audio, we will write it to a file that the MediaPlayer can use.

coroutineScope.launch {

val outputFile = File(context.getExternalFilesDir(null), "output.mp3")

val outputStream = FileOutputStream(outputFile)

outputStream.write(audio)

outputStream.close()

mediaPlayer = MediaPlayer.create(

context,

Uri.fromFile(outputFile)

)

mediaPlayer?.start()

}

Conclusion

I hope this guide has aided your development journey. Stay connected for more updates and subscribe to my newsletter for the latest content. Thank you for reading!

For more insights on Android Development, check out my YouTube channel where I share valuable tutorials. If you find my work helpful, consider buying me a coffee! ☕️

Chapter 2: Useful Resources

This video provides a comprehensive guide on using Speech to Text with Jetpack Compose.

In this video, learn how to build a Text To Speech app using Jetpack Compose.

dayonehk.com

Integrating Google Cloud Text-to-Speech in Jetpack Compose

Chapter 1: Introduction to Google Cloud TTS

Set Up Your Environment

Dependencies Required

Creating the TextToSpeech Class

Configuring Voice and Audio Encoding

Continuing with the Synthesize Function

Creating the User Interface

Saving and Playing the Audio

Conclusion

Chapter 2: Useful Resources

Share the page:

Recent Post:

Navigating Xcode's Shift Away from Rosetta: What You Need to Know

Unlocking Happiness: Four Essential Hormones for Well-Being

Making the Most of Your Final Days at Work: Strategies for Transition