Gladia API using asyncio and aiohttpwith Python

Whisper
Transcription
asyncio
aiohttp
Author

Francisco Mussari

Published

March 19, 2024

Introduction

Discovering Gladia

Gladia unveils a great “Speech-to-Text” service, powered by their Whisper-Zero ASR technology. You can begin this exploration free with a generous 10 hours of audio transcriptions per month.

Usage and Documentation

While Gladia’s documentation and API reference offers Python examples for both pre-recorded and live (streaming) scenarios, a crucial element remains unaddressed: integration with Python’s asyncio.

Asyncio with Gladia

This post delves into harnessing asyncio with the Gladia API, enabling applications to execute multiple tasks in parallel.

We’ll navigate into through the transcription process, which involves several I/O bound tasks:

  1. Uploading audio files
  2. Initiating the transcription job
  3. Awaiting completion of the transcription

Configuring your account

You can go to the Getting Started section in the documentation to configure your account and get an API key.

Resources

Asyncio with Gladia: A Step-by-Step Guide

Import Libraries

import asyncio
import aiohttp
import requests
import os
import sys

# This two lines allow asyncio to be used in Jupyter Notebooks
import nest_asyncio
nest_asyncio.apply()

from pathlib import Path
from typing import List, Optional
from time import perf_counter

Get API Key

if "google.colab" in sys.modules:
    # If running in Colab
    from google.colab import userdata
    x_gladia_key = userdata.get('GLADIA_API_KEY')
else:
    from dotenv import load_dotenv
    load_dotenv("creds/.env")
    x_gladia_key = os.environ.get('GLADIA_API_KEY')

Context manager to measure time

# https://stackoverflow.com/a/69156219

class catchtime:
    def __enter__(self):
        self.start = perf_counter()
        return self

    def __exit__(self, type, value, traceback):
        self.seconds = round(perf_counter() - self.start, 2)
        m, s = divmod(self.seconds, 60)
        self.m, self.s = int(m), round(s, 1)
        self.readout = f'Time: {self.seconds:.3f} seconds'

Python Async Functions

async def process_response(
    response: aiohttp.client_reqrep.ClientResponse
) -> Optional[dict]:
    """Process aiohttp requests
    """
    if response.status not in (200, 201):
        print(f"- Request failed with status: {response.status}")
        json_response = await response.text()
        print(f"Json Response: {json_response}")
        print('- End of work');
        return None
    else:
        print("- Request successful")
        return await response.json()


async def async_make_request(
    session: aiohttp.client.ClientSession, 
    url: str, headers: dict, 
    method: str = "GET", 
    data: aiohttp.formdata.FormData = None, 
    json: dict = None
) -> Optional[dict]:
    """Send aiohttp requests
    """
    if method == "POST":
        async with session.post(
            url, headers=headers, data=data, json=json
        ) as response:
            return await process_response(response)
    else:
        async with session.get(url, headers=headers) as response:
            return await process_response(response)


async def a_upload_file(
    session: aiohttp.client.ClientSession, 
    file_path: Path
) -> dict:
    """Upload audio file to Gladia
    """

    with catchtime() as t:
        file_name = str(file_path.with_suffix(''))
        content_type = f"audio/{file_path.suffix[1:]}"

        with open(file_path, "rb") as f:
            data = aiohttp.FormData()
            data.add_field("audio", f, filename=file_name, content_type=content_type)

            print("- Uploading file to Gladia...")
            json_response = await async_make_request(
                session, "https://api.gladia.io/v2/upload/",
                headers=headers,  method="POST", data=data
            )

    print(f'Upload Time: {t.seconds} seconds for `{file_path.name}`')

    return json_response


async def a_create_transcription_job(
    session: aiohttp.client.ClientSession,
    audio_url: str,
    diarization: bool = False,
    enable_code_switching: bool = False,
    custom_metadata: Optional[dict] = None,
    **kwargs
) -> str:
    """Initiate the transcription job
    """

    json_data = {
        "audio_url": audio_url,
        "diarization": diarization,
        "enable_code_switching": enable_code_switching,
        "custom_metadata": custom_metadata
    }
    for key in kwargs.keys():
        data[key] = kwargs[key]

    print("- Sending transcription request to Gladia API...")

    with catchtime() as t:
        json_response = await async_make_request(
            session, "https://api.gladia.io/v2/transcription/",
            headers=headers,  method="POST", json=json_data
        )

    print(f'Create Transcription Job: {t.seconds} seconds for `{audio_url}`')

    return json_response


async def a_wait_until_job_done(
    session: aiohttp.client.ClientSession,
    transcription_job: dict
):
    """Wait until the transcription job is done
    """

    result_url = transcription_job.get("result_url")
    id = transcription_job["id"]

    while True:
        poll_response = await async_make_request(
            session, url=result_url, headers=headers
        )

        if poll_response.get("status") == "done":
            print(f"- Transcription done. - id: ...{id[-5:]}")
            break
        elif poll_response.get("status") == "error":
            print(f"- Transcription failed. id: ...{id[-5:]}")
            print(poll_response)
        else:
            print(f"Transcription status: {poll_response.get('status')} - id: ...{id[-5:]}")
    
        await asyncio.sleep(4)

    return poll_response

Headers and example files

headers = {
    "accept": "application/json",
    "x-gladia-key": x_gladia_key,
}

files_path = Path("./data")
files_to_upload = [f for f in files_path.iterdir()]
files_to_upload
[PosixPath('data/Introducción Master Class.webm'),
 PosixPath('data/Introducing_ Better Offline.mp3'),
 PosixPath('data/You need to classify documents before trying to extract data.webm')]

asyncio.gather vs asyncio.as_completed

As we saw, the process to transcribe audios has the following steps: - Upload audio files - Initiate the transcription job - Awaiting completion of the transcription

asyncio.gather

This function orchestrate tasks by leveraging asuncio.gather():

async def async_function_orchestrator(func: 'function', tasks_param: list):
    """Gather results from an async function and a list of parameters
    """
    async with aiohttp.ClientSession() as session:

        tasks = [
            func(session, p) for p in tasks_param
        ]

        results = await asyncio.gather(*tasks)
        return results

Uploading files asynchronously

The first step in the transcription journey involves uploading audio files to Gladia. With asyncio we can simultaneously upload multiple files. With asyncio.gather() we can initiate several upload tasks concurrently, allowing our script to move forward without having to wait for each file to finish uploading:

with catchtime() as t:
    upload_results = asyncio.run(
        async_function_orchestrator(func=a_upload_file, tasks_param=files_to_upload)
    )

print(f'Total Time: {t.seconds} seconds')
- Uploading file to Gladia...
- Uploading file to Gladia...
- Uploading file to Gladia...
- Request successful
Upload Time: 4.77 seconds for `Introducción Master Class.webm`
- Request successful
Upload Time: 6.14 seconds for `You need to classify documents before trying to extract data.webm`
- Request successful
Upload Time: 6.86 seconds for `Introducing_ Better Offline.mp3`
Total Time: 7.03 seconds
for file in upload_results:
    print(f"audio_url: {file['audio_url']}")
    print(f"filename: {file['audio_metadata']['filename']}")
    print(f"id: {file['audio_metadata']['id']}")
    print()
audio_url: https://api.gladia.io/file/5d7d3d23-2de3-4c78-93c8-9010c6d7b6a7
filename: data%2FIntroducci%C3%B3n%20Master%20Class
id: 5d7d3d23-2de3-4c78-93c8-9010c6d7b6a7

audio_url: https://api.gladia.io/file/dee3b9c4-90d4-4b15-8a94-fbd66f11d6e2
filename: data%2FIntroducing_%20Better%20Offline
id: dee3b9c4-90d4-4b15-8a94-fbd66f11d6e2

audio_url: https://api.gladia.io/file/1cf50050-a7c4-4a5b-b99c-f6face16e942
filename: data%2FYou%20need%20to%20classify%20documents%20before%20trying%20to%20extract%20data
id: 1cf50050-a7c4-4a5b-b99c-f6face16e942

Asynchronously requesting transcriptions

Once files are uploaded, the next step is to request transcriptions. Similar to the upload process, asyncio.gather() enables us to send out transcription requests for all uploaded files in parallel. This ensures that we’re efficiently moving through or workload without unnecessary delays between requests:

audio_urls = [result["audio_url"] for result in upload_results]

with catchtime() as t:
    transcription_job_results = asyncio.run(
        async_function_orchestrator(a_create_transcription_job, audio_urls)
    )

print(f'Total Time: {t.seconds} seconds')
- Sending transcription request to Gladia API...
- Sending transcription request to Gladia API...
- Sending transcription request to Gladia API...
- Request successful
Create Transcription Job: 1.2 seconds for `https://api.gladia.io/file/dee3b9c4-90d4-4b15-8a94-fbd66f11d6e2`
- Request successful
Create Transcription Job: 1.24 seconds for `https://api.gladia.io/file/1cf50050-a7c4-4a5b-b99c-f6face16e942`
- Request successful
Create Transcription Job: 1.25 seconds for `https://api.gladia.io/file/5d7d3d23-2de3-4c78-93c8-9010c6d7b6a7`
Total Time: 1.25 seconds
transcription_job_results
[{'id': '517ca2e0-7830-4803-a66c-4cb2cb259fd5',
  'result_url': 'https://api.gladia.io/v2/transcription/517ca2e0-7830-4803-a66c-4cb2cb259fd5'},
 {'id': '8a247329-c586-4685-914d-06e3e204f581',
  'result_url': 'https://api.gladia.io/v2/transcription/8a247329-c586-4685-914d-06e3e204f581'},
 {'id': '9c7d8255-a7c3-465e-a0f5-01569ba49f4c',
  'result_url': 'https://api.gladia.io/v2/transcription/9c7d8255-a7c3-465e-a0f5-01569ba49f4c'}]

Wait for the transcriptions to be ready

Same as the uploading and transcription request process, we wait for transcriptions in parallel:

with catchtime() as t:
    transcription_results = asyncio.run(
        async_function_orchestrator(a_wait_until_job_done, transcription_job_results)
    )

print(f'Total Time: {t.seconds} seconds')
- Request successful
Transcription status: queued - id: ...59fd5
- Request successful
Transcription status: queued - id: ...49f4c
- Request successful
Transcription status: queued - id: ...4f581
- Request successful
Transcription status: queued - id: ...59fd5
- Request successful
Transcription status: processing - id: ...49f4c
- Request successful
Transcription status: queued - id: ...4f581
- Request successful
Transcription status: processing - id: ...59fd5
- Request successful
Transcription status: processing - id: ...49f4c
- Request successful
Transcription status: processing - id: ...4f581
- Request successful
Transcription status: processing - id: ...59fd5
- Request successful
Transcription status: processing - id: ...4f581
- Request successful
Transcription status: processing - id: ...49f4c
- Request successful
Transcription status: processing - id: ...59fd5
- Request successful
Transcription status: processing - id: ...4f581
- Request successful
- Transcription done. - id: ...49f4c
- Request successful
- Transcription done. - id: ...59fd5
- Request successful
Transcription status: processing - id: ...4f581
- Request successful
- Transcription done. - id: ...4f581
Total Time: 28.89 seconds
for transcription in transcription_results:
    print(transcription["id"])
    print(transcription["file"]["filename"])
    print(transcription["result"]["transcription"]["languages"])
    print(transcription["result"]["transcription"]["full_transcript"][:250])
    print("...")
    print(transcription["result"]["transcription"]["full_transcript"][-250:])
    print()
517ca2e0-7830-4803-a66c-4cb2cb259fd5
data%2FIntroducci%C3%B3n%20Master%20Class
['es']
Música ¿Necesitas tutorías en tus tareas escolares? ¿Asesorías en proyectos académicos y empresariales? Aquí está la solución. Ingresa desde tu PC a www.masterclass.com.ec o descarga la aplicación desde tu móvil masterclass-ec. Después, selecciona la
...
 tutorías recibidas, recibe una gratis. Recuerda que nuestra plataforma es inclusiva. Si necesitas que la tutoría vaya acompañada de un intérprete de lengua de señas ecuatoriana, escoge la opción Intérprete. Masterclass. El conocimiento a tu alcance.

8a247329-c586-4685-914d-06e3e204f581
data%2FIntroducing_%20Better%20Offline
['en']
Hi, I'm Ed Zitron, host of the Better Offline podcast on the Cool Zone Media Network. I've been both a tech writer and a tech executive for the last 15 years, and I've seen this industry grow from a bunch of dorks building things in their garage into
...
no bullshit, just a crystal clear window into a world that quietly finds new and innovative ways to make billionaires rich. Listen to Better Offline on the iHeartRadio app, Apple Podcasts, or wherever else you get your podcasts. Thanks for listening.

9c7d8255-a7c3-465e-a0f5-01569ba49f4c
data%2FYou%20need%20to%20classify%20documents%20before%20trying%20to%20extract%20data
['en']
Today I've been talking to a bunch of people on doing document extraction. And in particular, I think a lot of people who are coming into this world with that much machine learning experience kind of think that AGI is here and they think that Jupyter
...
y valuable. You might have to be in a world where you pay humans to do this relabeling. Because we have before, if you're wrong in your pre-work, it's very easy to not lose all that effort. And you can just rebuild a lot of these indices very easily.

asyncio.as_completed

Finally, instead of waiting for each step to finish, we can adopt a different strategy by processing the files as they are uploaded. Then, using asyncio.as_completed() allow us to process the end result as each transcription process ends.

async def a_upload_and_process(
    session: aiohttp.client.ClientSession, 
    file_path: Path
) -> dict:
    """Upload and process the file
    """
    # Upload the file
    uploaded = await a_upload_file(session, file_path)
    
    audio_url = uploaded["audio_url"]
    
    # Start the transcription
    transcription_job_result = await a_create_transcription_job(session, audio_url)
    
    # Wait for the transcription to complete
    transcription_result = await a_wait_until_job_done(session, transcription_job_result)
    
    return transcription_result


async def async_tasks_orchestrator(files_to_upload: List[Path]) -> None:
    """Process transcriptions as they complete
    """
    
    async with aiohttp.ClientSession() as session:
        
        transcription_tasks = [
            a_upload_and_process(session, file) for file in files_to_upload
        ]
        
        for transcription_task in asyncio.as_completed(transcription_tasks):
            
            transcription = await transcription_task
            process_transcription(transcription)
            #yield transcription


def process_transcription(transcription: dict) -> None:
    print(f"<<<<<Transcription with id: {transcription['id']} Done>>>>>")
    print(transcription["file"]["filename"])
    print(transcription["result"]["transcription"]["languages"])
    print(transcription["result"]["transcription"]["full_transcript"][:250])
    print("...")
    print(transcription["result"]["transcription"]["full_transcript"][-250:])
    print("<<<<</Transcription Done>>>>>")
with catchtime() as t:
    transcription_job_results = asyncio.run(
        async_tasks_orchestrator(files_to_upload)
    )

print(f'Total Time: {t.seconds} seconds')
- Uploading file to Gladia...
- Uploading file to Gladia...
- Uploading file to Gladia...
- Request successful
Upload Time: 3.56 seconds for `Introducción Master Class.webm`
- Sending transcription request to Gladia API...
- Request successful
Create Transcription Job: 0.35 seconds for `https://api.gladia.io/file/799bac2b-8217-49a1-a67c-c53966fb9b60`
- Request successful
Transcription status: queued - id: ...1fba0
- Request successful
Upload Time: 4.28 seconds for `Introducing_ Better Offline.mp3`
- Sending transcription request to Gladia API...
- Request successful
Upload Time: 4.4 seconds for `You need to classify documents before trying to extract data.webm`
- Sending transcription request to Gladia API...
- Request successful
Create Transcription Job: 0.47 seconds for `https://api.gladia.io/file/dd94ecad-ec21-40e7-97fd-ecd063afd686`
- Request successful
Create Transcription Job: 0.43 seconds for `https://api.gladia.io/file/47ae9c16-665c-46f8-8379-0ae38f81eb01`
- Request successful
Transcription status: queued - id: ...c17fb
- Request successful
Transcription status: queued - id: ...89830
- Request successful
Transcription status: processing - id: ...1fba0
- Request successful
Transcription status: queued - id: ...c17fb
- Request successful
Transcription status: queued - id: ...89830
- Request successful
Transcription status: processing - id: ...1fba0
- Request successful
Transcription status: queued - id: ...c17fb
- Request successful
Transcription status: queued - id: ...89830
- Request successful
- Transcription done. - id: ...1fba0
<<<<<Transcription with id: 145b8844-c360-406f-8bb9-2de82661fba0 Done>>>>>
data%2FIntroducci%C3%B3n%20Master%20Class
['es']
Música ¿Necesitas tutorías en tus tareas escolares? ¿Asesorías en proyectos académicos y empresariales? Aquí está la solución. Ingresa desde tu PC a www.masterclass.com.ec o descarga la aplicación desde tu móvil masterclass-ec. Después, selecciona la
...
 tutorías recibidas, recibe una gratis. Recuerda que nuestra plataforma es inclusiva. Si necesitas que la tutoría vaya acompañada de un intérprete de lengua de señas ecuatoriana, escoge la opción Intérprete. Masterclass. El conocimiento a tu alcance.
<<<<</Transcription Done>>>>>
- Request successful
Transcription status: queued - id: ...c17fb
- Request successful
Transcription status: processing - id: ...89830
- Request successful
Transcription status: queued - id: ...89830
- Request successful
Transcription status: processing - id: ...c17fb
- Request successful
Transcription status: processing - id: ...89830
- Request successful
Transcription status: processing - id: ...c17fb
- Request successful
Transcription status: processing - id: ...89830
- Request successful
Transcription status: processing - id: ...c17fb
- Request successful
Transcription status: processing - id: ...89830
- Request successful
Transcription status: processing - id: ...c17fb
- Request successful
Transcription status: processing - id: ...89830
- Request successful
- Transcription done. - id: ...c17fb
<<<<<Transcription with id: 8f880055-b56f-4297-8390-5d7668ec17fb Done>>>>>
data%2FIntroducing_%20Better%20Offline
['en']
Hi, I'm Ed Zitron, host of the Better Offline podcast on the Cool Zone Media Network. I've been both a tech writer and a tech executive for the last 15 years, and I've seen this industry grow from a bunch of dorks building things in their garage into
...
no bullshit, just a crystal clear window into a world that quietly finds new and innovative ways to make billionaires rich. Listen to Better Offline on the iHeartRadio app, Apple Podcasts, or wherever else you get your podcasts. Thanks for listening.
<<<<</Transcription Done>>>>>
- Request successful
- Transcription done. - id: ...89830
<<<<<Transcription with id: 9b13aff2-aa31-4d90-8e9d-dfab9ff89830 Done>>>>>
data%2FYou%20need%20to%20classify%20documents%20before%20trying%20to%20extract%20data
['en']
Today I've been talking to a bunch of people on doing document extraction. And in particular, I think a lot of people who are coming into this world with that much machine learning experience kind of think that AGI is here and they think that Jupyter
...
y valuable. You might have to be in a world where you pay humans to do this relabeling. Because we have before, if you're wrong in your pre-work, it's very easy to not lose all that effort. And you can just rebuild a lot of these indices very easily.
<<<<</Transcription Done>>>>>
Total Time: 44.65 seconds
Note

Processing times will depend on Gladia response time. In this example we cannot directly compare asyncio.gather() with asyncio.as_completed() without taking into account the time it takes to Gladia to complete each transcription.

Conclusion

Integrating Gladia transcription service with Python’s asyncio presents a powerful approach to managing audio data processing tasks efficiently. By utilizing asyncio.gather() for parallel uploads, requests and waits; or using asyncio.as_completed() and inmediate processing of each uploaded file, we significantly enhance the speed and responsiveness of the process.