Speech-to-Text for Video and Audio Assets

Categories

Unlock Your Audio and Video Content with AI Transcription

Available starting from version 10.5

Tired of manually transcribing interviews, meetings, or video content? Our new AI Transcription feature, powered by cutting-edge technology, turns speech into searchable text and insightful summaries automatically.

How It Works: Powered by OpenAI Whisper

At the heart of this feature is OpenAI Whisper, a state-of-the-art speech recognition service. To get started, you just need to do one simple thing:

Configure Your API Key: Before your first use, you need to enter your personal OpenAI API key in the application’s settings. This allows the software to communicate securely with the Whisper service.
- How to set it up: A detailed configuration guide can be found here.
Understanding the Cost: The transcription service is offered by OpenAI on a pay-as-you-go basis. The cost depends on the length of your audio files.
- Pricing details: You can familiarize yourself with the current rates here.

How to Transcribe Your Files

Sending files for transcription is a straightforward process.

Supported Formats: You can send both audio (e.g., MP3, WAV) and video files (e.g., MP4, MOV) for transcription.
A Note on Video Files: For the best experience, we recommend enabling web proxy generation for videos.
- Without proxies: The system will still process the original video file and create a transcription. However, you will not be able to use text-based positioning (clicking on text to jump to a point in the video) or view the video in the web client.
- With proxies: For full functionality—including text search and seamless video playback in your browser—please ensure video web support is enabled. Learn more about proxies here.
Initiating the Process: The process is identical in both the web client and the desktop application. Simply select the files and choose the “Auto-tag with AI” option, just like you would when processing images.
- Important Note: The tag options you select do not affect the transcription result. The AI will always perform two tasks: a full speech-to-text conversion and the generation of a short summary.

What to Expect: Processing Time & Requirements

Processing Speed: As a benchmark, transcribing a 40-minute file typically takes about 5 minutes to complete.
System Warning: Please be aware that before the file is sent to OpenAI, it undergoes a preparation phase that runs on your computer’s CPU. This means a powerful processor will help speed up this initial step.
- Check your specs: Review the system requirements here to ensure optimal performance.

Exploring the Results

Once processing is complete, your transcribed text is ready to explore.

In the Web Client:
You get the full, powerful experience:

Quick Search: Instantly search for any word or phrase within the transcript.
Interactive Transcript: View the full text, broken into easy-to-read segments.
Click-to-Navigate: Click on any sentence in the transcript to jump directly to that moment in the video or audio file.
AI Summary: A concise summary of the content is automatically saved to the AI Description tag.

In the Desktop Application:
You have access to the key information:

Quick Search: Search the transcript text.
AI Summary: View the generated summary in the AI Description tag.

ChatGPT: Custom prompts

Full Text Indexing Settings