chris/sprachText

Fork 0

A Python-based application that records audio and converts it into text using OpenAI's Whisper model. The app is designed for offline use, requiring internet only during the initial setup to download the model. It supports automatic splitting of long recordings and features an interactive web interface built with Gradio. Perfect for converting voice memos or audio notes into text with ease.

Go to file

Christian Rute dba250b798 add support for dynamic speech length		2025-01-16 17:00:39 +01:00
app.py	add support for dynamic speech length	2025-01-16 17:00:39 +01:00
LICENSE	add LICENSE	2025-01-06 18:45:10 +01:00
README.md	add README.md	2025-01-06 18:41:40 +01:00
requirements.txt	add requirements.txt	2025-01-06 18:36:01 +01:00

README.md

🎙️ Voice-to-Text App with Whisper

This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 Python, 🎵 sounddevice, and 🤗 Gradio, this app is designed for local use, requiring internet access only during the initial setup.

🌟 Features

🎤 Record Audio: Click a button to start and stop recording your voice.
✂️ Automatic Splitting: Handles long audio files by splitting them into smaller chunks for transcription.
📝 Speech-to-Text: Transcribes your voice into text using the Whisper model.
🔒 Offline Capability: After setup, the app works entirely offline.

🚀 Getting Started

Prerequisites

Python 3.8+

Install the required Python libraries:

pip install torch transformers sounddevice pydub gradio
pip install --upgrade transformers datasets[audio] accelerate

FFmpeg (for audio processing):
- Download and install FFmpeg from FFmpeg Official Site or via chocolately.

Possible Bugs: If there are any problems with the GPU use:

pip uninstall torch torchvision torchaudio

Search for the right torch version for you GPU and intall torch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

📦 Installation

Clone this repository:

git clone https://github.com/your-username/voice-to-text-app.git
cd voice-to-text-app

Run the app:
```
python app.py
```
Open the provided link in your browser to access the web app.

🛠️ How It Works

Recording: Click the Start Recording button to record your voice. Click Stop Recording when you're done.
Transcription: Click the Transcribe button to convert your audio to text.
Automatic Handling: The app automatically splits audio longer than 30 seconds and transcribes it in chunks.

🌐 Internet Usage

Setup: The app requires an internet connection only during the first run to download the Whisper model and dependencies.
Offline Mode: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.)

🎉 Example Use Case

Record a 20-second audio note: "Take out the trash at 6 PM."
Stop recording.
Transcribe the audio to see: "Take out the trash at 6 PM."

🧰 Built With

🤗 Transformers: Whisper model for speech-to-text.
🎵 SoundDevice: Audio recording.
✂️ Pydub: Audio splitting for long files.
🌐 Gradio: Interactive web interface.

🤝 Contributing

Contributions are welcome! Feel free to submit issues or pull requests to improve this project.

🛡️ License

This project is licensed under the MIT License.

🗂️ File Structure

.
├── app.py             # Main application script
├── requirements.txt   # List of dependencies
└── README.md          # This file