A Python-based application that records audio and converts it into text using OpenAI's Whisper model. The app is designed for offline use, requiring internet only during the initial setup to download the model. It supports automatic splitting of long recordings and features an interactive web interface built with Gradio. Perfect for converting voice memos or audio notes into text with ease.
app.py | ||
README.md | ||
requirements.txt |
🎙️ Voice-to-Text App with Whisper
This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 Python
, 🎵 sounddevice
, and 🤗 Gradio
, this app is designed for local use, requiring internet access only during the initial setup.
🌟 Features
- 🎤 Record Audio: Click a button to start and stop recording your voice.
- ✂️ Automatic Splitting: Handles long audio files by splitting them into smaller chunks for transcription.
- 📝 Speech-to-Text: Transcribes your voice into text using the Whisper model.
- 🔒 Offline Capability: After setup, the app works entirely offline.
🚀 Getting Started
Prerequisites
-
Python 3.8+
-
Install the required Python libraries:
pip install torch transformers sounddevice pydub gradio pip install --upgrade transformers datasets[audio] accelerate
-
FFmpeg (for audio processing):
- Download and install FFmpeg from FFmpeg Official Site or via chocolately.
-
Possible Bugs: If there are any problems with the GPU use:
pip uninstall torch torchvision torchaudio
Search for the right torch version for you GPU and intall torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
📦 Installation
-
Clone this repository:
git clone https://github.com/your-username/voice-to-text-app.git cd voice-to-text-app
-
Run the app:
python app.py
-
Open the provided link in your browser to access the web app.
🛠️ How It Works
- Recording: Click the Start Recording button to record your voice. Click Stop Recording when you're done.
- Transcription: Click the Transcribe button to convert your audio to text.
- Automatic Handling: The app automatically splits audio longer than 30 seconds and transcribes it in chunks.
🌐 Internet Usage
- Setup: The app requires an internet connection only during the first run to download the Whisper model and dependencies.
- Offline Mode: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.)
🎉 Example Use Case
- Record a 20-second audio note: "Take out the trash at 6 PM."
- Stop recording.
- Transcribe the audio to see:
"Take out the trash at 6 PM."
🧰 Built With
- 🤗 Transformers: Whisper model for speech-to-text.
- 🎵 SoundDevice: Audio recording.
- ✂️ Pydub: Audio splitting for long files.
- 🌐 Gradio: Interactive web interface.
🤝 Contributing
Contributions are welcome! Feel free to submit issues or pull requests to improve this project.
🛡️ License
This project is licensed under the MIT License.
🗂️ File Structure
.
├── app.py # Main application script
├── requirements.txt # List of dependencies
└── README.md # This file