3.3 KiB
3.3 KiB
🎙️ Voice-to-Text App with Whisper
This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 Python
, 🎵 sounddevice
, and 🤗 Gradio
, this app is designed for local use, requiring internet access only during the initial setup.
🌟 Features
- 🎤 Record Audio: Click a button to start and stop recording your voice.
- ✂️ Automatic Splitting: Handles long audio files by splitting them into smaller chunks for transcription.
- 📝 Speech-to-Text: Transcribes your voice into text using the Whisper model.
- 🔒 Offline Capability: After setup, the app works entirely offline.
🚀 Getting Started
Prerequisites
-
Python 3.8+
-
Install the required Python libraries:
pip install torch transformers sounddevice pydub gradio pip install --upgrade transformers datasets[audio] accelerate
-
FFmpeg (for audio processing):
- Download and install FFmpeg from FFmpeg Official Site or via chocolately.
-
Possible Bugs: If there are any problems with the GPU use:
pip uninstall torch torchvision torchaudio
Search for the right torch version for you GPU and intall torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
📦 Installation
-
Clone this repository:
git clone https://github.com/your-username/voice-to-text-app.git cd voice-to-text-app
-
Run the app:
python app.py
-
Open the provided link in your browser to access the web app.
🛠️ How It Works
- Recording: Click the Start Recording button to record your voice. Click Stop Recording when you're done.
- Transcription: Click the Transcribe button to convert your audio to text.
- Automatic Handling: The app automatically splits audio longer than 30 seconds and transcribes it in chunks.
🌐 Internet Usage
- Setup: The app requires an internet connection only during the first run to download the Whisper model and dependencies.
- Offline Mode: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.)
🎉 Example Use Case
- Record a 20-second audio note: "Take out the trash at 6 PM."
- Stop recording.
- Transcribe the audio to see:
"Take out the trash at 6 PM."
🧰 Built With
- 🤗 Transformers: Whisper model for speech-to-text.
- 🎵 SoundDevice: Audio recording.
- ✂️ Pydub: Audio splitting for long files.
- 🌐 Gradio: Interactive web interface.
🤝 Contributing
Contributions are welcome! Feel free to submit issues or pull requests to improve this project.
🛡️ License
This project is licensed under the MIT License.
🗂️ File Structure
.
├── app.py # Main application script
├── requirements.txt # List of dependencies
└── README.md # This file