🎙️ Voice-to-Text App with Whisper

This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 Python, 🎵 sounddevice, and 🤗 Gradio, this app is designed for local use, requiring internet access only during the initial setup.

🌟 Features

🎤 Record Audio: Click a button to start and stop recording your voice.
✂️ Automatic Splitting: Handles long audio files by splitting them into smaller chunks for transcription.
📝 Speech-to-Text: Transcribes your voice into text using the Whisper model.
🔒 Offline Capability: After setup, the app works entirely offline.

🚀 Getting Started

Prerequisites

Python 3.8+

Install the required Python libraries:

pip install torch transformers sounddevice pydub gradio
pip install --upgrade transformers datasets[audio] accelerate

FFmpeg (for audio processing):
- Download and install FFmpeg from FFmpeg Official Site or via chocolately.

Possible Bugs: If there are any problems with the GPU use:

pip uninstall torch torchvision torchaudio

Search for the right torch version for you GPU and intall torch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

📦 Installation

Clone this repository:

git clone https://github.com/your-username/voice-to-text-app.git
cd voice-to-text-app

Run the app:
```
python app.py
```
Open the provided link in your browser to access the web app.

🛠️ How It Works

Recording: Click the Start Recording button to record your voice. Click Stop Recording when you're done.
Transcription: Click the Transcribe button to convert your audio to text.
Automatic Handling: The app automatically splits audio longer than 30 seconds and transcribes it in chunks.

🌐 Internet Usage

Setup: The app requires an internet connection only during the first run to download the Whisper model and dependencies.
Offline Mode: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.)

🎉 Example Use Case

Record a 20-second audio note: "Take out the trash at 6 PM."
Stop recording.
Transcribe the audio to see: "Take out the trash at 6 PM."

🧰 Built With

🤗 Transformers: Whisper model for speech-to-text.
🎵 SoundDevice: Audio recording.
✂️ Pydub: Audio splitting for long files.
🌐 Gradio: Interactive web interface.

🤝 Contributing

Contributions are welcome! Feel free to submit issues or pull requests to improve this project.

🛡️ License

This project is licensed under the MIT License.

🗂️ File Structure

.
├── app.py             # Main application script
├── requirements.txt   # List of dependencies
└── README.md          # This file

3.3 KiB Raw Blame History