111 lines
3.3 KiB
Markdown
111 lines
3.3 KiB
Markdown
# 🎙️ Voice-to-Text App with Whisper
|
|
|
|
This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 `Python`, 🎵 `sounddevice`, and 🤗 `Gradio`, this app is designed for local use, requiring internet access **only during the initial setup**.
|
|
|
|
---
|
|
|
|
## 🌟 Features
|
|
|
|
- 🎤 **Record Audio**: Click a button to start and stop recording your voice.
|
|
- ✂️ **Automatic Splitting**: Handles long audio files by splitting them into smaller chunks for transcription.
|
|
- 📝 **Speech-to-Text**: Transcribes your voice into text using the Whisper model.
|
|
- 🔒 **Offline Capability**: After setup, the app works entirely offline.
|
|
|
|
---
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### Prerequisites
|
|
1. **Python 3.8+**
|
|
2. Install the required Python libraries:
|
|
```bash
|
|
pip install torch transformers sounddevice pydub gradio
|
|
pip install --upgrade transformers datasets[audio] accelerate
|
|
```
|
|
3. **FFmpeg** (for audio processing):
|
|
- Download and install FFmpeg from [FFmpeg Official Site](https://ffmpeg.org/download.html) or via [chocolately](https://chocolatey.org/).
|
|
|
|
4. **Possible Bugs:** If there are any problems with the GPU use:
|
|
```bash
|
|
pip uninstall torch torchvision torchaudio
|
|
```
|
|
Search for the right torch version for you GPU and intall torch
|
|
```bash
|
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
|
|
```
|
|
---
|
|
|
|
### 📦 Installation
|
|
|
|
1. Clone this repository:
|
|
```bash
|
|
git clone https://github.com/your-username/voice-to-text-app.git
|
|
cd voice-to-text-app
|
|
```
|
|
|
|
2. Run the app:
|
|
```bash
|
|
python app.py
|
|
```
|
|
|
|
3. Open the provided link in your browser to access the web app.
|
|
|
|
---
|
|
|
|
## 🛠️ How It Works
|
|
|
|
1. **Recording**: Click the **Start Recording** button to record your voice. Click **Stop Recording** when you're done.
|
|
2. **Transcription**: Click the **Transcribe** button to convert your audio to text.
|
|
3. **Automatic Handling**: The app automatically splits audio longer than 30 seconds and transcribes it in chunks.
|
|
|
|
---
|
|
|
|
## 🌐 Internet Usage
|
|
|
|
- **Setup**: The app requires an internet connection **only during the first run** to download the Whisper model and dependencies.
|
|
- **Offline Mode**: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.)
|
|
|
|
---
|
|
|
|
## 🎉 Example Use Case
|
|
|
|
1. Record a 20-second audio note: "Take out the trash at 6 PM."
|
|
2. Stop recording.
|
|
3. Transcribe the audio to see: `"Take out the trash at 6 PM."`
|
|
|
|
---
|
|
|
|
## 🧰 Built With
|
|
|
|
- 🤗 [Transformers](https://huggingface.co/docs/transformers): Whisper model for speech-to-text.
|
|
- 🎵 [SoundDevice](https://python-sounddevice.readthedocs.io/): Audio recording.
|
|
- ✂️ [Pydub](https://github.com/jiaaro/pydub): Audio splitting for long files.
|
|
- 🌐 [Gradio](https://gradio.app/): Interactive web interface.
|
|
|
|
---
|
|
|
|
## 🤝 Contributing
|
|
|
|
Contributions are welcome! Feel free to submit issues or pull requests to improve this project.
|
|
|
|
---
|
|
|
|
## 🛡️ License
|
|
|
|
This project is licensed under the MIT License.
|
|
|
|
---
|
|
|
|
## 🗂️ File Structure
|
|
|
|
```
|
|
.
|
|
├── app.py # Main application script
|
|
├── requirements.txt # List of dependencies
|
|
└── README.md # This file
|
|
```
|
|
|
|
---
|
|
|
|
Have fun 🤗
|
|
--- |