# 🎙️ Voice-to-Text App with Whisper This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 `Python`, 🎵 `sounddevice`, and 🤗 `Gradio`, this app is designed for local use, requiring internet access **only during the initial setup**. --- ## 🌟 Features - 🎤 **Record Audio**: Click a button to start and stop recording your voice. - ✂️ **Automatic Splitting**: Handles long audio files by splitting them into smaller chunks for transcription. - 📝 **Speech-to-Text**: Transcribes your voice into text using the Whisper model. - 🔒 **Offline Capability**: After setup, the app works entirely offline. --- ## 🚀 Getting Started ### Prerequisites 1. **Python 3.8+** 2. Install the required Python libraries: ```bash pip install torch transformers sounddevice pydub gradio pip install --upgrade transformers datasets[audio] accelerate ``` 3. **FFmpeg** (for audio processing): - Download and install FFmpeg from [FFmpeg Official Site](https://ffmpeg.org/download.html) or via [chocolately](https://chocolatey.org/). 4. **Possible Bugs:** If there are any problems with the GPU use: ```bash pip uninstall torch torchvision torchaudio ``` Search for the right torch version for you GPU and intall torch ```bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 ``` --- ### 📦 Installation 1. Clone this repository: ```bash git clone https://github.com/your-username/voice-to-text-app.git cd voice-to-text-app ``` 2. Run the app: ```bash python app.py ``` 3. Open the provided link in your browser to access the web app. --- ## 🛠️ How It Works 1. **Recording**: Click the **Start Recording** button to record your voice. Click **Stop Recording** when you're done. 2. **Transcription**: Click the **Transcribe** button to convert your audio to text. 3. **Automatic Handling**: The app automatically splits audio longer than 30 seconds and transcribes it in chunks. --- ## 🌐 Internet Usage - **Setup**: The app requires an internet connection **only during the first run** to download the Whisper model and dependencies. - **Offline Mode**: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.) --- ## 🎉 Example Use Case 1. Record a 20-second audio note: "Take out the trash at 6 PM." 2. Stop recording. 3. Transcribe the audio to see: `"Take out the trash at 6 PM."` --- ## 🧰 Built With - 🤗 [Transformers](https://huggingface.co/docs/transformers): Whisper model for speech-to-text. - 🎵 [SoundDevice](https://python-sounddevice.readthedocs.io/): Audio recording. - ✂️ [Pydub](https://github.com/jiaaro/pydub): Audio splitting for long files. - 🌐 [Gradio](https://gradio.app/): Interactive web interface. --- ## 🤝 Contributing Contributions are welcome! Feel free to submit issues or pull requests to improve this project. --- ## 🛡️ License This project is licensed under the MIT License. --- ## 🗂️ File Structure ``` . ├── app.py # Main application script ├── requirements.txt # List of dependencies └── README.md # This file ``` --- Have fun 🤗 ---