add README.md

2025-01-06 18:41:40 +01:00 · 2025-01-06 18:41:40 +01:00 · 97800e2b09
commit 97800e2b09
parent 4e0582f7ff
1 changed files with 111 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,111 @@
+# 🎙️ Voice-to-Text App with Whisper
+
+This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 `Python`, 🎵 `sounddevice`, and 🤗 `Gradio`, this app is designed for local use, requiring internet access **only during the initial setup**.
+
+---
+
+## 🌟 Features
+
+- 🎤 **Record Audio**: Click a button to start and stop recording your voice.
+- ✂️ **Automatic Splitting**: Handles long audio files by splitting them into smaller chunks for transcription.
+- 📝 **Speech-to-Text**: Transcribes your voice into text using the Whisper model.
+- 🔒 **Offline Capability**: After setup, the app works entirely offline.
+
+---
+
+## 🚀 Getting Started
+
+### Prerequisites
+1. **Python 3.8+**
+2. Install the required Python libraries:
+   ```bash
+   pip install torch transformers sounddevice pydub gradio
+   pip install --upgrade transformers datasets[audio] accelerate  
+   ```
+3. **FFmpeg** (for audio processing):
+   - Download and install FFmpeg from [FFmpeg Official Site](https://ffmpeg.org/download.html) or via [chocolately](https://chocolatey.org/).
+
+4. **Possible Bugs:** If there are any problems with the GPU use:
+    ```bash
+    pip uninstall torch torchvision torchaudio  
+    ```
+    Search for the right torch version for you GPU and intall torch
+    ```bash
+    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 
+    ```
+---
+
+### 📦 Installation
+
+1. Clone this repository:
+   ```bash
+   git clone https://github.com/your-username/voice-to-text-app.git
+   cd voice-to-text-app
+   ```
+
+2. Run the app:
+   ```bash
+   python app.py
+   ```
+
+3. Open the provided link in your browser to access the web app.
+
+---
+
+## 🛠️ How It Works
+
+1. **Recording**: Click the **Start Recording** button to record your voice. Click **Stop Recording** when you're done.
+2. **Transcription**: Click the **Transcribe** button to convert your audio to text.
+3. **Automatic Handling**: The app automatically splits audio longer than 30 seconds and transcribes it in chunks.
+
+---
+
+## 🌐 Internet Usage
+
+- **Setup**: The app requires an internet connection **only during the first run** to download the Whisper model and dependencies.
+- **Offline Mode**: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.)
+
+---
+
+## 🎉 Example Use Case
+
+1. Record a 20-second audio note: "Take out the trash at 6 PM."
+2. Stop recording.
+3. Transcribe the audio to see: `"Take out the trash at 6 PM."`
+
+---
+
+## 🧰 Built With
+
+- 🤗 [Transformers](https://huggingface.co/docs/transformers): Whisper model for speech-to-text.
+- 🎵 [SoundDevice](https://python-sounddevice.readthedocs.io/): Audio recording.
+- ✂️ [Pydub](https://github.com/jiaaro/pydub): Audio splitting for long files.
+- 🌐 [Gradio](https://gradio.app/): Interactive web interface.
+
+---
+
+## 🤝 Contributing
+
+Contributions are welcome! Feel free to submit issues or pull requests to improve this project.
+
+---
+
+## 🛡️ License
+
+This project is licensed under the MIT License.
+
+---
+
+## 🗂️ File Structure
+
+```
+.
+├── app.py             # Main application script
+├── requirements.txt   # List of dependencies
+└── README.md          # This file
+```
+
+---
+
+Have fun 🤗
+---