From 97800e2b092cf970b5aca9b15b2fd08112a62fdf Mon Sep 17 00:00:00 2001 From: Christian Rute Date: Mon, 6 Jan 2025 18:41:40 +0100 Subject: [PATCH] add README.md --- README.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..81453d1 --- /dev/null +++ b/README.md @@ -0,0 +1,111 @@ +# 🎙️ Voice-to-Text App with Whisper + +This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 `Python`, 🎵 `sounddevice`, and 🤗 `Gradio`, this app is designed for local use, requiring internet access **only during the initial setup**. + +--- + +## 🌟 Features + +- 🎤 **Record Audio**: Click a button to start and stop recording your voice. +- ✂️ **Automatic Splitting**: Handles long audio files by splitting them into smaller chunks for transcription. +- 📝 **Speech-to-Text**: Transcribes your voice into text using the Whisper model. +- 🔒 **Offline Capability**: After setup, the app works entirely offline. + +--- + +## 🚀 Getting Started + +### Prerequisites +1. **Python 3.8+** +2. Install the required Python libraries: + ```bash + pip install torch transformers sounddevice pydub gradio + pip install --upgrade transformers datasets[audio] accelerate + ``` +3. **FFmpeg** (for audio processing): + - Download and install FFmpeg from [FFmpeg Official Site](https://ffmpeg.org/download.html) or via [chocolately](https://chocolatey.org/). + +4. **Possible Bugs:** If there are any problems with the GPU use: + ```bash + pip uninstall torch torchvision torchaudio + ``` + Search for the right torch version for you GPU and intall torch + ```bash + pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 + ``` +--- + +### 📦 Installation + +1. Clone this repository: + ```bash + git clone https://github.com/your-username/voice-to-text-app.git + cd voice-to-text-app + ``` + +2. Run the app: + ```bash + python app.py + ``` + +3. Open the provided link in your browser to access the web app. + +--- + +## 🛠️ How It Works + +1. **Recording**: Click the **Start Recording** button to record your voice. Click **Stop Recording** when you're done. +2. **Transcription**: Click the **Transcribe** button to convert your audio to text. +3. **Automatic Handling**: The app automatically splits audio longer than 30 seconds and transcribes it in chunks. + +--- + +## 🌐 Internet Usage + +- **Setup**: The app requires an internet connection **only during the first run** to download the Whisper model and dependencies. +- **Offline Mode**: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.) + +--- + +## 🎉 Example Use Case + +1. Record a 20-second audio note: "Take out the trash at 6 PM." +2. Stop recording. +3. Transcribe the audio to see: `"Take out the trash at 6 PM."` + +--- + +## 🧰 Built With + +- 🤗 [Transformers](https://huggingface.co/docs/transformers): Whisper model for speech-to-text. +- 🎵 [SoundDevice](https://python-sounddevice.readthedocs.io/): Audio recording. +- ✂️ [Pydub](https://github.com/jiaaro/pydub): Audio splitting for long files. +- 🌐 [Gradio](https://gradio.app/): Interactive web interface. + +--- + +## 🤝 Contributing + +Contributions are welcome! Feel free to submit issues or pull requests to improve this project. + +--- + +## 🛡️ License + +This project is licensed under the MIT License. + +--- + +## 🗂️ File Structure + +``` +. +├── app.py # Main application script +├── requirements.txt # List of dependencies +└── README.md # This file +``` + +--- + +Have fun 🤗 +--- \ No newline at end of file