sprachText/README.md
2025-01-06 18:41:40 +01:00

3.3 KiB

🎙️ Voice-to-Text App with Whisper

This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 Python, 🎵 sounddevice, and 🤗 Gradio, this app is designed for local use, requiring internet access only during the initial setup.


🌟 Features

  • 🎤 Record Audio: Click a button to start and stop recording your voice.
  • ✂️ Automatic Splitting: Handles long audio files by splitting them into smaller chunks for transcription.
  • 📝 Speech-to-Text: Transcribes your voice into text using the Whisper model.
  • 🔒 Offline Capability: After setup, the app works entirely offline.

🚀 Getting Started

Prerequisites

  1. Python 3.8+

  2. Install the required Python libraries:

    pip install torch transformers sounddevice pydub gradio
    pip install --upgrade transformers datasets[audio] accelerate  
    
  3. FFmpeg (for audio processing):

  4. Possible Bugs: If there are any problems with the GPU use:

    pip uninstall torch torchvision torchaudio  
    

    Search for the right torch version for you GPU and intall torch

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 
    

📦 Installation

  1. Clone this repository:

    git clone https://github.com/your-username/voice-to-text-app.git
    cd voice-to-text-app
    
  2. Run the app:

    python app.py
    
  3. Open the provided link in your browser to access the web app.


🛠️ How It Works

  1. Recording: Click the Start Recording button to record your voice. Click Stop Recording when you're done.
  2. Transcription: Click the Transcribe button to convert your audio to text.
  3. Automatic Handling: The app automatically splits audio longer than 30 seconds and transcribes it in chunks.

🌐 Internet Usage

  • Setup: The app requires an internet connection only during the first run to download the Whisper model and dependencies.
  • Offline Mode: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.)

🎉 Example Use Case

  1. Record a 20-second audio note: "Take out the trash at 6 PM."
  2. Stop recording.
  3. Transcribe the audio to see: "Take out the trash at 6 PM."

🧰 Built With

  • 🤗 Transformers: Whisper model for speech-to-text.
  • 🎵 SoundDevice: Audio recording.
  • ✂️ Pydub: Audio splitting for long files.
  • 🌐 Gradio: Interactive web interface.

🤝 Contributing

Contributions are welcome! Feel free to submit issues or pull requests to improve this project.


🛡️ License

This project is licensed under the MIT License.


🗂️ File Structure

.
├── app.py             # Main application script
├── requirements.txt   # List of dependencies
└── README.md          # This file

Have fun 🤗