A Python-based application that records audio and converts it into text using OpenAI's Whisper model. The app is designed for offline use, requiring internet only during the initial setup to download the model. It supports automatic splitting of long recordings and features an interactive web interface built with Gradio. Perfect for converting voice memos or audio notes into text with ease.
Go to file
2025-01-16 17:00:39 +01:00
app.py add support for dynamic speech length 2025-01-16 17:00:39 +01:00
LICENSE add LICENSE 2025-01-06 18:45:10 +01:00
README.md add README.md 2025-01-06 18:41:40 +01:00
requirements.txt add requirements.txt 2025-01-06 18:36:01 +01:00

🎙️ Voice-to-Text App with Whisper

This is a simple, user-friendly application that records your voice and converts it into text using OpenAI's Whisper model. Built with 💻 Python, 🎵 sounddevice, and 🤗 Gradio, this app is designed for local use, requiring internet access only during the initial setup.


🌟 Features

  • 🎤 Record Audio: Click a button to start and stop recording your voice.
  • ✂️ Automatic Splitting: Handles long audio files by splitting them into smaller chunks for transcription.
  • 📝 Speech-to-Text: Transcribes your voice into text using the Whisper model.
  • 🔒 Offline Capability: After setup, the app works entirely offline.

🚀 Getting Started

Prerequisites

  1. Python 3.8+

  2. Install the required Python libraries:

    pip install torch transformers sounddevice pydub gradio
    pip install --upgrade transformers datasets[audio] accelerate  
    
  3. FFmpeg (for audio processing):

  4. Possible Bugs: If there are any problems with the GPU use:

    pip uninstall torch torchvision torchaudio  
    

    Search for the right torch version for you GPU and intall torch

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 
    

📦 Installation

  1. Clone this repository:

    git clone https://github.com/your-username/voice-to-text-app.git
    cd voice-to-text-app
    
  2. Run the app:

    python app.py
    
  3. Open the provided link in your browser to access the web app.


🛠️ How It Works

  1. Recording: Click the Start Recording button to record your voice. Click Stop Recording when you're done.
  2. Transcription: Click the Transcribe button to convert your audio to text.
  3. Automatic Handling: The app automatically splits audio longer than 30 seconds and transcribes it in chunks.

🌐 Internet Usage

  • Setup: The app requires an internet connection only during the first run to download the Whisper model and dependencies.
  • Offline Mode: Once the model is downloaded, the app works entirely offline, ensuring privacy and local processing. (As long as the cache is not deleted. Otherwise the model will be downloaded again.)

🎉 Example Use Case

  1. Record a 20-second audio note: "Take out the trash at 6 PM."
  2. Stop recording.
  3. Transcribe the audio to see: "Take out the trash at 6 PM."

🧰 Built With

  • 🤗 Transformers: Whisper model for speech-to-text.
  • 🎵 SoundDevice: Audio recording.
  • ✂️ Pydub: Audio splitting for long files.
  • 🌐 Gradio: Interactive web interface.

🤝 Contributing

Contributions are welcome! Feel free to submit issues or pull requests to improve this project.


🛡️ License

This project is licensed under the MIT License.


🗂️ File Structure

.
├── app.py             # Main application script
├── requirements.txt   # List of dependencies
└── README.md          # This file

Have fun 🤗