
For Hurixdigital (UK)
Text2Video class that generates videos from text prompts using several technologies, including OpenAI's Azure service, PIL for image processing, gTTS for text-to-speech conversion, and moviepy for video creation. Here’s a detailed technical description:
The Text2Video class is designed to transform text prompts into videos with corresponding images and audio. The class is initialized with an Azure OpenAI client to generate images based on the provided text prompts. The get_image method sends a request to Azure OpenAI's DALL-E model to generate an image that includes the text prompt as part of the image in a comic style. The download_img_from_url method handles downloading the generated images from the URL to a local directory. For audio generation, the text_to_audio method uses Google's gTTS (Google Text-to-Speech) to convert text prompts into audio files.
The get_images_and_audio method orchestrates the generation of images and audio for a list of text prompts, storing them locally with unique filenames to avoid conflicts. The create_video_from_images_and_audio method then combines these images and audio files into a video. It verifies that each image has a corresponding audio file, and uses moviepy to create video clips with the images and synchronize them with the audio files. These clips are concatenated into a final video and saved in the specified output path.
Finally, the generate_video method manages the entire process, taking a list of text prompts, generating the required images and audio files, and then creating and saving the final video. The class also includes a gradio_interface method that sets up a web interface using Gradio, allowing users to input their text prompts, generate videos, and play the resulting videos directly from the web app.
