Add "Speech to Text" to Video Analysis menu
This has a lower priority compared to finishing the implementation of the bounding boxes and should be addressed afterwards.
Since they are preparing the deplopyment of container for whisper (speech to text) at wizai, we should prepare the UI to implement it. When you select this feature, you should be requested to define the language of your video, because whisper can only automatically identify the language in the first 30 seconds. If there's nothing to hear in these 30s or another language it might fail with the transcription or perform badly.
Here are the languages that are supported with the respective error rate:
Maybe we could offer everything in a drop down up to a score of 21 (Hungarian) but I would offer English and German at the first place, because those might be the most common ones for our users.