Home Artists Posts Import Register

Content

Edit: Fixed the link with the missing license.txt, thanks Anthony for figuring it out.

Hi all, took a few slower days after the news eve, to try and relax my mind a little, still been working, but more limited and with more limited access to the internet.

Now I'm start going full steam again, to start I'm releasing a new GUI, this one is a lot simpler than Stable, it will not consume much of my time, in fact this first version have most of the stuff it will have in the final build.

Link:

https://drive.google.com/file/d/1gRXwPVtw9jL1J7cqUV9ruHHKlzrQzH-D/view?usp=share_link

Mirror:

https://grisk.itch.io/whisper-gui



Whisper is a AI by OpenAI  that:
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

https://openai.com/blog/whisper/ 

https://github.com/openai/whisper

That is, it can generate subtitles for videos and audios on multiple languages. It also allow to translate that subtitle to English after if you like.

Here are two examples I did:

https://streamable.com/bzjkcp
https://streamable.com/o17pts 


There are multiple models to select, it will download them if you don't have it already.

If you have more than 10Vram, you will always want to use Large-V2

If not, use the larger model you can. If your input is in english, use the ".en" version.

Tomorrow I will release this version on Itchio, it will also be easier to download it from their servers.

This GUI I plan to always keep it free on itchio with the latest update.
That is because I think that Whisper can really help people with Auditory disability and people from other countries that need to learn from a foreign country (once the global translation is working on it)

About Stable Diffusion:

The next version should be ready this week, with a few new options and bug fixes. Now I will start answering some comments that been waiting for a reply for to long, sorry about that. 

 

Comments

cool1

Thanks a lot. I tried the Whisper GUI with a .mp3 file and it seems to have worked okay from what I've seen (I haven't checked every word but it seems right). That should help a lot in future. Would it be possible to make Whisper output a video with the subtitles overlaid (or is there already software that can do that if you give it the .srt)? eg. either a overlaid onto the source video (maybe with a black background around the text sections if that option was selected) or encode a video with just the subtitles (which could also work with an audio file as the source), maybe with an option for black background around the subtitled sections and maybe you could have green everywhere else so it could easily be keyed - or an alpha channel used instead for where the background would be. edit: handbrake can add the .srt subtitles to a video so could use that for it. edit2: though with handbrake there's no simple way to change the font/font size of burnt in subtitles as far as I know. So if Whisper GUI could do things like that with an option to burn in subtitles of a particular size/font that would be good. edit3: Though there's no option in the handbrake interface someone has said that if the .srt file has html code that specifies the font face and size then that could be used in it. But I don't know if that font code would need to be in every subtitle line. Maybe Whisper GUI could have an option to specify a font and font size and put that into the .srt file (and if it needs it in each subtitle line for apps like handbrake to show it with that font/size then Whisper could add it to each subtitle line). ... For the stable diffusion GUI. What are your thoughts on how the recent lawsuit against Stability AI, Midjourney+DeviantArt might affect things? Would that likely only affect future models Stability AI release and the licensing of them? Do you think it could affect existing models and whether they could still be used? One youtuber said it should clarify things legally so might help with future AIs (eg. text to image ones). Though it might be that that lawsuit doesn't go ahead. Maybe we'd need to wait until any results of that are given. Maybe if there was a way to totally train it from scratch on content we own the rights to (if that's possible on normal PCs) then that might help rights-wise.

DAINAPP

It can be a option in the future to hardcode the subtitle, sure, for now if you want, you can do it with ffmpeg: ffmpeg -i "owl.mp4" -vf subtitles=owl.srt out_oul.mp4 About the lawsuit. This will be the year of lawsuits and property laws. They might censor a lot of stuff, but this will create a loooot of problem down the line. We will have to wait and see. I think that countries that start to ban AI stuff will fall behind other countries, but we will have to wait and see.

Alien Anthony

That's what the 18tb storage drive is for. That and my entire steam library.