AI Arts Midterm (Research + Progress Towards Final) GIF GENERATOR – Eszter Vigh

Original Goal:

Project Inspo
My Ideal Cup of Coffee (too bad I don’t drink coffee and this project consumed most of my nights the past two weeks…)

Build a project that takes audio input and matches that audio input to the closest equivalent quotation from a movie, yielding the video clip as output.

Original Plan: Use WordtoVec to complete this with SoundClassification. 

Completed Tasks:

  • Acquired the entire film 
  • Acquired the subtitles
  • Realized Word2Vec didn’t work
  • Found an example of something that COULD work (if I can get the API to actually work for me too)
  • Started playing with Sound Classification
  • Spliced the entire movie by sentences… (ready for output!)

Discoveries:

WordtoVec doesn’t work with phrases. The sole functionality is mapping individual words to other words. And even then, it does such a horrific job that barely as of the movie plot is distinguishable by searching these related terms. Half the associated words just are not usable after training the model on the script from Detective Pikachu.  

I’ll talk through the examples and how I was able to train the model on the subtitle script.

Word 2 Vectorized
How To Guide on GitHub

In order to train the model, I had to get the script into a usable format, because straight up the first time I did it, I left all the time stamps and line numbers in the subtitle document, so when it trained, the entire response list was just numbers that didn’t actually correlate to the words I typed into the sample in a meaningful way. 

TEST 1: SUBTITLES INTO TRAINER WITH TIME STAMPS + LINE NUMBERS

Results: 

Failure
Pure Failure on My Part
First Test
This is what not removing numbers looks like…

I took the numbers that were outputted and compared it to the subtitles… then, since the subtitles don’t clarify which character said what, I went into the movie… which honestly… yielded mixed results. Leading me to conclude that there was no actual meaningful correlation between this output and the script itself. 

Line 480
Line 480 Actual Pikachu!
Line 675
Line 675 Actual Pikachu!
Line 157
Line 157 Pikachu is mentioned in this scene, but doesn’t say anything and isn’t present.
Line 364
Line 364 Pikachu is in the Scene
Line 248
Line 248 We as an audience don’t know that’s Pikachu yet… this blob shape could be any intruder… as Pikachu isn’t speaking
Line 574
Line 574 Once again no Mention of Pikachu or direct connection to the character
Line 306
Line 306 Pikachu escaping evil Pokemon
Line 556
Line 556 Evil Guy in Film, No Direct Correlation to Pikachu
Line 721
Line 721 … Pikachu is technically Interrogating so this is semi related, but the scene makes it unclear who is asking the questions
Line 530
Line 530 That’s Pikachu on the Screen!! That’s definitely related…

Conclusion:

What a Twist
I love when things don’t work the way I want them to

So is this data usable? Not really… it’s usually connecting words that yield that high correlation… so I decided to go back to the drawing board with testing. Which is really unfortunate considering how long this initial test took start to finish. 

TEST 2: SUBTITLES INTO TRAINER WITHOUT TIME STAMPS + LINE NUMBERS

Results: 

I went back and edited out all the time stamps and line numbers… and the results were still mixed in terms of giving me relevant words. 

Testing Word 2 Vec
Testing Word 2 Vec
horrible result
HORRIBLE RESULT FOR PIKACHU
Stuff?
Stuff is not DESCRIPTIVE OR SPECIFIC ENOUGH
IS
IS ??? definitely not enough
Clues has SOME Correlation .. though it’s not really a clear indicator of plot for those not familiar with the film
semi-helpful
Minor Correlation Considering Plot
bad cross compare
Bad Cross Comparison Result
ok results
About Half the Results are Meaningful
word2vec bad compare
Poor Output
comparison_1
Bad Comparison

Is this usable information? Not even remotely. Most of the outputs are poor and it is next to impossible to draw any sort of meaningful connection that is not just a plain connector word. My conclusion here was that Word2Vec was not the best option for my project. It simply couldn’t even get meaningful word connections… and it did not possess the capability to analyze sentences as sentences. 

Conclusion:

ANGER
This was me… waiting to have permission to move on from Word2Vec.

I asked Aven if I could quit Word2Vec… he said as long as I documented all my failures and tests… it would be fine… so that’s exactly what I did before starting completely over again! It was… needless to say incredibly frustrating to realize this very easily trainable model didn’t work for my project. 

TEST 3: SENTENCE ANALYZER

script_input
Inputting the entire script
sentence issues
THIS DOESN’T WORK!

So… this is was probably the saddest test I have seen. Seeing as I had nothing to work with… I went back to Medium to see if I could gain some sort of new insights into ml5… I stumbled upon this article about an ITP student’s project and thought I could go about my project in a similar way. He also references Word2Vec… but then explains that there is this other TensorFlow model that works for paragraphs and sentences! That’s exactly what I need! But as you can tell from above… it didn’t work out for me… after I had finally inputted the entire script. Hindsight is 20/20, I probably should have tested the base code BEFORE I inputted and retyped the script… it’s fine you know, you live and you learn. 

This is the link to the GitHub Repo which is supposed to guide you just as the link from the ml5 website to the Word2Vec example, unfortunately there seems to be an issue even in their code, even after I downloaded everything necessary to run it (supposedly). 

Conclusion:

2nd Failure
This was a harder failure to swallow

Well… sometimes Holy Grail answers are too good to be true. I still haven’t given up on this one… I think maybe with enough work and office hours I can get the issue sorted, but this is definitely a potential solution to my issue… especially considering the ITP project, I definitely think this is the right way to go. 

TEST 4: SOUND CLASSIFIER

This was one of the parts of my project, meant to be sort of a last step, making it the sole interaction. This is really something I only plan on implementing if I get the rest of the project functional! But I thought since I was experimenting as much as I was, it was time to explore this other element as well. 

So what did I do? 

sound 2
Experimenting with the 18 word set 1/3
sound
Experimenting with the 18 word set 2/3
sound 1
Experimenting with the 18 word set 3/3

This is nice. I think however, it’s incredibly difficult to get it trained based on your voice, considering HOW MANY LINES ARE IN A SINGLE FILM… it seems rather inefficient to do this. So I was thinking… is there a better method… or a method to help me train this Sound Classifier. And… I think I found something useful.

advantages
ADVANTAGES
process
PROCESS

Conclusion:

Solutions!

I am not a computer science student… I don’t think I have the capabilities to build something from scratch… so I have really looked into this. And, should the main functionality of my project work with the Sentence Encoder… this could work to make the interaction smoother. 

Progress Report:

I know what I need to use, it’s now just a matter of learning how to use things that are beyond the scope of the class. Or at the very least, beyond my current capabilities. This is a good start and I hope to have a better actually working project for the final. For now the experimentation has led to a clearer path. 

Social Impact + Further Development: 

In the case of this project, I see it developing into a sort of piece that serves as a social commentary on what we remember, what we really enjoy about media.  I think there are so many different ways in which we consume media. We, as consumers all understand and really value different elements, characters, etc out of movies. There is something to be said about our favorite characters and lines, and why we remember them. Is your favorite character your favorite because it reminds you of someone you care about? Do you hate another character and not like their quotes because they play up on stereotypes you don’t believe in? 

There is much to be said about who we like and why we like them. For the sake of my midterm, I think it’s best to say this project needs to go. I think in terms of developing on this social impact piece further, I want to maybe look at the psychological aspect of advertising. Are good movies foreshadowed by certain colorings and text? Is there basically an algorithm or recipe to pick out a good movie versus a mad movie? Are we as consumers based against low budget or movies created/acted in by minority actors? 

I see that as the direction in which I can make this “Fun” gif generator a more serious work. 

Leave a Reply