Training this project was difficult and resulted in several failed download attempts of the data set. The ultimate solution to this problem was manually downloading the two files and then manually unzipping both.
I have throughly taken screenshots of my failed attempts.
As you can see, despite my best efforts and several different time tests, there was no way to successfully download this assignment without hacking it (the computer science way, special thanks to Kevin!)
Now, when it came to the image I wanted to use, I selected Monet’s Water Lillies. I thought it would work well since it has strong stylistic elements. Now, this too failed. I couldn’t actually get an output. I am going to continue working with this issue next week to see if I can’t get an actual result.
This is my actual transfer failure! ^
I was hopeful, but honestly, I can’t restart this again… after spending an entire week just trying to get the model downloaded.
You can see the manual unzipping!
This is all samples of the manual code I used to try to make this happen! ^
This alone took half the day! ^
Finally some success! ^
My constant checking to make sure this way didn’t quit! ^
These were y attempts using Aven’s method! ^
I think there is potential to make it work. I am going to retry until I get some results. When I have those results I will add them here.
I rode the metro with two ladies speaking sign language. I barely know enough Chinese to function, let alone Chinese Sign Language. So I did some research!
Turns out, just like China has many Chinese dialects, they have more than one sing language. Chinese sign language is split into North and South. Northern Chinese Sign Language is based off of American sign language, and Southern Chinese Sign Language is based off of French sign language.
Since we live in Shanghai, in this case I am looking at Southern Chinese Sign language. I trained 20 different hand gestures basically using the Rock,Paper, Scissors base KNN.
It kind of works? I works for one iteration and then stops. So I think this is because I have too many buttons but I cannot be sure! It’s okay! It is progress. At least the recognition is very accurate.
It worked with the three buttons. I didn’t change any of the code other than adding additional buttons. Because I can’t get the confidence to work it is harder to get additional interface things to work. I am hoping to get the issues sorted out as soon as possible.
Once those issues are sorted I will come back to this post and adjust with the final product. This issue with the code seems to be common. I want to see why it happens, as adding additional buttons seems rather straightforward.
UPDATE
So, upon meeting with Moon, I worked through the issue, discovering that the broken HTML was due to a single spelling error in terms of confidence detection with one of the signs. I subsequently fixed it and retrained the entire data set, with about… 100+ examples each. Some signs work standing up, others work sitting.
I worked on some simple CSS to make the interaction clear, in this way it makes it less cluttered as well. The emojis are meant to be hints, and the detection is meant to show people that their actions, their hand gestures have meaning.
I was going to add a guessing element… to have a randomized emoji appear, but I thought the learning part of this should happen with the video I included above. This is more just to practice different signs.
That is something I would further improve if I were to make this into an actual project beyond just a homework.
So growing up, a lot of the women in my family on my father’s side got diagnosed with cancer. My aunt nearly died of cervical cancer when I was eight. My cousin was in and out of remission for stomach and liver cancer from the time I was about seven till I was fifteen when she ultimately passed away. The four year anniversary of my cousin Judit’s death was September 28th, so I had been thinking of her a lot prior to the midterm. When it came to creating something for the class, I knew I wanted to do something that honored her and sort of helped me work through what I was feeling.
Sure, there is risk in doing something deeply personal. But it also helps you understand the level of interaction and helps you make it tailored to your target audience.
Background:
So, ideally I want to use ml5 to find the edge of a person’s head and then draw flowers that generate on that edge… now I think realistically it would make sense to generate some of those vines around the edge, just to make the image look less pixelated around the edge. So outline of the person is vines.. then at the end of these vines there is a flower that is made out of dots and kind of mimics the henna dot designs.
Please excuse my drawing ability. But essentially this is what I want the final to look like in some capacity. I’ll show how close to this I got in the midterm and also explain the critiques and what I hope to accomplish for the final.
The Process:
Using BodyPix or UNET find the edge
Count the pixel area of the body
Average the pixels to find the midpoint, so that the vines are all relative to the midpoint.
Draw all of the flowers and the vines in p5
Place the p5 elements in the ml5 sketch
Have the vine draw first
Add the flowers to the endpoint of the vine
Celebrate because this seems feasible in one week!
Step 1: BODYPIX OR UNET
BodyPix– “Bodypix is an open-source machine learning model which allows for person and body-part segmentation in the browser with TensorFlow.js. In computer vision, image segmentation refers to the technique of grouping pixels in an image into semantic areas typically to locate objects and boundaries. The BodyPix model is trained to do this for a person and twenty-four body parts (parts such as the left hand, front right lower leg, or back torso). In other words, BodyPix can classify the pixels of an image into two categories: 1) pixels that represent a person and 2) pixels that represent background. It can further classify pixels representing a person into any one of twenty-four body parts.” (ml5js.org)
UNET– “The U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg, Germany.[1] The network is based on the fully convolutional network [2] and its architecture was modified and extended to work with fewer training images and to yield more precise segmentations. “(ml5js.org)
So… I was going off of these definitions (that’s why I put them above)… it made more sense to use BodyPix since that’s a) what I used for my last coding homework, b) seems like it would be better at finding the overall area (I’m not looking for individual parts… which seems to be what UNET is tailored to do). Moon also agreed that BodyPix was better… at first I thought maybe it would be worth to do both in order to make the output outline of the person less pixelated, but if I am covering the edge up with a vine later anyway, do I really need UNET? I decided I didn’t and moved forward with BodyPix.
Step 2+3: BODYPIX CALCULATIONS
What does it mean to segment? Segment means you are looking for the body outline itself, so not the individual arm, face, etc.
It looked great… up until it was scaled, then I had these big ugly square edges, but at the time I thought, eh not a big deal since I’m covering this with a vine. But the image had to be bigger, so I had to do what I did.
Step 4: DRAWING FLOWERS + VINES
My favorite part of Calculus is not doing Calculus. But of course, flowers are circular, therefore it was time to revisit the Unit Circle. Below I’ll link all of the raw p5 code so you can see that the variation really come from adjusting the angles at which the sketch starts and ends.
The key difference between the vine and the other flowers is the mention of “noise” in the vine code. This means that the vine is different nearly every time you run the sketch and allows for a more organic, not nearly as neat and arced style as the flower have.
As I was admiring the vine, I realized it would be interesting to have it become sort of a necklace. I remember that one of the issues my cousin was facing towards the end of her life was shortness of breath, I thought maybe placing the vine there would be symbolic. This meant I wasn’t going to actually draw the vines around the body.
Now, a note on how to draw these graphics. This is apart of the create grpahics/PGraphics method present in both p5 and processing. Now the point of doing this… is the whole pixelated body outline is on the bottom most layer of the sketch. In this case, I drew another layer on top of this with the vine and the flowers. This method makes it possible to clear the sketch… this is a functionality I will explain in a little bit. The only thing that’s very odd still about this whole PGraphics thing is the fact that the canvas for that has to be large, and actually larger than the video output itself in this case. (This is another reason why I had to scale up the video and make the background pixelated)
STEP 5+6+7: PLACING ELEMENTS INTO THE ML5 SKETCH
So, it became apparent that using just plain frame count was going to be a problem. The issue was the flowers and vines couldn’t be drawn forever because that would eventually just yield a circle. So I was using frame count to control how many times the sketch would run. The issue was because the vine and the flowers were using frame count too, the sketch would stop before the flower reached maturity. This seemed to fix the issue almost completely.
The vine was in a function completely independent from the flower drawing function which took the ending point of the vine as an input.
STEP 8 + 9: BETTER USER EXPERIENCE WITH PAGE CLEARING + VOICE CONTROL
So I was thinking about how to clear the sketch as painlessly as possible. To me it made sense to clear it if there wasn’t enough of a person in the camera field. That’s exactly what I did with pixel counting, setting a reasonable threshold to clear the vine and the flowers, an example of which can be seen above.
When it came to growing the vine, it made sense considering the user has the vine positioned around the neck that speech, specifically volume controls the growth of the vine. Almost like this idea, that the more you struggle the worse it gets… this is also something that is possible for cancer patients to use even because it doesn’t require significant effort like movement would.
STEP 10: ACTUALLY CELEBRATING (BUT CRITIQUING MY WORK)
So what’s wrong with my project? Many things. I think I want to rework the vines to work around the outline and then have the background and the filling of the body be the same color.
The other issue is that I didn’t like the sizing of the graphics, but if I changed the size of the dots or the sketch it would have taken down the quality several notches, which was a clear design aspect I wanted to maintain. It just felt very… elementary and not as delicate as henna is.
So if I work through the issues with resizing then I think the vine issue is resolvable as well, but that goes back into fixing all of the frame count issues for the vine all over again.
The other critique I got related to this idea that my project was not… “fun” enough, which I really don’t want it to turn into a Snapchat filter, or something as superficial as a photo like that. I really want to be focused on this idea of positive growth coming out of decay. I see this as a more serious piece. Then again I’m a really serious person. Maybe I’ll make the colors a little more fun, but I really still want the detail and the beauty… the delicacy almost to remain as I keep working on this project.
Build a project that takes audio input and matches that audio input to the closest equivalent quotation from a movie, yielding the video clip as output.
Original Plan: Use WordtoVec to complete this with SoundClassification.
Completed Tasks:
Acquired the entire film
Acquired the subtitles
Realized Word2Vec didn’t work
Found an example of something that COULD work (if I can get the API to actually work for me too)
Started playing with Sound Classification
Spliced the entire movie by sentences… (ready for output!)
Discoveries:
WordtoVec doesn’t work with phrases. The sole functionality is mapping individual words to other words. And even then, it does such a horrific job that barely as of the movie plot is distinguishable by searching these related terms. Half the associated words just are not usable after training the model on the script from Detective Pikachu.
I’ll talk through the examples and how I was able to train the model on the subtitle script.
In order to train the model, I had to get the script into a usable format, because straight up the first time I did it, I left all the time stamps and line numbers in the subtitle document, so when it trained, the entire response list was just numbers that didn’t actually correlate to the words I typed into the sample in a meaningful way.
TEST 1: SUBTITLES INTO TRAINER WITH TIME STAMPS + LINE NUMBERS
Results:
I took the numbers that were outputted and compared it to the subtitles… then, since the subtitles don’t clarify which character said what, I went into the movie… which honestly… yielded mixed results. Leading me to conclude that there was no actual meaningful correlation between this output and the script itself.
Conclusion:
So is this data usable? Not really… it’s usually connecting words that yield that high correlation… so I decided to go back to the drawing board with testing. Which is really unfortunate considering how long this initial test took start to finish.
TEST 2: SUBTITLES INTO TRAINER WITHOUT TIME STAMPS + LINE NUMBERS
Results:
I went back and edited out all the time stamps and line numbers… and the results were still mixed in terms of giving me relevant words.
Is this usable information? Not even remotely. Most of the outputs are poor and it is next to impossible to draw any sort of meaningful connection that is not just a plain connector word. My conclusion here was that Word2Vec was not the best option for my project. It simply couldn’t even get meaningful word connections… and it did not possess the capability to analyze sentences as sentences.
Conclusion:
I asked Aven if I could quit Word2Vec… he said as long as I documented all my failures and tests… it would be fine… so that’s exactly what I did before starting completely over again! It was… needless to say incredibly frustrating to realize this very easily trainable model didn’t work for my project.
TEST 3: SENTENCE ANALYZER
So… this is was probably the saddest test I have seen. Seeing as I had nothing to work with… I went back to Medium to see if I could gain some sort of new insights into ml5… I stumbled upon this article about an ITP student’s project and thought I could go about my project in a similar way. He also references Word2Vec… but then explains that there is this other TensorFlow model that works for paragraphs and sentences! That’s exactly what I need! But as you can tell from above… it didn’t work out for me… after I had finally inputted the entire script. Hindsight is 20/20, I probably should have tested the base code BEFORE I inputted and retyped the script… it’s fine you know, you live and you learn.
This is the link to the GitHub Repo which is supposed to guide you just as the link from the ml5 website to the Word2Vec example, unfortunately there seems to be an issue even in their code, even after I downloaded everything necessary to run it (supposedly).
Conclusion:
Well… sometimes Holy Grail answers are too good to be true. I still haven’t given up on this one… I think maybe with enough work and office hours I can get the issue sorted, but this is definitely a potential solution to my issue… especially considering the ITP project, I definitely think this is the right way to go.
TEST 4: SOUND CLASSIFIER
This was one of the parts of my project, meant to be sort of a last step, making it the sole interaction. This is really something I only plan on implementing if I get the rest of the project functional! But I thought since I was experimenting as much as I was, it was time to explore this other element as well.
So what did I do?
This is nice. I think however, it’s incredibly difficult to get it trained based on your voice, considering HOW MANY LINES ARE IN A SINGLE FILM… it seems rather inefficient to do this. So I was thinking… is there a better method… or a method to help me train this Sound Classifier. And… I think I found something useful.
Conclusion:
I am not a computer science student… I don’t think I have the capabilities to build something from scratch… so I have really looked into this. And, should the main functionality of my project work with the Sentence Encoder… this could work to make the interaction smoother.
Progress Report:
I know what I need to use, it’s now just a matter of learning how to use things that are beyond the scope of the class. Or at the very least, beyond my current capabilities. This is a good start and I hope to have a better actually working project for the final. For now the experimentation has led to a clearer path.
Social Impact + Further Development:
In the case of this project, I see it developing into a sort of piece that serves as a social commentary on what we remember, what we really enjoy about media. I think there are so many different ways in which we consume media. We, as consumers all understand and really value different elements, characters, etc out of movies. There is something to be said about our favorite characters and lines, and why we remember them. Is your favorite character your favorite because it reminds you of someone you care about? Do you hate another character and not like their quotes because they play up on stereotypes you don’t believe in?
There is much to be said about who we like and why we like them. For the sake of my midterm, I think it’s best to say this project needs to go. I think in terms of developing on this social impact piece further, I want to maybe look at the psychological aspect of advertising. Are good movies foreshadowed by certain colorings and text? Is there basically an algorithm or recipe to pick out a good movie versus a mad movie? Are we as consumers based against low budget or movies created/acted in by minority actors?
I see that as the direction in which I can make this “Fun” gif generator a more serious work.
WordtoVec doesn’t work with phrases. The sole functionality is mapping individual words to other words. And even then, it does such a horrific job that barely as of the movie plot is distinguishable by searching these related terms. Half the associated words just are not usable after training the model on the script from Detective Pikachu.
I don’t think I can use this with multiple movies? Can you imagine how difficult it would be to pair a movie with the phrase? At first I was thinking about giving each movie an ID and manually adding it to every word… something like DP:are (the word are in Detective Pikachu)… I just don’t think that’s stratifiable.
SOUND CLASSIFIER
This was one of the parts of my project, meant to be sort of a last step, making it the sole interaction. This is really something I only plan on implementing if I get the rest of the project functional! But I thought since I was experimenting as much as I was, it was time to explore this other element as well.
This failed. I don’t think there is any other way to explain this, I think there is a serious issue with using this specific ml5 library with phrases. The confidence level of the sound-classifier with just the eighteen phrases is too low to be used in my project, done. End of story. The identifier can’t even identify the set words IT SHOULD KNOW!
Progress Report:
I know what I need to use, it’s now just a matter of learning how to use things that are beyond the scope of the class. Or at the very least, beyond my current capabilities. This is a good start and I hope to have a better actually working project for the final. For now the experimentation has led to a clearer path.
Original Plan:
Build a project that takes audio input and matches that audio input to the closest equivalent quotation from a movie, yielding the video clip as output.
Use WordtoVec to complete this with SoundClassification.
Now it looks like, I will have to re-evaluate. There is a really nice ITP example: Let’s Read a Story that I want to use to guide me… maybe there is a way to splice scripts together? Granted, that was never an idea I was interested in pursuing. I don’t know if it is too late to change my topic.
(Please see Midterm post for samples of failed tests)