Style Transfer Training: Cyborg Faces

Inference on Pre-Trained Models:

To start off, I found a basic image of a street in Hong Kong, and played around with pre-trained models in ml5.js. 

Input: 

Style:

Output:

As you can see from the output, the style transfer is quite amazing; the wave texture is perfectly captured from the style image and transferred over to the input image, while still keeping the overall structure (such as the cars, signs, and roadway). Personally, I am still not sure how this pre-trained model is able to take the tidal curve texture  (which is not overly present in the style image, only on specific waves), and apply it across the input image so nicely. 

Training My Model:

First off, I followed the instructions on the powerpoint, and downloaded the data set accordingly. The process wasn’t too bad; took only around 2 hours or so:

However, after acquiring the data set, I ran into a few errors during the training step:

Turns out there were some issues with my directory and image path, so I had to spend some time fixing those issues. 

Idea:

I wanted to experiment with style transfer using faces as the input image, since human faces are one of the most prominent, recognizable features to us.    Interestingly, I came across this article which touched upon the strange phenomenon where humans possess the tendency to find ‘faces’ on inanimate objects, especially cars. Personally, I’ve always thought that the fronts of cars have always had a certain personality to them (ie, some cars look happy, some cars sleepy, etc). In the article, it talks about a study published in the Proceedings of the National Academy of Sciences, where researchers had auto experts look at the fronts of cars, and they found that the same area of the brain involved in facial recognition was activated (fusiform face area). I thought this was especially interesting, in that car faces were essentially triggering the same responses as human faces in the brain. Therefore, for my style transfer project, I wanted to have human faces as the input image, and a car face as the style image. I thought it would be fascinating to see how the model would combine the perceived style of the car face into the human face, and what the results would be. Would the output image just be a photo where the machinelike style of the car is textured across the human? Or would the output face be more like a cyborg, where certain qualities of the car is fixed into the face, making it seem part human, part machine?

Results:

 +  =  

  +  = 

  +   = 

Judging from these results, I would say that my first run with the male test subject turned out the best; I think it may be because the prominent circles on the car translated over to the man’s pupils, giving him that ‘cyborg’ like feel I had hoped for. Interestingly, the model managed to position the headlights of each car into the eyes of the human faces, which I thought was quite fitting. The model also took the gridline patterns of the front of the cars and translated them across the background of the output images (which is most noticeable with the male subject and third female subject). Ultimately, my output images were quite striking in my opinion; it seemed as if they really were like ‘cyborgs’. It would also be interesting to see human faces styled with other face-like objects as well. 

Sources:

https://www.smithsonianmag.com/smart-news/for-experts-cars-really-do-have-faces-57005307/

Midterm Proposal

Before I took this course, if someone were to mention ‘Machine Learning” to me, I would automatically think of computers learning how to interpret speech and ‘talking’ with their human counterparts.  For some reason, chatbots and computer generated speech has always fascinated me, and I knew that this was a route I wanted to take for the midterm project. 

Intitially, I wanted to create a simple chatbot similar to ELIZA, which was an early model for natural language processing. ELIZA was created by Joseph Weizenbaum, a professor at MIT who wanted to make a chatbot to show the ‘superficiality of communication between man and machine’. Weizenbaum thought that because machines could not experience feelings and emotions attributed to language, conversation between a human and a computer is essentially meaningless. ELIZA was supposed to simulate an electronic therapist, where users would talk with the program to receive mental treatment. Ironically, Weizenbaum discovered that although users were aware that ELIZA was a robot, they projected their own human-like emotions onto the software itself, thereby creating the illusion that ELIZA actually did possess a degree of understanding and human intelligence. In fact, several patients reported very positive feedback in their interactions with ELIZA, stating that it differed very little from talking with an actual, human therapist. 

To me, this is especially interesting, because being able to converse with something is a good marker that it possesses some degree of intelligence, especially if the conversation is ‘reactive’, in the sense that the replies are not scripted. However, right before I was about to start with this ELIZA styled chatbot, my psychology professor introduced an interesting psychological phenomenon called Wernicke’s Aphasia, which is suffered by patients who received damage in the area of the brain related to language processing. As a result, they are unable to produce meaningful speech, albeit retaining proper grammar and syntax usage. I thought this was the prefect opportunity to combine the two ideas into one; create a chatbot that simulated the natural language ability of a human being, but imbue it with syndromes similar to those of Wernicke’s Aphasia patients. 

The ultimate goal of this project is to experiment with the meaning behind computer generated text. What if the model can ‘learn’ fluent speech, but produce individual words that lack meaning? Would users still want to talk with this bot? Would they project their own interpretations of what the bot is trying to say? Or would they get bored after a few minutes of interaction, and quit the program? 

In order to tackle this midterm project, I had originally planned to use word2vec, but realized that a more efficient and accurate approach would be to utilize the spaCy library, as well as a bidirectional model instead. I would feed it a heavy dataset (very likely a novel) so that it could build up a working vocabulary, and then have it generate responses on the fly, according to the user input. 

Sources:

https://www.eclecticenergies.com/ego/eliza       (ELIZA chatbot)

Acharya, Aninda B. “Wernicke Aphasia.” StatPearls [Internet]., U.S. National Library of Medicine, 16 Jan. 2019, www.ncbi.nlm.nih.gov/books/NBK441951    

Midterm Reflection: Imprinting

Inspiration

As stated in my midterm proposal, my project mainly focuses on the phenomenon of imprinting, a behavior prominently found in birds during their first stages of life. The aim was to have the robot mimic the basic tendencies of these infant animals, namely, following a moving subject and adjusting its direction accordingly.

Imprinting was first recorded in domestic chickens, and has since been used by mankind for centuries in order to control animals for agricultural and other daily uses. In rural China, farmers imprinted ducklings to special ‘sticks’, which they would then use to lead the ducklings out to the rice paddies to control the snail population. 

It seems that imprinting is pre-programmed in these animals through natural selection, and the main reason is because in the first moments of the animal’s life, it is vital that it follows a nearby moving organism to ensure its safety. There is a critical period for imprinting however, which typically spans the first 36 hours from the time the infant is birthed. Once the animal is imprinted on an object/organism, it is said to be ‘fixed’, meaning that it will be nearly impossible for the baby to unlearn this association, even in adulthood. In this case, the phenomenon is known as ‘damage imprinting’.

Here is a video of imprinting in action:

From this video, you can see that these ducklings will continue to follow the cameraman as he moves around in various directions, even moving their heads to look at him as he loops around the room. 

Despite this being a rather simple behavior, I feel that it’s something we all experience as living creatures; our pet dogs will run to us when it sees us appear, and small children will recognize and follow trusted family members or friends. To have a robot follow a human around means that it is registering the individual as something, regardless of whether or not it knows what it is, and there is something quite lifelike in this behavior.

Implementation

My first step was to have the robot detect objects in front of it, and move towards it accordingly. This was easy; with the use of the ultrasound, the bot could “see” objects and adjust the motor speed. However, the turning behavior of the robot posed a huge issue, as the kittenbot’s turns were very delayed and robotic. Therefore, I had to find another way to have the bot detect objects and turn towards it. 

Attraction behavior using one ultrasound sensor:

After thinking about the problem for quite a bit of time, I had a breakthrough moment after looking through images of insects, specifically the spider (I was trying to find other ways of sensing because the ultrasound was being finicky).  Most spiders have eight eyes, positioned across their ‘head’ to increase their field of view. I wondered if I could simply just have two ultrasound sensors on both sides of the bot, and connect each sensor to one motor. Therefore, if one sensor detects on object, and the other sensor does not, I could have the motor controlled by the non-detecting sensor move, propelling the bot to turn towards the object positioned in front of the sensor that did detect it. In other words, the process looked something like this:

                                                                  

                                   (turning right)

                                                                           OBJECT

sensor 1    (no object)                          sensor 2 (detects object)

^^^^^^^^                                                     ^^^^^^^^

motor 1   (move)                                     motor 2 (don’t move)

                                    (turning left)

OBJECT

sensor 1    (detects object)               sensor 2 (no object)

^^^^^^^^                                                     ^^^^^^^^

motor 1   (don’t move)                         motor 2 (move)

                                       

                                   (moving straight forward)

                                              O    B    J    E    C    T

sensor 1    (detects object)               sensor 2 (detects object)

^^^^^^^^                                                     ^^^^^^^^

motor 1   ( move)                                   motor 2 (move)

First Prototype:

As you can see, I removed the original kittenBot ultrasound, and equipped two Arduino ultrasound sensors to both sides of the bot. 

Here is the test run of each bot’s individual ‘motor-to-ultrasound’ connection:

From this video, we can see that each motor is indeed connected to their respective ultrasounds, which is the basis for achieving smooth, quick turns. However, I realized that my current setup of taping the ultrasounds to the bot was not very stable, as they would occasionally jitter. Therefore, I added a plastic plating to the front of the bot, and securely taped the ultrasound sensors to the plating:

              

After everything was securely fastened to the bot, and both motors and sensors were working properly, it was time to test out one bot to see if it would ‘imprint’ on objects and follow them in a quick, reactive motion. 

I was quite happy with the results of this, so I just duplicated the steps to create a second robot. 

Judging from the behavior of the first robot, I expected the second robot to be able to follow the first bot with no problems. The only issue I could imagine would be the speed of the two robots, where the ‘leader’ bot would either be too slow or too fast for the ‘follower’ bot. 

Here is the result of trying to recreate the duckling squad imprinting effect:

Interestingly, the bots managed to create a ‘congo-line’ effect quite well; the only issue I saw was that the follower bot would sometimes lag behind the leader and then lose the detection due to the first bot moving outside of the distance threshold. 

Reflection

I think that the phenomenon of imprinting is very intriguing in that it is a pre-programmed behavior in living organisms. In a way, this is an example of robotic behavior within non-robotic organisms (following a designated point from start to finish).  However, to have something react in real time to sudden changes in direction or movement, and even follow specific organisms is also a good marker for lifelike behavior. It would be interesting to have my ‘leader’ bot programmed to exhibit more randomized behavior, and have the ‘follower’ bot mimic these actions as well. 

Sources

“My Life as a Turkey.” PBS, Public Broadcasting Service, 21 Oct. 2014, www.pbs.org/wnet/nature/my-life-as-a-turkey-whos-your-mama-the-science-of-imprinting/7367/.

T.L. Brink. (2008) Psychology: A Student Friendly Approach. “Unit 12: Developmental Psychology.” pp. 268 [1]

Midterm Reflection: Wernicke’s Bot

Inspiration

Even before my proposal, I was very heavily drawn towards working with text based machine learning. Chatbots have always fascinated me, especially models that are able to ‘trick’ the user into thinking that they are having a conversation with a real person. I wanted to have the bot be able to take in inputs from the user, and output certain responses to create the ‘back and forth’ interaction of a simple conversation. 

The same week the midterm proposal was introduced, my psychology professor briefly talked about a strange syndrome called Wernicke’s Aphasia, which is essentially “characterized by superficially fluent, grammatical speech but an inability to use or understand more than the most basic nouns and verbs.” The aphasia can appear when the patient receives damage to the temporal lobe of the brain, and renders the patient either unable to understand speech, or unable to produce meaningful speech, albeit retaining proper grammar/syntax usage. 

Here is an example of a patient with Wernicke’s Aphasia attempting to answer questions posed by a clinician:

Personally, this was especially interesting to me, the fact that the human brain can experience errors in language comprehension similar to computers. In a weird way, the effects of this aphasia was the inspiration point for my project; I wanted to have the chatbot simulate a patient with Wernicke’s. This seems like a sharp contrast to normal chatbots, where understanding is shared between two parties, because in this case, only the computer can ‘understand’ what the user is saying, but the user cannot deduce any meaning from the chatbot’s replies. People may wonder what the point is in talking with something that spews ‘nonsense’, but I think the interaction can accurately depict the struggles of an individual experiencing this aphasia in real life, where he/she is essentially isolated in terms of the social world. 

Retrieving Data and Building the Model

In order to begin creating this chatbot, I needed to have powerful tools to work with language processing, and after some research, discovered the spaCy library, which integrates seamlessly with Tensorflow, and prepares text specifically for deep learning. I also found tutorials regarding the use of ‘bidirectional LSTMs’, which are essentially extensions of traditional LSTMs that improve upon model performance for sequence classification problems. The main use for the LSTM is to predict what the following word is for a given sentence (which is the user input). 

In order to train the data, I needed a thorough data set complete with common English phrases and a variety of words to work with. I started off training the model with a copy of “Alice in Wonderland”, but the novel was just to small of a data set, so I ended up using “War and Peace’ by Leo Tolstoy, which is around 1585 pages. 

I acquired a copy of the novel from the Gutenberg project, and split them into separate txt files. Because 1585 pages was extremely large, and I was on a strict time schedule, I ended up only feeding it around 500 pages. 

Using spaCy, I formed one list of words containing the entire novel, which I then used to create the vocabulary set for my model. The vocab set acts essentially as a dictionary, where each word is stored without duplicates, and assigned an index. 

Next, I needed to create the training sets for my model; in order to do this, I had to split the set into two parts, one of which contains the sequence of words from my original word list, and the other containing the next words of each sequence. Therefore, the end result of the model would be to be able to predict the ‘next word’ of a sequence of given words. 

Lastly, I needed to convert these words into digits so that it could be understood by the LSTM, and build the model with the use of Keras; in order to do so I had to transform my word sequences into boolean matrices (this took a long, long time, with the help of several tutorials). 

Training the Model

Since I spent most of the time learning how to implement spaCy and how to create a machine learning model, I had less time left to train the model, which was a big downside. Therefore, I couldn’t afford to feed it all 1500~ pages of the novel, and fed it only around 500~ pages. The result is as follows, with a batch size of 32 and 50 epochs:

The first 14 epochs saw a steady decline in loss, and an increase in accuracy, which yielded a loss of 2.2 and an accuracy of 0.47 in the end. Not too bad, as the inaccuracy of words can actually play a part in stimulating the lack of meaning for the bot’s output. 

Testing The Bot

I figured that I needed a way to adjust how the model ‘predicts’ the next possible word from my seed sentence (user input). Using sample( ), I ended up picking random words from my vocab set, but the twist is that the probability of the word being picked depends on how likely it is to be the next word in the sentence (which is determined by the trained LSTM).  After a few more tweaking with the bot’s predictability numbers,  I was ready to ‘talk’ with the chatbot. I asked the bot a few simple questions, similar to questions a clinician would ask a real patient with Wernicke’s. The result:

As you can see, the outputs are indeed nonsense sentences, but I felt that the lack in meaning was much too severe. Therefore, I had to adjust the predictability numbers quite a bit to find a good balance between nonsense and comprehension, as well as fix issues with periods and commas. 

The second generation ended up having words that related more to the original question; it seemed as if the robot was trying to answer the question in the first half of the sentence, but loses all meaning in the second half, which was precisely the effect I wanted. In order to better differentiate between the user and the chatbot, I used Python ‘colored’ to change the chatbot’s outputs to a red color. 

Even though I could not understand the chatbot at all, there was something engaging about asking it questions, and receiving a response back, especially when you knew that the bot was able to ‘read’ your inputs. Sometimes, the chatbot’s response would be eerily close to a meaningful response, and other times, completely incomprehensible. I ended up talking with the bot for at least half an hour, which was surprising to me, because I did not think that I would be engaged in essentially a one way conversation with a computer. 

Issues

My biggest issue with the process is that I didn’t have enough time to feed it a larger, more robust data set. Also, because “War and Peace” was written quite a while back, the language that my bot learned is a bit ‘outdated’, meaning that if I were to ask it “Hey, what’s up?”, it would not be able to reply, because those words are not in it’s vocabulary set. Instead, it would feed me a ‘KeyError’. Another issue is that I had to manually tweak the syntax a few times (such as adding commas or periods) because grammar usage is actually very complex to learn. 

Where I Would Like to Take This

I think there are a lot of possibilities to expand upon this project, because language and meaning is such a complex, layered topic. You could argue that the bot has it’s own way of creating ‘meaning’, in the sense that the model I trained it with is predicting words that it finds meaningful in relation to the phrases I ask it.  Another aspect is that rather than seeing my original data set of “War and Peace” as a drawback, I could use it to my advantage, such as feeding another bot a different novel, and seeing how the language and syntax changes. I could have two bots trained on two different novels talk with each other, or even have multiple bots converse in a group. I could also train the bot using my own chat history from Facebook or WeChat, and have it emulate my style of speech. Interestingly, Wernicke’s patients have also been tested using pictures, where they are given certain photos, and asked to describe the content of said photo:

As you can see from the photo, there is more than one type of aphasia, which means that I could also create a bot with Broca’s Aphasia (partial loss of the ability to produce language, but with meaning intact), and have it interact with the Wernicke’s Aphasia chatbot. Or, I could just feed the bot an image, and have it describe the image as well, which I think would produce interesting results. 

Ultimately, I am still new to the field of machine learning, but I aim to explore more into language processing and interactions between the user and the machine. Hopefully, I will be able to utilize more deep learning techniques to better train my model for the final project. 

Midterm Proposal: Imprinting

For my midterm project, I plan on expanding upon the curiosity aspect of the vehicle I created for the previous Braitenberg Vehicle lab. Originally, I had the bot move forward if the ultrasound sensors detect an object within a certain distance from the bot, and away if the object moves too close. However, one aspect of this that I want to improve upon was that the bot only moved in a straight line, which was quite limiting in terms of interactivity with the user. 

In terms of the midterm, I plan to have the bot follow an object if it appears within a specific threshold, and turn accordingly, so that it feels more ‘real’ by actually moving towards and turning with the object. If it does not detect something nearby, I will have the bot’s servo head move left and right, so that it seems like it is ‘searching’ for something. 

The animal behavior that I want to mimic is the method of imprinting, where certain animals form attachments the moment they are born, usually with the first organism they see. This is especially noticeable with ducklings: