Week 10: Style Transfer (Cassie)

For this week’s assignment, I decided to train the style transfer model with one of Jackson Pollock’s paintings:

The reason I chose to use this painting, besides the fact that I like Jackson Pollock, is that when I was considering using style transfer for my midterm project, Professor Aven mentioned that images that have bright colors and very defined shapes would work the best. While this piece doesn’t really have very defined shapes, the colors are still pretty different from each other.

After the model was trained, I put it into the styletransfer style.js code from Professor Aven’s github to test the output through the webcam. This was the result:

The shapes generated were interesting, kind of like a honeycomb. The colors somewhat matched the source image, but it also seems like some new slightly different colors were generated. If I saw this image without knowing how it was made, I wouldn’t think that it had anything to do with Jackson Pollock, though.

Now…what to do with this? I was really inspired by Roman Lipski’s Artificial Muse in how he incorporates his own paintings and combines them with his algorithm so that the role of artist is split equally between human and machine. This whole style transfer process also reminded me a lot of when I was first learning how to draw and paint: my art teacher would always give us some references that we would just straight up copy to try and improve our own skills. Combining these two ideas, what would it look like if I tried to paint my own Jackson Pollock painting, and then show that painting to the Pollock-trained style transfer? What would the combination of a human replicated Pollock painting and a machine replicated Pollock painting style look like?

I first attempted (key word: attempted) to paint the Pollock painting on a small canvas:

I then held the painting up to the webcam with the trained model, which created this output:

The colors are a bit duller, and the strokes are smoother. However, the whole thing is kind of blurry and there is this faint bumpy grid pattern over the whole image. I kind of like these effects because they would be difficult to achieve with paint on canvas – they very much digitize the style.

Overall, this was an interesting experiment and I think this concept is something I would potentially want to further explore for the final project.

Midterm Writing Assignment – Ziying Wang (Jamie)

For the midterm project, I developed an interactive experimental art project: Dancing with a stranger.

Background:

Dancing with a stranger is an interactive experiment art project that requires two users to participate. The idea is that user A and B’s limb movements will be detected. User A will be in control of the figure A’s arms and figure B’s legs, user B will be in control of figure A’s legs and figure B’s arms. The result will be presented on the screen with a dark starry night background with two glowy figures dancing. Ideally, the webcam can also detect the speed of the users’ feet movements and switch through a set of different songs that match distinctive speeds. 

The following photoshopped image illustrates the project. The white dots are used here to demonstrate the joints that will be detected on the users, and they will not appear in the final result.

The yellow figure’s arms and the pink figure’s legs are the movements of one user; the yellow figure’s legs and the pink figure’s arms are the movements of the other user.

Motivation:

The idea of this project was inspired by Sam Smith and Normani’s song “Dancing with a stranger”. When I’m listening to this song, it presents me with a picture that even though the two people are not familiar with each other, they are bonded by the music and therefore create a tacit, mutual understanding. In most dual games/ interactive designs, each player is asked to take full control of his/her character, I decided to pursue a different way. What if a person can only control half of the character, and only with the cooperation with another person, can they successfully create a beautiful dance together? That’s how this came to mind. 

Methodology

To create this project: Dancing With a Stranger, I need the Posenet Model to detect two people and record their coordinates simultaneously but separately. The model records the left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees and left and right feet. After storing all the coordinates in the parameters, I create bezier through every three dots and simulates the limbs of the two users. I further used the nose coordinates of the two users to represent the face positions. When the body structure has been formed, I trace the trail of the figure’s body movement, so that when the figure moves on the screen, there will be colored trails tracing the movement. The last step for this project was to detect the speed of the two movements and switch between the fast and the slow song according to the average speed of the two people.

Experiments

This is the video of a single person demonstration (sound on):

This is a screenshot of the two people model:

I started with accessing two sets of body coordinates in Posenet model. By console.log(poses), I am able to access all the data stored as objects inside different arrays within the poses array. Even though Posenet is distinguishing subject by subject, which means that they are classifying a full set of bodyparts’ coordinates within the array of subject 1 and another set for subject 2 and so on, it fails to be completely accurate when there are overlap bodyparts between different subjects. This is a huge obstacle for me since my primary goal was to apply a pair of arms of one person on the pair of legs of another person, without distinguishing the bodyparts clearly, the effect can’t be achieved and the same arms of the person will be on his/her own pair of legs. After I planted the bezier coordinates separately and attached them to the nose position, theoretically the model would work as I imagined. I then started a single-person-demo to work on the time-lapse effect. 

Originally, I thought about building arrays that would store the previous 100 coordinates of each body part and display the 100 beziers at the same time, only in different opacity, but as I started to work on it, I discovered that the dataset is huge and confusing. I then consider changing the background opacity to create the fading effect, but somehow it doesn’t work on canvas, the areas that are covered by the trail becomes dark gray and remains on the canvas. I, therefore, decided to take the model down from the canvas and build it directly on p5.js. By using the function background(0,20), the opacity change successfully worked for p5. The movement of the figure leaves the trace behind it and it would fade away as time goes on.

I then started to work on the nose speed, I stored the coordinates of the nose 100 loops before and compare the distance between the two coordinates, this may seem not too accurate since the person can go back and forth then return to the previous coordinates in really fast speed, but since the interval is controlled as 100 loops, the possibility of this is very low. To refine this system, I can shorten the intervals and adjust the constant. To apply this to the two people model, I only need to calculate the average distance of the two distances and build the conditions upon the speed (defined by distance). When the average speed is above a number, the movingFast() is executed, vice versa. However, it didn’t perform well because every time the movingFast function is performed, the song starts to play, and it therefore constantly does the starting action. I then revise it to recognize whether the song is playing or not, if it’s already being played, the program would skip the play function. 

Then I apply all techniques to the two people model, by calculating the average speed of the two noses, the program would switch between the fast and slow song. The two figures both consist of the color yellow and pink, indicating the body parts belonged to different users. When there aren’t two and only two people in front of the webcam, it displays a loading gif.

The loading page:

However, due to the problem of not classifying the two users’ bodyparts clearly, the two people model can’t perform as well as I imagined and would display the wrong color if it misrecognizes. It would also fail to locate the hips’ y position accurately. I suppose if a better camera is used for detecting, instead of the laptop’s webcam, the project would perform better.

Social Impact & Artistic Influence

For Dancing With a Stranger, my goal is to bring people closer by interacting with each other through dancing to music. For me, music and dancing are the things that break down boundaries. By combining technology (PoseNet and p5) and art together, I created a new approach to entertainment. The users (currently 2) are closely connected and influencing each other constantly in this project and together, they control the music choices——the music choice depends on the average moving speed of the users. We are living in a society where our lives are closely connected with technologies, there were lots of previous designs that aim at separating humans from their technologies in order to strengthen the human-to-human bonds. However, I don’t think that is the right attitude to deal with the booming technologies in our era, a better approach should be strengthening the bond among people with the help of technology. With p5, I managed to create an unrealistic visual effect of body movements. With PoseNet, I get to present what happens in the physical world on the screen and mirrored the movement in a non-human form. The users can, therefore, enjoy this process of being themselves but not actually themselves on a virtual platform.

Further Development

For further development, I’d like to transform my project that based on one pc to a platform that allows multiple users to use their own devices for PoseNet detection and mirror their images onto one public platform. The user gets to see their own image on a communal screen and perform the dancing together on that communal platform while using personal devices. Preferably, the color of the user’s figure changes after the model collect the speed of each user and compare them together, then assign different colors from the fastest to the slowest figure. I’ll also input more music choices into the model and implement different ways to decide on music. If possible, I’d also switch to different backgrounds instead of the all-black one. I would also improve the figures displayed on the screen by adding more p5 effect to the figures.

Midterm Writing assignment —— Lishan Qin

Overview

For the midterm project, I developed an interactive two player combat battle game with a “Ninjia” theme that allows players to use their body movement to control the actions of the character and make certain gestures to trigger the moves and skills of the characters in the game to battle against each other. This game is called “Battle Like a Ninjia”.

Background

My idea for this form of interaction between the players and the game that involves using players’ physical body movement to  trigger the characters in the game to release different skills is to a large extent inspired by the cartoon “Naruto”. In Naruto, most of the Ninjas need to do a series of hand gestures before they launch their powerful Ninjitsu skills. However, in most of the existing battle games of Naruto today, players launch the character’s skills simply by pushing different button on joystick. Thus, in my project, I want to put more emphasis on all the body preparations these characters do in the animation before they release their skills by having the players pose different body gestures to trigger different moves of the character in the game. Google’s pixel 4 that features with hand-gesture interaction also inspires me.

Motivation

I’ve always found that in most of the games today, the physical interaction between players and games is limited. Even though with the development of VR and physical computing technology, more games like “Beat Saber” and “Just Dance” are coming up, still, the number of video games that can give people the feeling of physical involvement is limited. Thus, I think it will be fun to explore more possibilities of diversifying the ways of the interaction between the game and players by riding of the keyboards and joysticks and having the players to use their body to control a battle game.

Methodology

In order to track the movement of the players’ body and use them as input to trigger the game, I utilized the PoseNet model to get the coordination of each part of the player’s body. I first constructed the conditions each body part’s coordination needs to meet to trigger the actions of the characters. I started by documenting the coordination of certain body part when a specific body gesture is posed. I then set a range for the coordination and coded that when these body parts’ coordinations are all in the range, a move of the characters in the screen can be triggered. By doing so, I “trained” the program to recognize the body gesture of the player by comparing the coordination of the players’ body part with the pre-coded coordination needed to trigger the game. For instance, in the first picture below, when the player poses her hand together and made a specific hand sign like the one Naruto does in the animation before he releases a skill, the Naruto figure in the game will do the same thing and  release the skill. However, what the program recognize is actually not the hand gesture of the player, but the correct coordination of the player’s wrists and elbows. When the Y coordination of both the left and right wrists and elbows of the player is the same, the program is able to recognize that and gives an output.

Experiments

Originally, I wanted to use a hand-tracking model to train a hand gesture recognition model that is able to recognize hand gestures and alter the moves of the character in the game accordingly. However, I later found that PoseNet can fulfill the goal I wanted just fine and even better. So I ended up just using the PoseNet. Even though it’s sometimes less stable than I hope, it makes more using diverse body movement as input possible.During the process of building this game, I encountered many difficulties. I tried using the coordination of ankles to make the game react to the players’ feet movement. However due to the position of the web cam, it’s very difficult for the webcam to get the look of the players’ ankle. The player would need to stand very far from the screen, which prevents them from seeing the game. Even if the model got the coordination of the ankles, the numbers are still very inaccurate. The PoseNet model also proves to be not very good at telling right wrist from right wrist. At first I wanted the model to recognize when the right hand of the player was held high and then make the character go right. However, I found that when there is only one hand on the screen the model is not able to tell right from left so I have to programmed it to recognize that when the right wrist of the player is higher than his left wrist, the character needs to go right.

Social Impact 

This project is not only an entertainment game, but also a new approach to apply the technology of machine learning in the process of designing interactive game. I hope this project can not only bring joys to the players, but also show that the interaction between game and players is not limited by keyboards or joysticks. By using the PoseNet model in my project, I hope this project allows people to see the great potential the machine learning technology can bring to game design in terms of diversifying the interaction between players and games, and also raise their interest in learning more about the machine learning technology through a fun and interactive game. Even though today most of games still focus on the application of joysticks, mouse, or keyboards, which is not necessarily a bad thing, I still hope that in the future with the help of machine learning technology, more and more innovative way to interact with games will become possible. I hope people can find inspiration from my project.

Further Development

If given more time, I will first improve the interface of the game. Since it has brought to my attention during user test that many players often forgot the gestures they need to do to trigger the character’s skill. Thus, I might need to include an instruction page on the web. In addition, I will also try to make more moves available to react to player’s gesture to make this game more fun. I was also advised to create more characters that players can choose from which character to choose to use. So perhaps in the final version of this game, I will apply a style transfer model and ask the model to generate different character and battle scene to diversify the players’ choice.

Midterm Documentation — Crystal Liu

Background

         My inspiration is a project called teachable machine. This model can be trained in real time. The user can make some similar poses as the input for a class. The maximum of the class is 3. Each pose corresponds an image or GIF. After setting up the dataset, once the user makes one of the three poses, the corresponding result will come out.

For me the core idea is excellent but the form of output is a little.  There are also some projects about motion and music or other sound.

So I want to add audio as output. Also, the sound of different musical instruments is really artistic and people are familiar with it. Thus, my final thought is to let users trigger the sound of the instrument by acting like playing the corresponding musical instrument. 

Motivation

My expected midterm project is an interactive virtual instrument. Firstly, the trained model can identify the differences in how musical instruments are played. Once it gets the result it will play the corresponding sound of the instrument. Also, there will be a picture of the instrument on the screen around the user.

        For example, if the user pretended to be playing guitar, the model would recognize the instrument is guitar and automatically play the sound of guitar. Then there will be an image of guitar showing on the screen. The expected result is that it looks like the user is really playing the guitar on the screen.

Methodology

In order to achieve my expectation, I need the technology to locate each part of my body and to identify different poses and then classify them automatically and immediately. According to the previous knowledge, I decide to use PoseNet to do the location part.

I plan to set a button on the camera canvas so that the users don’t need to press the mouse to input information and the interaction will be more natural. To achieve it, I need to set a range for the coordination of my hand. When I lift my hand to the range, the model will start receiving the input image and 3 seconds later it will automatically end it. Next time the user makes similar poses the model will give corresponding output. KNN is a traditional algorithm to classify things. So it can be used to let the model classify the poses in a short time and achieve real time training.

Experiment

Virtual button

The first step to make my project is to replace buttons that users need to use the mouse to press by to the virtual button triggered by the human body part, for example, the wrist. To achieve it, I searched for the image for the button and found the following GIF to represent the virtual button. To avoid touching mistakenly, I put the buttons at the top of the screen which looks like the following picture.

I used the poseNet to get the coordinates of the user’s left and right wrists and then set a range for each virtual button. If the user’s wrist approaches the button, the button will change into GIFs containing different instruments (Guitar, drum and violin). These GIFs play the role of feedback to let the user know they have successfully triggered the button. 

After that the model should automatically record the video as the dataset. The original one is that if the user press the button once, there will be five examples added to the class A. For my virtual button, the recording part should run once the user trigger the button. However, I need to set a delay function to give users time to put their hands down and prepare to play a musical instrument. Because the model shouldn’t count the image that users put their hands down as the dataset. So I set 3s delay for the users. But collecting examples is discontinuous if I keep raising my hand and dropping it.

Sound

The second step is to add audio as the output. At first, I said if the classification result is A and then the song will play (song.play( ); ). But the result is that the song played a thousand time in 1 second. Thus, I can only hear noise not the sound of guitar. So I asked Cindy and Tristan for help, and they suggested me to use the following method: if the result is A and the song is not playing right now, the song will play. Finally, it worked. There was only one sound at a time.

UI

The third step is to beautify UI of my project. First is the title: Virtual Instrument. I made a rectangle as the border and added an image to decorate it. It took some time to change the size of the border to the smaller one. Also, I have added shadow to the words and added 🎧🎼 to emphasize music.

Then I added some GIFs which shows the connection between body movement and music. They are beside the camera canvas.

At last I added the border to the result part:

Problems

The problems I found in the experiment are as follows:

  1. The process of recording and collecting examples is discontinuous. It often gets stuck. But the advantage is that the user will know whether the collection part end by seeing if the picture is smooth or stuck. Also, the stuck image may have something to do with my computer.
  2. Sometimes the user might touched two buttons at the same time, but it is hard for me to avoid this situation through the code. So I just changed the range of each button to widen the gap between them. 
  3. I have set the button to start predicting but it was hard for the model to catch the coordinates of left wrist. Sometimes it took a lot of time for the user to start predicting. Thus, I have changed the score from 0.8 to 0.5 to make it better.
  4. Once the user pressed the start button, there would be a sound of the drum, even though the user didn’t do anything. It made me confused. Maybe it is because that KNN cannot consider the result that doesn’t belong to any classification. The model can only consider the most possible classification the input belongs to and give the corresponding sound.

Therefore, the next step is to solve the problems and enrich the output. For example, I can add more kinds of musical instruments. And also the melody can be changed according to different speed of the body movement. 

Social impact

My goal is to create a project to let people interact with AI in an artistic and natural way. They can make the sound of a musical instrument without having a real physical one. Also, it is a way to combine advanced technology and daily art. It provides an interesting and easy way to help people learn about and experience Artificial Intelligence in their daily life. In a word, it provides a new way of interaction between human and computer. I think this project can be used as a way to display the project in the museum and as a medium to bridge the viewers and the project.

Further development

The next step is to solve the problems happened in the experiment and enrich the output. And I need to fully utilize the advantage of real-time training in my final project. My idea is to give users more opportunities to show their thought and creativity. To let them decide which kind of input can trigger which kind of output. Also, I want to add style transfer model to enrich the visual output. The style of the canvas can be changed as the mood of melody changes. For example, if the user choose the romantic style of the melody, the color of the canvas can turn to pink and also there will be pink bubble on the canvas. But the most essential problem is how to let the users create their own ways of expression through the real-time training. How to make the interaction smoother is also important for the final project. Also, I want to take advantage of the sound classification to play the role of the visual button. On the one hand, the users can create their own sound command to control the model. On the other hand, this function can avoid the problem happened in my previous experiment. But I am worried that the model of sound classification can not work accurately enough so that the result can not reach my expectation.

Week 10 Assignment: Train & inference style transfer

1) Train your own style model with the knowledge covered in class and 2) Develop a simple project (could be an exploration of ideas for your final), utilizing ml5.js style transfer and Document your work on class blog.

  • Train your own style model and document your successful attempts or temporal failure. Please include your source style image and some sample pairs of source and styled image.
  • Develop a simple project utilizing the style transfer model you obtained.
  • Post it on the IMA blog before Friday Midnight, 15th with tag: aiarts10