MLNI – Final Project Concept – Dr. Manhattan (Lishan)

Overview

For the final project, I’m planning to build an educational game that tests the users about their biology knowledge about human body by asking them to assemble their own body parts. It’s called “Dr. Manhattan”.

Inspiration

The project was largely inspired by the superhero character “Dr. Manhattan” from Watchman. Jonathan Osterman, aka the Dr. Manhattan, was a researcher at a physics lab and one day an experiment went wrong, the lab machine tore the body of the researcher into pieces. However, in the following months, a series of strange events happened in the lab. It turned out that the conscious of Dr. Manhattan survived from the accident is progressively re-forming himself, from a disembodied nervous system including the brain and eyes; then as a circulatory system; then as a partially muscled skeleton, and finally he managed to rebuilt himself as a person. So, in my project the user will become the conscious of Dr. Manhattan that survived from the accident and has to rebuild his body from parts.

.     

         

How it works

In the beginning of the game, the player can see all these body organs in the correct position on his body in the screen. Then the player can press a key to trigger an explosion that will tear apart the body of the users shown in the screen and scatter the body parts. I will train a style transfer model and apply it here to make the image look less disturbing and cooler. The players will then use their hands to retreat their scattered body parts, organs and assemble them correctly to rebuild their body. I will use PoseNet to track the position of their body and “conscious” and to calculate the correct position of their body organ that the player should place the specific body organs on.

Machine Learning techniques used:

  • PoseNet : to track the position of the player and calculate the correct position the organs need to be on
  • Style Transfer

Week 10 Assignment: Train & inference style transfer — Lishan Qin

For this week’s assignment, I trained a style transfer model with the painting below.

The biggest difficulty I met when training this model is that since the internet is extremely unstable, I failed again and again when downloading the model for training. I tried at least 10 times and finally manage to download that at 2 am in the morning…Other than that the procedures are smooth and I finally have a general understanding of what those commands do with the help of Aven. The output for the trained model is as follows.

The style transfer can be triggered by a loud sound)

It’s only after I saw the output did I realize that I probably chose the wrong picture to train. Since the image is black and white, so is the output, which makes it hard for me to identify the similar pattern. Originally I want the output image to have the similar drawing line pattern as the input. However, I think  such detailed imitation requires more training. I should have chosen an image with an obvious color pattern that is easier to observe in output image…Still, I guess the pattern of black, white and gray lines shown in the output is somewhat noticeable, even though it’s not as  obvious as I hope.

Overall, it’s a very interesting experiment. I think it helps me a lot to understand how the style transfer process works and allows me to get my hand on to train a model. I also tried using different signals to trigger the style of the web to change, such as using p5’s pitch detection. The style of the web cam image will change when the mic reaches certain volumn. I also hope I can apply this process of training style transfer model in my final project. The style transfer model can be used to generate different characters or battle scene with same style and theme for my battle game. 

Midterm Writing assignment —— Lishan Qin

Overview

For the midterm project, I developed an interactive two player combat battle game with a “Ninjia” theme that allows players to use their body movement to control the actions of the character and make certain gestures to trigger the moves and skills of the characters in the game to battle against each other. This game is called “Battle Like a Ninjia”.

Background

My idea for this form of interaction between the players and the game that involves using players’ physical body movement to  trigger the characters in the game to release different skills is to a large extent inspired by the cartoon “Naruto”. In Naruto, most of the Ninjas need to do a series of hand gestures before they launch their powerful Ninjitsu skills. However, in most of the existing battle games of Naruto today, players launch the character’s skills simply by pushing different button on joystick. Thus, in my project, I want to put more emphasis on all the body preparations these characters do in the animation before they release their skills by having the players pose different body gestures to trigger different moves of the character in the game. Google’s pixel 4 that features with hand-gesture interaction also inspires me.

Motivation

I’ve always found that in most of the games today, the physical interaction between players and games is limited. Even though with the development of VR and physical computing technology, more games like “Beat Saber” and “Just Dance” are coming up, still, the number of video games that can give people the feeling of physical involvement is limited. Thus, I think it will be fun to explore more possibilities of diversifying the ways of the interaction between the game and players by riding of the keyboards and joysticks and having the players to use their body to control a battle game.

Methodology

In order to track the movement of the players’ body and use them as input to trigger the game, I utilized the PoseNet model to get the coordination of each part of the player’s body. I first constructed the conditions each body part’s coordination needs to meet to trigger the actions of the characters. I started by documenting the coordination of certain body part when a specific body gesture is posed. I then set a range for the coordination and coded that when these body parts’ coordinations are all in the range, a move of the characters in the screen can be triggered. By doing so, I “trained” the program to recognize the body gesture of the player by comparing the coordination of the players’ body part with the pre-coded coordination needed to trigger the game. For instance, in the first picture below, when the player poses her hand together and made a specific hand sign like the one Naruto does in the animation before he releases a skill, the Naruto figure in the game will do the same thing and  release the skill. However, what the program recognize is actually not the hand gesture of the player, but the correct coordination of the player’s wrists and elbows. When the Y coordination of both the left and right wrists and elbows of the player is the same, the program is able to recognize that and gives an output.

Experiments

Originally, I wanted to use a hand-tracking model to train a hand gesture recognition model that is able to recognize hand gestures and alter the moves of the character in the game accordingly. However, I later found that PoseNet can fulfill the goal I wanted just fine and even better. So I ended up just using the PoseNet. Even though it’s sometimes less stable than I hope, it makes more using diverse body movement as input possible.During the process of building this game, I encountered many difficulties. I tried using the coordination of ankles to make the game react to the players’ feet movement. However due to the position of the web cam, it’s very difficult for the webcam to get the look of the players’ ankle. The player would need to stand very far from the screen, which prevents them from seeing the game. Even if the model got the coordination of the ankles, the numbers are still very inaccurate. The PoseNet model also proves to be not very good at telling right wrist from right wrist. At first I wanted the model to recognize when the right hand of the player was held high and then make the character go right. However, I found that when there is only one hand on the screen the model is not able to tell right from left so I have to programmed it to recognize that when the right wrist of the player is higher than his left wrist, the character needs to go right.

Social Impact 

This project is not only an entertainment game, but also a new approach to apply the technology of machine learning in the process of designing interactive game. I hope this project can not only bring joys to the players, but also show that the interaction between game and players is not limited by keyboards or joysticks. By using the PoseNet model in my project, I hope this project allows people to see the great potential the machine learning technology can bring to game design in terms of diversifying the interaction between players and games, and also raise their interest in learning more about the machine learning technology through a fun and interactive game. Even though today most of games still focus on the application of joysticks, mouse, or keyboards, which is not necessarily a bad thing, I still hope that in the future with the help of machine learning technology, more and more innovative way to interact with games will become possible. I hope people can find inspiration from my project.

Further Development

If given more time, I will first improve the interface of the game. Since it has brought to my attention during user test that many players often forgot the gestures they need to do to trigger the character’s skill. Thus, I might need to include an instruction page on the web. In addition, I will also try to make more moves available to react to player’s gesture to make this game more fun. I was also advised to create more characters that players can choose from which character to choose to use. So perhaps in the final version of this game, I will apply a style transfer model and ask the model to generate different character and battle scene to diversify the players’ choice.

Assignment: Innovative Interface with KNN —— Lishan Qin

Overview

For this week’s assignment, I utilized a KNN model to develop an interactive musical program. Users can trigger different songs from various musicals with different body gesture. For instance, when they cover half of their face, the music playing would change to the phantom of opera; when they pose their hands like a cat’s claw, the music would change to “Memory” from cats, when they wear anything green, the song would change to “Somewhere over the rainbow” from the wizard of Oz. The model I used also allows users to train their own dataset, which improves the accuracy of the output.

Demo

Technical Problem

At first I want to use both the users’ speech and movement as input to trigger different output, however, I had some difficulties combining two models together. Still, I think it would be more cool if the user can use both their songs and dance to interact with the musicals.

Midterm Project Documentation —— Lishan Qin

Overview 

For the midterm project, I utilized PoseNet and p5.js to develop an interactive dance game that requires players to move their whole body according to the signs showed up on screen, thus making their body dance to the rhythm of the music, as well as imitating the move of the Pikachu on screen. I named this project “Dance Dance Pikachu”.

Process of Development 

Different signs have different meaning in the game: red means left wrist, blue means right wrist, and when the circle is solid, it requires the user to clap hands. In order for players to dance to the rhythm of the music, the signs show up on the screen must always follow the beats of music. To accomplish this, I used “dia” to get new p5.AudioIn() to get the volunm of the music playing. I used this value as condition for the signs to show up at different times but always according to the beats of the music. Only when this value is higher than a certain number will signs show up. By doing so, the dancer can always dance to the rhythm of the music if he/she follows the signs correctly. I also used the frameCount value when arranging the coordination for the signs to show up. For instance, in the first 10 seconds of the song, during which the frameCount is always under 650, the coordination of the signs showing up would be different than those when frameCount equals 700. By making the coordination of the signs show up different, I’m able to diversify the dance moves in the game.

I used PoseNet to get the coordination of players’ wrists and built a counting score system. The more the player’s movement match with the signs, the higher he/her score will be. I also rely on the AudioIn() value to see if the player claps hands when he/she is supposed to. The player will also get a “Perfect” signal when he/she is doing correctly. I also tried my best to make the moves of the dancer identify with the moves of Pikachu in the screen. I did this by manipulating the condition (the frameCount) for signs to show up as the positions for the signs to make them match the posture of the Pikachu Dancing in the middle of the screen. By doing so, the player is able to dance like the dancing Pikachu.

(dancer: Ziying Wang)                                                               

Technical issue & Reflection

As is pointed out by the guest critics during my project presentation, the audioIn() mic value sometimes messes up with the time of the sign showing up. I should indeed try to catch the beats of the music with other value to make these signs more stable and matches the music. Other main technical issues I met when developing this project are all due to the instability of PoseNet. The coordination it provides is sometimes desultory, which brings difficulty for the game to return output timely. Sometimes even if the player’s hands are in the correct position, due to the model’s failure to recognize the wrist (sometimes because of the players’ quick movement), the game still shows that the player fails to meet the sign. In addition, the model also takes an unfixed number of time to load. I added a countdown session before the game because I don’t want to start the game when the model isn’t even loaded for the player to play. Originally, I want to use uNet to make the player’s figure behind the Pikachu image, so that the players can see if their movement match with the sign. However, when I use both the PoseNet and uNet, it takes even longer time for the model to load, the game also gets stuck more often, and the voice in uNet also messes with the AudioIn() value for the game to work properly.

If given more time, I will try to solve these problems mentioned above. I still think it would be more fun and easier for players if they could see their figure behind the Pikachu image and know where their hands are in the screen, and uNet creates a sense of mystery that allows players to not see their true identity. In addition, I will also try to make the interface look better. I will also try to diversify the dancing moves to make more dancing posture available.