The goal for this week was to train the model using the given code. I ran the model training program using the given instructions. I had to repeat the download for the dataset a couple times since it failed. I found the qstat command to be very useful for checking if my training was still going on, since the training took a couple tries. Like others, I was unable to finish the training since the maximum runtime was 24 hours for the Intel AI Cloud. So, I modified the code to skip training and only include the checkpoint conversion to a format that ml5.js supports. Alternatively, you could modify the number of epochs to reduce training time. I was able to successfully get the model, which seems to only be a series of weights compiled by the program. I trained two models, but these are incomplete and likely need more training to have a better result.
Style Image:
Input Images:
Output Images:
I find this form of machine learning technology to be really cool, since it combines art and computer science together. This would not have been possible just a few years before. I have noticed that Google Photos sometimes gives me stylized suggestions for photos, and I would think that a variation of this model is used for their implementation of style transfer. I noticed artifacts and weird patterns in the output, so the model might require further training or modifications. Either way, I enjoyed the unique style it generated, and I look forward to seeing more machine generated art in the future.
For my midterm project, I wanted to make a complete game using the machine learning models discussed in class. I used Posenet.js and Handtracking.js as control inputs for my games. These models were a fun way to experiment with new input methods; I think there are many possibilities that can be paired with these models in real life. I wanted to recreate flappy bird in 2D or 3D; I changed my mind and tried making different games that were more suitable to the Posenet model I was using. I used the three.js library recommended by moon to make the Fruit Ninja game more realistic and interactive. I chose these games because they had simple controls and were recognizable by people of all ages.
My inspiration for this project was this article on gesture based games using hand tracking. I love flappy bird, so I built it and tested my concept, but I found out pretty early that this would be impossible since flappy bird is very difficult to play, just with touch controls alone. The user would get a concussion at the rate it took to get the bird in the air. I thought of Chrome’s dinosaur game as a better alternative.
To make a simple game, you can use an infinite loop to show and update the location of each object. For this game, I used p5.js to draw rectangles and video output, as well as render images and text. I used an array of obstacles to store the address of each of the obstacles so that they could be updated as game progressed. Every 100 frames, the game would call new object and dispose of older objects that have gone past the screen. To find when a player is jumping in the camera, we can calculate the velocity between two points using (prevx – currentx) / 2. To make the dinosaur jump, I added the acceleration of the jump to the velocity, and added the velocity to the y coordinate every frame. To check for collisions, I used hitboxes for cacti with rectangles.
I experimented with Fruit Ninja before my presentation but I was unable to get it to work despite hours of research. The game is built using three layers – a transparent p5 layer, the game canvas using three.js, and a background/flash layer. Later, I went back and modified my code to use raycasting thanks to Professor Moon. This worked miracles, as I was able to find the 2D point in a 3D space without too much difficulty. I did have to convert it to a Vector2 object which can be found with the formula (screenX / window.innerWidth * 2 – 1).
However, the more challenging part was defining the physics since three.js does not have a built in physics engine. I used what I learnt from the dinosaur jump game to calculate the position of objects. When the raycaster detected a collision, it would remove it from the screen and call another fruit object to be launched. Fruits are randomly generated each time this function is called. If this is a bomb and the user hits it, the game ends.
Challenges
I faced several challenges when making these two games. I was unable to get more than one object on the screen at one time with a single Collada loader. I used a Constructor to call a new Fruit() object every time, but only one would update at a time. I should implement some type of input smoothing function, as posenet’s inputs are very unreliable and jump all over the place. As for the dinosaur game, the hitboxes can be somewhat inaccurate and it needs to be changed based on the obstacle size.
Conclusion
If I could make changes, I would use handtrack.js instead of posenet for hand tracking as it is more specialized, more efficient, and more accurate for my project. These games can take up a lot of resources on the computer, and it needs to be optimized.
I learned a lot about how three.js works and hope to use it in my projects for the future. Game design is quite interesting and I really respect game developers for their work now. Even with a high level library like three.js, it takes a lot of work to make even a simple game. Posenet has many applications, and I think we will see it a lot more in the future.
For my midterm project, I want to make a more complete game using the ML5.js posenet library. I really enjoy experimenting with new input methods, and I think there are many possibilities that can be paired with this model alone. Using PoseNet’s facial mapping feature, I hope to create a more natural and intuitive way to interact with a game. I want to recreate flappy bird in 2D or 3D; instead of tapping the screen the user will control avoid obstacles by bobbing up and down. If I have enough time, I will use the three.js library as recommended by moon to make the game in first person or add multiplayer capability. I chose this game because it has simple controls, uses few computer resources, and is easy to recreate with javascript. I believe there is potential for games to include more natural input in the future as machine learning gets better.
For this week’s assignment, the goal was to train the model for the highest accuracy. Since the code was already given to us, I just tweaked the three variables batch_size, num_classes, and epochs to give me the best result. From my understanding, batch_size controls the number of training examples in one forward or backward pass. One epochs is a full iteration through the dataset. I’m not sure if I was suppose to change num_classes, but I experimented with it anyways.
Process
I modified the code so that I could call a single function test_model to run the model overnight. I ran it for about 6 hours and was able to process 16 different models with various parameters. I used control variables and only modified one variable at a time to understand the effect of each individual variable on the accuracy. However, this approach isn’t correct because the model is not a simple linear function. I outputted the batch size, epochs, and number of classes, along with accuracy and loss to a text file for analysis later. In hindsight, I wish I could have logged the amount of time each model took as well.
Below are the results I obtained in table form and in a line graph. There are some interesting relationships that seem to occur.
As number of epochs goes up, the accuracy increases, but only up to a certain extent. After 16, my guess is that the epoch size will go down or stay relatively constant. As the batch size increases, there is a small relationship that shows the accuracy decreases. The accuracy goes down as the number of classes increases.
Conclusion
The best result I got was with a batch size, class size, and epoch value of 10, 10, and 10 with an accuracy of 73%, similar to the example given. I think this is pretty reasonable considering the amount of time it takes to train isn’t too bad. I also tried running the model with the best variable from each set, batch size of 32, class number of 10, and epoch of 16; this worked well with an accuracy of 72% but it wasn’t all that surprising. I hope that as I learn more machine learning concept I will be able to understand how this all works. Under-fitting and overfitting is apparent from the results I obtained. I am very excited to learn how to use the Intel AI cloud service to run my code, because the amount of time it takes to train the model is too long. I hope to revisit this in the future and see if I can do a better job of training the model.
P5.js has many interesting functions and models to choose from. I was very interested in PoseNet when I learned about it in class, because it has a very high accuracy and is relatively easy to use. Many of the other models had issues with performance and required more programming experience. The program can label and classify different parts of the body along with the location on the screen. One of my favorite classics while growing up was pong, which often included a single player mode with an AI. One of the most interesting implementations that I have seen is an augmented reality rock climbing wall. Although I cannot try it in person, I decided to make a machine learning version that could be played at home. For my version of pong, I could use the Y position of my nose to control the paddle.
Process
While brainstorming, I was stuck on a lot of different ideas that were either too difficult to make or weren’t very interesting. I am very new to machine learning and I don’t understand how a lot of it works. First, I thought about training the model to detect facial recognition and use it to like and dislike photos on tinder using a javascript API but this would take too long. Another idea I had was to use the YOLO model and LTSM text generator to give dialogue to people based on facial emotions. In the end, I decided to pair machine learning with a game to make it more modern.
I used the PoseNet example in class and modified it to only capture nose information. I watched a YouTube tutorial to learn how pong works and modified the code to overlay on top of the video output. The Y position of the nose is sent to the player paddle class to move to a specific X and Y location on the screen. Finally, I created a computer paddle class so that the computer controls the movement, similar to a rudimentary AI. I had to change the difficulty so that the user would have a chance to win every once in a while.
Demo
Possible Improvements
In the future, I want to add multiplayer support so that you can play with your friends. The only problem is that it may be difficult to identify whose nose belongs to which paddle. I intend to add sound effects and music before class on Monday. Also, I had to tweak the difficulty of the AI because it does not behave like a human. A single variable controls whether it is easy or impossible to beat. Facial recognition works surprisingly well, but sometimes it glitches when the user goes off the screen. This could be fixed with some maxY conditionals. All in all, I think the project turned out pretty well. I wish to make some minor refinements but I am happy with the overall concept.