MLNI – Final Project (Billy Zou)

Adventure of Sound

This project is open source. Source code can be found on https://github.com/DaKoala/adventure-of-sound. You can also try this project on your machine by following the guide in the readme file of this repository.

This project is an interactive sound game. Users need to control the size of a box in the center of the screen. There will be obstacles appearing in the front and there are holes of different shapes on the obstacles. The goal is to make the box fit the size of the hole. The better it fits the hole, the higher score the user will get. There are multiple stages and each stage has a target score. Failing to meet the target score or bumping into an obstacle will lead the game end.

Architecture

Typically a HTML file + a CSS file + a JS file is a pattern adopted by most p5 projects. As for me, to make my code well split, I used a bundler, which is Parcel, to bundle my code. Therefore, I can separate my code into different files but in the end only a single JS file will be generated.

I also utilized TypeScript, which is a superset of JavaScript. It was only effective when I wrote my code. TypeScript provides some features that JavaScript does not have, like static type check, which is the feature I love the most. Final output is always vanilla JavaScript.

Inspiration

The game is inspired by a TV program Hole In The Wall, in which competitors are required to pass through walls with different postures.

The basic idea of the game is the same as the TV program. There are holes on the wall and players are required to shape the box.

Sound Processing

Implementation: https://github.com/DaKoala/adventure-of-sound/blob/master/src/sound.ts

Sound is the only way that users interact with the box. Width of the box is related to the volume and height of the box is related to the pitch. I used p5 sound library to get the volume, which is the AudioIn.getVolume method. For the pitch detection, I used ml5 Pitch Detection. It is worth pointing out that the pitch detection function is asynchronous, so what I did was executing this function recursively and storing the value on the instance.

Drawing Objects

Box: https://github.com/DaKoala/adventure-of-sound/blob/master/src/SoundBox.ts

Obstacle: https://github.com/DaKoala/adventure-of-sound/blob/master/src/Obstacle.ts

Track: https://github.com/DaKoala/adventure-of-sound/blob/master/src/Track.ts

This game is 3D, which was a challenge for me because I rarely develop 3D projects.

This first challenge I faced was changing the perspective, which was implemented by calling the camera function in p5. It took me some time to figure out the usage of different parameters. The value I finally set was the result of testing several times and finding the most comfortable value for me.

It was also challenging for me to draw objects. The way to draw objects in 3D environment is different from the one in 2D environment. What I did was using the translate function to translate the origin and drawing objects.

To save code, I designed a function called tempTranslate, standing for Translate Temporarily.

const tempTranslate = (func: () => void, x: number, y: number, z?: number) => {
  translate(x, y, z);
  func();
  translate(-x, -y, -z);
}

This function is useful for drawing 3D shapes.

Game Design

Game play is an important part of a game. For a game with simple logic like this, it should not take beginners too much time to figure out the interaction. Also, incentive that attracts players is another part need to be take into consideration.

To help users understand the interaction better, the box has a minimum size so even if there is no sound, the box is visible. However, there is a problem that players can score with no sound input.

I got some helpful feedback during presentation and did some modification after that. When the box size is too small, no score will be added.

Difficulties

I think the largest difficulty in the development of this project is handling 3D objects. I am a newbie this field and still learning. Therefore, the shapes in the game seems a little awkward.

I am still not familiar with some concepts in 3D development such as light and camera. I think these things are meaningful to learn and this project for me is an introduction.

Words in the End

This course is the last IMA course I take in my college (apart from the Capstone). I really enjoy the happiness brought my IMA courses and wonderful IMA instructors.

November 20, 2019

MLNI – Final Project Concept Presentation (Billy Zou)

https://docs.google.com/presentation/d/1-RNT8ygWXF9DZMM0aNRjOs6MwzrPeF8DyP72fOKfL9k/edit?usp=sharing

November 11, 2019

MLNI – Sound Password (Billy Zou)

Sound Password

This is a mini project with real-time trained “machine learning model” where users will use sound to train the model. Users need to record a sound clip as password in order to save some secret information. They can unlock the information by reproducing the same sound.

Implementation

In this assignment, I utilized the FFT algorithm provided by p5 library. After processed by FFT algorithm, the sound characteristics at a frame is represented by a number array of length 1024 with each entrance a value ranging from 0 to 255. To store the sound data in a certain period of time, I used a 2D array.

The KNN methodology used here is comparing the difference between two number array. Iterate through two arrays and compute the absolute difference between two numbers with the same index. Finally, divide the sum by 1024 and the derived value is the difference of two sound waves at the same time point.

Then we can expand this technique. Compute the accumulation of such difference and return the average value in each time point. The final value we get is a value ranging from 0 to 255. However, after several test, I found this value quite low and decided to use 10 as threshold.

Difficulties

After finishing the assignment, I was still unable to synchronize the two sound clips well. A user may successfully reproduce the same sound clip as the recorded password. However, they may not start at the exactly same point. Therefore, such offset causes obvious inaccuracy in the final difference.

I think the algorithm I used in this project is quite weak. To make the algorithm robust, I believe some mathematical techniques need to be involved.

November 3, 2019

MLNI – Midterm Project (Billy Zou)

Living Image

GitHub Link: https://github.com/DaKoala/living-image

This is an interactive webpage allow users to interact with images using their faces.

Usage

There are plenty of movements to interact with the image.

Move face forward/backward: Change the scale of the image.
Move face left/right: Change the brightness of the image.
Tilt head: Rotate the image.
Turn head left/right: Switch image.
Nod: Switch image style.

Inspiration

Before midterm, I found that the Style Transfer was really cool, so I decided to do my midterm project with it. However, the project would lack interaction if I only used Style Transfer. Finally I integrated Style Transfer with PoseNet to develop a project with sufficient interaction and cool visual effects.

Typically, people interact with images in websites using mouse and keyboard. With the help of machine learning technology, we can make images reactive. In this project, it seems that images can “see” our faces and change their status based on our faces.

Development

Code Split

To organize my code well, I used TypeScript + parcel to code the project.

TypeScript is superset of JavaScript, it has all features that JavaScript has. Besides, it has static type check, which can prevent a large number of errors before during editing period.

Parcel is a tool to bundle my code. I split my code into different modules during development, but in the end all the code would be bundled into one HTML file, one CSS file, one JS file and some static asset files.

Structure of my source code

Structure of my distribution code

Face

In the project, I used a singleton class Face to detect user interaction.

Basically, it retrieves data from PoseNet every frame to update the position of different facial parts and compute the state of the face.

I used a callback style programming method to trigger functions when users nod or turn their heads.

To prevent image shaking, I set some thresholds. Therefore the image will update only if the difference between two sequential frames exceeds a certain value.

Update Image

I used Style Transfer to add special effects to the image. Other properties of the image, such as size and direction are all implemented by CSS.

Difficulties

At the early stage of this project, I spent a lot of time trying to make Style Transfer work. The biggest problem I had was that Style Transfer never got the correct path of my model resources.

Finally I solved this problem by looking at the source code of ml5.

https://github.com/ml5js/ml5-library/blob/development/src/utils/checkpointLoader.js

When we use Style Transfer, we usually pass a relative path of the model as the first parameter. In the source code, I learned that during execution, it will send a HTTP request to get the model. That is the reason why we must use a live server to host the webpage in class.

The previous failure was that the development server treat the path of the resource file as a path of the router, instead of a request to a static file hosted by the server. To solve the problem, I added a .model extension to all model files. In this way, the server will recognize it correctly.

Future Improvement

This project proves the feasibility of making websites interactive by equipping them with machine learning libraries.

In the future, I want to develop some well-functional applications with the same methodology I used in this project.

October 23, 2019October 23, 2019

MLNI – Midterm Project Concept Presentation (Billy Zou)

https://docs.google.com/presentation/d/1dLpZ9EKKr6ke0zxuaLsdUua-7Bux1wNJ0lPbnkiru0w/edit?usp=sharing