MLNI: Final Project Kevin Xu

For my final project I wanted to continue advancing the work I did for the midterm, but go a step further in smoothing the process, as well as creating a better visual presentation of the material. I ambitiously wanted to create a neural network to recognize handwriting myself, but amongst other assignments failed to give myself enough time to do so, so I defaulted back to using resemble.js as I had in the midterm in order to compare the canvas to an image. With help from Professor Moon, I realized that I could actually refer to the p5 canvas as an html element, which drastically improved the smoothness of the project since I didn’t have to worry about the conflict between saveCanvas() running on a live server. (which normally reloads the live server, not allowing anything to be done after the canvas is saved to the machine.) I still ran into other problems with resemble.js that I did not expect. A problem that had deterred my progress for several hours was in the recording of the comparison data fed by resemble.js.  Since I was trying to make a video game, in which you need to copy the given glyphs in the same order as presented, I wanted to store the ID of each glyph within an array. Then, when the player repeated the gylphs, it would be stored in a separate array then compared to the original. In order to do this, I put the resemble.js command inside a for loop, but it would turn an array specifically limited to 24 values into a length of 25, and only feed data into the 25th item in the array. I tried many ways to amend this problem, one being the addition of a setTimeout() into an empty function to delay the speed of each loop, amongst other attempts to change names and order of which it would call each function. After spending about 3 hours on this single problem, I found that the solution was simple changing the (var i = 0; i<24; i++) to (let i=0; i<24;i++).

Another problem in the same area of code was in the calling of the images themselves. In my last project I preloaded images and was able to refer them in the find difference command of resemble.js. This time, I kept getting an error asking me to import the images as blobs, and even after making sure that the information being referred to was specfically the data image, the problem turned out to be with the physical library as opposed to the canvas image.

In the image above, I had originally simply referred to the image library with imgs[test3] instead of the full directory reference (“js/Images/LibraryImg1/Symbol”+(test3+1)+”.png”).

It was little problems like these that hindered me the most, in hindsight resemble.js was not only a rather impractical way of attempting to recognize handwriting, but was also filled with inconsistencies as such.

In the areas of code not pertaining to resemble.js itself, things went much more smoothly. Since I wanted the visual presentation and interactivity of the project to improve, I spent a fair amount of time trying to develop the game aspect of the project.

A part of the inspiration for the game actually came from the game “osu,” a rhythm game, where you need to hit circles as they pop up on the screen. There is a mod in the game that makes it so that instead of staying on the screen until the note is passed, it simply flashes for a fraction of a second and you simply need to remember where it was. This brought me to the idea of creating a type of game where you needed to rely on this extreme short-term memory in order to pass. About 2 months ago I had also gotten around to playing God of War 4, which included several puzzles involving Nordic runes.  I liked the idea of basing the images off of those symbols, as completely random/made-up glyphs may have been hard to follow, and regular letters or Chinese characters might’ve been too easy and boring. 

I originally tried supporting 24 different runes, and by manually training (appending more and more images to simulate different “handwritings” of the different glyphs) I quickly realized that resemble was not suited for heavy duty work. Even with the base of 24 images, in order to compare each one to the canvas took around 30-40 seconds for just one rune; and the starting level would already have 3. In order to cut down this time, I cut the number down to 16 but the idea of trying to make model more accurate failed as more pictures added significantly more time. 

I also wanted the game to be able to go on forever like an arcade game, and the objective was simply trying to retain the highest score. In order to up the difficulty over time, I made it so with each level the speed at which the runes flashed became faster, resetting every 5 level along with the addition of an extra rune (3 runes lvl 1-4, 4 runes 5-9, etc). I made sure to specifically code everything to keep scaling larger with these numbers as well. 

This was best exemplified in the gamescreen “Pre-game,” where I did all the calculations for how many runes to show, tracing what level you were on, and the speed at which the runes were shown. Overall, my disappointment mostly lied in my choice of using resemble.js instead of putting in time to truly develop a neural network capable of more accurately understanding what the user input.  While I certainly had an unfortunately timed semester, with midterms dragged out right to the start of finals, I certainly could’ve and should’ve made sure to get an earlier start on this project in order to build a strong basis and make sure that the little problems I faced could be avoided.

MLNI – Midterm – Kevin Xu

As I was coming up with the project idea, I wanted to make something that I could continuously build on even past the midterm, and stay true to ‘machine learning’ as a core part of the project. I recalled one of the videos we watched early on in the semester where a man spoke a made-up language and they would generate subtitles animated according to his speech pattern. Unfortunately I could not locate the video but this inspired my idea of matching gestures to the English alphabet. After starting the project I quickly realized that this would be hard to accomplish, due to the physical limitations of the body to create shapes that resemble letters. I would have preferred to use leap motion or the kinect to trace the fingers, which would offer more flexibility in terms of representing the characters but I figured it might have been a better idea to stick with poseNet as I hadn’t worked with either leap motion nor kinect for over a year. 

The full arc of my project was to include tracing of the arms, saving it as an image, comparing the image to a library of the English characters, and then detecting which one of the letters most closely matched the gesture. This would become a basis for visuals similar to the aforementioned video that inspired me. The letter would appear and would be animated according to either sound or movement, however, in the end I did not manage to add this section.

For image comparison I found a library called resemble.js.

Unfortunately, I ran into quite a few problems while using this. They only had one demo that incorporated jQuery, so I needed to take a decent amount of time distinguishing the syntax of jQuery from the library. I also had problem referencing the library as an external script, so I ended up directly copy and pasting it into the main sketch.js which ended up working out. 

I wanted to create some specific parameters for controlling when and how p5/poseNet detected the arms and dictated when it was time to save the canvas for comparison. Knowing that poseNet has a tendency to ‘stutter,’ I wanted to make sure that it only ran when specifically triggered, and ended with another trigger. I used the confidence markers that poseNet offers in order to dictate this, only starting when all the wrist, elbow, and shoulder points stayed relatively stable with a high confidence level for a certain amount of frames( utilizing two sets of variables tracking the last frame’s positions and the current frame’s),

or resetting when the confidence level of the hands dropped below a certain point (signifying the hands dropping out of frame). Due to the unreliability of the confidence levels fed by poseNet, I had to keep the threshold very low (0.1 out of 1) just so that it could track the arms more reliably, at the cost of this “off” trigger, so I decided to remove it until I could find a better way to trigger the reset.

At this point I realized that I could not count on poseNet tracking reliably enough to distinguish a whole set of 26 different characters from each other, especially with the limitations of how we can move our arms. Instead, I replaced this library with 3 basic images that were generally achievable through poseNet.

Knowing PNGs automatically resize depending on the amount of empty space, I made sure to keep everything as that file type, so that no matter what part of the canvas your body took up with poseNet, it’ll automatically center crop out the sides so that the image sizes can match better.

Ex.

The peak of my hardship came after I managed to figure out the individual steps of the project and began to integrate them with each other. While P5 has the ability to save the canvas, I needed it to be able to replace an existing image with the same name. I came to realize that this was only possible using a html5 canvas, while p5 would just keep saving with a sequence number after (eg. CanvasImg(1).png). I had largely forgotten how to utilize the workflow between html and javascript, so I decided to keep using P5 out of time constraints, but this would mean that I would need to manually delete the saved Canvas image each time I wanted to make a new comparison. Another problem was that in order to register the saved images, atom live server had to reload, which would restart the pose detection. Luckily, loadImage() has a two extra parameters, a success callback function, and a fail callback function. I turned the two segments of the project into functions,

ran through the single call to load the saved canvas image.

In order to run the function that calculates the resemblance of the two images, you would need to reload the page. I never figured out why that is.

I ended up hard coding the call for the three images in the library but I had plans to use for loops in order to run through folders categorizing the different shapes from each other, then append saved canvas images to their corresponding categories, allowing the library to expand therefore ‘training’ the model. While I may have had a much easier time not downloading and uploading images, I wanted to keep it this way as a strong base for further development, since I wanted this library to permanently expand, rather than resetting to the base library each time I reran the code.

Project File:

https://drive.google.com/open?id=1oGikrMPInQqwWK3dTkAtO7HYQRr_yHuQ

Week 3-Generative Art (Kevin Xu)

For this assignment I wanted to go for a calligraphy style visuals. I didn’t want it simply to be a ‘paint tool’ where it simply draws where your mouse is. Instead I decided to make it so that when you clicked down, it would mark the mouse X/Y position then mark down the mouse X/Y position of where you release. I used this to map the general direction and velocity of the object it spawns as well as the speed of which it changes color. I originally had the balls going in straight lines but it seems very unnatural and not very pleasant so I added a random addition/subtraction to the change of the X/Y positions in order to make the balls travel a bit more fluidly. For controls I mapped w to clear objects, q to clear the background to black, and e to clear the background to white. As for controlling which direction the ball goes click and drag the opposite direction you want it to go.

https://editor.p5js.org/khx201/full/lTUgbUxk1

Source Code:

let ball = [];
function setup() {
createCanvas(1920 , 1080);
background(0)
ball.push(new Circle());
}

function draw() {
for (let i = 0; i < ball.length; i++) {
ball[i].move();
ball[i].display();
}
}
//clear screen
let keyb = 81;
let keyw = 69;
let keyc = 87;
function keyPressed() {
if (keyCode == keyb) {
background(0);
return false;
}
if (keyCode == keyw) {
background(255);
return false;
}
if (keyCode == keyc) {
ball = [];
}
}
//end clear screen
//Vars
let startX;
let startY;
let endX;
let endY;
//MousePressed/Release
function mousePressed() {
startX = mouseX;
startY = mouseY;
}
function mouseReleased() {
endX = mouseX;
endY = mouseY;
let b = new Circle();
ball.push(b);

}
//Ball Class
class Circle {
constructor() {
this.x = endX;
this.y = endY;
this.color = 0;
this.diameter = 35;
this.speedX = startX-endX;
this.speedY = startY-endY;
this.colneg = 1;
}
move() {
this.x += constrain(map(this.speedX,-200, 200, -5, 5), -4, 4) + random(-11,11);
this.y += constrain(map(this.speedY,-200, 200, -5, 5), -4, 4) + random(-11,11);
let colchange = constrain(abs((this.x+this.y)/2), 1, 3);
if (this.color >= 250) {
this.colneg = -1;
}
if (this.color <=0) {
this.colneg = 1;
}
this.color += colchange*this.colneg;
fill(this.color)
}
display() {
noStroke();
ellipse(this.x, this.y, this.diameter, this.diameter);
}
}

Week 2– Kevin Xu & Billy Zou

When looking up projects to study I had discovered an interesting report on algorithms used to detect gender in computer vision. As someone who is particularly interested in working with computer vision, I suggested we go from there, but Billy brought up a good point in that, although interesting, there was pretty limited usage for such a function. He, in turn, suggested us to study NMT or Neural Machine Translation. NMT is what most digital keyboards, translation services, etc use now to predict text. It is considered a significant improvement from its predecessor, Statistical Machine Translation, as it uses far less memory and predicts individual words based off context, intent, and relation between certain words as opposed to the statistical usage of words in phrases, which requires a lot of subcomponents to work together. As we discussed about NMT further, I started to see a lot of usage for this system, both practically and artistically. A project idea I had come up with was a robot that cuts you off as you try to talk, and attempts to finish your sentences for you. Because NMT tends to predict words based off it’s perceived intent and context of only a single sentence, chances are it would be very off-base with its prediction, though legible and grammatically correct. I’ve had many conversations with friends where it seems like we are on the same page but are actually in entirely different head-spaces, so when they finish my sentences I’m taken slightly aback by what they think I’m trying to say. That feeling is something I would try to replicate with this project idea.

Week 1 — Kevin Xu

As someone who looks at the practical advantages of technology more so than the artistic opportunities, after reviewing the provided reading/video resources about computer vision and zeroUI, I immediately thought of products such as Amazon Echo, Google Nest, etc., particularly regarding security. In many TV crime shows I feel like the idea of “crime response vs crime prevention” is a large motif, usually with some character on the brink of going around protocols or going against rules to preemptively prevent a crime when they know someone is about to commit a crime but cannot prove it. Camera security has been around for decades if not years but the need to manually monitor them has been a major limitation in that form of security. Even with cameras on, it is hard to prevent crime, even if it helps in recovering from it. With computer vision and AI, however, products such as Google Nest Cams can automatically distinguish humans, animals, and other elements from the environment and alerts the owner of something suspicious going on. In their product introduction video, they even showcase a scenario where mailmen, wild animals, and burglars come. It notifies the owner’s phone and allows you to speak through a loudspeaker attached to the camera to communicate directly with the friendly or otherwise visitor. Although more of an individual product, the ability to distinguish different entities can be applied to a much larger scale, allowing things such as companies automatically determining when employees clock in and out of work, or governments building databases of their citizens. As computer vision becomes more and more refined, security, without a doubt will adapt to be more effective.

https://youtu.be/ciuIYGr5bfQ