Week 06 Assignment: Document Midterm Concept – Eszter Vigh

My idea is to use a speech to text converter to train a model to recognize my sentences (regardless of tone, etc)… I would most likely ask my friends to read some random sentences too just to build that data set and to see how well it works in terms of recognizing different words regardless of accents/tones/etc. 

Then I would collect a lot of movies/tv shows (I think just in English to make it easier on myself) and make the voice bring up clips of those specific quotes. o imagine if you said “How you doin?” and a clip of Joey from Friends came up and says the iconic line back to you. This requires text matching across the two sets of text.

The whole idea came about when I thought about my terrible habit of taking gifs of my favorite movies and making them into WeChat stickers. But sometimes, see I can only remember the line. I’m so bad with names and faces that I’ll just type in the quote and hope it’s iconic enough for the scene to come up, but let’s say I am a couple of words off, then I’m completely stuck. 

This project is personal to me because I love movies, I have seen so many that at times I get into arguments with my friends about the details (usually it’s about when something happened on the Marvel timeline or something). Imagine having this tool just there to help prove you are right to all your friends!

The main challenge I see with this project is getting a hold of the videos. Netflix doesn’t allow for the downloading of full films But there are ways around this. The script of most films are available online… (most notably the Bee Movie… because that gem… I mean who wouldn’t want to have just the entire script). But for TV shows, it may be more difficult. 

In terms of references, when I met with Aven we talked about similar projects including a sample that takes a random squiggle and matches it to landforms within the trained data set. Imagine that happening with audio and matching with a video. 

Same logic, just a slightly different implementation. 

Week 5: Interactive Portriture – Eszter Vigh

Week 5 I was inspired by the style of the in class code and decided to cycle through the symbols on my keyboard, also experimenting with Chinese characters to see if they also work within the symbols. 

Body Pix

Using BodyPix is really interesting since you have to think about all the body parts as segments as opposed to X,Y coordinates. In my case I made a right hand wave yield one result and a left hand wave to yield another. There was also a base state with the symbols. 

base

The base state is the same as in the class example.

Interesting Error to Note here is that when launching heavy files in the draw function, atom-live-server will actually crash! It’s kind of terrifying because the camera will work live, but the console will be blank regardless of console.log. 

Final Product:

final

What this is… 4 separate character sets are used to represent the lightness and darkness of the image. The detection is the two sides of the face, and the left and right hand. It preserves the entire theme and feeling of the original sample while also conveying the bodypix abilities.

Interesting things I learned: 

  • Putting the function outside of draw makes the actually identification of images faster, however drawing lags behind as the image is not updated frame by frame.
  • Putting the functions in draw makes the updating/realization of changed information slower, but the drawing of the image is faster, which is what the user sees. 
  • I also made the grid size larger to make the text more clear (especially the characters) 

Week 5 – Train CIFAR-10 CNN – Eszter Vigh

ev821

Machine Specs

  • Shuts down all the time for no apparent reason… 
  • Quickly running out of space
  • Won’t open Atom half the time
  • Atom Live Server won’t always work 
  • Let’s see how training goes!

Optimization

  • I learned optimization is a little complicated for me to tackle right now with no machine learning background, so I did some background research on alternative optimizers to the one we have in Root Mean Square Propagation (RMSProp).
    • So first of all, what is RMSProp?
      • It’s a gradient based optimizer.
      • It uses a moving average of squared gradients to normalize the gradient itself.
      • Source
    • So what are the alternatives?
      • (AdaGrad)~Adaptive Gradient Algorithm
        • increases the learning rate for sparser parameters and decreases the learning rate for ones that are less sparse
        • Source
      • Adam (not my best friend… but the optimizer)
        • It’s both RMSProp and AdaGrad combined! Wow!
        • Source
      • SGD (The competitor to Adam)
        • GD only computes on a small subset or random selection of data examples. SGD produces the same performance as regular gradient descent when the learning rate is low.
        • Source
      • ICLR 2019
        • The combination of SGD and Adam.
        • Source 

test epoch 1 ev821

Test Epoch Test 1/3 || Test Accuracy: 0.1853

  • Time Spent: 3 minutes
  • Code completely the same, just changing the Epochs to 1. 
  • Findings: It’s worth running the model longer than just the one Epoch. (So, yes running just one Epoch while convenient, sucks in terms of accuracy… 18% is horrific)
  • Thoughts: I wish I was patient enough to sit through more than one Epoch. 

ev821

Test Numb_Class Test 1/2 || Test Result: ERROR

  • Time Spent: 5 minutes
  • Code changed to 10 epochs (for the sake of my sanity, more on the 100 epoch test later) and the Numb_Class was changed to 5.
  • Findings: And Index Error! There is a note on the index 6 being out of bounds with size 5…
  • Thoughts: I guess this means 1 won’t work! 

ev821

Test Epoch Test 2/3 || Test Accuracy: 0 .4245

  • Time Spent: 15-20 minutes (sometime between the two batch tests)
  • Code completely the same except for the epochs being changed to 10.
  • Findings: I was hoping for a more dramatic increase in the 80% accuracy range just because of our class activity. If anything, this just showed me that I was just going to have to test more, and commit a good chunk of time (at least an hour) to testing.
  • Thoughts: It’s funny because at the time I thought 100 epochs would take around 1 hour… just wait… because it didn’t. In fact… it took just such an excruciating amount of time… I almost lost faith in this homework assignment. 

ev821

Test Batch Test 1/2 || Test Accuracy: 0 .4185

  • Time Spent: 20 minutes
  • Changed epochs to ten (for the sake of my sanity) and cut the batch number in half (1024). 
  • Findings: The accuracy wasn’t cut in half. Sure, it went down in comparison to the 10 epochs with the larger batch, but it was realistically only 1% which isn’t that much considering it was losing over 1000 batches. 
  • Thoughts: I wonder what would happen with more batches. Like at what point does the amount of batches not matter? (I’m thinking about this in terms of like, how one would look at water soluble vitamins like Vitamin C… it’s weird because you cannot store the excess, so the extra Vitamin C just straight up doesn’t do you any good… is that what batches do too?)

ev821

Test Batch Test 2/2 || Test Accuracy: 0 .718

  • Time Spent: 15 minutes
  • Changed epochs to ten (for the sake of my sanity) and cut the batch number to 10. 
  • Findings: The accuracy was higher, significantly higher, like 72%, which is the highest accuracy I have seen at this point. 
  • Thoughts: So the data doesn’t make sense. I decrease the batches and at first the accuracy goes down, then I bring them way down, by another thousand and the accuracy goes up? That would mean the accuracy graph isn’t linear and is at least curved at some point where maybe between a certain amount the accuracy decreases before going up again. (That’s really complicated math I don’t care for… but I’m thinking something like this (Image from Wikipedia):

graph

ev821

Test Numb_Class Test 2/2 || Test Result: ERROR

  • Time Spent: 5 minutes
  • Code changed to 10 epochs (for the sake of my sanity, more on the 100 epoch test later) and the Numb_Class was changed to 1.
  • Findings: And Index Error! There is a note on the index 6 being out of bounds with size 1…
  • Thoughts: Well my previous guess was right! Yay me!

ev821

Test Epoch Test 3/3 || Test Result: 0.637

  • Time Spent: 7 hours 25 minutes
  • Code unchanged from the sample.
  • Findings: Running it all school day doesn’t improve the results that dramatically. 
  • Thoughts: This took far longer than I thought. I am really tired actually. I didn’t even change the code! The goal with this was a baseline… it took me all day… I mean sure I knew machine learning took time. But ALL DAY? It wasn’t worth it! The accuracy is still only 64% (if you round generously). 

Week 4: Buttons-N-PoseNet Eszter Vigh

This was so hard to load! So I ended up having issues loading the background music. Originally I was going to have the goat/dinosaur sound interrupt the nice music, but for whatever reason the larger sound file was not found. I always got the Error 404 message… I still included it in the zip file, but honestly, I settled for the sound playing with the sprites. 

I thought challenging myself with creating sprites would up the difficulty since you can no longer use radius as a measurement. 

The sounds get bugged at times, but it works most of the time. I also started playing around with the background as another way of interaction. So the background squares appear/fade based on your nose location. 

This is where my project is linked

This is my script.js:

console.log(“ml5 version”, ml5.version);
let cam;
let poseNet;
let poses = [];
let noseX, noseY;
let dinosaurs = [];
var dino;
let goats = [];
var goat;
function preload()
{
// load images
dino = loadImage(“dino.png”);
goat = loadImage(“goat.png”);
}
function setup() {
createCanvas(640,480);
background(“lavender”);

// loadImage();
// createImg();
cam = createCapture(VIDEO);
cam.size(640,480);
cam.hide();
soundFormats(‘mp3’, ‘m4a’);
song = loadSound(‘dino.mp3’)
moo = loadSound(‘goat.m4a’)
// soundtrack = loadSound(‘bendsound-buddy.mp3’);
//poseNet
poseNet = ml5.poseNet(cam,modelReady);
poseNet.on(‘pose’, function(results){
poses = results;
} );

}

function modelReady() {
console.log(“Model Loaded!!!!”);
}

function draw() {
// image(cam,0,0); //vidObj,x,y,(W),(h)
background(“lavender”);
// soundtrack.play();
let r1 = map(noseX, 0, width, 0, height);
let r2 = height – r1;
// class Square {
// constructor(x,y){
// this.x = x;
// this.y = y;
// this.width = random(20,100);
// this.height = random(10,150);
// this.r = random(255);
// this.g = random(255);
// this.b = random(255);
// }
// }

fill(66, 206, 244, r1);
rect(0 + r1 / 2, 0, r1, r1);

fill(244, 220, 65, r2);
rect(200 – r2 / 2, 0, r2, r2);

fill(44, 22, 65, r1);
rect(400 – r1 / 2, 300, r1, r1);

fill(44, 220, 165, r2);
rect(600 – r2 / 2, 100, r2, r2);

fill(144, 220, 265, r1);
rect(300 – r1 / 2, 100, r1, r1);

fill(144, 0, 65, r2);
rect(0 – r2 / 2, 300, r2, r2);

fill(144, 220, 65, r1);
rect(500 – r1 / 2, 0, r1, r1);

fill(244, 22, 0, r1);
rect(500 – r1 / 2, 400, r1, r1);

fill(244, 0, 65, r2);
rect(300 – r2 / 2, 0, r2, r2);

if (random(1) < 0.10) { // only 10%
dinosaurs.push( new Dinosaur(width/2, height) );
}

// update & display
for (let i=0; i<dinosaurs.length; i++) {
let p = dinosaurs[i];
p.move();
// p.fall();
p.updateLifespan();
p.checkEdges();
p.checkInteraction(noseX, noseY);
p.display();
}

// remove dinosaurs if done!
for (let i = dinosaurs.length-1; i >= 0; i–) {
let p = dinosaurs[i];
if ( p.isDone ) {
dinosaurs.splice(i, 1);
}
}

// limit the number of dinosaurs
while (dinosaurs.length > 7) {
dinosaurs.splice(0, 1); //(index, howMany)
}

// check the number of dinosaurs
fill(255);
text( dinosaurs.length, 10, 20 );
//Goats
if (random(1) < 0.10) { // only 10%
goats.push( new Goat(width/2, height) );
}

// update & display
for (let i=0; i<goats.length; i++) {
let w = goats[i];
w.move();
// p.fall();
w.updateLifespan();
w.checkEdges();
w.checkInteraction(noseX, noseY);
w.display();
}

//Goats
for (let i = goats.length-1; i >= 0; i–) {
let w = goats[i];
if ( w.isDone ) {
goats.splice(i, 1);
}
}

// limit the number of goats
while (goats.length > 5) {
goats.splice(0, 1); //(index, howMany)
}

// check the number of goats
fill(0);
text( goats.length, 10, 200 );

for (let i = 0; i < poses.length; i++) {
let pose = poses[i].pose;
// console.log(poses[i].pose.keypoints);
for (let k = 0; k < pose.keypoints.length; k++) {
// console.log(pose.keypoints[k]);
let p = pose.keypoints[k];
if(p.part == “nose”){
noseX = p.position.x;
noseY = p.position.y;
noStroke();
fill(0,0,255);
ellipse(noseX, noseY, 10, 10);
//console.log(p);
}

// text(p.part, x+15,y+0);
}
}
}

class Dinosaur {
constructor(x, y) {
this.x = x;
this.y = y;
this.xSpd = random(-2, 2);
this.ySpd = random(-5, -3);
this.rad = dino.width ;
// this.r = random(255);
// this.g = random(255);
// this.b = random(255);
this.isDone = false;
this.lifespan = 10.0; // 100%
this.lifeReduction = random(0.001, 0.010); // 0.1% to 1%
}
updateLifespan() {
if (this.lifespan < 0.0) {
this.lifespan = 0.0;
this.isDone = true;
} else {
this.lifespan -= this.lifeReduction;
}
}

// fall() {
// this.ySpd += 0.03;
// }
checkInteraction(x, y) {
let distance = dist(this.x, this.y, x, y); //(x1, y1, x2, y2)
if (distance < this.rad) {
// in the area
// this.r = 255;
// this.g = 255;
// this.b = 0;
song.play();
this.isDone = true

} else {
// out of area
// this.r = 255;
// this.g = 255;
// this.b = 255;
song.stop();
}

}
move() {
this.x += this.xSpd; // this.x = this.x + this.xSpd;
this.y += this.ySpd;
}
checkEdges() {
if (this.x < 0 || this.x > width) {
this.isDone = true;
}
if (this.y < 0 || this.y > height) {
this.isDone = true;
}
}
display() {
push();
noStroke();
// fill(this.r, this.g, this.b, 255);
// ellipse(this.x, this.y, this.rad*2 , this.rad*2);
image(dino, this.x, this.y);
pop();
}
}

class Goat {
constructor(x, y) {
this.x = x;
this.y = y;
this.xSpd = random(-2, 2);
this.ySpd = random(-5, -3);
this.rad = goat.height;
// this.r = random(255);
// this.g = random(255);
// this.b = random(255);
this.isDone = false;
this.lifespan = 10.0; // 100%
this.lifeReduction = random(0.001, 0.010); // 0.1% to 1%
}
updateLifespan() {
if (this.lifespan < 0.0) {
this.lifespan = 0.0;
this.isDone = true;
} else {
this.lifespan -= this.lifeReduction;
}
}

fall() {
this.ySpd += 0.03;
}
checkInteraction(x, y) {
let distance = dist(this.x, this.y, x, y); //(x1, y1, x2, y2)
if (distance < this.rad) {
// in the area
// this.r = 255;
// this.g = 255;
// this.b = 0;
moo.play();
this.isDone = true

} else {
// out of area
// this.r = 255;
// this.g = 255;
// this.b = 255;
moo.stop();
}

}
move() {
this.x += this.xSpd; // this.x = this.x + this.xSpd;
this.y += this.ySpd;
}
checkEdges() {
if (this.x < 0 || this.x > width) {
this.isDone = true;
}
if (this.y < 0 || this.y > height) {
this.isDone = true;
}
}
display() {
push();
noStroke();
// fill(this.r, this.g, this.b, 255);
// ellipse(this.x, this.y, this.rad*2 , this.rad*2);
image(goat, this.x, this.y);
pop();
}
}

Week 4: The Neural Network and the Brain/Neuron Stuff -Eszter Vigh

brain

Neuroscience and AI have always been very separate in my mind. The motivation behind calling it a “neural network” is to create the illusion, or make stronger connections to this idea that the computer has “a brain”. The easiest way to explain the complexities of computers and answer the questions of why and how to the general population is to use magic blanket phrases like “oh it works like a brain making connections between things and processing complicated information”. 

The distinction is how the brain receives and processes information. Input comes in from the sensory organs which take a physical stimulus (like light or heat) and create electrochemical signals that are in a “language” the brain understands. (It is essentially like data processing.) Then these signals are put through a filter… because humans are surrounded by constant stimulus. It’s like being able to pick out and follow your mom’s voice in a massive crowd at the grocery store or like seeing individual fruit on a bush as opposed to the image just bluring together. 

Once the information is processed and filtered, that same filter decides how important the information is. Let’s say you are in an emergency situation and you are processing information… the color of the firefighter’s shoes is probably not important enough to be the primary focus in that situation. The information then gets stored based on importance… so think short-term memory, long-term memory, etc. Links are then formed based on relevance to other information already stored. 

Let’s compare a Neural Network to this. Consider this idea that there is a big data/ training set of data… that information to some degree gets inputed in a way the computer understands. Kind of like how the physical stimulus are translated into electrochemical signals. I think the closest comparison would be coding the Network since coding is traditionally done in coding language so the computer understands and this coding language is not necessarily flowing like normal conversation. 

This data then can get filtered. Maybe one way it does that is in the form of arrays. It makes information organized and can potentially have connections hard coded into it. (Think about this in terms of maybe a word being tied to a definition, there is a way to do that within an array for example). 

The clear difference here between this neural network and say the brain is the prioritizing of information. This difference is so massive that it overrides all the similarities previously stated. The data is not put in a hierarchical form. The data is there, categorized… but there is no situational input to show which data is more important. Think back to that emergency example, the detail of the firefighter’s shoes would probably be forgotten, but in the case of a computer, that data would still be stored. The computer doesn’t “forget” like a brain would. 

Citations:

http://www.teach-nology.com/teachers/methods/info_processing/