Week 05: Train CIFAR-10 CNN – Katie

Introduction

For this week’s assignment, we are training a CIFAR-10 CNN.  Before doing anything involving actual training, I first wanted to understand the CIFAR-10 dataset because I’ve never worked with it before.  I read that the dataset contains 60,000 images in total (the training set is only 50,000), with each image at 32×32. These are broken down into 10 classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. Each class contains 6,000 images. 

cifar-10 dataset classes
https://medium.com/@jannik.zuern/training-a-cifar-10-classifier-in-the-cloud-using-tensorflow-and-google-colab-f3a5fbdfe24d

Machine Specs

machine_specs

Tests

I ran 3 different tests to test the relationship between epochs and batch size, and its effect on accuracy. Test 1 is high epoch, high batch. Test 2 is low epoch, high batch. Test 3 is low epoch, low batch. 

1: (Epochs: 100, batch size: 2048, 20 hours)

The first test I did (not sensibly) was to just run the program.  It was really painful but I wanted to see it through to see what it could do. I knew this was going to take forever to complete, so I tried running it when I was asleep. I forgot that my computer also goes to sleep very quickly when I’m not using it. I had to come back to this a lot to wake my computer up, so as a result, this took me 20 hours.  This resulted in a 0.6326 accuracy.  I was pretty surprised at this, considering the testing we did in class resulted in much higher accuracy, and much faster. 

test

2: (epochs: 10, batch size: 2048, 25 minutes)

Thinking that it was the high number of epochs that took so long to process (even if it would have been continuous), I assumed that lowering them to 10 would go very fast. It definitely cut down on the total amount of time, but it seemed to be processing at about the same rate as the first test. The final result is a 0.4093 accuracy.

test2

3: (Epochs: 10, Batch size: 256, 20 minutes)

Finally, I decreased the batch size as well, meaning testing low epoch, low batch. This had surprised me that despite testing a much lower batch size than in test 2, it took about the same amount of time. Even more surprising was the fact that it resulted in a 0.5453 accuracy, meaning that the lower batch size produced a more accurate result.

test3

Conclusion

I’m still a bit confused about the different aspects that go into CNN training, and especially how their relationships with each other affect the outcomes. But at the very least I learned the amount of time it would take to try to train the CIFAR-10 dataset. I’m still not totally sure why it took so much more time for a less accurate result than in class, especially since the Fashion-MNIST dataset has an almost equal number of images as CIFAR-10. On quick evaluation, I see that each image in Fashion-MNIST is only 28×28, whereas CIFAR-10 is 32×32. I wonder if that has something to do with it, but that difference still seems pretty small for such a big gap in the results.

Week 04: ANN vs. Biological Neural Networks – Katie

I think that there is no doubt that artificial intelligence neural networks are, at the very least, inspired by the brain/neuron structure. Even without reading any McCulloch and Pitts, I think it makes sense if we assume that 1) we cannot create anything that is completely new, but we can build off of previous discoveries and knowledges, and 2) developers over the years did not have any other established system of learning on which to base their work, aside from biological neural networks (human or otherwise). But, as a lot of classmates have pointed out, the key word is “inspired”; this does not imply that the structures are exactly the same, and it does not imply that the processes of learning are the same either.

The structures of these neural networks are similar, but different. After receiving an input, the processing of biological and artificial structures (in cell bodies and hidden layers, respectively) vary; this is in part due to the fact that most artificial neural networks are built in a way that accomplishes a single task, whereas biological can learn entirely new tasks. In that same vein, the ways AI learn and the ways humans learn are also different. Actually, while we can categorize different ways of AI learning, is it possible to do this for human learning? Is it supervised or unsupervised? While I was researching I also came across the terms inductive, deductive, transductive learning, but I still don’t know nearly enough about any of these or even enough about neuroscience to try to speculate on how they tie into human learning. One thing that’s clear, though, is that humans learn constantly and continuously; even if AI learned through a similar method, I don’t know if it would be possible for it to function with the same degree of complexity as a human, and I don’t think that outcome would be very predictable or controllable.

Ultimately, I think that artificial neural networks are definitely inspired by biological neural networks. However, the networks themselves, as well as the way AI and humans learn, can be vastly different. I think this conversation is interesting since other questions inevitably come out of it—will we ever be able to construct an ANN so that AI can learn like a human? Is this even the goal?

Week 03 Assignment: trying Save/Load Model with Image Classification

This week’s assignment turned out to be bit less successful than I hoped. I wanted to use the assignment as an opportunity to build off of an earlier project I did using image classification. For that project, I loosely trained the model with images of the alphabet in American Sign Language (ASL). *note that I’m not very knowledgable about ASL and even worse at signing (I had to reference an alphabet guide) but here’s a video for reference of the last project:

Despite the fact that I only trained it with 24 of the 26 letters (J and Z require motion while image classification training requires still images), it was incredibly time consuming to have to retrain it every time I opened the project. Previously when I was doing this project, I didn’t know that a save/load model had recently been developed, so I figured I should try to implement that in this project to make it more efficient.

I referenced Daniel Shiffman’s video on the Save/Load model, and for the first part it does seem to be working; after training, the model.json and model.weights.bin files download and open to look like the demonstration in the video. It’s only when I try to load them back into the program that it stops working. In localhost, it stops running at “loading model.” My terminal shows this:

badterminal

I think there’s probably a relatively simple explanation to this or something that I’m overlooking, and I think I should continue to work on it and make sure I can make this work. If I could get it to work, I think the save/load model would be extremely helpful in developing a larger project, and especially if I wanted to move forward with something similar to the image classification model.

Code:

Training & saving model code: https://github.com/katkrobock/aiarts/tree/master/train_saveModel

Loading model code: https://github.com/katkrobock/aiarts/tree/master/loadModel

Reference:

Daniel Shiffman / The Coding Train video: https://www.youtube.com/watch?v=eU7gIy3xV30

Week 02 Assignment: imageClassifier() Testing – Katie

For this week’s assignment, I chose to look closer at the Image Classifier; this uses neural networks to “recognize the content of images” and to classify those images. It also works in real time, and the example I tried specifically uses a webcam input.

You can tell a bit about its development based on what you see on the user side. I noticed that it wasn’t too bad at assessing images if they were visually straightforward—I held up a water bottle against a mostly plain background, and it caught on quickly. But when I would turn it on its side or upside down, the classifier had a harder time trying to identify it.

I looked more into the training of this model and learned that it was trained on the ImageNet (image-net.org) database. ImageNet has around 14 million images divided into different synsets, which are labelled and monitored by humans.

Image Net database

imagenet-database

I started to think more about how that training really translates to its function, and what the computer is actually ‘seeing’—if it’s only recognizing one angle of a given object, does it only learn in groups of pixels? Even if that’s the case, is it possible for it to understand those groups if they were rotated? I’m not sure if these questions have obvious answers, but I’m excited to hopefully understand better over the next few months.

Week 01 Case Study: AI Facial Profiling, Levels of Paranoia — Katie

Presentation: https://docs.google.com/presentation/d/18gT9DN-SiBhwvFuDLMEAzr__pwge8fzzJnmgTDHqEi8/edit?usp=sharing

The project I chose for this case study is Marta Revuelta’s AI Facial Profiling, Levels of Paranoia. This project is a performance art/supervised machine learning piece that sorts participants into two categories: likely high ability to handle firearms and likely low ability to handle firearms. Revuelta’s project was influenced by recent developments in machine learning technology toward facial profiling—Faception, an Israeli company that uses algorithms to determine potential behaviors of its human subjects (white collar offenders, terrorists, pedophiles), and a paper by Jiao Tong University scientists Wu Xiaolin and Zhang Xi on the ability to detect criminal behavior only through a person’s face.

Revuelta’s project uses a convolutional neural network. Through supervised learning, the network is trained with two datasets, resulting in its ability to attribute one of two labels to an image. On their website, Revuelta notes that “the first set includes 7535 images and was created by automating the extraction of the faces of more than 9000 Youtube videos on which people appear to be handling or shooting firearms, while the second set of 7540 images of faces comes from a random set of people posing for selfies.”

revuelta2

At exhibition, a participant stands as someone (a performer?) holds a camera to their face; the act looks disturbingly similar to someone holding a gun to another person. There is a very blatant power dynamic exposed in this alone. The camera takes a picture of the participant’s face, and through the algorithm the image (and thus the participant) is determined to be potentially dangerous or not. A printed photo is then sent down a conveyor belt, and stamped in red or black ink, “HIGH” or “LOW.” They are sorted into their respective piles in glass cases, for all other passersby to see.

revuelta2

I think this project is really significant; as someone who formerly studied media theory, surveillance is a really interesting topic to me.  Surveillance by humans is already something that can contribute to inequality, as humans have their own biases and preconceptions that lead to discrimination. While surveillance by artificial intelligence is something that is newly developing, it carries all the same cons as the former—bias in AI is also unavoidable, since those programmers and developers of it are human and have their own biases, and the data that they are trained with may also exhibit bias. I think that to create a model with such a starkly contrasting binary between low/high (or safe/dangerous or good/evil) is already a statement in itself, but to accompany that with the performance made it much more impactful. The powerlessness the participant had in being photographed, being defined by a single stamp, and then exposed with that definition as their identity all encompassed the idea of dehumanization through AI surveillance technology. Given my own background, I may have already had my biases on the topic, but Revuelta’s project certainly reinforced my concerns.

Sources:

https://revuelta.ch/ai-facial-profiling

https://www.creativeapplications.net/arduino-2/ai-facial-profiling-levels-of-paranoia-exploring-the-effects-of-algorithmic-classification/