Week 7: Midterm documentation(EB)

GitHub: https://github.com/luunamjil/AI-ARTS-midterm

For the midterm, I decided to create an interactive sound visualization experiment using posenet.  I downloaded and used a library called “Simple Tones” containing multiple different sounds of various pitches. The user will use their left wrist to choose what sound they want to play by placing their wrist along the x-axis.  This project was inspired by programs such as Reason and FL Studio as I like to create music in my spare time.

Although I originally planned to create a framework for webVR on A-Frame using posenet, the process turned out to be too difficult and beyond my capabilities and understanding of coding. Although the idea itself is relatively doable compared to my initial proposal, I still needed more time to understand how A-Frame works and the specific coding that goes into the 3D environment. 

Methodology

I used the professor’s week 3 posenet example 1 as a basis for my project. It already had the code which allows the user to paint circles with their nose. I wanted to incorporate music into the project, so I looked online and came across an open-source library with different simple sounds called “Simple Tones”.  

I wanted the position of my hand in the posenet framework to play sounds. Therefore I decided that the x-axis of my left wrist would be used to determine the pitch.

if (partname == “leftWrist”) {
if (score > 0.8) {
playSound(square, x*3, 0.5);
let randomX = Math.floor(randomNumber(0,windowWidth));
let randomY = Math.floor(randomNumber(0,windowHeight));
console.log(‘x’ + randomX);
console.log(‘y’ + randomY);
graphic.noStroke();
graphic.fill(180, 120, 10);
graphic.ellipse(randomX, randomY, x/7, x/7);

the “playSound” command and its attributes relate to the library that I have in place. Because the x-axis might not have high enough numbers to play certain pitches and sounds, I decided to multiply the number by 3. Left is  high-pitch, while the right is low-pitch.

I ran it by itself and it seemed to work perfectly.

After some experimentation, I also wanted some sort of visual feedback that would represent what is being heard. I altered the graphic.ellipse to follow the x-axis coordinate of the left wrist. The higher the pitch (the more left it was on the axis) – the bigger the circle.

The end result is something like this. The color and sounds that it produces give off the impression of an old movie. 

Experience and difficulties

I really wanted to add a fading effect on the circles, but for some reason, it would always crash when I write a “for” loop. I looked into different ways to produce the fading effect, but I wasn’t able to include it in the code. 

I would also try to work on my visual appearance for the UI. It does seem basic and could use further adjustment. However, currently, this is as much as my coding skills can provide.

This idea and concept did seem to be a very doable task at first, but it required a lot more skill than I expected. However, I did enjoy the process, especially the breakthrough moment when I could hear the sounds reacting to my movement. 

Overall, I have now learned how to use the positioning of a bodypart to do something. Going further, I do want to work on the webVR project and this experience can help in the understanding and implementation.

Social Impact:

In the process of my midterm, I worked on two different projects. The first project was pairing WebVR with posenet in order to develop a means to control the VR experience with the use of the equipment required. The second project was the one I presented in class – Theremin-inspired posenet project. Although I only managed to complete one posenet project, I believe that both projects have a lot of potential for social impact.

First, let’s talk about the WebVR project. The initial idea behind the project was to make VR more inclusive by allowing people without the funds to buy the equipment to experience VR. HTC Vive and other famous brands all cost over 3000RMB to purchase. By allowing posenet to be used inside WebVR, we can allow anyone with an internet connection to experience VR. Obviously, the experience won’t exactly be the same, but it should give a similar enough experience

Secondly, the Theremin-inspired project. I found out about the instrument a while back and thought to myself “What an interesting instrument?”. While the social impact of this project isn’t as important or serious as the previous one,  I can see people using this project to get a feel or understand of the instrument. The theremin differs from traditional instruments in that it is more approachable for children, or anyone for that matter. It is easy to create sounds with the theremin but it has a very steep learning curve. By allowing this kind of project to exist, people of any background can experience music and sound without buying the instrument.

Future Development:

For the first project, I can see the project developing into an add-on that works for every WebVR project. For this to be real, one has to have an extensive understanding of the framework A-Frame. By understanding the framework, one can possibly use it to develop the necessary tools for the external machine learning program to be integrated. The machine learning algorithm also needs to be more accurate in order to allow as many functions to be used as possible. 

For the second project, I can see music classes using this project to explain the concept of frequencies and velocities to younger children or those with beginner knowledge in music production. It allows a visual and interactive experience for these people. For the future, it can be possible to add the velocity and volume of each point on the x and y-axis to make the sounds more quantifiable for the person who is using it. The types of sounds that can be played can also be placed on the sidebar for the user to pick and choose. 

Week 06 Assignment: Midterm Project Concept (Erdembileg)

Overview:

For this midterm project, I plan on creating a Meme-Generator. The user will interact with this machine by providing the machine with a quote or caption. The machine will then figure out the perfect photo for the given content. 

Background & Inspiration:

Meme culture is continuing to be a major factor for the new generation of netizens. What started out as a joke is now a worldwide phenomenon, with every group of people displaying their sense of humor through captions and photos. 

I found out about meme culture back in high school when it was still developing within the online forums. What amazed me was the versatility of memes in general. One picture could have multiple meanings depending on the context given by the caption. It was purely up to the creativity of the user.

Motivation:

Motivation for this project comes from the fact that everyone relates to memes and everyone has their own interpretation of memes. But I pondered a question: How can machine learning and AI help in the creation process of the meme? 

It seems to me that a majority of the people in the meme culture are purely consumers. Most people have never made a meme in their life but they find joy in consuming community-made memes. I found that people will likely interpret the meme from what is already given by the creator: a caption and picture. Most people do not stop to think that a different picture can give the same or more comedic value. I think that through machine learning we can test this theory. 

Another motivation is to try and test how well machine learning and AI can be used to provide humor or context. Humor is something that is inherently a human function. We see something as humorous when we can imagine the context. In other words, our brains and imagination count for a large portion of our perceived humor. I want to explore just how significant is our imagination in this process by allowing the machine to pick a portion of the context – the photo/image.

How to build and potential problems:

It is possible to create this machine if we can teach it to recognize certain words that are most commonly related to the specific meme image. It would be a process of assigning words to the photos and comparing the caption given by the user to see which photo fits best. 

Some potential problems that I can see is misinterpreting the content of the quote. During the earlier weeks in class, we were able to play around with a ml5 that detected the sentiment of a sentence. The technology wasn’t able to detect sarcasm and thus unable to correctly interpret the given sentence. Similarly, this model might run into the same issues. 

Ideal Project:

Ideally, I would like the project to be able to give correct images to the quotes.

Week 05 Assignment: Train CIFAR-10 CNN (Erdembileg)

Background: For this week’s assignment, we were introduced to epochs and other inner workings of training. We have been assigned to look into the different variables and tweak them in order to look at the various effects it has on the end result such as loss and accuracy. I first tweaked epochs( number of times it runs), then I moved to tweak the batch size and later the pool size to test the performance. 

Machine Specs:

Variations of the Epoch: I first changed Epoch time by making it from 100 to 10 to see faster results.

10Epoch

It took me about 25 minutes to complete the whole run. Immediately we can see that 10 iterations result in an accuracy of only 0.4077 and a loss of 1.6772. We can’t really do anything with a single experiment so therefore we need to refer to the next photo of 8 epochs.

8Epoch

This time it took me 20 minutes to complete the whole run. The test run ended with a loss of 1.7213 and an accuracy of 0.401. The loss increased quite a bit within a difference of 2 epochs while the accuracy difference was only 0.006.

5Epoch

 Let’s check the 5 epoch run. The time it took me to run was actually pretty fast compared to the previous two tests at 12 minutes. However, the accuracy is substantially different at a 0.3636 and a loss of 1.8298.

It seems that the effectiveness of running multiple epochs is shaped like a curve on a graph. You can only get substantial improvements up until a point. After that certain point, the improvements get slower and slower, requiring more and more processing.

Batch Size Experiment:

Looking through sites like Quora I came across an article stating how different batch sizes can have an effect on the accuracy of the model. After looking through people’s answers and posts, it seems like a batch size of 64 seems to be a good experiment. I ran the test of Batch size 64, 128, 256 with 5 epochs.

Batch 64

Overall the test took me about 12 minutes to finish a test of 5 epochs and a batch size of 64 and immediately we can see a huge difference in the accuracy compared to 2048 batch size test. The accuracy is at a whopping 0.5594 and a loss of 1.2511. 

I wasn’t expecting a huge amount of success in stats from a small batch size. 

Batch 128

This time we doubled the 64 batch to a 128. This made it slightly faster at 11 minutes and 30 seconds. However, we are starting to see a drop in accuracy and an increase in loss. Accuracy is 0.5223 and the loss is 1.3298.

It would seem that the higher a batch size increases, the less effective it becomes at 5 epochs. 

Batch 256

Test loss was at 1.4259 and the accuracy at 0.4896. Run time is at 10~minutes

Once again we see that increasing the batch size will not help us when we keep the epoch at 5. 

After having run 3 tests corresponding to variation in the batch size, I realize that increasing the batch size past 64 is not a good idea if I want my accuracy high and loss low. I’m now interested in founding out if changing the batch size to below 64 will increase the success of the test. I decided to test this idea out by running a test with a batch size of 32.

Batch 32

I was surprised that the test produced a much better result at 0.6237 accuracy and a loss of 1.08. Run time of 13 minutes.

What I now understand from the two types of tests that I have conducted is that epoch runs will start to stagnate after a certain number of runs. It will start to produce significantly higher results up until a point and any number of runs after that will be producing lower and lower differences than those before it. 

Batch size also plays a large role in the accuracy and loss of the test. It seems to me that 32-64 is a solid batch size for this test and I’m sure that the results would have been better if we were to increase the epoch runs. 

It would seem that there must be a perfect combination of these two elements when training a model to increase success. 

The Neural Network (NN) and the Biological Neural Network (BNN) – Erdembileg Chin-Erdene

According to our current studies, AI has advanced to the point where tasks are performed much more efficiently than we can ourselves. With this comes ground-breaking innovations in various fields of expertise, allowing more streamlining of tasks. But to understand why we have chosen AI over humans we must understand what aspects of the artificial and biological are the same and what differs. 

Similarities:

ANN goes back as early as the 1950s, an example being the creation of perceptrons. The idea behind perceptrons was to emulate what we thought the BNN does in our brains. In the biological neural network, dendrites receive signals from other cell bodies which they use the nucleus to process and send out to others with the help of axons. The link between one nucleus and others can vary, leading to the storage of information (i.e. memory) and the creation of new neural links. What happens during the processing stage in the nucleus still remains uncertain, however, the ANN also tries its best to mimic the workings. The ANN also receives inputs and uses its hidden layer to process information and send out the results. However, unlike the biological neural network, the ANN is a mathematical model with changing parameters and functions to calculate whatever it is that we require of it.

Differences:

We have over 86 billion neurons in our brain creating and over 100 trillion synapses (links) while ANN consists of a number much smaller. In terms of speed of operations, the neurons in a human brain can vary widely due to aspects such as age, gender, how much they got the night before, etc. It is much easier for the ANN to stay consistent in its calculations. However, it is important to understand that consistency comes from the specialization of ANN. An example being an ANN designed to play chess wouldn’t be able to play checkers unless designed to. 

Conclusion:

I have no doubt whatsoever that AI and ANN will be developed to be adaptable to any circumstances just like in sci-fi films. The most important aspect that comes to mind is how the job market will shift in the AI revolution to come. Trucks with AI are already being tested and dispatched while medical facilities are filled with AI that can notice cancer cells much more faster and accurately than the human eye. Even home caretakers are being slowly replaced with systems such as smart homes with Amazons Alexa and Siri. From my understanding, AI can only replace us with what we already know, with information that we already have uncovered. In other words, I believe that the job market will shift towards seeing a demand in human-to-human interactions more than anything else.

Sources: 

https://www.quora.com/What-is-the-differences-between-artificial-neural-network-computer-science-and-biological-neural-network

https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7

Week 01: Magenta Studios

Magenta Studios

Slides: https://drive.google.com/open?id=1fXMxelSqzy15b-pwVr40K46TJRxdlGAo48KnkDRgK34

https://magenta.tensorflow.org/studio

DRUMIFY 0 – Document 2 Chords progrssion

Magenta is an opensource machine learning project that was created with the purpose of creating art in visual and audio form. Through this, they have created various different sets of programs that use algorithms and deep learning to create art pieces.

Magenta Studios is an audio enhancement tool that helps artists further their process of creating music. It can be used on Ableton (music producing program) as a plug-in, or it can be used on its own as a standalone program. 

I decided to download the standalone program and test the usage and results were amazing. By uploading a chord progression in the form of a “.midi” file, I uploaded the file into the program called Drumify which creates drum loops based on the chords and melodies that you feed it. The results were actually very interesting. It managed to create a drum loop that almost felt like a live performance. 

Although the results were amazing, the creation of something like this leads to the question of what is authentic music in the modern age. If we can create music that is indistinguishable from live music with the help of AI and machine learning, what will happen to the music industry?