iML Week 14: Final Project (Magic Violin) – Thomas

Introduction

For my final project, I wanted to explore audio generation with Magenta. My original idea was to use NASA data to generate sounds based on data collected on the universe, but I came up with a better idea shortly after. Having played the violin for ten years, I have found that the violin is difficult to play because it requires accurate finger positions and complex bow technique. I wanted to create an interface for people to make music without musical experience. My inspiration for this project also came from the Piano Genie project that Google made, which allows for improvisation on the piano. 

Process

The goal of this project was to use notes  played from the violin to produce a sequence of notes on the computer. Below are the steps I needed to complete in order to make the project come to life.

I began by experimenting with a variety of pitch detection algorithms, which included McLeod pitch, YIN(-FFT), Probabilistic YIN, and Probabilistic MPM. I ultimately decided to use a machine learning algorithm included in the ml5.js library. The ml5.js pitch detection algorithm uses CREPE, which is a deep convolutional neural network which translates the audio signal into a pitch estimate. Below is a diagram of the layers and dimensions of the model included in the paper.

After running the pitch detection and checking that the RMS level is greater than 0.05, we call a function in Piano Genie that asks for a prediction based on the model. This can also be done by moving the bow across the violin or typing 1-8 on the keyboard. I created a mapping based on the string that is played. For example, G will call 1 and 4 on the model, D will call 2 and 5, A will call 3 and 6, and E will call 4 and 7. These are chords that are usually spaced an octave apart from each other. Most of the time the notes played are harmonious, but occasionally they sound awful. Below is an explanation of the Piano Genie model from their website.

Training

I trained the data on the following songs from classicalarchives.com, which had violin and piano parts:

Sonata No.1 for Solo Violin in G-, BWV1001 
Partita No.1 for Solo Violin in B-, BWV1002 
Sonata No.2 for Solo Violin in A-, BWV1003 
Partita No.2 for Solo Violin in D-, BWV1004 
Sonata No.3 for Solo Violin in C, BWV1005 
Partita No.3 for Solo Violin in E, BWV1006 
Violin Sonatas and Other Violin Works, BWV1014-1026 
Violin Sonata in G, BWV1019a (alternate movements of BWV1019) 
Violin Sonata in G-, BWV1020 (doubtful, perhaps by C.P.E. Bach) 
Violin Suite in A, BWV1025 (after S.L. Weiss)

Below is a sample of a Bach Sonata:

I ran the included model training code which can be found here. I attempted to the training script on the Intel AI Devcloud, but the mangeta library requires libasound2-dev and libjack-dev to work. This cannot be installed since apt-get is blocked on the server. I scraped the files off the classical archives website and converted it into a notesequence which can be read by tensorflow. I then evaluated the model and converted it using a tensorflow.js model. When I was converting the tensorflow model into a tensorflow.js model, I ran into some dependency trouble. The script wanted me to use a tensorflow 2.0 nightly build but it wasn’t available for mac. I had to create a new python 3.6 environment and install dependencies manually. 

Challenges

Along the way, I ran into a couple issues that I was mostly able to resolve or work around. First, I had an issues with AudioContext in Chrome. Ever since the autoplay changes introduces a few years ago, microphone input and audio output is restricted as a result of obnoxious video advertisements. Generally, this is good, but in my case the microphone would not work 50% of the time in Chrome, even when audioContext.resume() was called. This could be because p5.js or ml5.js has not been updated to support these changes, or it could be my own fault. Ultimately, I used Firefox, which has more open policies and fixed the issue.

Another issue I had was that Ml5.js and Magenta were conflicting with each other when run together. I could not figure out why this was occurring, I assume this was because they used the same tensorflow.js backend which may have caused issues with the graphics. Rather than fixing the error, my only real option was to silence it.

Results

Live Demo: https://thomastai.com/magicviolin

Source Code: https://github.com/thomastai1666/IML-Final

I was generally pretty happy about the results that I produced. The model is not very good at generating rhythm but it does a good job at generating Bach style chords. The pitch detection model also needs some modifications to pick up notes more accurately. Much of the work was already done by the Piano Genie team who created the model, I only adapted it to work for the violin. The violin is rarely used in any machine learning experiments because it is difficult to receive notes, whereas the Piano has midi support which allows it to work universally. I hope that as machine learning grows, more instruments will be supported.

Sources

Magenta Piano Genie – https://magenta.tensorflow.org/pianogenie

Piano Genie Source Code – https://glitch.com/edit/#!/piano-genie

Violin Soundfont – https://github.com/gleitz/midi-js-soundfonts

Pitch Detection – https://ml5js.org/docs/PitchDetection

Bach Dataset – https://www.classicalarchives.com/midi.html

P5.js – https://p5js.org/examples/sound-mic-input.html

ML5.js – https://ml5js.org/docs/PitchDetection

Webaudio – https://webaudiodemos.appspot.com/input/index.html

Week 14 – Final Project – Virtual Dressing Room – Jenny

Introduction 

This final project is a continuation of previous midterm project and it simplify the fashionGAN model into more practical bodyPix and styleTransfer model. BodyPix is responsible for making body segmentation and styleTransfer is responsible for adding different patterns or transforming different colors on the input clothing images. Since in real cases, the same clothes would have different colors and texture patterns as well. So it is really useful if I could use styleTransfer to generate different styles of the same clothes.

Goal

Continuing on my midterm project, I think I do not really need to stick with DeepFashion since the image synthesis part does not quite compile with my project idea. What I really want to create is like some e-commerce website already have. They have their own virtual online dressing room with 3D models generated by computer. When the user select different clothes, the model will automatically fit in different clothes. The conceptual webpage could be like this.

1

Demo

Here is a quite demo about the webpage I created. When pressing different keys on my computer, different clothes will be matched onto the upper body part of the model and different text description will display on the left hand side of the image. 

demo

My Web Layout

ferv

Workflow & Difficulties

Here the techniques I used are basically these two models and here styleTransfer is based on ml5.js and bodyPix is based on p5. (Moon helped me to create the p5 version of bodyPix and I used this one instead of the original p5 based on Tensorflow.)

First of all, I worked on these two work stream separately and combined them into two linked webpage. But then wired things happen that every time I input the segmentation I got from bodyPix into styleTransfer, my webpage crashed for some unknown reason. I am not quite sure about the exact reason but both my newly trained styleTransfer model and existing styleTransfer model failed. But all of these models worked perfectly fine with other input images.

h

one of the newly trained model — a white T-shirt (input & output images)

c

derfcfv

Then I adjust my work plan and I think if I really want to get a better visual result, the best way I can do for now is to map the clothes images directly into the body segmentation I get from bodyPix. So here I cropped the segmentation images into rectangular shaped position and they map the clothes images into this position. Some of the results are really fit with the model while some are not. There are two reasons behind.

The first reason is that I used the rectangular shape to map clothes images rather than using the bodyPix outline. That may created some losses. The second reason is because I also use the rectangular shape of clothes images rather than the outline of the clothes images. Then in this way, I cannot perfectly map every outline of the clothes with the  model. There is always some gaps in between especially in the shoulder part. Here are the four clothes images I used to map on the upper body of the model.

ds  dewefw

Future Improvement

1)Outline detection 

In the future, the outline detection accuracy has a lot of space to improve. One simple method could be using bodyPix to get the segmented outline of the clothes image. But there are still lots of problems when matching the clothes images into the model. I think maybe different machine learning model similar but powerful than Pix2Pix may be really helpful since Pix2Pix basically detect the outline of one image. Also, I noticed that if I choose a model image which does not have a front body, then bodyPix will have a relatively low accuracy and the matching process will become tricky since you need to adjust not only the proportion but also the rotation of the clothes image. That part could be hard to realize in a 2 dimension way. I am wondering whether there is any 3 dimension machine learning model to refer to.

2)Video capture

I think the ideal way to make an interaction is to let the user stand in front of the camera and take an image of themselves. Then they can select different kinds of clothes from the computer that they would like to try on virtually. 

Source & Code

https://github.com/tensorflow/tfjs-models/tree/master/body-pix

https://www.dropbox.com/s/1vhf7qjawa5wv75/iML_final.zip?dl=0

iML Final Project – “Copycat” – Alison Aspen Frank

Link to final code

Link to final presentation

Inspiration

Since the midterm project, I have been interested in how computers process human language along with the creative and practical applications of this. I have found many art projects which utilize machine learning to process language and I have found many articles stating how Machine Learning is being used for language translation (I will include references at the end of this documentation). As I had already worked with ML5.JS’s Word2Vec model, I wanted to work with text generation instead. That being said, my biggest goal in this project was to successfully train a model on my own.

Original Plan

The way in which I originally planned to execute this was to train a text generation model on my own and inference it with JavaScript. Originally, I tested Keras’ example text generation model, but found that the results it gave were nonsensical. Looking back, this could also have been due to the size of my dataset, as I was using a relatively small dataset.

After this, I looked into many different models, but chose to go with ML5.JS as the model would automatically be converted to JavaScript. As I was still very unfamiliar with training models on Intel’s DevCloud, I spent about two weeks trying to get the model to train successfully. The first errors I received were in my bash script, and they occurred because I did not correctly reference my data directory. However, once this was solved, there was another error with a .pkl file which was created throughout the training phase. To debug this, I had to get help from Aven. With Aven’s help, we reorganized my directories and modified the Python training script. However, even after Aven helped, I was still receiving the same error (pictured below). Eventually, I found that the .pkl file which was created was created with a different name than what was reflected in the script. Therefore, I changed every instance of the file name in the training script, and was finally able to train my model.

PKL File Error:

pkl file error

However, once the model was trained, I found that it could not be inferenced into JavaScript. Even though the model was saved in the correct folder, whenever I would run my JavaScript, I would get an “unexpected token < in JSON” error. I had Aven look at this error as well, and instead of using JavaScript to access the model, we decided to try and see if we could run it with Python. However, this also gave us an error. Then, Aven and I did some research to see if ML5 had any pre-trained models which could be used. Once I got access to the pre-trained models, I found that they returned the same error. Therefore, after conversing with Aven, we decided that this meant that there was an error within the backend of the ML5 code. Unfortunately, we had only found this error on Saturday, leaving me with two days to put together something else for the final.

Backup Plan

With the shortage of time in mind, I chose to work with ML5’s word2vec model once again. I chose to work with this as I knew that it would function properly and I believed I could get it to give me a similar outcome as to what I had originally pictured when planning my project.

My new idea was to utilize Word2Vec to take each word of a user input and find the next closest word. Then, it would output the new words. The effect which I received is similar to text generation, but I would say that it is more akin to something which I would call “machine poetry.” Overall, the outcome which I created is something I am currently satisfied with as I was forced to put it together in a day and a half. Therefore, the user interface design is not exactly where I would like it to be. All other aspects aside, I accomplished my original goal: I trained a model (even though it could not be inferenced), and I did something with text generation.

Conclusion

Though I was not able to carry out my original plan for the final project, I learned many useful things along the way. Through the stages of this project, I learned how to utilize and customize bash scripts, how to setup datasets for training, and I gained more familiarity with Python (a language which I only have three months worth of experience with). I also found that my project can be used to demonstrate how machines process our language along with the relations within. My end result may appear basic, but it does not completely show all the work which I have done along the way. That being said, this class has fostered my interest in machine learning and I am eager to learn more.

photo showing working projectn/a

Interesting Projects & Articles:

Sunspring: AI-Written Screenplay

IML | Week14 Final Project – Quoey Wu

TrackLog Generator

Introduction

As I developed my final project, I made some adjustment based on my original proposal. I used to plan to make a simple way for people to do daily logs in which they can spend less effort achieving pretty visual effects. But later I decided to narrow the concept to target a more specific group of people who want to record the places they have been. Thus, TrackLog Generator can be a useful tool for them to create some artful and unique visuals for their tracks. As for the machine learning part, I mainly used Style Transfer and CycleGAN and trained them separately. 

Continue reading “IML | Week14 Final Project – Quoey Wu”

Final Project – Jarred van de Voort – What Would _ See?

Introduction

In an effort to stay true to the title of the class and create something interactive with techniques covered in class, I explored using style transfer. At first the project started off as an extension of my emotion classifier / style transfer midterm. However, I decided to remove the emotion classifier due to performance, and instead focussed on building an interactive experience around the style transfer. Throughout the class, we’d studied artists and painters so I thought it would be an interesting idea to explore how some artists might’ve perceived the world around us. The name of the concept is also partially inspired by aven’s github repo named “what_would_monet_see”. The idea is to perform a style transfer on an input image and use the output as the backdrop of the canvas. A sketch is placed over to represent what an artist might have started with before painting.

Concept

For the user interface, I wanted to provide several options for artist styles, as well as a way to select different images. This required quite a bit of logic to enable users to select a new image, reset any drawings, and select new styles. An example of the UI layout is shown below.

When the user first loads the page, they see several options for painting styles and an input image.

The user must select a painting style before starting to paint. Once a style is selected, a sketch of the image is projected and the user can begin to use their mouse to paint.

The user is also able to cycle through and load different painting styles derived from trained neural style transfer models.

Finally, the user is also able to select different input images to test painting styles. I also included some examples of paintings from several artists to explore how other artists may have painted a similar subject. 

Here’s how Francis Picaba may have painted a starry night

or how Picasso might have painted the great wave

Challenges & Considerations

Since neural style transfers are a relatively new publication, there are limited pretrained models that can be used. I used the neural style training process we established in class to train several models. I also used some technique to dynamically load models to conserve computational footprint while exploring the page. The biggest challenge for displaying and inferencing images is the limitation on client resources. Client based neural style transfer is limited to roughly 300,000 pixels (around 500×600). Therefore a limit on the height and size of images is used to prevent overload.

Future Work

In the future, I’d like to implement several features. The first of which is the ability for users to upload their own images to see how artists may have painted them. In order to do this, a binarization algorithm would need to be developed to create the sketch effect.  I’d also like to add a method to train new painting styles. While this could take hours, implementing some kind of queue for doing such would enable far more painting styles. The last feature would be to add another layer of interactivity using mobileNet so that users to use their hands to paint instead of a cursor. I imagine that something like this would work well as an art installation at a museum.