kk3144 – Page 2 – IMA Documentation

May 15, 2019May 15, 2019

2much // IML project by Konrad Krawczyk

2MUCH // Interactive Machine Learning final by Konrad Krawczyk

initial exploration + new idea

The initial idea was to explore the precious 1 TB of user data that survived the MySpace server purge. The dataset, listed publicly on the Internet Archive, consists largely of music tracks uploaded by amateur and professional musicians on the network. The data is no annotated with tags other than author and name, all that’s left are the raw mp3 tracks. This is why at first I felt a bit apprehensive about simply using mp3s to generate music, since the samples are so diverse and vast that even the most comprehensive model would likely not return anything meaningful.

Another idea popped into my mind a few days later. I wanted to make an app that enables users to autotune their voices into pre-specified, pop-sounding, catchy melodies. I realised this would be a great endeavour, however, my goal was to make at least a minimal use case for an app like this, with both a back end and a front end working.

data analysis and extraction

After having looked at MySpace data, I got somewhat scared by its ambiguousness and scale. I decided that it would not be feasible to try to simply train a GAN based on mp3 data. Instead, I decided to look up a database that would be a bit more focused. I found a useful tool for crawling the web for MIDI files (import.io), which enabled me to bulk download MIDI files for over 400 most famous pop songs from the freemidi.org database. After having analysed these files, however, it turned out all of them contained multiple tracks (up to 20) and used different instruments, including atonal beats. What I wanted to have instead was a dataset of pure melodies, which which I could generate my own ones. I still have the freemidi.org data on my computer, however.

Therefore, I eventually decided to merge the two ideas and extract MIDI melodies from mp3 audio files from the MySpace database. I accomplished it for around 500 files using Melodia. This GitHub package has helped me significantly in accomplishing the task: https://github.com/justinsalamon/audio_to_midi_melodia Initially I had problems installing all the necessary packages, as it seemed there has been ongoing technical difficulties on my Colfax cloud account. However, eventually I got the sketch to work after manually adding the necessary plugins. In the future I would be more than happy to extract more melodies thereby making a more comprehensive database, however right now I cannot do this due to the inability to submit background-running qsub tasks.

training the model

After having collected the training data, I went on to search for implementations of generative music algorithms. Most of them seemed to utilize MIDI data to generate tonal melodies. The one that got me particularly interested due to its relatively simple and understandable implementation was about classical piano music generation. In its original implementation, it used data from Pokemon soundtracks, in order to train a Tensorflow-based Long short-term Memory Recursive Neural Network. It is relatively difficult to understand technical details of long short-term memory networks, however it seems that they enable for greater recognition of larger, timed patterns in musical progressions, which is why they’re the go-to tool for music generation. I trained the LSTM and changed the parameters, most notably the number of epochs (from 200 to 5) – simply to get the optimal use case faster.

After having obtained the model file, I included it in a new GitHub repo for my Flask-based back end. I used code from the aforementioned GitHub repo to generate new samples. The original generated 500 new notes, I broke it down to 10 in order not to make the wait time on the web awfully long. The initial results have been variant in some good and bad ways. What seems like a bug is that the notes are sometimes repetitive – one time I got B#-3 for 10 times in a row. However, this is a MIDI output that I could still use in an external API.

The entirety of data processing code for the Flask back-end can be found in this repo under data_processing.py: https://github.com/krawc/2much-rest

building the front end

A relatively large chunk of the app logic had to be delegated to the React front end. The app was meant to coordinate:

1. parsing audio from the video input,

2. fetching the wav file to the Sonic API for autotuning

3. getting the generated melody notes from the Flask API

4. reading the output and playing it to the user.

The most issues I encountered happened on stages 2) and 4). It turned out that there was a specific style in which data had to be sent to the Sonic API, namely form-data, and that the data had to be loaded asynchronously (It takes around three different requests to Sonic API to actually get the autotune output file). Later on, I also had to work with syncing video and audio, which unfortunately still has to be fixed because the API trims audio files making them incongruent with durations of videos.

However, I got an app to work and perform the minimal use case which I intended to have.

The front end code is available here: https://github.com/krawc/2much-front

video – just me reading a random tweet, turning it into a melody:

May 7, 2019

Final Project proposal BIRS // Konrad Krawczyk

For my final project, I would like to create a system that uses neural object detection, from a camera feed, in order to recognise certain objects and avoid them. The system would mimic the behaviour of certain herbivore animals, such as rabbits, impala or kudu, which can often spot known predators and initiate a flight reaction. It is one of the three (fight, flight, freeze) known survival strategies during conflicts between animals of different species. Herbivores are not biologically equipped to fight predators due to their chewing teeth, as well as relatively small bodies. However, the necessity of running away from various predators means these animals often develop fast running legs, and an instant recognition of certain triggers such as the shapes and patterns of predators.

For my system, I would like to use a Raspberry-Pi based computer, equipped with a webcam, which would record video in real time. The camera feed will be forwarded to a python sketch that recognises certain objects, such as predatory animals (tigers, lions, etc. ) and propels servo motors to move in the opposite direction as fast as possible. The idea is largely prompted by the Medium article I found on that specific technology: https://heartbeat.fritz.ai/real-time-object-detection-on-raspberry-pi-using-opencv-dnn-98827255fa60

In my project, I would like to use this tutorial and experiment with the results of it. Perhaps it would be possible to adjust the system to recognise other objects, and make a potential use case for some industry other than experimental robotics.

April 30, 2019April 30, 2019

Locomotion: Duck // Bishka & Konrad

Locomotion: Duck // Konrad Krawczyk

The idea Bishka and I worked on was to build a two-legged robot moving similarly to a duck.

The way ducks move can be well illustrated with this video:

In short, ducks move on their flexible two feet by putting their bodies in a bounce-like movement between left and right side, while also rotating their backs to the opposite side of their stepping foot. This is how they keep their balance despite having only two feet. What also helps them is the relative lightness of their bodies, especially the inner bone structure.

We started working on the duck without having reflected on these observation. This is perhaps why our first attempt was less than successful. We tried attaching two legs onto a heavy steel body of the Kittenbot. The legs, hastily made out of Lego brick and paper, were too long and fragile, and the robot fell every time we tried to add balance to it.

Later, we gave up and attached two extra legs of support to the robot, ending with a ball berry. This could at least stand on its (four) feet, but could not make any moves, as the servo would get blocked on the spot. We realised there was a bigger problem with our mechanism itself.

We assembled one of the robots in the pre-ordered set of simple robots. The robot would stand on two feet and make the bouncing move with just one motor. Our robot, however, was different in a sense that it was supposed to be a bit shorter and wider, which requires a different kind of balance.

On the Web, Bishka found an image of a mechanism that we eventually decided to recreate. The key difference here was that the part attached to the servo would be a Reuleaux triangle with rounded edges.

We quickly user a laser cutter to make the robot, and then assembled it with a new arrangement.

First problem we ran into was the leg movement. The ones made of wood got stuck immediately due to friction. The pieces of metal that we found were to tiny and frail to even stick to the trianglular wheel. With great help of prof. Cossovich, we assembled legs with bigger and more sturdy pieces of wire, three washers and a nut. The legs finally kept the balance at least slightly, however the robot would still fall forward.To decrease friction, we added a ball berry at the front, which caused the robot to move forward

One thing I would hope to accomplish with more time is definitely a greater balance for the legs, so that in can actually walk solely on its own two feet. This could be accomplished with changing the structure a little bit, perhaps by making it even lighter, and also adding extra power to the motors (both ran on the same two batteries).

April 29, 2019

The Sound of MySpace // Konrad Krawczyk

My final project idea is to recreate the sound of MySpace music using a generative network.

As it appears, almost the entire archive of music on MySpace has been lost. MySpace used to be the largest social network in the world, with 75 million visitors at its 2008 peak. After years of poor management, dissatisfying user experience, and strong competition from Facebook, the popularity of MySpace has plummeted, down to a point of almost complete irrelevance. However, the largest asset of the company up to this year was an archive of music data. This year, however, 12 years’ worth of music has been deleted as a result of a botched server migration.

The interface I intend to develop would aim at reflecting the legacy of MySpace in influencing currents in arts and culture. It would do so by distilling the essence of the MySpace sound from a specific point in its history. By compiling 450,000 songs into a dataset, one can hope it is possible to infer certain patterns in the music of that time, and perhaps create a generative algorithm. The model would not so much aim at “reconstructing” MySpace and saving it from extinction as it would be an experiment in inferring musical essence from seemingly disparate samples.

April 9, 2019

Midterm initial idea // Konrad Krawczyk

The initial idea for my midterm was to make an app that would provide the users information about their personality based on the hypothetical model of facial expression and personality prediction. In the proposed interface, users would be able to take a picture of their face, have it analysed by a personality prediction algorithm, and then answer questions about the accuracy of the given model in their own perspective.

I was interested in the topic due to the recent revelations about how facial structure can be used and is increasingly deployed by safety forces, in order to identify potential “suspects” or “troublemakers” in public spaces. Such attempts echo the 19th century attempts to map “good” and “bad” personality traits onto certain characteristics of bone structure of the human head. Despite its scientific appeal, modern personality prediction algorithms are largely based on flawed and biased data.

In my project, I wanted to create an algorithm based on a possibly accurate and scientifically validated dataset of perceived personality traits, and then compare it to the actual reactions of people to their own personality “prophecies”. It is about the difference between inner truth and outer perception in the time when complexity is increasingly being automated and outsourced to models.