Week 11 – Final Project Proposal – Jenny

Introduction

For my final project, I will continue work on my midterm project and further develop the body segmentation function and try to match the segmentation with the clothing image dataset I have.

Implementation

I plan to first develop BodyPix with p5 and get the body part segmentation images I want. Then I would like to training my own Pix2Pix model using the segmentation map I generated from the original database. And then I will try to optimize the output model I could get and try to adjust it into the machine learning model I want to achieve. In the end, I want to build an online interactive page just like the Monet project we go through in class. The user can change different outfit input images and they can see the virtual outfit generated on their body. 

Goal

The goal is try to transfer the clothes in the input image into the body part of my real-time video or the picture I captured from my real-time video. In this way, I could synthesize a virtual outfit on top of my body part. 

Resources

BodyPix: https://medium.com/tensorflow/introducing-bodypix-real-time-person-segmentation-in-the-browser-with-tensorflow-js-f1948126c2a0

Training Pix2Pix: https://medium.com/@dongphilyoo/how-to-train-pix2pix-model-and-generating-on-the-web-with-ml5-js-87d879fb4224

The Sound of MySpace // Konrad Krawczyk

My final project idea is to recreate the sound of MySpace music using a generative network.

As it appears, almost the entire archive of music on MySpace has been lost. MySpace used to be the largest social network in the world, with 75 million visitors at its 2008 peak. After years of poor management, dissatisfying user experience, and strong competition from Facebook, the popularity of MySpace has plummeted, down to a point of almost complete irrelevance. However, the largest asset of the company up to this year was an archive of music data. This year, however, 12 years’ worth of music has been deleted as a result of a botched server migration.

The interface I intend to develop would aim at reflecting the legacy of MySpace in influencing currents in arts and culture. It would do so by distilling the essence of the MySpace sound from a specific point in its history. By compiling 450,000 songs into a dataset, one can hope it is possible to infer certain patterns in the music of that time, and perhaps create a generative algorithm. The model would not so much aim at “reconstructing” MySpace and saving it from extinction  as it would be an experiment in inferring musical essence from seemingly disparate samples.

Final Project Proposal: Repictionary

For my final project, my initial idea was to use a GAN to improve my first project, pokebuddy. After doing some research online, I came across a project that used a Bird dataset with a GAN to receieve results like the ones pasted below. This inspired my project ideas in the direction of text to image synthesis. 

Idea 1: Pokecreator

Inspired by this, I really wanted to explore the possibility of generating new pokemon based on some descriptions. However, the concept itself was baffling, let alone how I could go about achieving this. The next significant hurdle with this idea would be the dataset. I searched online and tried to source some datasets with multiple images of same pokemon that I could use to train the a model. I also explored expanding my current dataset of pokemon descriptions to include more desciptions for each Pokemon. Another hurdle would be that image quality of GANarated pokemon would be very poor. 

The obvious things to do here would be to remove the front-end webcam interface for giving buddies and the PoseNet portion, and replace it with a very simple interface for just getting the description from the user. The same Flask architecture for the entire web-app would be used, where the image would be generated on the backend and then ported to the front end for user to admire. 

Idea 2: Repictionary

Mostly since I was unsure whether the idea above would work, I thought of using a gan to create a reverse-pictionary style of game. The inspiration came from two sources – one being the game of Pictionary itself, the other being Revictionary: a reverse dictionary, where users can type in a description and get the word matching the description. The idea then would be to combine these two, using text-to-image synthesis. The format of the game would be 2 player. Each player would enter a desciption which would then be used to generate an image using a GAN. The second user would have to guess the description of the GANarated image and would receive a score based on how close to the answer they were.

An additional idea regarding Repictionary was to have a Human vs Bot implementation. The Bot would be comprised of an AI that would be able to guess an image based on the given image.  This ‘AI’ would have a list of descriptoins that it would provide to the game for the player to guess. 

This entire project idea has many layers to it: The first being the GAN which would generate images from text descriptions. The second would be a NLP tool that would compare two descriptions and give them a similarity score. If the AI works, then an added layer of image to text classification would be involved. 

Direction

It seems like Repictionary is the one I will be going for, possibly using AttnGAN. This pytorch implementation of AttnGan was the one used for the bird project described at the beginning. The plan would be to train this on CoCo dataset and other image datasets to see what kinds of outputs I am able to achieve. Just as a fun side project, I would even try to run it on the pokemon image dataset and see what results I am able to achieve. I plan on using NLTK and Gensim for the Natural Language part, and maybe add spaCy if necessary as well. The AI bot would only be touched upon if time permits, which I’m not sure it will. 

Week 12 : An.i.me – Final Project Proposal – Abdullah Zameek

For the final project, I wanted to experiment with some sort of generative art since I felt there is no better time and place to try out some sort of generative model first hand. Throughout the semester, I kept using a recurring Pokemon theme in most of my projects because of how fond I am of the series and came across this article that spoke about machine generated Pokemon.
This time, however, I wanted to do something a tad bit different, but along the same lines. So, I decided to bring in one of my other all-time favorite interests – Anime. 

Idea

We’ve all heard of Snapchat and their filters that allow various kinds of special effects to be applied onto your face. But, I think we could go one step further than that with the help of generative models such as GANs. 
I came across this paper that described a GAN model to generate anime characters, and this proved as a great source of inspiration for my project. What if a given human face could be translated across domains from reality into an animated face? After a bit of reading, it turned out that this exact application is doable with GAN models. 

The project presentation is here

Implementation

As I described in the presentation, I investigated two different models – pix2pix and CycleGAN. The reason why CycleGAN is a clear winner is because of the fact that it allows for unpaired image-to-image translation. This is highly desirable because of the fact a given anime character dataset is not going to have a corresponding “human” face pair. This allows for a great deal of flexibility in creating a model where  the anime character images and human faces can be treated independently. 
One of the key papers in cross-domain translation is this paper published by Facebook AI and tackles the matter of Unsupervised Cross Domain Image Generation. 
Going forward, I haven’t honed in onto a very specific model as of yet, but there are some great CycleGAN derived models out there such as DRAGAN, PCGAN and most notably, TwinGAN , which is derived from PCGAN. 
With regards to the dataset, once again, there are multiple datasets out there and while I will make a decision within the next few days, there are some strong contenders such as TwinGAN’s Getchu  and the popular Danbooru dataset

I’m very much inclined to go with the Getchu Dataset with the TwinGAN model because of the ease of access. However, the resulting model is not directly compatible with ml5js or p5js so there will be a bit of interfacing to do there which I’ll have to tackle. 

Goal

The final outcome can be thought of as a sort of “inference” engine where I’d input the image of a new human face and then generate its corresponding animated face. Ultimately, by the end of this project, I want to get a better understanding of working with generative models, as well as make something that’s amusing.  

iML Week 11: Final Project Proposal – Ivy Shi

Idea: 

My final project is a continuation of the midterm project to work on tattoo image generation using GAN. After exploring this topic/model and having trained a model on face generation, I am interested in utilizing it to fulfill my interest in creating tattoos. So far I have not found any project similar to this on the internet so I am excited to making something new. The idea is to create an interface that allows users to generate tattoo images, ideally incorporating personalized features like user input and style selection. 

Continue reading “iML Week 11: Final Project Proposal – Ivy Shi”