Week 06: Midterm Project Proposal — Crystal Liu

Initial Thought

       My expected midterm project is an interactive virtual instrument. Firstly, the trained model can identify the differences in how musical instruments are played. Once it gets the result it will play the corresponding sound of the instrument. Also there will be a picture of the instrument on the screen around the user.

        For example, if the user pretended to be playing guitar, the model would recognize the instrument is guitar and automatically play the sound of guitar. Then there will be an image of guitar showing on the screen. The expected result is that it looks like the user is really playing the guitar on the screen.

Inspiration 

         My inspiration is a project called teachable machine. This model can be trained in real time. The user can make some similar poses as the input for a class. The maximum of the class is 3. Each pose correspond an image or gif. After setting up the dataset, once the user makes one of the three poses, the corresponding result will come out.

For me the core idea is excellent but the form of output is a little bit.  There are also some projects about motion and music or other sound.

So I want to add audio as output. Also the sound of different musical instruments is really artistic and people are familiar with it. Thus my final thought is to let users trigger the sound of the instrument by acting like playing the corresponding musical instrument. 

Technology 

In order to achieve my expectation, I need the technology to locate each part of my body and to identify different poses and then classify them automatically and immediately. According to the previous knowledge, I decide to use PoseNet to do the location part.

I plan to set a button on the camera canvas so that the users don’t need to press the mouse to input information and the interaction will be more natural. To achieve it, I need to set a range for the coordination of my hand. When I lift my hand to the range, the model will start receiving the input image and 3 seconds later it will automatically end it. Next time the user makes similar poses the model will give corresponding output. KNN is a traditional algorithm to classify things. So it can be used to let the model classify the poses in a short time and achieve real time training.

Significance

My goal is to create a project to let people interact with AI in an artistic and natural way. They can make the sound of a musical instrument without having a real physical one. Also it is a way to combine advanced technology and daily art. It provides an interesting and easy way to help people learn about and experience Artificial Intelligence in their daily life.

Week 06: Midterm Project Proposal—Ziying Wang

For the midterm project, I plan to develop an interactive experimental art project: Dancing with a stranger.

Background:

Dancing with a stranger is an interactive experiment art project that requires two users to participate. The idea is that user A and B’s limb movements will be detected. User A will be in control of the figure A’s arms and figure B’s legs, user B will be in control of figure A’s legs and figure B’s arms. The result will be presented on the screen with a dark starry night background with two glowy figures dancing. Ideally, the webcam can also detect the speed of the users’ feet movements and switch through a set of different songs that match distinctive speeds. 

The following photoshopped image illustrates the project. The white dots are used here to demonstrate the joints that will be detected on the users, and they will not appear in the final result.

The yellow figure’s arms and the pink figure’s legs will be the movements of one user; the yellow figure’s legs and the pink figure’s arms will be the movements of the other user.

Motivation:

The idea of this project was inspired by Sam Smith and Normani’s song “Dancing with a stranger”. When I’m listening to this song, it presents me with a picture that even though the two people are not familiar with each other, they are bonded by the music and therefore create a tacit, mutual understanding. In most dual games/ interactive designs, each player is asked to take full control of his/her character, I decided to pursue a different way. What if a person can only control half of the character, and only with the cooperation with another person, can they successfully create a beautiful dance together? That’s how this came to mind. 

Reference:

The model inspiration was the “Body, Movement, Language: AI Sketches With Bill T. Jones”. The model is based on the PoseNet model that detects the limbs and the joint of the user. I intended to use this detected coordinates to create the structure of the two glowy figures. Since we need to distinguish between the two users, I intend to restrict the area of detection, which means only the movements within a specific range can be detected and presented on screen. Also, the position of the two glowy figures can’t be stationary since I need to create a dance. I intend to randomize the movement of the head with the torso and the limbs will be attached to the floating torso.

Week 06 Assignment: Document Midterm Concept – Eszter Vigh

My idea is to use a speech to text converter to train a model to recognize my sentences (regardless of tone, etc)… I would most likely ask my friends to read some random sentences too just to build that data set and to see how well it works in terms of recognizing different words regardless of accents/tones/etc. 

Then I would collect a lot of movies/tv shows (I think just in English to make it easier on myself) and make the voice bring up clips of those specific quotes. o imagine if you said “How you doin?” and a clip of Joey from Friends came up and says the iconic line back to you. This requires text matching across the two sets of text.

The whole idea came about when I thought about my terrible habit of taking gifs of my favorite movies and making them into WeChat stickers. But sometimes, see I can only remember the line. I’m so bad with names and faces that I’ll just type in the quote and hope it’s iconic enough for the scene to come up, but let’s say I am a couple of words off, then I’m completely stuck. 

This project is personal to me because I love movies, I have seen so many that at times I get into arguments with my friends about the details (usually it’s about when something happened on the Marvel timeline or something). Imagine having this tool just there to help prove you are right to all your friends!

The main challenge I see with this project is getting a hold of the videos. Netflix doesn’t allow for the downloading of full films But there are ways around this. The script of most films are available online… (most notably the Bee Movie… because that gem… I mean who wouldn’t want to have just the entire script). But for TV shows, it may be more difficult. 

In terms of references, when I met with Aven we talked about similar projects including a sample that takes a random squiggle and matches it to landforms within the trained data set. Imagine that happening with audio and matching with a video. 

Same logic, just a slightly different implementation.