MLNI Final Project (Shenshen Lei)

Concept
For my final project I intended to create a immersive viewing experience.

Machine Learning Model: PoseNet

Process:
The final project is the extension of my midterm project idea. The prototype I created in the midterm project is to detect the position of the center of the triangle formed by two eyes and the nose. A big background picture will moved when the user’s face move. I received many suggestions from my classmates and instructors, so I decided to make a 3D version. So there will be two major parts in the project: the 3d model and the position detecting system.
For the model part, the p5.js only accepts obj and stl file. So I made some drafts in MAYA and tried them with the loadModel() function. The room model works. I also tried some other kinds of figures such as stones and guns. I have found that the model size has the limitation: no more than 1M size. Another problem the p5.js has on loading the 3D model the render material. The surface of the model loaded on the screen could only defined by settled function in p5.js such as normalMaterial(), which means that the rendered color made in MAYA cannot be sent to the screen. In another class one of my classmate used Three.js to created a interactive model online, which can render the model with surface material and colors. I am considering of using another machine model in the future.


About the direction controlling part, instructor suggest me to use the vector. I searched the createVector() function and learned some traits of vectors online. We used the ratio of the x number of two eyes and the center point, and the vertical movement is controlled by the distance between the center point of eyes and the nose. The poseNet is sensitive and blinks when detecting the figures. Thus it is important to smooth the changing process. I utilized lerp() function that send analogy data to the screen. I tried few times and finally decided the number 0.1, the unit that best smooth the project but also ensure the accuracy and rotate speed.

Initially I used a model of a room to display, but the problem is that the screen is limited. In this case when the user is moving the focus to change the direction of the model, he or she cannot viewing the full screen. Also, when the model rotate same direction with the face, the viewing experience is not that immersive.
To make the pointing direction more clear. I changed the model to the gun. So the user can control it to point to any directions.

Output:

Reflection:
Since my project is very experimental, it is not easy to display. But doing this project makes me thinking about the design of user directed interface and how could it be in used in other areas. I have many ideas of improving my project or make it in use such as games. I hope to keep on working on this project.
Thanks to all the friends who helped me with the project and those who gave great suggestions. I have learned a lot this semester.

MLNI Final Project Proposal (Shenshen Lei)

For my final project, I will keep working on the  my midterm project: the immersive museum viewer. 

There are mainly four parts in my final project:

First of all, I will change the method of calculating the coordinate. Instead of using one variable, I will employ the vector which will indicated the angle of face and the ratio of the distance between two eyes and the nose. 

Secondly, I will add an distance sensor. The mechanism is to use the poseNet detect the distance between the user’s left and right eye. When the user move close to the camera, the distance increase, and then the background image will amplify and vice versa. 

Thirdly, I will add the audio guide and background music when the user navigate in the scene. 

Finally, there will be some interaction point. For example, when the user points at the thing, the introduction will jump out. 

The aim of my project is to create a complete immersive and keyboard free experience for the users.

MLNI week9 KNN training (Shenshen Lei)

This week we are supposed to create a real time KNN model. I used the sampled code called KNN-image-Classification(link at the end). 

I trained the model to recognized whether the user is wearing glasses. 

The program will take screenshot during the training process. While in the get result function the machine will comparing the screenshot of video with those in the databases. I also added a filter that when the machine detects the result of “not wearing glasses”, the screen will became blur.

The obstacles I faced during editing the model is that I cannot find the parameters of the video in the origin model, so I did not add the filter directly on the html linked video. To show the result constantly, I added an image function and the filter function in the getResult part.

There are still some problems in the project. For example, the when the imaged blurred, the computing speed went too slow. 

Github link of the model: 

https://github.com/cvalenzuela/ml5_KNN_example

MLNI midterm Project (Shenshen Lei)

Shenshen Lei (sl6899)

For the midterm project, I intended to create a museum viewer imitating 3d effect. After researching on the current online museum, I have found that there are mostly two types of online museum viewing interaction: the first is by clicking the mouse, which is discontinuous, and the second is using the gravity sensor on the mobile phone, which asks the user to turn around constantly.

In my project, I want the user to have an immersive and continuous experience of viewing online museum without busy clicking and drugging the screen. To mimic an immersive experience, I used Postnet (the Posenet reacts faster than Bodypix) to track the position of the user’s eyes and the nose. These three-point form a triangle that can move when facing different directions. The coordination of the background picture move following the position of the triangle. In this process, one thing bothered me is that the user’s movement caught by the camera is mirror side, so I have to change the x-coordinate by using the width of canvas to subtract detected x. I calculated the coordinate of the centroid of the face triangle so I could use one pair of variables rather than six.

To show the position of detected face, I drew a circle without filing color. Also, I added a function that when the circle of gaze moved on to the staff, the name of the thing will display in the circle. (While I  initially believed that the circle can imply the users of the viewing process, but it seems confusing sometime.)

So my current project works as the following gif:

After presented my project to the class and guests, I have received many valuable suggestions. For the future improvement of my project, I will make some changes.


Firstly, with the suggestion from the professor, I will use the ratio of sides of the triangle rather than calculating the centroid, because using the vector as the variable will save the moving distance of the user. And that will also smooth the moving of the background picture. It may also improve the immersive experience from changing coordinate to changing viewing angles, as suggested by my classmates. Another thing is that I will add a zoom-in function that when the user moves closer to that camera, the picture will turn bigger. The method to solve it is to measure the distance of the user’s two eyes detected by a computer camera. Finally, for the instruction part, I will add a signal before the item introductions jump out. For example, when the camera detects the user’s hand and then will show the introduction in a few seconds.. I was inspired by Brandon’s projects for employing the voice commands, but the accuracy of voice command should be tested. There will be more thoughts on improving the user experience of my project. I am considering to keep on working on the project and show a more complete version in the final project.


Thanks to all the teachers and assistances and classmates who helped me or gave out advice on my project.

Week 5 MLNI

Shenshen Lei

sl6899

This week I create a interactive game named having a quite night. Initially there will be several mosquitoes flying around the screen. The camera and the mic will detect the users’ two hand and display it on the screen. When the mosquito is close enough and the sound of clasp is louder enough, the mosquito will disappear. If the user catch all six mosquitoes before the timer finish, a β€œsleep well” picture will came out, otherwise a sad face will appear on the screen.

I edit the code based on the sample code of interaction BodyPix. But there are still some logic problems in the program. I am confused about how to site class functions in draw function functions relates to array. I am keep working on the program and try to fix the bug and will make it playful in new versions.

One problem of the Bodypix game is that the camera and the picture on the screen display with time lags. Also the shape of hand is no accurate enough. Sometimes the camera may recognize things that have similar shape with hand and projects it on screen.