MLNI – Final Project (Wei Wang | Cherry Cai)

Project: Control

Partner: Cherry Cai (rc3470)

Concept

Many people, in current society, are under many sources of control, either physically or psychologically. And often, we find ourselves, acting against original wishes, “manipulated” by the outside world, the media, works, surrounding people, etc. Therefore, our project explores the potential to simulate this process with machine learning and a puppet, which is a typical representation of the concept of “control”.

Process

We developed three versions of this project, mainly working on the interface, during the experimenting process.

- Version 1

- The first interface was developed with pure puppet interaction. We put a digital marionette that looks similar to the physical one in the middle of the screen. And with the help of PoseNet, we were able to detect key points positions of the actual puppet, which the virtual one will move with. Only arms and legs will respond to the detection, which would be raised if the corresponding part of the actual one were raised and put down when the same happens in the physical world. Moreover, there is the “attempt” of the virtual one to get out of control by deliberately throwing the part that is manipulated to an extreme extent. And this is how it works visually.

- The marionette was brought from TaoBao and originally controlled with a pair of bars, connected with four strings to the wrists and ankles. This, however, makes the marionette too difficult to manipulate with, therefore, we set up a “stage” that sticks the puppy on the board and abandon the control bars but operate the puppy by pulling the strings directly.

- Version 2

- As we received feedback that it could be distracting to place a similar figure on screen reacting to the physical one, We moved on to develop our second version of this project. For this version, we take a photo of the user who stands in front of the webcam, which is processed with BodyPix and the person’s image will be recognized by the model with the head, the torso, arms, hands and legs. The person’s figure is used to replace the puppy image and placed on the screen. As this involves taking a screenshot of the person who should stand somewhere away from the computer to take a full picture, we place a gesture image on the screen to guide people to follow. And once the gesture is achieved, a photo will be taken and lead to the following process. All other interactions stay the same, with the virtual figure stuck to the center of the screen. And here is a demo of this version working with people.

However, PoseNet works poorly with the puppy due to the abstraction of hands and feet. Therefore, the confidence scores of key points were too low that there has been always huge uncertainty and the visual movement was not smooth enough. We then moved on to develop our third version.

- Version 3

- For this last version, we continue to use BodyPix to analyze the screenshot and improve the image segmentation with pixel iteration. And the figure on the screen can move completely freely with anyone in front of the webcam. The interaction of segments throwing out in an attempt to fight back becomes less fit to this version, therefore, we exclude this from the final version. And here is the demo.

Difficulties

- Segments Rotation
- The first difficulty we encountered was to match the movement of limbs with real-world movement. We came up with the solution to calculate respective angles and rotate the arms and legs in terms of shoulders and hips. And we successfully figured out the angles by using atan2() function in p5 to calculate the degree based on x, y relative positions of two points.
```
 // radias calculation radRightArm = - atan2(rightWristY - rightShoulderY, rightShoulderX - rightWristX); radLeftArm = atan2(leftWristY - leftShoulderY, leftWristX - leftShoulderX); // set leg to move within certern range radRightLeg = PI/2 - atan2(rightAnkleY - rightKneeY, rightKneeX - rightAnkleX) ; radLeftLeg = atan2(leftAnkleY - leftKneeY, leftAnkleX - leftKneeX) - PI/2;
```
- For version 1 and 2, the interface involves limbs moving with users and being thrown. When the limbs follow users’ movement, they rotate around shoulders and hips with translate(). However, during the process of throwing, they rotate around perspective center points. Therefore, we would need to adopt two systems of coordinates, one for the interaction before throwing and the other after. Also, the translating center for throwing also needs calculation. This point should be the middle point of a segment and, thus, is calculated from rotation with the midpoints with the simultaneous angle. For example:

translate(width/2 - 100 * sin(radLeftArm/2), height/2 - 200 * cos(radLeftArm/2));

- Segments Subtraction
- BodyPix detects not only the front surface but also the back areas. Therefore, to get the segments in need, we construct an array with 0s and 1s at the length of 24, the total number of parts that the BodyPix can detect. And 1 at the corresponding index of the array indicates that we would need the segment and copy the image, while 0 means that this segment result can be ignored.
- To subtract out the image of a person with segments, we searched for the minimum and maximum Xs and Ys that bound the segment as a rectangle in Version 2. And we copy the image in the bounded area to a predefined square and then scale it and put it to the respective position.
- However, using boxes of bounding areas will include a lot more space that does not belong to part of the body image. With help from Moon, we were able to copy only the pixels in the rectangle what is classified by BodyPix as body part by using pixel iteration.

// data[index] >= 0 means that it is recognized as one of the 24 parts that bodypix can detect.
if (data[index] >= 0) { imgSegments[i].pixels[segIndex*4+0] = snapshot.pixels[index*4+0]; imgSegments[i].pixels[segIndex*4+1] = snapshot.pixels[index*4+1]; imgSegments[i].pixels[segIndex*4+2] = snapshot.pixels[index*4+2]; imgSegments[i].pixels[segIndex*4+3] = 255; }

Future Development

For future development, there is one thing that we find particularly needs to be improved during the final show. The screenshot, at the current stage, is triggered so easily that even a picture would be taken if there is a passer-by. Therefore, we think about to add a delay and count-down numbers as signs before taking pictures to give users the sense to keep the gesture for BodyPix to best recognize. Moreover, we can only restart the process by reloading the webpage, which interrupts the experience. Hence, we would like to design one more gesture recognition to start the project over to take another screenshot.

And finally, there is huge room for improvement of the interface. For the last version, even though the position of the whole body moves with respect to the users horizontally and vertically. However, for example, when a user shakes his or her head to a certain angle, the image would not follow. Therefore, I think it would be essential to come up with a way to calculate this degree of incline with PoseNet, which I have no idea of how to achieve currently, to optimize the experience.

Leave a Reply Cancel reply