Considering that people on the street will wear clothes of various colors, I plan to try to use face tracking. It can only be based on grayscale recognition and needs to use jit.rgb2luma object  to convert RGB images into monochrome. But this also means that the image needs to have good contrast to be tracked successfully.
Can see from the video, the tracking is not very stable. In the second half of the video, when I increased the lighting conditions of the face, the tracking stabilized a lot.
But for me currently this method's shortcoming is obvious: once the face is blocked, the tracking will be lost.
So I tried by putting on a mask and sunglasses. Under the same brightness conditions, masks can easily interfere with face tracking. Sunglasses have relatively little interference with tracking, but when the distance is far (about 60 cm away from the webcam), the face tracking starts to appear to be lost.
(Through a small rectangle on the right to determine whether it has been tracked. When it is white, it means that it has been tracked, otherwise it means that the track has been lost.)
Next, I will try motion tracking, and use Kinect to complete the clock chasing of people, and let the computer process the data of the distance.