samedi 27 juin 2015

Having trouble determining a 2D feature matrix structure to feed into machine learning algorithm

I am training an emotion recognition system that detects emotions through facial movement as a result, I have formed a 4 dimensional matrix that I am trying to reduce to 2 dimensions.

Features that makes up the 4D matrix:
Number of videos (and each video will be assigned emotion label)
Number of frames per video
Direction of the facial landmarks per frame
Speed of the facial landmarks per frame

The important features that I am trying to train with:
The left side is the speed (hypotenuse between same facial landmark each frame)
The right side is direction (arctan of the x and y values of the same facial landmark each frame)

The 4D matrix that I am stuck with and trying to reduce to 2D

>> main.shape  
(60, 17, 68, 2)  
# 60 videos, 17 frames per video, 68 facial landmarks, 2 features (direction and speed)  
>> main  
array([[[[  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         ..., 
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ]],

        [[  1.        ,   1.        ],
         [  1.41421356,   0.78539816],
         [  1.41421356,   0.78539816],
         ..., 
         [  3.        ,   1.        ],
         [  3.        ,   1.        ],
         [  3.        ,   1.        ]],

        [[  0.        ,   0.        ],
         [ -1.41421356,   0.78539816],
         [ -1.41421356,   0.78539816],
         ..., 
         [  2.        ,   1.        ],
         [  3.        ,   1.        ],
         [  3.        ,   1.        ]],

        ..., 
        [[  1.        ,   1.        ],
         [  1.41421356,  -0.78539816],
         [  1.41421356,  -0.78539816],
         ..., 
         [ -1.41421356,   0.78539816],
         [  1.        ,   1.        ],
         [ -1.41421356,   0.78539816]],

        [[  2.23606798,  -0.46364761],
         [  2.82842712,  -0.78539816],
         [  2.23606798,  -0.46364761],
         ..., 
         [  1.        ,   0.        ],
         [  0.        ,   0.        ],
         [  1.        ,   1.        ]],

        [[ -1.41421356,  -0.78539816],
         [ -2.23606798,  -0.46364761],
         [ -2.23606798,  -0.46364761],
         ..., 
         [  1.41421356,  -0.78539816],
         [  1.41421356,  -0.78539816],
         [  2.23606798,  -1.10714872]]],


       [[[  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         ..., 
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ]],

        [[  2.        ,   1.        ],
         [  2.23606798,  -1.10714872],
         [  1.41421356,  -0.78539816],
         ..., 
         [ -2.        ,  -0.        ],
         [ -1.        ,  -0.        ],
         [ -1.41421356,  -0.78539816]],

        [[  2.        ,   1.        ],
         [ -2.23606798,   1.10714872],
         [ -1.41421356,   0.78539816],
         ..., 
         [  1.        ,   1.        ],
         [ -1.        ,  -0.        ],
         [ -1.        ,  -0.        ]],

        ..., 
        [[ -2.        ,  -0.        ],
         [ -3.        ,  -0.        ],
         [ -4.12310563,  -0.24497866],
         ..., 
         [  0.        ,   0.        ],
         [ -1.        ,  -0.        ],
         [ -2.23606798,   1.10714872]],

        [[ -2.23606798,   1.10714872],
         [ -1.41421356,   0.78539816],
         [ -2.23606798,   1.10714872],
         ..., 
         [ -2.23606798,   0.46364761],
         [ -1.41421356,   0.78539816],
         [ -1.41421356,   0.78539816]],

        [[  2.        ,   1.        ],
         [  1.41421356,   0.78539816],
         [  2.82842712,   0.78539816],
         ..., 
         [  1.        ,   1.        ],
         [  1.        ,   1.        ],
         [ -2.23606798,  -1.10714872]]],


       [[[  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         ..., 
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ]],

        [[  1.        ,   1.        ],
         [  0.        ,   0.        ],
         [  1.        ,   1.        ],
         ..., 
         [ -3.        ,  -0.        ],
         [ -2.        ,  -0.        ],
         [  0.        ,   0.        ]],

        [[  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         ..., 
         [  1.41421356,   0.78539816],
         [  1.        ,   0.        ],
         [  0.        ,   0.        ]],

        ..., 
        [[  1.        ,   0.        ],
         [  1.        ,   1.        ],
         [  0.        ,   0.        ],
         ..., 
         [  2.        ,   1.        ],
         [  3.        ,   1.        ],
         [  3.        ,   1.        ]],

        [[ -7.28010989,   1.29249667],
         [ -7.28010989,   1.29249667],
         [ -8.54400375,   1.21202566],
         ..., 
         [-22.02271555,   1.52537305],
         [ 22.09072203,  -1.48013644],
         [ 22.36067977,  -1.39094283]],

        [[  1.        ,   0.        ],
         [  1.41421356,  -0.78539816],
         [  1.        ,   0.        ],
         ..., 
         [ -1.41421356,  -0.78539816],
         [  1.        ,   1.        ],
         [  1.41421356,   0.78539816]]],


       ..., 
       [[[  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         ..., 
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ]],

        [[  5.38516481,   0.38050638],
         [  5.09901951,   0.19739556],
         [  4.47213595,  -0.46364761],
         ..., 
         [ -1.41421356,   0.78539816],
         [ -2.82842712,   0.78539816],
         [ -5.        ,   0.64350111]],

        [[ -6.32455532,   0.32175055],
         [ -6.08276253,  -0.16514868],
         [ -5.65685425,  -0.78539816],
         ..., 
         [  3.60555128,   0.98279372],
         [  5.        ,   0.92729522],
         [  5.65685425,   0.78539816]],

        ..., 
        [[ -3.16227766,  -0.32175055],
         [ -3.60555128,  -0.98279372],
         [  5.        ,   1.        ],
         ..., 
         [ 12.08304597,   1.14416883],
         [ 13.15294644,   1.418147  ],
         [ 14.31782106,   1.35970299]],

        [[  3.60555128,  -0.5880026 ],
         [  4.47213595,  -1.10714872],
         [  6.        ,   1.        ],
         ..., 
         [-20.39607805,   1.37340077],
         [-21.02379604,   1.52321322],
         [-22.09072203,   1.48013644]],

        [[  1.        ,   1.        ],
         [ -1.41421356,   0.78539816],
         [  1.        ,   1.        ],
         ..., 
         [  4.12310563,   1.32581766],
         [  4.        ,   1.        ],
         [  4.12310563,   1.32581766]]],


       [[[  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         ..., 
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ]],

        [[  0.        ,   0.        ],
         [  1.        ,   1.        ],
         [ -2.23606798,   1.10714872],
         ..., 
         [ -3.16227766,   0.32175055],
         [  1.        ,   1.        ],
         [  1.41421356,  -0.78539816]],

        [[  1.        ,   1.        ],
         [  1.        ,   1.        ],
         [  1.        ,   1.        ],
         ..., 
         [  3.        ,   1.        ],
         [  2.        ,   1.        ],
         [ -1.41421356,   0.78539816]],

        ..., 
        [[  5.38516481,  -1.19028995],
         [  4.47213595,  -1.10714872],
         [  4.12310563,  -1.32581766],
         ..., 
         [  2.23606798,  -0.46364761],
         [  1.        ,   1.        ],
         [ -1.        ,  -0.        ]],

        [[ -5.38516481,   1.19028995],
         [ -4.12310563,   1.32581766],
         [ -3.16227766,   1.24904577],
         ..., 
         [  0.        ,   0.        ],
         [  1.        ,   0.        ],
         [  1.41421356,  -0.78539816]],

        [[  8.06225775,   1.44644133],
         [ -7.07106781,  -1.42889927],
         [  6.        ,   1.        ],
         ..., 
         [ -3.16227766,  -0.32175055],
         [ -3.16227766,  -0.32175055],
         [ -3.16227766,  -0.32175055]]],


       [[[  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         ..., 
         [  0.        ,   0.        ],
         [  0.        ,   0.        ],
         [  0.        ,   0.        ]],

        [[ -2.23606798,   0.46364761],
         [ -1.41421356,   0.78539816],
         [ -2.23606798,   0.46364761],
         ..., 
         [  1.        ,   0.        ],
         [  1.        ,   0.        ],
         [  1.        ,   1.        ]],

        [[ -2.23606798,  -0.46364761],
         [ -1.41421356,  -0.78539816],
         [  2.        ,   1.        ],
         ..., 
         [  0.        ,   0.        ],
         [  1.        ,   0.        ],
         [  1.        ,   0.        ]],

        ..., 
        [[  1.        ,   0.        ],
         [  1.        ,   1.        ],
         [ -2.23606798,  -1.10714872],
         ..., 
         [ 19.02629759,   1.51821327],
         [ 19.        ,   1.        ],
         [-19.10497317,  -1.46591939]],

        [[  3.60555128,   0.98279372],
         [  3.60555128,   0.5880026 ],
         [  5.        ,   0.64350111],
         ..., 
         [  7.28010989,  -1.29249667],
         [  7.61577311,  -1.16590454],
         [  8.06225775,  -1.05165021]],

        [[ -7.28010989,   1.29249667],
         [ -5.        ,   0.92729522],
         [ -5.83095189,   0.5404195 ],
         ..., 
         [ 20.09975124,   1.47112767],
         [ 21.02379604,   1.52321322],
         [-20.22374842,  -1.42190638]]]])

The direction and speed features are quite valuable (the most important features) as it represents the movement of each facial landmark per frame and I am trying to get the machine learning algorithm to train base on that

I tried to reshape three of the dimensions into one long vector (just mushed speed, direction, and frame all together) and finally formed a 2D matrix, I fed it into sklearn SVM function and it produced a rather low accuracy. I expected this as I figured there is no way the ml algorithm would recognize that the difference between the features in the giant single matrix and assume that everything in the vector is the same features.

The 2D matrix I was forced to make to feed in sklearn SVM by forcing speed, direction, and video per frame all into one vector, and got a low accuracy with:

>> main
array([[  0.        ,   0.        ,   0.        , ...,  -0.78539816,
          2.23606798,  -1.10714872],
       [  0.        ,   0.        ,   0.        , ...,   1.        ,
         -2.23606798,  -1.10714872],
       [  0.        ,   0.        ,   0.        , ...,   1.        ,
          1.41421356,   0.78539816],
       ..., 
       [  0.        ,   0.        ,   0.        , ...,   1.        ,
          4.12310563,   1.32581766],
       [  0.        ,   0.        ,   0.        , ...,  -0.32175055,
         -3.16227766,  -0.32175055],
       [  0.        ,   0.        ,   0.        , ...,   1.52321322,
        -20.22374842,  -1.42190638]])
>> main.shape
(60, 2312)

I want to preserve the speed and direction features, but have to represent them in a 2D matrix that takes into account the frames in the video.

The emotion label will be attached to each of the 17 frames in each video. (so basically, the 17 frame video will be labeled as an emotion)

Is there any smart way in reshaping and reducing the 4D matrix that would accomplish this?

Aucun commentaire:

Enregistrer un commentaire