手势识别（二） - 静态手势动作识别

这篇具有很好参考价值的文章主要介绍了手势识别（二） - 静态手势动作识别。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

我公司的科室开始在公众号上规划一些对外的技术文章了，包括实战项目、模型优化、端侧部署和一些深度学习任务基础知识，而我负责人体图象相关技术这一系列文章，偶尔也会出一些应用/代码解读等相关的文章。
文章在同步发布至公众号和博客，顺带做一波宣传。有兴趣的还可以扫码加入我们的群。
（文章有写的不好的地方请见谅，另外有啥错误的地方也请大家帮忙指出。）
（另外，文章引用的图片or代码如有侵权，请联系我删除。）

微信公众号：AI炼丹术

更新一下 2022.4.30：由于近期很多人加我，而且发现原作者EricLee的代码加了很多组件模块（本身作者不是单做一个手势识别的），然后我这阵子也比较忙，没办法一个一个粉丝去梳理代码。所以我这边把之前梳理的以及加入我做的静态手势识别那部分代码单独上传到github上了，大家自取即可，https://github.com/ooooxianyu/simple-handpose-recognition。参照readme应该可以直接运行。（另外方便的话大家可以给我github点个星😀）

手势识别（二） - 静态手势动作识别

一、项目概述

上一节我们已经可以实现从手部检测到手部关键点检测这两个简单的功能呢。但是实际上，计算机获取到的信息只是图像上手部的位置和关键点的位置信息。而这些信息代表的意思是什么，我们并不知道。所以本节要对上一节我们所能获取到的这一些信息，做进一步的分析，对一些简单的静态手势动作做识别。

静态手势动作包括数数字，握拳，拉钩等……不难想到，当我们竖起一根食指代表数字一、竖起食指和无名指代表数字二或耶，五根手指合在一起代表握拳，竖起尾指代表拉钩。这些我们日常常用的手势动作，我们可以直接从关节点上对应的位置信息就可以分析得到对应的手势。

本文将利用手部关节点的位置信息，通过制定相应规则和神经网络两种方式，对这些简单的静态手势动作进行识别。

二、技术简述 & 代码实现

1. 基于规则简单实现手势动作识别

这里直接引用EricLee作者再源码中提供的部分简单手势识别规则代码。

作者通过计算手指头关节上节和下节的夹角，设定相关阈值来判断手指是否弯曲或伸直。接着通过五个手指头的弯曲数量来识别手势。如，食指伸直，其他弯曲判定为one；食指和无名指伸直，其他弯曲判定为yearh…以此类推。

目前该示例由于静态手势数据集的限制，目前用手骨骼的二维角度约束定义静态手势，原理如下图,计算向量AC和DE的角度，它们之间的角度大于某一个角度阈值（经验值）定义为弯曲，小于摸一个阈值（经验值）为伸直。
注：这种静态手势识别的方法具有局限性，有条件还是通过模型训练的方法进行静态手势识别。

作者只实现了握拳 fist、五 five、枪 gun、爱你 love、一 one、六 six、三 three、点赞 thumbUp、比耶 yeah，九个动作。根据作者的意思，二维约束的方法定义手势，由于受限于没有大量的静态手势数据集原因，只实现了部分手势动作。

弊端也很明显，由于规则简单，而且是人工设定阈值。容易因为检测出现偏差，阈值设定不合适，导致误检的状态。

完整代码来源：https://codechina.csdn.net/EricLee/dpcas/-/blob/e14f78e482d887f8c440a5027d82be11e571d264/lib/hand_lib/cores/handpose_fuction.py

'''
    求解二维向量的角度
'''
def vector_2d_angle(v1,v2):
    v1_x=v1[0]
    v1_y=v1[1]
    v2_x=v2[0]
    v2_y=v2[1]
    try:
        angle_=math.degrees(math.acos((v1_x*v2_x+v1_y*v2_y)/(((v1_x**2+v1_y**2)**0.5)*((v2_x**2+v2_y**2)**0.5))))
    except:
        angle_ =65535.
    if angle_ > 180.:
        angle_ = 65535.
    return angle_
'''
    获取对应手相关向量的二维角度
'''
def hand_angle(hand_,x=0,y=0):
    angle_list = []
    #---------------------------- thumb 大拇指角度
    angle_ = vector_2d_angle(
        ((int(hand_['0']['x']+x)- int(hand_['2']['x']+x)),(int(hand_['0']['y']+y)-int(hand_['2']['y']+y))),
        ((int(hand_['3']['x']+x)- int(hand_['4']['x']+x)),(int(hand_['3']['y']+y)- int(hand_['4']['y']+y)))
        )
    angle_list.append(angle_)
    #---------------------------- index 食指角度
    angle_ = vector_2d_angle(
        ((int(hand_['0']['x']+x)-int(hand_['6']['x']+x)),(int(hand_['0']['y']+y)- int(hand_['6']['y']+y))),
        ((int(hand_['7']['x']+x)- int(hand_['8']['x']+x)),(int(hand_['7']['y']+y)- int(hand_['8']['y']+y)))
        )
    angle_list.append(angle_)
    #---------------------------- middle 中指角度
    angle_ = vector_2d_angle(
        ((int(hand_['0']['x']+x)- int(hand_['10']['x']+x)),(int(hand_['0']['y']+y)- int(hand_['10']['y']+y))),
        ((int(hand_['11']['x']+x)- int(hand_['12']['x']+x)),(int(hand_['11']['y']+y)- int(hand_['12']['y']+y)))
        )
    angle_list.append(angle_)
    #---------------------------- ring 无名指角度
    angle_ = vector_2d_angle(
        ((int(hand_['0']['x']+x)- int(hand_['14']['x']+x)),(int(hand_['0']['y']+y)- int(hand_['14']['y']+y))),
        ((int(hand_['15']['x']+x)- int(hand_['16']['x']+x)),(int(hand_['15']['y']+y)- int(hand_['16']['y']+y)))
        )
    angle_list.append(angle_)
    #---------------------------- pink 小拇指角度
    angle_ = vector_2d_angle(
        ((int(hand_['0']['x']+x)- int(hand_['18']['x']+x)),(int(hand_['0']['y']+y)- int(hand_['18']['y']+y))),
        ((int(hand_['19']['x']+x)- int(hand_['20']['x']+x)),(int(hand_['19']['y']+y)- int(hand_['20']['y']+y)))
        )
    angle_list.append(angle_)

    return angle_list
'''
    # 二维约束的方法定义手势，由于受限于没有大量的静态手势数据集原因
    # fist five gun love one six three thumbup yeah
    # finger id: thumb index middle ring pink
'''
def h_gesture(img,angle_list):
    thr_angle = 65.
    thr_angle_thumb = 53.
    thr_angle_s = 49.
    gesture_str = None
    if 65535. not in angle_list:
        if (angle_list[0]>thr_angle_thumb)  and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
            gesture_str = "fist"
        elif (angle_list[0]<thr_angle_s)  and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]<thr_angle_s) and (angle_list[4]<thr_angle_s):
            gesture_str = "five"
        elif (angle_list[0]<thr_angle_s)  and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
            gesture_str = "gun"
        elif (angle_list[0]<thr_angle_s)  and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]<thr_angle_s):
            gesture_str = "love"
        elif (angle_list[0]>5)  and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
            gesture_str = "one"
        elif (angle_list[0]<thr_angle_s)  and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]<thr_angle_s):
            gesture_str = "six"
        elif (angle_list[0]>thr_angle_thumb)  and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]<thr_angle_s) and (angle_list[4]>thr_angle):
            gesture_str = "three"
        elif (angle_list[0]<thr_angle_s)  and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
            gesture_str = "thumbUp"
        elif (angle_list[0]>thr_angle_thumb)  and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
            gesture_str = "yeah"

    return gesture_str

2. 基于深度学习的方法实现简单手势动作识别

这里使用EricLee作者提供的14类静态手势动作，具体数据来源可从下方链接获取。
手势识别（二） - 静态手势动作识别

14类静态手势动作数据集来源：https://codechina.csdn.net/EricLee/classification

由于这里的手势动作数据集，背景都很单一，而且数据集的数量并不大。（每类数据集200张的样子）所以如果我们直接用一个分类网络对上述数据集做一个14个类别的分类。那这个效果的鲁棒性会相当差，在实际场景下并不能使用。

我这边对这些数据做了下处理，引入前面得到的手指关节点信息。如下图所示。
手势识别（二） - 静态手势动作识别

可以看出和之前的区别是，我利用前面的手指关键点识别网络，将14类手势动作数据集进行处理，最终只保留了手指的位置信息，并在图像上绘制出来线条。

我们刚开始学分类算法的时候，第一个任务肯定是手写数字识别。我们在图像上将手指关节点绘制出来的，每个手势相当于是一个手写数字一样。比起用原图去做分类，这样去训练一个分类任务出来的模型能够适用于各种不同的背景。（前提条件是关键点的预测是相对准确的。）

小编这边随便找了一个分类任务的源码：https://github.com/weiaicunzai/pytorch-cifar100。代码非常简单，这里就不再演示训练过程了。只需要通过前面的hand-pose对14类手势动作的数据进行预处理，得到上图手指描绘图就可以训练，实现一个分类网络。

将代码加入到作者EricLee的源码里，如下代码段，通过手指关键点估计得到手势绘制图s_img_mask，再将它传入手势识别网络 gesture_model得到类别。output = gesture_model(s_img_mask)。

下面代码是EricLee最初一版的代码，我稍微修改了下，加入了gesture_model和gesture_dict，识别手势动作和字典存储（方便后续使用）。同时，作者制定了规则：食指和大拇指的捏和放开判断，即实现了点击（click）判断。

def handpose_track_keypoints21_pipeline(img,hands_dict,hands_click_dict,track_index,algo_img = None,handpose_model = None,gesture_model = None,
                                        index_finger_track = None, gesture_dict = None,title_boxes= [], icon=None,vis = False,dst_thr = 35,angle_thr = 16.):

    hands_list = []

    if algo_img is not None:

        for idx,id_ in enumerate(sorted(hands_dict.keys(), key=lambda x:x, reverse=False)):

            x_min,y_min,x_max,y_max,score,iou_,cnt_,ui_cnt = hands_dict[id_]

            # x_min,y_min,x_max,y_max,score = bbox
            w_ = max(abs(x_max-x_min),abs(y_max-y_min))
            if w_< 60:
                continue
            w_ = w_*1.26

            x_mid = (x_max+x_min)/2
            y_mid = (y_max+y_min)/2

            x1,y1,x2,y2 = int(x_mid-w_/2),int(y_mid-w_/2),int(x_mid+w_/2),int(y_mid+w_/2)

            x1 = np.clip(x1,0,img.shape[1]-1)
            x2 = np.clip(x2,0,img.shape[1]-1)

            y1 = np.clip(y1,0,img.shape[0]-1)
            y2 = np.clip(y2,0,img.shape[0]-1)

            bbox_ = x1,y1,x2,y2

            pts_ = handpose_model.predict(algo_img[y1:y2,x1:x2,:])

            img_mask = np.ones(algo_img.shape, dtype=np.uint8)
            img_mask[:, :, 0] = 255
            img_mask[:, :, 1] = 255
            img_mask[:, :, 2] = 255

            plam_list = []
            pts_hand = {}
            for ptk in range(int(pts_.shape[0]/2)):
                xh = (pts_[ptk*2+0]*float(x2-x1))
                yh = (pts_[ptk*2+1]*float(y2-y1))
                pts_hand[str(ptk)] = {
                    "x":xh,
                    "y":yh,
                    }
                if ptk in [0,1,5,9,13,17]:
                    plam_list.append((xh+x1,yh+y1))
                if ptk == 0: #手掌根部
                    hand_root_ = int(xh+x1),int(yh+y1)
                if ptk == 4: # 大拇指
                    thumb_ = int(xh+x1),int(yh+y1)
                if ptk == 8: # 食指
                    index_ = int(xh+x1),int(yh+y1)

            # 计算食指和大拇指中心坐标
            choose_pt = (int((index_[0]+thumb_[0])/2),int((index_[1]+thumb_[1])/2))
            # 计算掌心
            plam_list = np.array(plam_list)
            plam_center = (np.mean(plam_list[:,0]),np.mean(plam_list[:,1]))

            # 计算食指大拇指的距离
            dst = np.sqrt(np.square(thumb_[0]-index_[0]) +np.square(thumb_[1]-index_[1]))
            # 计算大拇指和手指相对手掌根部的角度：
            angle_ = vector_2d_angle((thumb_[0]-hand_root_[0],thumb_[1]-hand_root_[1]),(index_[0]-hand_root_[0],index_[1]-hand_root_[1]))
            # 判断手的点击click状态，即大拇指和食指是否捏合
            click_state = False
            if dst<dst_thr and angle_<angle_thr: # 食指和大拇指的坐标欧氏距离，以及相对手掌根部的相对角度，两个约束关系判断是否点击
                click_state = True
                cv2.circle(img, choose_pt, 6, (0,0,255),-1) # 绘制点击坐标，为轨迹的坐标
                cv2.circle(img, choose_pt, 2, (255,220,30),-1)
                cv2.putText(img, 'Click {:.1f} {:.1f}'.format(dst,angle_), (int(x_min+2),y2-1),cv2.FONT_HERSHEY_COMPLEX, 0.45, (255, 0, 0),5)
                cv2.putText(img, 'Click {:.1f} {:.1f}'.format(dst,angle_), (int(x_min+2),y2-1),cv2.FONT_HERSHEY_COMPLEX, 0.45, (0, 0, 255))
            else:
                click_state = False
                cv2.putText(img, 'NONE  {:.1f} {:.1f}'.format(dst,angle_), (int(x_min+2),y2-1),cv2.FONT_HERSHEY_COMPLEX, 0.45, (255, 0, 0),5)
                cv2.putText(img, 'NONE  {:.1f} {:.1f}'.format(dst,angle_), (int(x_min+2),y2-1),cv2.FONT_HERSHEY_COMPLEX, 0.45, (0, 0, 255))

            #----------------------------------------------------
            # 记录手的点击（click）计数器，用于判断click稳定输出状态
            if id_ not in hands_click_dict.keys():# 记录手的点击（click）计数器，用于稳定输出
                hands_click_dict[id_] = 0
            if click_state == False:
                hands_click_dict[id_] = 0
            elif click_state == True:
                hands_click_dict[id_] += 1



            #--------------------- 绘制手的关键点连线
            draw_bd_handpose_c(img,pts_hand,x1,y1,2)
            draw_mask_handpose(img_mask, pts_hand, x1, y1, int(((x2-x1)+(y2-y1))/128))

            s_img_mask = img_mask[y1:y2,x1:x2,:]
            s_img_mask = cv2.resize(s_img_mask, (128, 128))
            # cv2.imwrite("output/test_2/123.jpg", s_img_mask)

            s_img_mask = Image.fromarray(s_img_mask)

            if transform is not None:
                s_img_mask = transform(s_img_mask)
            s_img_mask = s_img_mask.unsqueeze(dim=0).cuda()
            output = gesture_model(s_img_mask)
            print(output)

            pre_tag = torch.argmax(output, dim=1)[0].cpu().detach().tolist()
            gesture_name = tags[str(pre_tag)]
            if gesture_name == "gun":
                gesture_name = "one"
            print('label:', gesture_name)
            if id_ in gesture_dict.keys():
                if gesture_dict[id_][0] == gesture_name:
                    gesture_count = gesture_dict[id_][1]+1
                else:
                    gesture_count =0
            else:
                gesture_count = 0
            #----------------------------------------------------
            hands_list.append((pts_hand,(x1,y1),plam_center,{"id":id_,"click":click_state,"click_cnt":hands_click_dict[id_],
                                                             "gesture_name":gesture_name,"gesture_count":gesture_count,"choose_pt":choose_pt})) # 局部21关键点坐标，全局bbox左上坐标，全局掌心坐标
            # 记录手势状态（gesture）计数器，用于手势稳定输出状态
            gesture_dict[id_] = (gesture_name, gesture_count)

        return hands_list