M. Hullin, R. Klein, T. Schultz, A. Yao (Eds.)
Data Driven Synthesis of Hand Grasps from 3-D Object Models
S. Majumder1H. Chen2and A.Yao1
1Institut für Informatik II, Computer Graphik, Universität Bonn, Germany
2RWTH Aachen University, Germany
Abstract
Modeling and predicting human hand grasping interactions is an active area of research in robotics, computer vision and computer graphics. We tackle the problem of predicting plausible hand grasps and the contact points given an input 3-D object model. Such a prediction task can be difficult due to the variations in the 3-D structure of daily use objects as well as the different ways that similar objects can be manipulated. In this work, we formulate grasp synthesis as a constrained optimization problem which takes into account the anthropomorphic and kinematic limitations of a human hand as well as the local and global geometric properties of the interacting object. We evaluate our proposed algorithm on twelve 3-D object models of daily use and demonstrate that our algorithm can successfully predict plausible hand grasps and contact points on the object.
1. Introduction
Given an object,grasp synthesisrefers to the problem of finding a plausible grasp configuration that satisfies a set of criteria rele- vant for interacting with the object. Modeling and predicting hu- man hand grasps is an active and popular area of research as it has applications in robotics [SDN08], computer vision and com- puter graphics [Liu09]. Existing grasp synthesis algorithms can be broadly divided into two categories :analytic [SEKB12] and data-driven[BMAK14]. Given an input object model, analytic ap- proaches determine the contact locations on the object and grasp- ing pose through kinematic and dynamic formulations. Analytic approaches are known to be computationally expensive as a cer- tain number of conditions have to be satisfied for a successful grasp [SEKB12]. Contrary to analytic approaches, the data-driven paradigm places more emphasis on learning models that capture the relationship between the object’s shape and features and the grasping pose by training on annotated examples. As 3-D data ac- quisition devices and modeling tools became more widely avail- able, research in data-driven direction gained more traction within the community [Shi96,BMAK14]. In this work, we also adopt a data-driven approach which models the hand-object interaction and automatically synthesizes 3-D hand grasps when presented with an object model (refer to Figure1).
We are motivated by the energy minimization approach of [KCGF14], which automatically predicts human pose and con- tact points when given the 3D structure of an object such as a bi- cycle or a fitness machine. The energy minimization incorporates local affordance features as well as global constraints such as sym- metry of the human body and human pose priors. We adopt a simi- lar approach for synthesizing realistic hand grasps given a 3-D ob- ject model. However, the model in [KCGF14] cannot be directly
Figure 1: Given a 3D object model as input, we predict a plausi- ble hand pose and contact points on the object surface.
applied to the grasp synthesis problem because unlike the human body, the human hand is not symmetric. Furthermore, grasp sta- bility is an important factor to consider when synthesizing hand grasps for object interaction,i.e.physically possible hand grasps are not always natural nor plausible in real life due to a lack of object stability.
We propose an energy-minimization approach for the task of 3-D grasp synthesis and summarize our contributions as follows.First, we relax the symmetry constraints of [KCGF14] by proposing a modified energy term that reflects the part-wise reflectional sym- metry of the human hand.Secondly, we propose a novel energy term which leads to the synthesis of stable grasps. Stability of syn- thesized hand grasps is a feature that is often found only inanalytic approaches but with our proposed energy term, we are able to incor- porate this desirable property into a data-driven paradigm.Third, to speed up the computation of the energy-minimization, we propose
c 2017 The Author(s)
Eurographics Proceedings c2017 The Eurographics Association.
tations for the hand contact points and the 3-D hand model.
2. Related Works
Existing grasp synthesis algorithms can be broadly divided into an- alytic and data-driven approaches. We give only a short overview and refer the reader to existing surveys [SEKB12,BMAK14] for more details. Instead, we primarily focus on approaches which cast the hand grasp synthesis as an energy minimization.
2.1. Analytic Approaches
Analytic approaches focus on the analysis of kinematics, stabil- ity and/or dynamic formulations. Several of these approaches aim to synthesize stable grasps [Liu00,DLW00,LLD04]. These ap- proaches are often dependent on an ideal background such as sim- plified contact models [Ngu88], Coulomb friction [HPK13] and rigid body modeling [SK16,MLSS94]. When applied to real world scenarios, synthesized grasps may be improper (anthropomorphi- cally not possible) [PT08] due to ambiguities and imperfections unaccounted for in the formulations.
2.2. Data-Driven Approaches
Data-driven or empirical approaches rely on learning from exam- ples and predict graspable regions based on object geometric fea- tures [Sax09,LLS15]. These examples can either be provided in the form of generated labeled training data, human demonstration or through trial-and-error. A standard data-driven approach samples grasp candidates given an object and then ranks them according to some metric [BMAK14]. The approach in [MCFdP04] learns a vision-based grasp system by repeating a large number of grasping actions on different objects. In [SDN08], a simple logistic regressor is learned based on large amounts of synthetic training data to pre- dict grasps without the need for satisfying any kinematic or stabil- ity constraints. More recently, there has been focus on the relation- ship between grasp prediction and object features [BK10,HCCJ10].
In comparison to analytic approaches, data-driven approaches pay more attention to the aspects of the object representation and per- ceptual processing. As a result, the data-driven approaches may generate grasps which are improper, as pointed out in [KEK09].
2.3. Grasp Synthesis as Energy Minimization
Several approaches, both analytic and data-driven, have cast grasp synthesis as an energy minimization problem [Lia,JGT11,CGA07, HWA∗12]. Jiaet al.in [JGT11] proposed a two-finger grasping ap- proach for deformable objects by minimizing the object’s potential energy under external squeezing forces. Ciocarlieet al.in [CGA07]
use to simulated annealing to minimize an energy term based on lo- cal geometric features such as distances between the contact points and object surface, and angular differences between surface nor- mals at the contact locations and the closest point on the object.
features of the object as well the stability of the object on appli- cation of a particular grasping posejointlyduring the energy min- imization. This allows us to synthesize physically possible hand grasps which ensures object stability during the interaction.
3. Approach
Our proposed approach proceeds in two stages : learning a hand- object interaction model and using this learned model to infer the grasping pose when presented with an input shape.
For learning, the input is a collection of 3-D shapes with man- ually annotated contact points and poses represented by the joint angles. Our goal is to learn an interaction model that is able to mea- sure the quality of a pose given an input object shape. The interac- tion model incorporates terms learned from examples to model the local geometry of contact points and the joint angles for hand inter- action poses, and it includes penalty terms for deviations from the part-wise reflectional symmetry of the human hand, intersections with the shape and penalizes unstable grasping poses.
For inference, the input is a novel shape, and the output is a set of joint angles and contact points parameterizing the most likely hand interaction pose. The key algorithm in this stage searches the com- binatorial space of hand poses to find the ones with lower energies (meaning higher compatibility) according to the interaction model.
First, possible contact points on the object are sampled; this con- strains the search space for possible hand poses. We then sample large number of poses from the learned joint angle distributions.
The distribution of the hand parts and the sample points are then aligned using a rigid transformation. For eachalignedpose-contact points pair, the exact value of the objective function is evaluated.
The pose with the lowest energy is selected as the final solution.
An overview of our approach is given in Figure2.
3.1. Kinematic Model of Hand Skeleton
Estimating an accurate kinematic model of the human hand is ren- dered difficult by its anatomical complexity. Consequently, simpli- fying assumptions are often made in analytic solutions to ease the implementation or speed up computations [BBD12]. The human hand has 27 degrees of freedom (DOFs) : 4 in each of the four fin- gers, 3 for extension and flexion, 1 for abduction and adduction; the thumb has 5 DOFs and remaining 6 DOFs for the rotation and the translation of the wrist [AD09,ES03].
We make the following simplifying assumptions on top of the 27 DOF model. First, in contrast to the standard model, we simplify the role of the thumb to behave like the any other finger. Also, it has one fewer joint and thus has 3 DOFs instead of 4 (the DOF for all other fingers). Secondly, for our experiments, we assume that the input object model is presented in an upright position. Thus, we remove the DOFs corresponding to rotation and are left with 3 DOFs (corresponding to translations in thexyzplane). In total, our kinematic hand model has 22 DOFs as shown in Figure3.
c 2017 The Author(s)
Figure 2:Grasp Synthesis Pipeline : (a) Given an input 3D shape, (b) we first classify the surface for possible contact points correspond- ing to each key part of our kinematic hand model, (c) find the probability distribution for each contact part by sampling hand poses from training examples, and (d) predict the grasping pose by minimizing energy terms corresponding to (b) and (c).
Figure 3: Hand skeleton model with22degrees of freedom. The circles (in yellow) indicate the key parts of our proposed hand model which make physical contact with the object.
The joint angles of the hand specify only the pose which it has to assume when interacting with objects. To fully determine the grasp, we also need to predict where the hand makes physical con- tact with the object of interest. We refer to the hand parts that estab- lish contact askey parts. For precise grasp predictions, all the finger tips, finger joints and points connecting the base of each finger and thumb to the root of our kinematic hand model (denoted asCin Fig- ure3), as well asCitself can be assigned as akey part. This totals 21 key parts -{Li=0→3,Ri=0→3,Mi=0→3,Ii=0→3,Ti=0→2,W,C}.
However, this imposes a heavy computational burden. For our work, we identify 7 locations on the human hand ascontact partsas shown in Figure3. Later in our experiments, we observed that hav- ing 7 contact parts instead of 21 leads to minimal loss of precision as observed during our experiments.
3.2. Modeling Hand-Object Interaction
In this work, we cast the grasp synthesis problem as an energy min- imization problem and adopt a framework similar to [KCGF14].
Based on the observation that local geometric features are often insufficient for predicting contact points, Kimet al.in [KCGF14]
proposed a framework that allows the incorporation of both local constraint (in the form of key part-local geometric feature com- patibility) and global constraints stemming from anthropomorphic limitations and properties of the human pose into a single energy minimization framework. Similar in concept, we want the inferred grasp to interact with the object in a believable manner,i.e.make
contact with the object in “graspable” areas. This is in turn depen- dent on the local geometric properties of the object. Besides, we want to ensure that the synthesized grasp does not intersect the ob- ject surface. Furthermore, we add the functional constraint of ‘do not drop the object’ in order to synthesize stable grasps. In our pro- posed interaction model, each individual energy terms addresses one of the aforementioned issues.
In the learning stage, we model the hand-object interaction for a class of shapes. Our goal is to build a model that can be used to evaluate the interaction between a shapeSand a hand grasp rep- resented by a rigid transformationT, joint angles ˆθ={θ1, ...,θn} wherenis the number of joints, key hand partsP(tip of each fin- ger, center of the palm, and the base of the four fingers), and contact point assignmentsm:P→S∪ {ground,unassigned}. Some hand parts may be unassigned and rest in free space :p→unassigned, or may be placed on the ground plane :p→ground.
Our proposed model searches over a space of all plausible hand grasps, and picks a grasp minimizing the following objective:
E(T,θ,m,S) =ˆ wdistEdist(T,θ,ˆ m,S) +wf eatEf eat(m,S) +wposeEpose(θ) +ˆ wstabEstab(T,m,θ,S)ˆ (1) +wisectEisect(T,θ,S)ˆ
Edist andEf eat are local energy terms assigned to the key parts;
Edist penalizes key parts that do not make physical contact with the object whileEf eatpenalizes contact assignments when the cor- responding key part is incompatible with the local surface geom- etry. The remaining energy terms define global pose constraints:
Eposepenalizes implausible poses,Estabpenalizes unstable grasp- ing poses, andEisectpenalizes surface intersections.
3.2.1. Contact Distance [KCGF14]
If a hand part is assigned to a surface point on the 3-D object, we want the hand part to establish physical contact with the object. To ensure this, we penalize large separations between the object and the assigned contact part. The energy term is given by
Edist=
∑
p∈P,mp6=unassigned
kTpθ−mpk2, (2) wherepθis the position of key hand partsp∈Pgiven joint angles θandmpdenotes the contact point assignments for each key partp.
Parts assigned to the ground are measured by separation in height.
Vp:S→[0,1]for each part p∈Pwhich estimates the probabil- ity that it will be placed on a point on a query surfaceS. The model relies on local geometric features to predict which regions are com- patible with which hand part: for instance, large flat/cylindrical surfaces are meant for the palms and small homogeneous surfaces (such as trigger or button) are meant for more assertive parts such as the thumb and index finger.
Using the iterative farthest-point algorithm, 1000·ApointsCSi= {c1,c2,·,cK}are sampled on each shapeSi, where A is the shape’s surface area in square centimeters. Geometric features such as lo- cal neighborhoods, local symmetry axes, curvature, shape diameter function, and a histogram of distances are computed at these points.
Next, for each body partpand training shapeSi, we can compute a normalized measureVpi which is 1 at the ground truth contact pointmipand decays to zero. We defineVpi(cj)at sample pointcj
as
Vpi(cj) =exp
−g(cj,mip)2 τ2
,
whereg(,)is the geodesic distance andτis a tuning parameter.τis chosen in a way such thatVpi(cj)is 0.4 at a geodesic distance of 2 cm.
For each hand key partp, we train a random regression forest with 30 trees to estimateVp. When predicting the pose, the regres- sion forest is used to predict feature compatibility at each candidate contact point assigned to a hand part. The overall compatibility is measured by the energy termEf eatgiven by,
Ef eat=
∑
p∈P
−logVp(mp) (3) For parts mapped to the ground plane or left unassigned, the fea- ture compatibility is estimated from training data statistics with Vp(ground) =Mground/M whereMground is the number of times partpwas placed on the ground or left unassigned. A lower bound of 0.1 is set to avoid infinite energies.
3.2.3. Pose Prior and Symmetry
The pose prior helps to distinguish between plausible (anthro- pomorphically possible) poses from implausible ones [KCGF14].
Similar to [HEKL∗13] we use a Gaussian Mixture Model (GMM) to learn a probabilistic encoding of finger joint angle distributions.
We use the same hand skeletal model in all examples. Each hand pose is represented by a 26 dimensional ˆθ- 22 degrees of freedom and 4 parameters for the location and rotation.
First, we use standardk-means clustering to group all input train- ing poses intoLclusters. In most cases, we setL=3. Then, for each clusterlk(wherek={1,2,· · ·,L}), we use a Gaussian with learned meanµlikand standard deviationσlik to represent the varia- tion of theθi- thei-th joint angle. Note that the distribution of each joint angle is modeled independently
(the remote control) or the index finger (the gun and the spray bot- tle), which makes their pose different than that of the middle, ring and pinkie fingers. As such, we relax the constraints of [KCGF14]
and incorporate a 3-finger symmetry in the pose prior energy term.
We set the joint angles of the ring and little finger to be symmet- ric with the corresponding joint angles on the middle finger. For each symmetric pair(θi,θsymi ), the deviation of the joint angles is represented with a Gaussian :|θi−θsymi | ∼ N(µsymi ,σsymi ), where a smallerσsymi indicates that the middle, ring and pinkie fingers are aligned in an symmetrical manner in a grasp.
The pose-prior energy term is now given by Epose=minl∈L
26
∑
iθi−µli
2
σli2 +( θi−θsymi
−µsymi )2
(σsymi )2 (4) The first term in the summation penalizes the deviations of the inferred joint angle and the joint angle distribution learned from the examples. It prefers poses which are similar to the ones observed during training. The second term in the summation penalizes in- ferred poses which violate the symmetrical behavior observed dur- ing training.
3.2.4. Stability
A grasped object is defined to be in equilibrium if the sum of all forces and the sum of all moments acting on it are equal to zero [Shi96]. However, an equilibrium grasp can both be stable or unsta- ble. A grasp is said to be stable when the grasped object is in equi- librium (no net forces and torque) and it should be possible to in- crease the grasping force’s magnitude to prevent any displacement due to an arbitrary applied force [VI12,Cut89].Force closed grasps are a subset of equilibrium grasps which have the important prop- erty of being stable [SEKB12]. Force closure is an important prop- erty in grasping and has an extensive literature [MLSS94,Ngu88].
In grasp synthesis, we want to generate grasps not only with plau- sible poses and contact points, but also ensure that objects of inter- action are stable. We introduce a novel energy term which ensures the predicted grasp results inforce closureby restricting the mo- tion of the object through the contact forces exerted by the hand.
For simplicity, we assume that all contacts between the fingertips and the objects are point contacts which can only exert a normal force through the point of contact and a frictional force along the surface in a direction perpendicular to that of the normal force.
In the simplest scenario, we assume that 2 contact points are required to make an object stable (Figure4(a)). Furthermore, we assume that the forces exerted through the contact points are equal in magnitude and they are applied at points diametrically opposite to each other. The frictional force is given byFf=µ(F1+F2)where µis the coefficient of friction of the surface. If the magnitude ofFf is bigger than the force exerted by gravity, then the object remains stable in thez-axis. Furthermore, ifF1andF2are equal and opposite in direction (180◦between them), then they cancel each other out in thexy-axis.
c 2017 The Author(s)
(a) (b)
Figure 4: (a) Normal forces F1and F2of equal magnitude ap- plied at diametrically opposite points into the surface cancel each other out. (b) As the angle between the normal forces F1and F2
decrease, the y-components increase making the object unstable.
Now consider an angle between the force vectors smaller than 180◦(Figure4(b)). There are still two forces in operation:F1and F2.F1xandF1yare the components inxandydirections of forceF1. Likewise,F2xandF2yforF2. Even ifF1xandF2xare in equilibrium in xdirection, the object is not in mechanical equilibrium in the direc- tion ofybecause of the additive nature ofF1yandF2y. Furthermore, as the angle betweenF1andF2gets smaller than 180 degrees, the y-components increase, making the object more unstable.
Based on these observations, we propose an energy termEstab which favors hand parts to be assigned to sampled contact points which are 180◦apart. Letp,q∈Pdenote thep-th andq-th key part respectively. Also, letap=Tpθindicate the position of the key part pgiven joint angleθandbp=mpdenote the contact point for hand partp. The force vector for key part pcan now be denoted by the vector−−→
apbpand likewise−−→
aqbqfor key partq.
The energy termEstabis then given by Estab=
∑
p,q∈P,p6=q
−−→apbp
−−→aqbq
|−−→
apbp||−−→
aqbq|. (5) During interaction with objects of daily use, we often assume a grasping pose where a pair or more of fingers are placed at large angles with respect to each other in order to impart stability. We can classify 7 key hand parts into 2 groups. One includes the tips of index, middle, ring and little fingers. The tip of the thumb, the center of palm and the center of the forward half palm form the other group. In common conditions, contact points which gener- ate the forces with big angle are respectively from those 2 groups.
For example, the tip of thumb and the tip of the index finger when interacting with disk shaped objects [VI12].
3.2.5. Intersection [KCGF14]
The intersection energy term helps us to avoid impossible grasps where the hand skeleton intersects the object. We assume the hand is represented as a skeleton, with linear bonesB=b1,b2, ...,bK connecting the joints joints. For each linkbi, we check forIS(bi)- the intersection with the shapeS. Intersections within a small dis- tance of the shape and the assigned contact part are ignored. Higher penalties are applied when the bone intersects the surface orthogo- nally. The intersection energy is given as the sum of maximal per- link penalties:
Eisect=
∑
bi∈B
maxq∈Is(bi)|normal(q)·direction(bi)|. (6)
3.3. Inferring the Grasping Pose
During inference, the key challenge is to efficiently sample the combinatorial search space spanned by the hand pose and the con- tact points. Instead of jointly minimizing over this search space, we observe that it is possible to sample high-probability contact as- signmentsmand high-probability poses ˆθindependently, since they contribute to two separate termsEf eat(m)andEpose(θ)ˆ respectively.
3.3.1. Sampling contact points
The contact pointsmpfor each body partp∈Pare sampled inde- pendently, by picking candidate points on the shape whose compat- ibility energyEf eat(m,S)with pis lower than the cost of leaving them unassigned to any contact point.
3.3.2. Sampling plausible poses
We sample plausible hand poses with low energyEposeby directly sampling the joint angle Gaussian distributions from the pose prior.
In our experiments we sample 5,000 poses in a fraction of a sec- ond. The space is discretized into grids of 1cm3voxels. Each voxel stores a portion of the pose prior energy of the sampled pose cor- responding to the key part lying in this voxel.Epose can be com- puted by adding over the partial energy in the voxels containing the individual hand key parts. Joints contributing to multiple par- tial energies have their contributions averaged over the overlapping paths [KCGF14]. Note that discretization introduces some approx- imation error at the cost of reduced complexity.
3.3.3. Pose-Contact Point Alignment
Next, for every sampled contact pointmpfor the corresponding part p∈P, 32 rotations are considered around the up axis in an attempt to align the part distribution grids with respect to the surface. The anchor and the rotation define the rigid transformationT, which aligns part distribution grids to the surface.
Given the aligned grid, we estimate a lower bound on the feature and pose energy terms, as well as the corresponding pose, by greed- ily assigning body parts to contact points. Each successive assign- mentmip)is chosen to be the one that least increasesEf eat+Epose. The 3 finger symmetry term is measured with respect to the pre- viously assignedi−1 points, and the pose prior is bounded from below by the entry in the aligned voxel containing the assigned contact point.
Finally, in order to infer the best pose, we need to compute the full energy function, which requires knowledge of the exact joint angles ˆθ. All candidate poses which were sampled previously are sorted in order of increasing estimated lower bound on energy, and for each pose weEdist+Eposeis minimized. Following [KCGF14], θˆ is solved for iteratively until the energyEdist+Epose stops de- creasing. Given ˆθ, we solve forEstab+Eisectand rank the predicted poses according to the lowest values of the energy function.
4. Dataset
For evaluating our proposed approach, we need a hand-object in- teraction dataset which contains detailed hand annotation consist- ing of 3-D joint locations and deformation and the 3-D object
to construct the rigid hand skeleton model and also the 3-D model of the object. Even though, segmentation masks for the objects are provided, it is still not suitable for feature analysis because of the lack of precision. Most importantly, none of the mentioned datasets contain contact points annotations. Consequently, we recorded and release a new dataset to validate our proposed approach.
4.1. Grasp Taxonomy
We studied the detailed grasp taxonomy of [LFNP14], where typi- cal human hand grasps are classified into 73 different grasps based on different object geometries and hand shapes. We select a sub- set of 6 highly distinct grasp types for our experiments. For each grasp type, we annotate the contact points and joint angles for 2−4 different object categories. In total, we annotate the contact points for 111 object models spanning over 12 different object categories - ‘bottle’, ‘mug’, ‘knife’, ‘sword’, ‘phone’, ‘cube’, ‘bulb’, ‘fruit’,
‘gun’, ‘pen/pencil’, ‘spoon/fork’, ‘coin/chess pieces’. Sample an- notations for each grasp type from our dataset are shown below in Figure5. Note that the grasp types are named based on the geome- try of the object with which they interact. The annotated 3-D grasps sizes are chosen to approximate real hand sizes and likewise for the 3-D object shapes.
Figure 5: Example of Dataset
5. Experiments
In this section, we validate our proposed approach on the afore- mentioned dataset. We perform experiments to select appropriate weights for each energy term and to assess the correctness of the predicted grasp. We also demonstrate the improvements in grasp prediction with the introduction of our newly proposed energy terms. Finally, we show that having a simplified hand model of 7 key parts finstead of 21 leads to minimal loss in accuracy.
We ran a leave-one-out experiment for each grasp type, i.e for each grasp, we train on all models except one and predict the pose for the omitted model. In order to quantitatively evaluate the cor- rectness of the poses predicted by our algorithm, we measure the
5.1. Weights of Energy Terms
First, we want to select appropriate weights for each of the en- ergy terms. We run several experiments by varying the weights for each of the individual energy terms while keeping the weights for the other terms fixed. We observed that the feature compatibil- ity and intersection energy terms,Ef eat andEisect, have the most impact on the quality of the synthesized grasp as shown in Fig- ure6. Based on the experimental evaluations on our dataset, we set wdist=1000,wf eat=10,wisect=0.3,wpose=10 andwstab=500.
(a)
(b)
Figure 6: Variation of synthesized grasp quality for different val- ues of (a) wf eatand (b) wisect.
5.2. Correctness of Prediction
0 5 10 15 20 25
0 20 40 60 80 100
Error(millimeters)
%success Tip Pinch
Trigger Press Prismatic 4 Finger
Precision Disk Large Diameter Small Diameter Mixed Classes
Figure 7: Prediction accuracy on single versus mixed classes.
From Figure 7, we observe that when the distance threshold reaches 10 millimeters, we achieve a correctness higher than 50%
for all 6 grasp types. Except for the large-diameter grasp type, we reach more than 60% and even 80% correctness for certain grasps.
Prediction for thetip pinchgrasp has the best performance while the most difficult grasp to predict is for large diameterobjects.
Performance on the other four grasp types are similar. We spec- ulate two possible reasons for the variation in performance. First,
c 2017 The Author(s)
Figure 8: (a) There are many more candidate contact points on objects with larger surface areas than objects with smaller surface areas. (b) Candidate contact points on homogeneous surfaces tend to be similar with respect to one another in terms of local geometrical features.
the graspable area of large diameter objects (bottles and mugs) are comparatively larger than other object types which results in many candidate contact points, as opposed to smaller objects like the coin (tip pinch) which will have fewer candidate contact points.
Secondly, majority of the candidate contact points for grasping a cylinder are on the curved side surface, where geometric features are similar. In comparison, an object like the gun has several dis- tinct geometric features which are unique to specific hand parts.
Consequently, estimating grasping pose for the category ‘tip pinch’
results in the best performance whereas the accuracy drops signifi- cantly (∼20% for the 0-10 mm threshold) while estimating the pose for ‘large diameter’.
Figure8(b) shows candidate contact points for the palm center, the tip of the thumb and the tip of index finger. For a gun, contact points for different hand parts are easily recognized and clearly lo- cated in 3 parts, whereas for a bottle, it is difficult to distinguish the hand parts, causing interference on the prediction.
We find that the correctness of mixed class (leave-one-out over all grasp types combined) is still higher than 60% when the thresh- old is 10 millimeters. The mixed class is better than the single class of large diameter but worse than the other 5 single classes, leav- ing us to speculate that learning the interaction model on the mixed classes have an interfering and adversarial effect on each other.
5.3. Modified Energy Terms
We compare the accuracy of the synthesized hand grasps with and without our proposed stability term and modified symmetry in the pose prior using the mixed classes. We plot the comparison in Fig- ure9, demonstrating that there is an improvement of∼5% and
∼10% on an average in the prediction with the addition of the sta- bility and the modified symmetry term respectively. The qualitative improvements in the synthesized grasp from having the new energy term for stability is shown with an example (Figure10).
5.4. Simplified Kinematic Hand Model
In our prediction pipeline, the total number of key part plays an important role on the precision of the synthesized grasps. Having
0 5 10 15 20 25
0 20 40 60 80 100
Error(millimeters)
%success
Stability
withoutEstab withEstab
0 5 10 15 20 25
0 20 40 60 80 100
Error(millimeters)
%success
Symmetry
with symmetry without symmetry
Figure 9: Improved grasp synthesis accuracy due to the addition of the new energy term for stability and the modified energy term for symmetry.
(a) (b)
Figure 10: (a) WithoutEstab(b) WithEstab. The addition of the new energy term leads to more realistic grasp synthesis.
more key parts lead to improved prediction but at the cost of in- creased computational complexity. We compare two different mod- els - one with 7 key parts and the other model which considers all the joints and the finger tips as contact points - 21 in total (Fig- ure3). We use leave-one-out method to test the system on a our proposed dataset.
0 5 10 15 20 25
0 20 40 60 80 100
Error(millimeters)
%success
7 key parts 21 key parts
Figure 11: Prediction accuracy for7key parts vs.21key parts.
As can be seen from Figure11, for lower distance thresholds (0- 10 mm) the increment in precision is minimal∼2-5%. On the other hand, average grasp synthesis estimation for 7 key-parts varies from
∼3s for small objects such as coin/chess pieces to∼550–600s for objects with large surface areas and homogeneous features such as mug or bottle. The average estimation time rises by a factor of 25 when using the 21 key-part hand model. Thus throughout our ex- periments, we reported results for the 7 key-part model as it allows to keep the estimation time tractable.
bility under interaction. We evaluate our proposed approach on a newly proposed dataset with 6 grasp types containing 111 anno- tated object models spread over 12 object categories. Our experi- ments show that our approach is able to synthesize grasps where 60%−80% of the hand parts are correctly placed within a distance of 10 millimeters. Upon correctness and runtime analysis, we no- ticed that prediction accuracy is directly dependent on the scale of the object.
As future work, we would like to create our own large scale dataset, complete with grasp (parameterised as joint angles) and contact point annotations, for objects of daily use. We would also like to explore theform closureandforce closureproperties of hand grasps in detail.
References
[AD09] AGURA. M., DALLEYA. F.:Grant’s atlas of anatomy. Lip- pincott Williams & Wilkins, 2009.2
[AWK15] A. WETZLERR. S., KIMMELR.: Rule of thumb: Deep dero- tation for improved fingertip detection. InProceedings of the British Machine Vision Conference (BMVC)(2015).6
[BBD12] BULLOCKI. M., BORRÀSJ., DOLLARA. M.: Assessing as- sumptions in kinematic hand models: a review. InBiomedical Robotics and Biomechatronics (BioRob), 2012 4th IEEE RAS & EMBS Interna- tional Conference on(2012).2
[BK10] BOHGJ., KRAGICD.: Learning grasping points with shape context.Robotics and Autonomous Systems(2010).2
[BMAK14] BOHGJ., MORALESA., ASFOURT., KRAGICD.: Data- driven grasp synthesis – a survey.IEEE Transactions on Robotics(2014).
1,2
[CGA07] CIOCARLIEM., GOLDFEDERC., ALLENP.: Dimensional- ity reduction for hand-independent dexterous robotic grasping. InIn- telligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on(2007).2
[Cut89] CUTKOSKYM. R.: On grasp choice, grasp models, and the design of hands for manufacturing tasks.IEEE Transactions on robotics and automation(1989).4
[DLW00] DING D., LIUY.-H., WANG S.: Computing 3-d optimal form-closure grasps. InRobotics and Automation, 2000. Proceedings.
ICRA’00. IEEE International Conference on(2000).2
[ES03] ELKOURAG., SINGHK.: Handrix: animating the human hand.
InProceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation(2003).2
[FRE∗13] FEIX T., ROMERO J., EK C. H., SCHMIEDMAYER H., KRAGICD.: A Metric for Comparing the Anthropomorphic Motion Capability of Artificial Hands. Robotics, IEEE Transactions on(2013).
6
[HCCJ10] HSIAO K., CHITTA S., CIOCARLIE M., JONES E. G.:
Contact-reactive grasping of objects with partial shape information. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on(2010).2
[HEKL∗13] HUANGB., EL-KHOURYS., LIM., BRYSONJ. J., BIL- LARDA.: Learning a real time grasping strategy. InRobotics and Au- tomation (ICRA), 2013 IEEE International Conference on(2013).4
Automation (ICRA), 2012 IEEE International Conference on(2012).2 [JGT11] JIAY.-B., GUOF., TIANJ.: On two-finger grasping of de-
formable planar objects. InRobotics and Automation (ICRA), 2011 IEEE International Conference on(2011).2
[KCGF14] KIM V. G., CHAUDHURIS., GUIBASL., FUNKHOUSER T.: Shape2pose: Human-centric shape analysis. ACM Transactions on Graphics (TOG)(2014).1,3,4,5
[KEK09] KAWAGUCHI K., ENDO Y., KANAI S.: Database-driven grasp synthesis and ergonomic assessment for handheld product design.
Digital Human Modeling(2009).2
[LFNP14] LIUJ., FENGF., NAKAMURAY. C., POLLARDN. S.: A taxonomy ofeveryday grasps in action.Humanoid Robots (Humanoids), 201414th IEEE-RAS International Conference on(2014).4,6 [Lia] LIAROKAPISM. V.: Directions, methods and metrics for mapping
human to robot motion with functional anthropomorphism: A review.2 [Liu00] LIUY.-H.: Computing n-finger form-closure grasps on polyg-
onal objects.The International journal of robotics research(2000).2 [Liu09] LIUC. K.: Dextrous manipulation from a grasping pose. In
ACM Transactions on Graphics (TOG)(2009).1
[LLD04] LIUY.-H., LAMM.-L., DINGD.: A complete and efficient algorithm for searching 3-d form-closure grasps in the discrete domain.
IEEE Transactions on Robotics(2004).2
[LLS15] LENZI., LEEH., SAXENAA.: Deep learning for detecting robotic grasps. The International Journal of Robotics Research(2015).
2
[MCFdP04] MORALES A., CHINELLATOE., FAGGA.,DELPOBIL A. P.: Using experience for assessing grasp reliability. International Journal of Humanoid Robotics(2004).2
[MLSS94] MURRAYR. M., LIZ., SASTRYS. S., SASTRYS. S.: A mathematical introduction to robotic manipulation. 1994.2,4 [Ngu88] NGUYENV.-D.: Constructing force-closure grasps.The Inter-
national Journal of Robotics Research(1988).2,4
[PT08] PRATTICHIZZOD., TRINKLEJ. C.: Grasping. InHandbook of Robotics. 2008.2
[Sax09] SAXENAA.:Monocular depth perception and robotic grasping of novel objects. Tech. rep., 2009.2
[SDN08] SAXENAA., DRIEMEYERJ., NGA. Y.: Robotic grasping of novel objects using vision. The International Journal of Robotics Research(2008).1,2
[SEKB12] SAHBANIA., EL-KHOURYS., BIDAUDP.: An overview of 3d object grasp synthesis algorithms.Robotics and Autonomous Systems (2012).1,2,4
[Shi96] SHIMOGAK. B.: Robot grasp synthesis algorithms: A survey.
The International Journal of Robotics Research(1996).1,4
[SK16] SICILIANOB., KHATIBO.: Springer handbook of robotics.
2016.2
[TSLP14] TOMPSONJ., STEINM., LECUNY., PERLINK.: Real-time continuous pose recovery of human hands using convolutional networks.
ACM Transactions on Graphics(2014).6
[VI12] VENKATARAMANS. T., IBERALLT.: Dextrous robot hands.
2012.4,5
[XC13] XUC., CHENGL.: Efficient hand pose estimation from a single depth image. InICCV(2013).6
c 2017 The Author(s)