Human pose estimation is in increasing demand across diverse applications, from avatar generation to human-robot interaction. However, the domains of these applications often diverge from standard human pose estimation datasets, leading to limited domain transfer. Particularly in multi-dataset training (MDT), there are often variations in skeleton types and limited comprehensive supervision across them.We propose a novel MDT framework, called PoseBH, that integrates poses beyond humans.Our method addresses keypoint heterogeneity and limited supervision through two primary techniques. First, we introduce nonparametric keypoint prototypes that learn on a unified embedding space, enabling seamless integration across arbitrary skeleton types and facilitating robust domain transfer. Second, we introduce a cross-modal self-supervision mechanism that aligns keypoint predictions with keypoint embedding prototypes, thus enhancing supervision without reliance on teacher-student models or additional augmentations.PoseBH demonstrates significant generalization improvements on whole-body and animal pose datasets (COCO-WholeBody, AP-10K, APT-36K), while maintaining the performance of the standard human pose benchmarks (COCO, MPII, AIC). Our learned keypoint embeddings also transfer well to hand shape (InterHand2.6M) and human shape (3DPW) domains.