Data Analogies Enable Efficient Cross-Embodiment Learning

Jonathan Yang, Chelsea Finn, Dorsa Sadigh

Stanford University

Data Analogies Enable Efficient Cross-Embodiment Learning

How should we collect data so that demonstrations from one robot directly help another? We study coverage and pairing strategies across embodiments and show that structured, trajectory-paired datasets unlock substantial transfer for generalist robot policies.

Overview

Generalist robot policies are now trained on increasingly large, heterogeneous cross-embodiment datasets. While scale clearly helps, it remains unclear what is actually being transferred when we mix data from many robots, morphologies, and viewpoints.

This work asks: what kinds of cross-embodiment data actually help a policy adapt to a new robot under a fixed budget of target demonstrations?

We introduce the notion of data analogies, structured correspondences that let demonstrations from one embodiment efficiently help another, and systematically study how data collection along three axes (viewpoint, morphology, appearance) affects this capability.

Data-Centric Method

We view data for cross-embodiment learning through two orthogonal design axes: coverage and pairing.

Coverage

Targeted: Select demonstrations that explicitly fill gaps relative to the target robot (e.g., missing camera poses, gripper types, or kinematic regimes).
Diverse: Collect broadly varied demonstrations without target-aware selection, emphasizing visual and embodiment breadth.

Pairing

Unpaired: Independent demonstrations linked only by task labels.
Task-Paired: Same task instance across robots (same objects and goals), weakly aligned.
Trajectory-Paired: Time-aligned executions with similar object-centric trajectories, via DTW in both simulation and the real world.

Coverage vs. Pairing. Simulation scenes illustrating how diversity and cross-robot alignment shape transfer.

Experiments

Core Questions

Under a fixed budget, which data collection strategy works best for each axis?
How does our compositional dataset compare to naive training on large open-source data?
How does performance scale as we increase the diversity of source data?
Do these trends hold on real robots?

We evaluate Pi0.5-style VLA policies on RoboCasa-based simulation benchmarks and tabletop tasks on Franka, WidowX, PiperX, and related platforms, under strict few-shot constraints on the target robot, to study efficient cross-embodiment learning.

Results

Coverage vs. Pairing

Diversity helps most for perceptual shifts (viewpoint and appearance), where broad variation regularizes the encoder and reduces overfitting to specific scenes or cameras. For morphology, however, diversity alone quickly saturates: targeted coverage and strong trajectory pairing are essential to bridge action-space differences.

Pairing is Important.

Improvements over Large, Open-Source Datasets

Training on large unpaired datasets like OXE provides strong baselines, but our method delivers large gains in both simulation and real-world evaluations.

Figure 2. Composed OXE+Translational data outperforms both narrow two-robot pools and large unpaired OXE training.

Figure 3. Scaling behavior as we increase the diversity of source embodiments, viewpoints, and scenes.

Evaluations

Example policy rollouts across different embodiments and viewpoints.

Real-Robot Transfer

We validate our findings on real robot platforms including Franka, WidowX, and PiperX in kitchen-like scenes. Structured coverage and trajectory pairing reliably improve success by 25-40 points over baselines.

Simulation robots. Diverse simulated embodiments and viewpoints.

Real robots. Physical setups with Franka, WidowX, PiperX.

Figure 4. Real-world transfer results across multiple robot pairs and tasks.

Paper

This website summarizes the main ideas, methods, and results from our paper on Data Analogies Enable Efficient Cross-Embodiment Learning. Please refer to the full paper PDF for complete details, ablations, and experimental tables.

Paper PDF arXiv

BibTeX

@misc{yang2026dataanalogiesenableefficient,
      title={Data Analogies Enable Efficient Cross-Embodiment Transfer},
      author={Jonathan Yang and Chelsea Finn and Dorsa Sadigh},
      year={2026},
      eprint={2603.06450},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.06450},
}