Introducing Direct Preference Optimization (DPO) Support on OpenPipe

Introducing DPO Support →

RLHF beta

Unlock reinforcement learning with human feedback for enterprise.