The Hidden Complexity of Multiturn Training I recently spent two weeks refactoring multiturn tokenization and masking for VeRL. While VeRL already had a functional implementation, what initially seemed like a straightforward refactor turned out to be surprisingly nuanced.