Papers·3개월 전

DR-Venus-4B: Edge-scale deep research agent with 10K open data outperforms 9B models

DR-Venus-4B, a 4B-parameter deep research agent trained entirely on ~10K open data, surpasses prior agentic models under 9B parameters on multiple benchmarks and narrows the gap to 30B-class systems. The two-stage recipe combines agentic SFT with strict data cleaning and long-horizon trajectory resampling, followed by agentic RL using turn-level rewards based on information gain and format-aware regularization. Models, code, and recipes are released for reproducible research.

#deep-research
#small-language-models
#reinforcement-learning
#agentic-sft
#edge-deployment

inclusionAI

원문 보기 →

DR-Venus-4B: Edge-scale deep research agent with 10K open data outperforms 9B models

Comments