Discover the Future of AI with Jujutsu!
At TabbyML, we’re excited to delve into the fascinating world of Jujutsu (jj), a unique version control workflow. With many on our team exploring its potential, we decided to kickstart research on how AI coding agents perform in this new environment.
Key Highlights:
- Innovative Evaluation: We built a semi-automated pipeline that leveraged AI to create a dataset of 63 evaluation tasks, complete with instructions and tests.
- Top Performers: The standout was Claude 4.6 Sonnet, achieving an impressive 92% success rate in parsing Jujutsu’s novel CLI rules.
- Speed vs. Accuracy: While GPT-5.4 showed remarkable speed at 81%, Gemini-3.1-pro excelled in accuracy, revealing critical trade-offs in performance.
Join the conversation! If you have specific edge cases for Jujutsu, consider contributing to our evolving dataset. Share your thoughts and let’s shape the AI landscape together! 💡🤖 #ArtificialIntelligence #TechInnovation #Jujutsu