Senior Member of Technical Staff, Multimodal AI
💰 $80,000 – $130,000/yr
Job Description
About Cohere
Cohere's mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises building AI systems that power content generation, semantic search, RAG, and agents. Our diverse team of researchers, engineers, and designers is committed to advancing AI capabilities and driving widespread adoption of these transformative technologies.
Why This Role?
We believe in the power of multimodal AI to revolutionize how we interact with technology. Our engineering teams push the boundaries of what's possible, and we're seeking talented individuals to join us on this exciting journey. With an exceptional ratio of compute resources to engineers, you'll have an ideal environment to explore, innovate, and shape the future of AI.
In July 2025, Cohere's Multimodal team introduced Command A Vision, our new flagship vision-language model that consistently outperforms major models like Llama 4 Maverick, Mistral Medium/Pixtral Large, and GPT-4.1. Our model achieves an 83.1% average benchmark (73.5% MathVista, 90.9% ChartQA) and runs efficiently on just 2 GPUs with 112B parameters. The open weights are available on HuggingFace.
Key Responsibilities
- Design and develop cutting-edge multimodal AI systems, integrating various modalities including text, speech, and vision
- Conduct research and experiments on advanced compute infrastructure, exploring novel ideas in multimodal representation learning and transfer learning
- Collaborate closely with world-class teams, learning from and contributing to their expertise in cutting-edge AI research
- Contribute to increasing model capabilities and driving customer value
Ideal Candidate Profile
You possess exceptional software engineering skills with a proven track record of building robust, scalable systems. You have:
- Strong command of Python and deep experience with popular deep learning frameworks like JAX, PyTorch, and TensorFlow, with understanding of their multimodal capabilities
- Knowledge of distributed training strategies and experience optimizing large-scale models
- Strong fundamentals in machine learning, deep learning, and computer vision or NLP
- Ability to work hard, move fast, and prioritize what matters most for customers
- Passion for advancing the frontiers of AI and contributing to a mission-driven team
We value diverse perspectives and believe that the best innovation comes from teams with varied backgrounds and experiences. If you're excited about building the next generation of multimodal AI systems and want to work alongside some of the best researchers and engineers in the field, we'd love to hear from you.
💰 Compensation not publicly listed. Market estimate for similar roles: from $80K, varying by experience and location.