JobsRemoteList
hcompany logo

Member of Technical Staff (Inference)

hcompany·April 10, 2026·0 views
🌍 Remote · WorldwideFull-time

💰 $80,000 – $130,000/yr

Job Description

About H

H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential. H is hiring the world's best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute.

About the Team

The Inference team develops and enhances the inference stack for serving H-models that power our agent technology. The team focuses on optimizing hardware utilization to reach high throughput, low latency, and cost efficiency in order to deliver a seamless user experience. As a Member of Technical Staff, you will work alongside world-class engineers and researchers to build the infrastructure that powers next-generation AI agents.

Key Responsibilities

  • Develop scalable, low-latency, and cost-effective inference pipelines
  • Optimize model performance including memory usage, throughput, and latency using advanced techniques like distributed computing, model compression, quantization, and caching mechanisms
  • Develop specialized GPU kernels for performance-critical tasks such as attention mechanisms and matrix multiplications
  • Collaborate with H research teams on model architectures to enhance efficiency during inference
  • Review state-of-the-art papers and research to improve memory usage, throughput, and latency (Flash Attention, Paged Attention, Continuous Batching, etc.)
  • Prioritize and implement cutting-edge inference techniques into production systems

Required Qualifications

Technical Skills:

  • MS or PhD in Computer Science, Machine Learning, or related fields
  • Proficient in at least one of the following programming languages: Python, Rust, or C/C++
  • Hands-on experience in GPU programming such as CUDA, OpenAI Triton, or Metal
  • Demonstrated experience with model compression and quantization techniques

Soft Skills:

  • Collaborative mindset, thriving in dynamic, multidisciplinary teams
  • Strong communication and presentation skills
  • Eager to explore and tackle new technical challenges

Bonus Qualifications

  • Experience with LLM serving frameworks such as vLLM, TensorRT-LLM, SGLang, or llama.cpp
  • Experience with CUDA kernel programming and NCCL
  • Experience with deep learning inference frameworks (PyTorch/ExecuTorch, ONNX Runtime, GGML, etc.)

Location: Remote, Worldwide

💰 Compensation not publicly listed. Market estimate for similar roles: from $80K$130K USD annually, varying by experience and location.