Software Engineer, Distributed Systems

fal·May 14, 2026·0 views

🌍 On-site · TurkeyFull-time

💰 $120,000 – $180,000/yrMarket estimate · not provided by the employer

Distributed Systems Python Rust GPU Orchestration System Design AI Infrastructure Performance Optimization

Job Description

About This Role

fal is seeking an experienced Software Engineer specializing in Distributed Systems to join our platform engineering team. You will be responsible for designing, building, and scaling our core infrastructure that powers large-scale AI workload orchestration and compute distribution. This is a high-impact position for a senior engineer who thrives on solving complex technical challenges in production environments handling massive traffic and data volumes.

As a Distributed Systems Engineer at fal, you are an experienced software engineer who excels at building large-scale computing platforms. You possess deep expertise in distributed systems architecture that manages high complexity, handles significant traffic and data at scale, and you understand how to achieve reliability and operational efficiency with minimal overhead. Your background demonstrates a proven ability to design systems that maintain performance under extreme load while remaining maintainable and observable.

Key Responsibilities

Build and maintain the core Python/Rust platform infrastructure including request routing, AI workload orchestration, intelligent scheduling algorithms, GPU autoscaling systems, large-scale distributed file storage, and enterprise-grade queueing systems
Produce forward-looking architectural designs for platform evolution as fal scales to handle 100x current traffic while maintaining low latency globally across all regions
Leverage artificial intelligence extensively to automate and optimize the complex aspects of building reliable distributed systems, reducing manual operational burden
Profile, analyze, and optimize low-level CPU and memory performance across the entire platform stack
Collaborate with cross-functional teams to define technical standards and drive architectural decisions that support long-term scalability
Contribute to system observability and monitoring infrastructure to ensure visibility into platform health and performance

Required Qualifications

5+ years of professional experience building and maintaining distributed compute and orchestration platforms using Python, Rust, or similar systems programming languages
Strong foundational understanding of distributed systems theory including consensus algorithms, scheduling theory, fault tolerance mechanisms, and capacity planning strategies
Deep knowledge of computational complexity analysis and memory allocation patterns in large-scale systems
Demonstrated track record of designing and shipping systems that perform reliably under real production workloads at significant scale
Hands-on experience building comprehensive observability solutions (logging, metrics, tracing) and using observability data to drive performance and reliability improvements
Excellent written and verbal communication skills with ability to influence technical decisions across multiple teams and stakeholder groups
Self-starter mentality with ability to execute quickly, take full ownership of projects, and continuously seek process and technical improvements

Nice-to-Have Qualifications

Production experience with AI/ML inference platforms, training infrastructure, or GPU workload management systems
Background in high-performance systems programming including async runtimes, zero-copy patterns, and memory-safe concurrent programming models
Experience designing and operating multi-tenant compute platforms with strong isolation and resource management
Deep understanding of networking fundamentals, TCP/IP performance characteristics, and load balancing strategies
Familiarity with GPU workload characteristics, scheduling constraints, and GPU cluster management systems

What fal Offers

Intellectually challenging and meaningful work on cutting-edge distributed systems infrastructure
Extensive learning and professional growth opportunities through exposure to complex technical problems and modern systems architecture
Regular team events and company offsites fostering collaboration and company culture
Opportunity to work with a talented team building the future of AI compute infrastructure

Location & Work Arrangement

This position is based in Turkey. Please note the specific location requirements and any visa sponsorship details by contacting fal directly through their careers page.

💰 Compensation not publicly listed. Market estimate for similar roles: from $120K, varying by experience and location.