Capacity TPM

Cerebras Systems about 21 hours ago

Toronto, Ontario, Canada

Senior Level

Full-Time

About the role

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services.

This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. About The Role Cerebras serves billions of inference tokens per day to customers like Cognition, AlphaSense, Mistral, IFM, Block, and others, running on the world's largest AI accelerators. Capacity is the heartbeat of this business: every model deployment, every customer commit, every SLO breach lands on a finite set of wafers, GPUs, and datacenter racks. The Capacity TPM owns end-to-end capacity planning, allocation, and reporting for the Inference Service org. What You'll Own Capacity planning and forecasting. Build and maintain the 6 / 12 / 26-week rolling capacity model across every cluster. Forecast model replicas, system-hours, GPU racks, and headroom by customer and by model. Reconcile against actuals weekly. Allocation and cluster placement. Partner closely with the SRE team to run the weekly capacity review with Engineering and product team. Decide model placement and re-balancing: which customer tenants land where, which clusters absorb new launches, which freezes are in effect etc. Customer commits and SLA tracking. Translate customer contracts and Sales pipeline asks into capacity requirements. Maintain the source-of-truth doc. Capacity observability and reporting. Own the Grafana / Confluence dashboards that report fleet utilization, restart rates, available headroom, and capacity burn by customer. Run the weekly capacity report and the monthly capacity readout for the for Inference Service leadership Drive Critical Programs. Capacity is on the critical path for the 5 MW SWE-1.7 production launch (M4, 30 Sep 2026). Own the capacity workstreams that gate M1 through M4: the cluster-tetris work to free 5 MW in the prod DC, the GPU rack acquisition timeline, and the multi-replica readiness validation. What Success Looks Like 6 months: Single source of truth for capacity exists; weekly capacity review runs without drama; no customer SLA breach is attributed to capacity misallocation. 12 months: Forecast accuracy is within 10% at the 4-week horizon; hardware orders are placed at least 8 weeks before the demand they serve; Ensure 99.9% SLA with capacity buffer of 15% or more. Responsibilities Run the Monday capacity review (Inference Platform, Cluster Mgmt, Customer PMs, Hardware Procurement, Datacenter Ops). Update the capacity model after major events: new customer commits, hardware deliveries, postmortems, model launches. Translate sales asks into "yes by [date], no, or yes if we drop X" within 2 business days. Maintain Jira EPICs and Confluence pages that the broader org uses to plan against. Drive continuous improvement, stakeholder adoption of new capacity management platform. Drive new datacenter bringups, cluster upgrade and other related tasks in close partnership with deployment and AIOps team. Skills & Qualifications Must have 5+ years of TPM, technical program management, or product operations experience in cloud infrastructure, large-scale ML serving, or hyperscaler capacity planning. Comfort with the inference serving stack: model replicas, batching, prefill/decode, KV cache, GPU and accelerator scheduling. Strong data fluency: SQL, Grafana, basic Python or Flux to pull your own numbers without waiting for an analyst. Track record of running a recurring cross-functional ritual involving senior engineers and LT. Direct experience AI accelerator fleet operations such as Habana, TPU pods, Inferentia, Trainium. Familiarity with Kubernetes, HAProxy, InfluxDB, Loki, and the FastAPI-based control plane on AWS EKS. Hardware supply chain familiarity (NVIDIA NVL72, DGX delivery cycles, datacenter colo logistics). Why Join Cerebras People who are serious about software make their own hardware. At Cerebras, we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras: Build a breakthrough AI platform beyond the constraints of the GPU. Publish and open source their cutting-edge AI research. Work on one of the fastest AI supercomputers in the world. Enjoy job stability with startup vitality. Our simple, non-corporate work culture that respects individual beliefs. Find out more about what it's like to work at Cerebras here! Apply today and become part of the forefront of groundbreaking advancements in AI! Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them. This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

About Cerebras Systems

Computer Hardware Manufacturing

Website

Similar Jobs