Only AI Jobs


System Development Manager, Cloud compute/gpu/storage server team

ID: 7352

Type: Full-time

Category: Others

Company Name: Amazon Web Services, Inc.

Location: USA, WA, Seattle; USA, CA, Cupertino - Seattle - United States

Salary: 191,300.00 - 258,800.00 USD annually

Visit company vacancy
Job Description

We have two distinct System Development Manager positions open — one leading the storage server team and one leading the AI/ML (GPU-based) accelerator server team. Because the
core responsibilities, technical depth, and leadership expectations overlap significantly across both roles, we are accepting applications through this single posting. During the interview process, we will assess fit for both positions and align candidates to the team where their experience and interests are the strongest match.

We are looking for a forward-thinking technical leader to manage a diverse, cross-functional team of Hardware Design Engineers, Systems Development Engineers, and Technical Program Managers responsible for developing storage or accelerated (AI/ML/GPU) server platforms for AWS.

This is not a role for someone who manages from a distance. You will set the technical vision and architectural direction for next-generation server platforms — making bold bets on where storage or accelerated compute infrastructure needs to go — and then build and lead the team that delivers it. Your success is measured not just by launching hardware, but by driving fast instance adoption because you built the right thing for the customer.

You will own the full lifecycle — design, build, test, deploy to the data center, launch, and fleet health beyond launch. You will lead a team of architects
defining what we build next, an NPI team delivering it through build and test to the data center, and an operations-focused engineering team ensuring it runs reliably at scale long after launch. You will connect these functions into a single, cohesive organization that moves fast and delivers high-quality server platforms that customers want to adopt.

You will work across organizational boundaries — with other AWS service teams to deeply understand customer workloads and translate
that understanding into hardware architecture decisions. You will lead relationships with ODMs and design partners to develop and manufacture your products at scale. When complex technical problems arise — across hardware, firmware, software, thermal, power, or signal integrity — you will have the technical depth to engage meaningfully and the judgment to drive the right trade-offs.


Key job responsibilities
Vision & Architecture
- Set the technical vision and multi-generational roadmap for storage or accelerator (AI/ML/GPU-based) server platforms
- Make architectural bets that differentiate AWS — anticipating customer needs and industry shifts before they become obvious
- Manage a team of hardware architects in defining server platform architectures that optimize for performance, reliability, cost, and speed of customer adoption
- Translate deep understanding of customer workloads (storage, AI/ML training, inference) into hardware design decisions
- Influence the broader AWS hardware strategy through data, conviction, and results

Design, Build & Test
- Own server platform development from architecture through detailed design, prototype, build, and qualification
- Manage a team of engineers responsible for design, build and launch of systems
- Lead ODM/JDM and design partner relationships, ensuring our requirements for performance, quality, testability, and diagnostics are met
- Drive design verification, system validation, and qualification — ensuring platforms meet reliability, performance, and cost targets before deployment
- Ensure systems are designed for operational excellence from day one — testability, diagnosability, and serviceability are built in, not bolted on

Deploy, Launch & Fleet Health
- Own deployment to the data center, launch readiness, and successful ramp into production
- Drive qualification and readiness milestones, removing technical and organizational blockers to get servers into the fleet
- Own fleet health beyond launch — your responsibility never ends. Monitor quality, reliability, and customer experience for the life of the platform
- Drive toward zero-touch operations — building automation infrastructure that detects, diagnoses, and remediates faults before customer impact
- Build predictive failure detection capabilities using telemetry, error trending, and log correlation
- Establish and track fleet health metrics (failure rates, MTTD, MTTR, first-time fix rate, predictive accuracy)
- Close the loop between field failures and design improvements in next-generation platforms

Team Leadership & Development
- Manage and grow a diverse team spanning hardware engineering, systems development, and technical program management
- Hire, develop, and retain top talent across multiple engineering disciplines
- Create an environment where engineers with fundamentally different expertise (hardware, firmware, software, program management) collaborate effectively and challenge each other
- Set clear goals, remove obstacles, and hold the team to high standards on delivery and quality
- Coach and develop senior technical leaders — help architects think bigger and help execution-focused engineers see the strategic picture

Cross-Organization Collaboration
- Partner with AWS service teams to ensure server platforms meet data path and control path requirements and drive fast adoption
- Work with supply chain, manufacturing, and datacenter operations teams to deliver at scale
- Influence peer teams and senior leadership on technical direction, investment priorities, and trade-offs
- Represent your team's work and roadmap to VP-level and above


About the team
This organization is responsible for designing, building, testing, launching and maintaining a fleet of AI/ML (GPU-based) servers and storage servers for Amazon's web services. Our engineers work with leading-edge technologies, solve challenging problems, influence the industry's roadmaps, and develop unique solutions that are ahead of the pack. We work in an environment that fosters innovation and creativity — we encourage and invest in new directions and ideas that serve our customers better.

The organization comprises Hardware Design Engineers, Systems Development Engineers, and Technical Program Managers, all with the common
goal of delivering the best storage and accelerator server fleet possible to our customers. We are located in Seattle and Cupertino, and we work with ODMs and Design Partners globally.

We own the full lifecycle of our server platforms: design, build, test, deploy to the data center, launch, and fleet health beyond launch. There is no hand-off — we are accountable from first architecture decision through every day the server runs in production.

Basic Qualifications

- 7+ years of relevant hands-on systems engineering and administrative work in networking, storage systems, operating systems experience
- Bachelor's degree in electrical engineering, mechanical engineering or a relevant engineering discipline
- 3+ years of direct management experience
- Experience in server development, e.g. compute, AI/ML, storage, edge servers.
- Hands-on experience in designing, developing and operationally supporting high volume enterprise servers.

Preferred Qualifications

- Knowledge of data center infrastructure design, operations, or delivery
- Experience leading new product introduction (NPI) teams
- Experience writing business strategy documents and plans
- Experience engaging and influencing senior executives
- Experience that includes strong analytical skills, attention to detail, and effective communication abilities, or experience working with customers with a passion for delivering exceptional service
- 5+ years of direct management experience
- Experience in working with CM/OEM/ODM vendors for design development and manufacturing.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.



USA, CA, Cupertino - 191,300.00 - 258,800.00 USD annually
USA, WA, Seattle - 166,300.00 - 225,000.00 USD annually

Company Information

Company Name: Amazon Web Services, Inc.

Company Website: https://aws.amazon.com

Company Address: 410 Terry Ave N, Seattle, WA 98109-5210, United States

Amazon Web Services, Inc. (AWS) is the cloud computing and infrastructure arm of Amazon.com, Inc., offering a broad and evolving portfolio of on-demand cloud services, platform services, and infrastructure products for organizations of all sizes. Founded to provide scalable, reliable, and cost-effective computing resources over the internet, AWS enables customers to deploy and run applications and services without the need to build and maintain physical datacenters. The company’s public materials describe it as a provider of on-demand cloud computing platforms and APIs to individuals, companies, and governments, supplying infrastructure and higher-level services that accelerate application development, data processing, storage, and global delivery. Core business activities: AWS’s primary business is the design, operation, and delivery of cloud-based computing resources and managed services. That includes offering virtualized compute capacity, object and block storage, database engines (managed relational and NoSQL), networking primitives, identity and access management, security and compliance tooling, analytics and big-data processing stacks, machine learning and AI services, developer and application deployment tools, serverless computing, container orchestration services, content delivery, and Internet of Things (IoT) connectivity. AWS also provides enterprise-focused offerings such as hybrid cloud solutions, migration services to assist organizations in moving on-premises workloads to the cloud, managed operations and support plans, professional services, and training and certification programs for IT professionals. Main products and services: AWS’s product set spans foundational infrastructure to highly managed, domain-specific offerings. Key foundational services include Amazon Elastic Compute Cloud (EC2) for virtual servers, Amazon Simple Storage Service (S3) for scalable object storage, and Amazon Virtual Private Cloud (VPC) for isolated networking. Managed database and data services include Amazon Relational Database Service (RDS), Amazon Aurora (a high-performance relational database), Amazon DynamoDB (a fully managed NoSQL database), Amazon Redshift (a petabyte-scale data warehouse), and Amazon ElastiCache (in-memory caching). For compute modernization, AWS provides AWS Lambda (serverless compute), Amazon Elastic Kubernetes Service (EKS), and Amazon Elastic Container Service (ECS). AWS’s advanced and specialized services include Amazon SageMaker for building, training, and deploying machine learning models; Amazon Rekognition for computer vision; Amazon Comprehend for natural language processing; AWS Glue and AWS Data Pipeline for ETL and data integration; and AWS IoT Core for connecting and managing Internet of Things devices. Application delivery and developer tooling include Amazon API Gateway, AWS CloudFormation for infrastructure as code, AWS CodePipeline and CodeBuild for CI/CD, and Amazon CloudFront for content delivery across a global edge network. The AWS Marketplace and AWS Partner Network (APN) provide channels for third-party software, consulting partners, and managed service providers to offer products and services that run on or integrate with AWS. Infrastructure, delivery model, and pricing: AWS operates a global infrastructure composed of multiple geographic Regions, each containing multiple Availability Zones—physically separate data center locations engineered for fault isolation and high availability. This global footprint supports data residency, low-latency delivery, and resilience for customers deploying distributed systems. AWS’s commercial model emphasizes flexible consumption and cost control: customers commonly choose pay-as-you-go billing for on-demand resources, with options for reserved capacity, savings plans, and spot instances to reduce costs for predictable or interruptible workloads. Support tiers and managed services are available at varying cost and service-level commitments. Security, compliance, and governance: AWS provides a suite of security and identity services—such as AWS Identity and Access Management (IAM), AWS Key Management Service (KMS), AWS CloudTrail, and AWS Config—to help customers secure environments, manage access, encrypt data, and demonstrate compliance. AWS documents participation in standard industry compliance frameworks and certifications, and publishes detailed security and compliance resources to help customers meet regulatory obligations. Customer segments and use cases: AWS serves a broad range of customers that include startups, established enterprises, public sector organizations, educational institutions, and independent software vendors. Common use cases include web and mobile application hosting, data analytics and warehousing, machine learning and AI workloads, backup and disaster recovery, IoT deployments, gaming infrastructure, and enterprise application modernization. AWS emphasizes scalability, elasticity, and rapid provisioning to support development velocity and business agility. Ecosystem, training, and partner network: AWS supports a large ecosystem of technology and consulting partners that build, certify, and deliver solutions on the platform. The company offers official training, certifications, and documentation to help developers, architects, and IT professionals gain proficiency on its services. AWS Marketplace and partner programs provide channels for third-party software procurement and professional services. Business model and positioning: AWS generates revenue principally through consumption-based fees for cloud services and through related professional services and support offerings. It competes in the global cloud infrastructure market with other major cloud providers by focusing on breadth of services, global infrastructure, developer tooling, partner ecosystem, and continuous release of new managed services. AWS positions itself as an enabler for digital transformation by reducing the capital and operational burden of running infrastructure, allowing customers to focus on application development and business innovation. For additional, up-to-date, and authoritative information on product details, global infrastructure, security programs, and service announcements, AWS’s official website and documentation pages provide comprehensive resources and customer-facing materials.
Visit company vacancy