jobs in Shopee

Sepenuh Masa [AI] AIGC Distributed Training - Optimization Engineer (Pre-training) Jobs, in Shopee - Maukerja

[AI] AIGC Distributed Training - Optimization Engineer (Pre-training)

Undisclosed

Singapore

Kongsi
Simpan

Lokasi Kerja

  • Singapore

x2_onboarding.experience.fields.job_description.title

Tanggungjawab

Department Engineering and Technology
LevelExperienced (Individual Contributor)
LocationSingapore

The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don't limit ourselves on what we can or can't do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee's hyper-growing business scale has transformed most "innocent" problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.

Job Description:
About Us
Sea Group is establishing a brand-new, strategic AI department. This department is dedicated to exploring the transformative potential of generative AI in revolutionizing human connection, self-expression and communication diversity, and social interaction. We are building the next generation of AI-native applications and a comprehensive Model-as-a-Service (MaaS) product support system. Based on massive multi-country data, we are building a leading multilingual AI ecosystem from the ground up. We look forward to more outstanding talents joining us to build leading Southeast Asian multilingual models and explore innovative AI-native applications.
The AIGC team at Sea AI Department is dedicated to pushing the boundaries of visual synthesis. We aim to achieve industry leadership in high-fidelity portrait and video generation. This team focuses on fundamental research and the scaling of generative models to empower next-generation social and E-commerce platforms.

About the Job
  • Toolchain Development: Design and build distributed training toolchains to support ultra-large-scale AIGC model training.
  • System Optimization: Optimize distributed training performance across computation, communication, and storage layers.
  • Stability & Scalability: Analyze and resolve technical bottlenecks in the training process, specifically focusing on improving training stability and efficiency.
  • Frontier Research: Track and explore cutting-edge distributed training technologies, leading project planning and production-grade implementation.
Requirements:
  • Master’s degree or above in Computer Science or related fields; Bachelor can be considered with a strong industrial experience.
  • Minimum 2 years of relevant experience.
  • Distributed Expertise: Deep understanding of distributed training principles (Data/Pipeline/Tensor/Expert Parallelism) with proven hands-on experience.
  • Framework Proficiency: Expert in deep learning frameworks such as PyTorch, DeepSpeed, and Megatron-LM.
  • Low-level Knowledge: Familiar with GPU hardware architecture and CUDA programming; experience in CUDA kernel development/debugging and familiarity with NCCL and cuDNN.
  • AIGC Background: Understanding of AIGC pre-training methodologies, Transformer architectures, and Diffusion models (e.g., Stable Diffusion, Flux).
  • Core Competency: Strong problem-solving skills, innovative thinking, and excellent team collaboration/communication skills.

job_detail.scamJob.title

job_detail.scamJob.subs

job_detail.scamJob.learnMore