We are partnering with a fast-growing AI infrastructure company building foundational technology powering the next generation of AI-native search and retrieval systems. The team is building large-scale infrastructure to crawl and index the web, train advanced embedding/retrieval models, and operate high-performance distributed systems at massive scale.
They are looking for experienced engineers who enjoy solving deep infrastructure and internet-scale engineering problems.
The Role
As a Web Crawler Engineer, you will help design and build internet-scale crawling infrastructure capable of processing hundreds of millions of webpages efficiently and reliably.
This role sits at the intersection of distributed systems, performance engineering, browser automation, and large-scale data infrastructure.
You will work on problems such as:
- Distributed web crawling at massive scale
- Intelligent crawl scheduling and prioritisation
- JavaScript-heavy and dynamic site rendering
- Anti-bot detection and evasion handling
- Crawl politeness, rate limiting, and domain-aware orchestration
- High-throughput data pipelines and infrastructure optimisation
Responsibilities
- Build and scale distributed web crawling systems processing 100M+ pages daily
- Design highly performant infrastructure for crawl orchestration and scheduling
- Improve handling of dynamic content and JavaScript-rendered websites
- Work with browser automation tooling and Chrome DevTools Protocol (CDP)
- Optimise crawling efficiency, reliability, and resource utilisation
- Develop systems for rate limiting, crawl politeness, and domain management
- Contribute to low-level performance optimisation across the stack
Requirements
- Strong experience building scalable backend or distributed systems
- Prior experience working on web crawlers, scraping infrastructure, search infrastructure, browser automation, or adjacent systems
- Experience with high-performance languages such as Rust, C++, or Go
- Familiarity with TypeScript, Playwright, Puppeteer, or browser automation tooling
- Understanding of modern web technologies and dynamic rendering
- Strong systems thinking and performance optimisation mindset
- Interest in AI infrastructure, search, and knowledge retrieval systems
Nice to Have
- Experience with Chrome DevTools Protocol (CDP)
- Experience handling anti-bot systems at scale
- Experience with distributed job orchestration systems
- Exposure to search engines, indexing systems, or retrieval infrastructure
- Kubernetes / cloud infrastructure experience