Site Reliability Engineer

Sorry, this advert is now closed.
Things move quickly and we may know a team that's just about to hire your particular skills.
You can register now to hear about upcoming vacancies and, if we can offer you a headstart on an unadvertised position meanwhile, we'll be in touch.
Click here to view our live vacancies.

We're hiring customer focused SRE's and Systems Engineers/Developers to apply infrastructure support and site reliability engineering approaches to significant projects embracing emerging ML compute technology.

As a platform vendor and MLaaS provider, we offer you the opportunity to work across all sectors - including research organisations, universities, technology vendors and enterprises - encountering a diversity of ecosystems and best practices.

This team helps customers extend their data center and cloud provisioning ecosystems to incorporate our ML compute products, helps define and build pipelines for migration, refines production operations, and provides site reliability engineering expertise, automation and technical support throughout.

It's a mix of greenfield work, solution and product evolution, and technical collaboration. Becoming hands-on a subject matter expert, you'll empower others to develop new capabilities and accomplish things that were not previously possible, embracing emerging advances in machine intelligence.

Of particular interest are your skills applied to domains such as any of; site reliability engineering at scale; grid or cloud computing; HPC/scientific computing, OpenStack admin or development; data center orchestration; SDN/NFV; and/or developing Linux-based systems for novel IP-based protocols - or similar.

We're hiring an all-new team, including a lead engineer, and we'll be pleased to explore the possibilities with you.

A flavour of work within this team

Interfacing between customers, industry partners and our domain experts
Site reliability engineering for our MLaaS cloud platform
Defining and building effective infrastructure provisioning solutions
Designing and implementing compute workload migration pathways
Guiding on adapting and optimising software for new processors and systems
Designing, building and refining production pipelines and tooling
Optionally; contributing to aspects of our SDK product and virtual-IPU tools in Python and/or C++

We're looking for

Someone customer focused and solution oriented
A solid understanding of Computing, Maths or Engineering - accrued through formal education or equivalent applied practice
Linux configuration and management with shell scripting, Python or similar
Optionally; strong Python and/or C++ applied to Linux systems, infrastructure, or back-end development
Experience of configuring and managing hardware platforms, and infrastructure for clusters
Knowledge of Ethernet and IP networking standards
Production admin skills with two or more of; Kubernetes, Docker, Grid Engine, Slurm, OpenStack, public/private cloud etc.
Comfortable debugging across multi-layer solutions
Familiarity with modern CI/CD and orchestration methods
An aptitude for trouble-shooting and a pragmatic application of engineering rigour: from the basic symptoms through to analysis and resolution with code fixes, work-arounds, improved documentation, tutorials, and collaboration with other teams

You may also bring - or may optionally like to gain - skills around

Running novel protocols on IP fabrics
HPC or hardware acceleration technologies
Data center infrastructure, storage, network, security, virtualisation
Compilers and Linux kernel driver development, debugging and system configuration
Linux OS's and memory management

Salary and benefits

Compelling salary - talk with us about what you need
Stock options in a high growth potential start-up
Flexible and inclusive working environment - UK hours, work at the times that suit you
Discretionary relocation assistance
Optional four day week or part time working

Flexible amount of holiday + UK national/public holidays

10% CPD time in your calendar, with supporting budget - in addition to the L&D of your role
Matched personal pension | healthcare | life assurance | dental | health cash plan | income protection

About us

Our team is at the forefront of the artificial intelligence revolution, enabling innovators from research and all sectors to expand human potential with technology. From day one you'll be contributing to important and interesting projects, at the forefront of the advanced ML community worldwide. We offer a collaborative, supportive and inclusive environment, where you can learn and flourish on a team with a diversity of perspectives. We're an equal opportunity employer and want to build a work environment where everyone is happy, productive and respectful so they can do their best work. If you have a disability or additional need that requires accommodation, just let us know.

Please note, we are only considering candidates who have an established right to work in the UK.

Location: central Bristol, Cambridge or London (Euston) - with discretionary remote working once up to speed

Even if your CV isn't ready, please talk with Andrew at techfolk to find out more:

+44 (0)117 318 2447 | hello@techfolk.co.uk | @andrew_techfolk