Lead Site Reliability Engineer (SRE)

Apply Now

Thought Machine, one of the UK's leading fintech companies, is undergoing a period of rapid expansion and is looking to hire a Lead Site Reliability Engineer.

Our mission is to cure one of the banking industry's primary problems: its reliance on outdated IT infrastructure. Nearly every bank is stuck on a legacy IT platform, which cripples their ability to innovate and give their customers the type of service they deserve.

Our solution to this is Vault: a complete retail banking platform that is capable of being configured easily to suit the needs of any bank. We have built Vault from the ground up as a cloud native, microservice API architecture platform. Thought Machine has a deep culture of engineering excellence, and we believe it is this which delivers a solution compelling enough to engender a seismic shift in the banking industry.

Thought Machine is looking for highly talented individuals to help grow the company and achieve our ambitious goal. We pride ourselves on having an excellent internal culture, where we strive hard to create the best possible working environment; a healthy mix of great technical work, fast pace, supportive atmosphere, and of course our irreverent sense of fun.

Thought Machine hires team members of excellent calibre in every role. While a lot will be asked of you, you will benefit greatly from working in a world class team, with colleagues who excel. Working at Thought Machine is fast paced and team oriented with an emphasis in delivering the highest quality work in every role.

DUTIES

Site Reliability Engineers at Thought Machine take responsibility for deploying our software into production and you will be leading this team. This is a hands on role, so as well as traditional DevOps roles, your focus will be on writing and maintaining software with the aim of automating the deployment processes and managing this team.

Your duties will include:

  • Supporting the engineering team in building highly fault-tolerant applications.
  • Developing automation tools to ensure our services can scale and are highly available. We always try to minimise our ops tasks by developing bespoke tools as required.
  • Day to day development support and monitoring of production server and network environments by developing and deploying logging and monitoring tools.
  • Developing applications to increase code quality throughout our codebase.
  • Supporting disaster recovery, backup, redundancy and capacity planning activities.


Requirements

  • Essential
    • Strong background in Linux/Unix administration, e.g. Ubuntu, Debian
    • A strong background in at least one of Go, Python or Java
    • Experience with AWS is essential
    • Experience or knowledge of container orchestration tools, e.g. Kubernetes
  • Desirable:
    • Experience with automation/configuration management, e.g. Puppet, Chef, Ansible
    • Management experience

Benefits

  • Competitive salary
  • Share options
  • Pension
  • Healthcare (including dental & optical)
  • Other perks like sports clubs, healthy (and sometimes not so healthy) snacks, tea and coffee
  • A talented & experienced team as your colleagues
  • An environment where you can learn and progress
  • Friday team wrap up with drinks and food
Apply Now