How to build a career in HPC

'The supercomputers that I bought and managed have been some of the most powerful ever built in the UK'

 How to build a career in HPC

 

Supercomputing is one of those industries that never stands still. Driven by a race to the top, everyone wants the most powerful and energy-efficient computer, with the smallest carbon footprint, broadest user base and best I/O. Nobody ever builds an average supercomputer.

For a computer manager, it is like being the mechanic in the pit lane at a Formula One race: you know that you’re working on the best equipment money can buy, and that means you have to be the best too.

The supercomputers that I bought and managed for research groups at Durham University, in particular for the Institute for Computational Cosmology over the past 14 years, have been some of the most powerful ever built in the UK.

My first system was capable of delivering a single Megaflop per CPU (core processor unit) – now, 28-years later, my current system is 35 million times more powerful and can deliver 20.8 Gigaflops on just 1 core (that’s 20,000 times faster per core). In between, I have built and managed more than ten supercomputers.  

Right now, I’m personally responsible for the specification, procurement, installation and maintenance (wielding a screwdriver when necessary) of Durham University’s most famous high performance computing (HPC) asset: COSMA.

Originally 64 workstations with a Myrinet interconnect and 128 cores, this system evolved first to 528 cores, then 800 cores, 3,000 cores and finally 9,856 cores with continued operation of both COSMA version 4 and COSMA 5 simultaneously.

Here are six things that have helped in my career.

1. A good education

This always helps when mapping out a career. I hold a Diplom-Physikerin (equivalent to having a research masters in physics) and I hold a PhD in theoretical high-energy physics. I have worked in four different research areas in physics, from particle physics to cosmology.

This understanding of science, along with my solid mathematical background and experience in research, has given me the domain knowledge to understand more completely my users and their codes and what they want and need from a HPC system.

2. Putting myself forward

My path into the supercomputing world was slightly unusual in that I didn’t train for or apply for a job managing a supercomputer. I worked for someone who was already using such a system and when it failed, I stepped forward to repair it.

That was the theme of my first few roles – stepping forward and getting my hands dirty and at the same time doing research into physics. Ultimately, putting myself forward has provided opportunities along the way.

3. Keeping ahead of the game

I learned very early on that you need to keep up with technical developments. Every single day I am now researching the best ways of delivering compute power to those who need it. I meet lots of tech-savvy people; stay in touch with hardware and software vendors like IBM, Lenovo, Atos-Bull, SGI, HP, Cray and others; and build strong, ongoing relationships with integrators like OCF.  Through these contacts I can stay ahead in the game.

4. Trying out what I have learned

This is the fastest way to improve – I am continuously learning on the job. Sometimes, I’m applying knowledge to COSMA that I only learned myself two weeks before. That’s the high-performance world – I can’t stand still.

5. Understanding the users

You must get to know and understand the users of your supercomputer. Day by day, I am working with researchers to understand their code, how to get it running more efficiently, why it doesn’t work, and why it stopped working when it was fine previously. I can do this because I have a science degree and a PhD and a passion for HPC.

6. Valuing people for their brain power

Don’t judge a book by its cover. There is always lots of debate in the industry around the role of women in IT. I think in this job – and any other for that matter – knowledge, passion, experience and attitude reign supreme.

 

Dr Lydia Hec is senior computer manager in Durham University’s Department of Physics, manager of the DiRAC Data Centric system and responsible for the COSMA HPC system

Comments (0)