Adventures in HPC

Friday, May 11, 2012

RE: Thomas Sterling - "I think we will never reach zettaflops"

While thinking about this blog post, I considered the title "another narrow-minded professor makes unfounded sensationalist claim about the future of computers."

Some might think that's a rather harsh way to look at things, but when I read the title of this HPCwire article, I couldn't help but think of the dozens of claims other big names in the tech industry have made over the years which have been proven wrong time and time again. One of the most famous blunders of this kind has to be the rumored quote by Bill Gates that "640k ought to be enough for anybody." The legitimacy of this quote is still up for debate, but the idea still stands: big tech people love to make these sensationalist claims about the long-term future of technology.

Now before anyone gets the idea that I'm trying to discredit Sterling here, I will point out he has some important qualifiers in this statement. The quote itself is really:

These words may be thrown back in my face, but I think we will never reach zettaflops, at least not by doing discrete floating point operations. We are reaching the anvil of the technology S-curve and will be approaching an asymptote of single program performance due to a combination of factors including atomic granularity at nanoscale.

I'm glad he added his reasoning here, because I do agree with his stance that doing things exactly the same way as we do them today will not get us to zettascale. In fact, the big irons as we see them today may not even be enough to get us into exascale on their own -- that's something we'll have to watch in the decade to come.

Sterling goes on to say:

Of course I anticipate something else will be devised that is beyond my imagination, perhaps something akin to quantum computing, metaphoric computing, or biological computing. But whatever it is, it won’t be what we’ve been doing for the last seven decades. That is another unique aspect of the exascale milestone and activity. For a number, I’m guessing about 64 exaflops to be the limit, depending on the amount of pain we are prepared to tolerate.

This is exactly why I think it's quite a claim to say we won't reach zettascale -- we don't know what's coming! Every day, there are advancements in alternate computing methods such as quantum computing and biological computing, and there's most likely other research going on that isn't being talked about yet because there's nothing worth noting. Ten years ago, 3TB hard drives seemed like an impossibility because we hadn't really seen perpendicular recording being used in hard drives.

Is reaching zettascale or exascale going to be easy? Not at all. There are a ton of milestones we're going to have to overcome in order to achieve these feats. As Sterling points out, even if we can achieve the hardware advancements we need, there's a need for more advanced software to take advantage of these future beasts. That being said, unless there's a time machine I haven't heard about, none of us know what's to come in the next few decades. For all I know there could be an advancement in using cats that will launch us into the exascale and beyond!

Monday, January 30, 2012

Dell's "helpful" new network device names

Traditionally on Linux hosts, your Ethernet devices are prefixed with eth, so your primary Ethernet port is eth0, followed by eth1, etc. Almost every Linux distro follows this (for wired LAN interfaces - wireless is a whole other bag). It makes things like scripting network configuration in your automated installers a lot easier, since you know the primary interface should almost always be named eth0 (there are some exceptions, but we deal with those as they come).

That is until Dell decided to change their naming scheme for RHEL 6 hosts.

Under the New World Order, Dell has "helpfully" decided to rename network interfaces emX for onboard devices, pX for PCI-slot cards. Now, I sort of understand their reason for doing this. There are some cases where network devices will be enumerated differently, particularly when using PCI-based network cards, and also when switching kernel versions. This can result in unwanted changes to your network order, where eth1 suddenly becomes eth0, or eth[0-3] disappear and become eth[4-7]. This problem can be mitigated by leveraging UDev's persistent network interface naming, which is precisely what Dell has done.

I think they went a step too far though, especially since emX is not a standard across server platforms. For Dell-only sites, this isn't a big issue, since all of your hardware will follow that paradigm. When your install scripts depend on interfaces being named eth0, however, things break horribly.

Fortunately you can disable this "feature" by specifying "biosdevname=0" as a kernel boot argument.

Saturday, January 28, 2012

The case of the disappearing node

When you work in an environment with thousands of computers, you're bound to come across strange problems that make you go "Hmm..." Parts fail, computers die, switches go bad. But they don't always die in the most obvious ways.

We got a report a month or two ago about a node in one of our 10 GigE clusters which apparently went offline for some reason. After a student went to look at the node, they found the node seemed fine, but it didn't have any network connectivity. They started down the normal path of troubleshooting: flip the cable, change the cable out, replace the NIC. Still nothing. So they decided to exchange cables with the node next to it. This is where things got really weird...

Normally when you switch cables with the adjacent node after you've already replaced the cable, you're looking to see whether the node or the switch port is bad. Again, switches go bad, it's nothing new. What you'd normally expect is for the problem to either stay on the original node (the node is bad) or the problem moves to the adjacent node (the port is bad). What you don't expect is for the problem to disappear. Both nodes now connect without problems. Move the cable back, and the problem reappears on the original node. Hmm...

Diving into low-level networking, you learn that every network interface has a clock source in the PHY layer. This seems pretty logical, as you have to establish time somehow. We want to make sure all of our interfaces use the same clock speed. Unfortunately, clock sources aren't perfect, and over time they may drift one way or another. We call this network jitter. Normally, it's within a certain range and can be corrected for in software. Sometimes, however, things start to drift in different directions. One clock may slow while the other speeds up.

We haven't yet resolved the problem of the disappearing node. An obvious solution is to leave the cables flopped. This solves the short-term goal of getting a node returned to service, but leaves the original problem of PHY jitter driving connections into the ground.

The long-term solution will probably be replacing the switch. Our switch vendor probably won't like that solution, but we don't like nodes that go poof.

And I get tired of saying "hmm..." all the time.

Sunday, January 22, 2012

Back home again in Indiana

Time to try blogging again. I've been wanting to get into blogging (and podcasting) for a while now, and figure life is getting interesting enough to start again.

So what's the blog about?

The main focus will probably be work (as the title suggests), since I don't think high performance computing has enough voice from a sysadmin perspective sometimes. HPC programmers love to talk and dream and voice their opinions, but a lot of times I feel their perspective gets skewed. Even the admins that are voicing opinions seem to be out of touch, as they tend to work on smaller scale machines. More on those ideas in future posts.

When I run out of technical topics to talk about, there's always music and other current events in my life. At least I find them interesting. Your mileage may vary.