French Fry Withdrawal -- and My ‘Hazy’ Predictions for High Performance Computing

French Fry Withdrawal -- and My ‘Hazy’ Predictions for High Performance Computing

  • Comments 2

I’m not big into New Year’s resolutions (I don’t see the point of waiting to do something you know you should do)… but that said, every few years I put myself on a one month French fries moratorium to start off the year because I just can’t seem to summon the willpower at any other point in the year.  So now, 17 days into the moratorium, I don’t see how I can possibly hold out another 2 weeks (especially in the land of burger joints that is Dallas).

So what exactly does this have to do with high performance computing? Well, colored by the ‘golden haze’ of my withdrawal, I offer up some predictions for the year ahead in high performance computing:

  • Directive-based multicore programming takes hold – OpenMP is working on this as a means for heterogeneous computing, NVIDIA announced support for OpenACC, and Intel is promoting directive-based implementation for their new MIC architecture.  All signs point to a rosy future for this paradigm.
  • Supercomputing goes mainstream – It seems to me (again, as a relative newbie) that the supercomputing software world has been disconnected from the mainstream programming world for quite some time.  We have reached the stage where almost all computing devices in all manner of devices have multiple processing engines be they heterogeneous or homogenous.  Therefore, the challenges of supercomputing are now moving down to more consumer-oriented applications -- and people and companies who know how to use and program more complex systems will be in high demand.
  • Targeted high performance systems will see the light of day– A reasonable number of high performance applications do not require general purpose computing machines because they are focused on a specific problem. I believe we will start to see specially designed systems that are tuned for these applications to save on both power and cost.
  • Cloud supercomputing gets leveraged – ‘Personal’ supercomputing systems have started to unlock the demand for high performance systems at an individual user level – but the performance improvements of single blades will not increase fast enough to keep up with these users as they push their applications to higher levels of performance.  These same users cannot afford to build larger systems and therefore will start leveraging high performance cloud solutions.
  • Research community starts adopting our C6678 multicore DSP – Our initial benchmarks showed well at SC’11 and research institutions are starting to dig deeper into TI DSP solutions for HPC. I expect to see numerous papers and presentations discussing their findings this year (while this may seem like a shameless plug, it’s true nonetheless – and quite exciting!)

Happy New Year!  And if you see me in the next two weeks please don’t ask me if I want fries with that…because, yes, actually I DO!!

 

  • The C6678 is an interesting and impressive piece of floating point hardware development.  Speaking as a mathematical modeller who is trying to become a DSP developer, your toolchain and software documentation are very hard work still.   I think HPC researchers wanting to use C6678s might find learning to develop code for them something of a shock...

    It's partly culture, and partly tools, and partly that the device *is* complicated (eg. lack of automatic cache coherency).  Of course, HPC people tend to be bright so they'll probably manage some impressive things nonetheless.

    I was about to suggest that a full OpenMP and/or MPI stck for the C6678 that, for example, took care of starting tasks on cores and allocating tasks between cores might make things more familiar.  I see that you're working on OpenMP - I'm pleased, although it'll be too late for our current project.

  • Hi Gordon,

    Thanks for your comment, we're always striving to make working with our devices easier and we've made good progress on OpenMP.  We have a really good example running an SGEMM kernel across all 8 cores on C6678 that you might find interesting.  We're also putting together an MPI + OpenMP demo (although still a work in progress) that will use MPI to distribute the job across multiple C6678's and OpenMP will be used to parallelize on each individual DSP.  If you're interested in learning more about these demos please let me know and I'll be happy to follow up with you.

    - Arnon