Apply today for a FREE subscription to CIO Magazine!
Tue, Sep 12, 2006 18:41 EDT

|
Posted by: Bernard Golden Blog: The Open Source
Current Rating: |
This morning I was out walking and listening to a podcast. Now, mostly I feel podcasts are a very low bandwidth media and not a very good way to communicate complex content. However, they're a great mode of learning for the right circumstances -- commuting, exercising, ... actually, that pretty much exhausts the right circumstances to listen to podcasts. But for them, podcasts are perfect.
On this morning's walk, I was listening to a podcast by Thomas Kurian of Oracle from Software 2006. He was talking about Oracle's architecture; one of the topics he discussed was Oracle's use of its grid architecture as the underpinning of its Fusion offering.
He identified a key problem for users: the massive increase in software processing. There's no question that most organizations are using more and more software driven by:
But, are grids the way most organizations will solve the problem of exploding computer processing and data? I don't think so. I think grids are a vendor solution looking to convince customers it will solve their problems.
Now, my experience of Oracle's grid is mostly the increased difficulty it adds to Oracle's already-notoriously challenging database installation. But, he noted, it offers the ability to increase scalability of customer infrastructures. Essentially, grids offers the ability to scale applications beyond the bounds of a single machine and provide more computing power than possible from one server. Kurian maintained that grid will be the foundation of how Oracle's users will write and roll out applications in the future. Perhaps.
Despite all the hype from vendors, user acceptance of grid computing has been lukewarm at best. The biggest problem is that grids require application designers to extend their skills in order to write systems capable of executing in a grid architecture; in other words, end users have to learn new things to take advantage of the power of grids.
Generally speaking, imposing new conditions on users in order to move to a new architecture is a non-starter.
The requirement to break the bounds of the application/single machine foundation put me in mind of virtualization. While virtualization is usually thought of as a way to consolidate multiple servers onto a single box, several virtualization vendors have extended their products to create a virtualized pool of servers running on multiple machines.
With these products, you can declare a set of servers as a resource for virtual machines to run upon. If one physical server gets maxed out, new VMs can be started on a second physical server. The new VMs can be additional instances of an existing VM; the pooling products can even move a VM from one server to another to reduce load on a given server. Multiple VMs can work cooperatively on a common datastore via clustering.
It strikes me that this kind of server pooling is a much more likely path for IT organizations to achieve application scalability than rewriting applications to operate in grids. Enabling virtualization server pooling puts the burden of architecture on vendors rather than imposing it on users, as with grids. With virtualization pooling, vendors can make the necessary investment to create an easily extended architecture while users can continue to operate with their existing skill sets. The most expensive technology migration possible is building a new skill set in end users -- just look at the decade-long headache that was the move to object-oriented programming to understand the issue.
Now before you get all bothered and post a comment along the lines of "some apps require grid-type processing that virtualization just can't do," it's definitely true that pooled servers won't solve every problem. Video rendering, financial derivatives, and drug discovery all require massive, coordinated processing that necessitates breaking processing up into discrete chunks spread across a grid. For the most challenging applications, grid computing is perfect.
Organizations that require that level of application will make the skill investment to take advantage of grid computing -- but for the vast majority of organizations that do not require the specialized functionality of coordinated grid computing, the solution is likely to be pooled servers via virtualization. Instead of grid computing being a high performance, difficult architecture imposed upon everyone that needs more than a server's worth of processing, it will be used only by those organizations that truly need its capabilties.
For vendors to offer grid computing as the architecture of the future is very much reminiscent of a solution looking for a problem. It's right for some organizations, but overkill for most.
While there is some truth that learning Grid can be more complex what I am seeing in the IT world is the dumbing down of developers. CIO,CTO,CFO's make the assumption that all developers are equivalent to some extent or another but the problems they are asking us to solve grow more complicated and well the vendors they dont always give you the complete picture.
Requiring you to train your staff in a new technology while it might be a sticking point I find that a lot of developers out there dont even know simple recursion or threads and now we are going to introduce grids to them. We have to invest in our IT talent and dont always count on some vendor to solve it for us or some foreign companys hoard of cheep labor.
How about realizing that we pick the right tool for the job and we invest in our IT employees and not simply ship their work offshore to folks who dont necessarily have the knowledge either.
I think in the long term Grid will be the way to go since we are wasting so many cycles on all of these machines and spending soo much money in central IT centers and lets not even talk about redundancy.
Lets get off of our laurels and engineer a solution rather than being afraid to tackle the problem and leave it up to our vendors to solve all of the difficult issues. After all they arent always better they just have better marketing.
Lets keep our minds open and beware of the Golden Hammer Anti-Pattern.
And if you don't know what that is you most likely have been in management way too long.
Larry
Having worked with Oracle's Real Application Clusters databases in a custom software shop, I can identify with the installation difficulties. I think the author makes an important lapse however when he says that it imposes conditions on the end users. Grids and clusters do require advanced programming techniques. But none of this affects how the USER uses the program.
Grids are not a panacea. They are only useful for certain types of large computationally-intensive problems. Clusters are useful for hosting a large numer of smaller problems that require the same resources - usually data. Virtualization addresses the same issues that clusters address, but do not address the same problems grids are created to handle, because virtualization still relies on an individual server handling the load of the entire problem.
One additional misconception perpetuated here is the notion that the underlying hardware is cheaper with virtualization. Virtualization combines a number of applications/servers onto a single hardware platform, but those hardware platforms are generally special servers that can be very costly. Grids and clusters attempt to leverage multiple cheaper platforms - usually ones that are currently in use and underutilized.
It is clear that this author has his preference toward virtualization over clusters/grids, which is his prerogative. But individual projects should choose their architecture based on the complexity of the problems being solved, the number of concurrent connections, and their budgets. Grids, clusters, and virtualization all serve different needs, and it is important to select the appropriate solution for your needs.
I disagree with this article. Grid IS virtualization. It can as well be used for resource/service pooling with additional benefits of coordinated, secure and transparent sharing across organizations. I recommend a recent article - Nemeth, Sunderam "Virtualization in Grids: A Semantical Approach"
There are two points that you make for which I believe clarification is called. First, you state that putting Grid-enabled applications in place requires users to have to adapt. I disagree - mostly. It depends on who you mean by 'users'. If you mean developers and tech support (the customers of Oracle), then yes, they need to expand their skills to handle the middleware layers. If, on the other hand, you mean the end-users of grid-enabled applications, then I believe you are incorrect. If Grid apps are properly enabled and installed, end users should have seamless access to more resources, and could in fact see NO difference other than higher productivity.
Second, you state that "for most challenging applications, grid computing is perfect." Again, I disagree. There are two basic classifications of "challenging" or high-performance applications. Let's call these "embarrassingly parallel" and "low latency". For those embarrassingly parallel jobs that are easily chopped into thousands or millions of discreet and independent jobs (like pattern matching search on databases, or Monte Carlo style statistical evaluations) Grid computing is a wonderful solution. However, for applications that must pass a great deal of information dynamically between compute cells (e.g., meteorological modeling, climate modeling, computational fluid dynamics), also know as a "low latency" message passing interface requirement, Grid computing is abysmal. The performance of the application is degraded terribly, users activating CPUs being shared on a grid will cause MPI jobs to abort, and network impact is huge.
We need to be careful about the lexicon we use with Grid computing, as we observe and opine about where it can be best used. Of course, all of this will change radically as the middleware developers catch up with the hardware. Next up - field programmable gate arrays and cell computing...
The real truth in the GRID vs Virtualization problem is that you don't need to be either or. The two technologies are complimentary and at the end of the day mean more computing power with less sprawl of your server environment. The key is going to be the applications and their maturity. If you look at products like VMware, they really can't address virtualization at the process level so the pooling concept is only accurate when you run out of CPU. While Sun has this built into Solaris 10, GRID gets you there on the rest of the applications and allows a considerable value proposition today on scaling the application environment to fit the volume.
Once again, the challenge is to have our poorly written code bases remediated. The underlying technology that solves the problem (GRID/Virtualization) is a matter of usage / demand forecasts. In 2002, the golden rule of application development was that time to market is more critical than efficiency of the code once it is running. This was largely based on the ever descending cost of CPU and memory in the Intel space. I had a SVP tell me that it was easier and more cost effective to throw resources at coding problems rather than dig into the code and reengineer. While the technologist in me was offended, my business degree pointed to product lifecycles and the sensibility to be frugal with resources. The new computing rules that would allow us to leverage GRID and virtualization as it continues to mature require that we write better code as an industry.