Page 1 of 1

ILMs data center

Posted: 2006-03-30 01:08pm
by Ace Pace
Linky
By Barbara Robertson

When George Lucas moved a large part of his filmmaking empire from San Rafael, California-a small town north of San Francisco-into a state-of-the-art, four-building complex on 17 acres of parkland in San Francisco’s Presidio, he spared no detail. Lawrence Halprin, the renowned landscape architect, even rearranged individual rocks in the babbling brook that rambles through the campus to achieve the most pleasing sound.



Similarly, the technical team left no stone unturned when it developed the infrastructure that powers Industrial Light & Magic (ILM), Lucas’ award-winning visual arts facility, and the Lucas Arts game-development division. “When we went from San Rafael to the Presidio, we had a 10X increase in network bandwidth,” says systems developer Michael Thompson. “We knew it would be coming, so we designed a system that could handle a massive jump in network throughput.”

At the new Lucas Digital Arts Center (LDAC) in the Presidio, a 10gb/sec Ethernet backbone feeds data into 1gb/sec pipes that run to the desktops. About 600 miles of fiber-optic cable thread through 865,000 sq. ft. of building space; the network is designed to accommodate 4k images via 300 10gb/sec and 1500 1gb/sec Ethernet ports.

A 13,500-sq.-ft. data center houses the renderfarm, file servers, and storage systems; the data center’s 3000-processor (AMD) renderfarm expands to 5000 processors after-hours by including desktop machines.


“All these render nodes constantly need data,” says Thompson. “At ILM, and probably at most visual effects studios, there is an ongoing war between the renderfarm and storage. Currently, we have about half a dozen major motion-picture projects under way. Keeping everyone happy requires feeding a phenomenal amount of data to those render nodes.”

How much data? “The whole [storage] system holds about 170tb, and we are 90 percent full,” says Thompson.


In a visual effects-laden film such as Star Wars, nearly every minute of the 140-minute film included work by ILM. For the film Jarhead, which is not considered a visual-effects film, ILM created about 40 minutes of effects. With that in mind, consider this: ILM currently renders most visual-effects shots at around 2k x 2k resolution; however, some productions are moving to 4k x 4k resolution. A shot is an arbitrary number of frames; film is projected at a rate of 24 fps and video at 30 fps. To produce the final shots, compositors combine several layers of rendered elements for each frame. A 100-layer shot is not unusual; most shots include at least 20 layers. It took 6,598,928 hours of aggregate render time to produce the shots in Star Wars: Episode III-Revenge of the Sith.

A New Way to Store

The IT team began looking for a new storage system about three years ago when Lucas was beginning work on Revenge of the Sith. They chose SpinServer NAS hardware and the SpinFS distributed file system from start-up Spinnaker Software.

“The system had all the attributes we needed to go forward,” says Thompson. “We knew we’d have major scaling issues, and it could scale well. Also, it has good data management features and a unified naming space [aka global namespace].”

Yet, shortly after ILM purchased the system, Network Appliance bought Spinnaker. “It was spooky for us,” says Thompson. “We didn’t know if they would deep-six the technology. But it turned out to be a good deal. For the past two and a half years, we’ve been prototyping NetApp’s Data ONTAP NG [Next Generation] software, which includes the Spinnaker software.”

ILM now uses 20 Linux-based SpinServer NAS systems and about 3000 disks from Network Appliance. “In six to nine months, we’ll swap the SpinServers for Network Appliance hardware, but will still run the same software stack,” says Thompson. “Our system is a weird hybrid: It has all the features of a SAN, but it does NAS as well.”

Linux-based render boxes at ILM talk to the disk storage systems via the NFS protocol. Brocade Fibre Channel switches handle data transfer between the SpinServers and two types of disks: high-speed production disks and slower nearline disks used for archiving data before it goes off-line to an ADIC Scalar 10k tape library. Couriers deliver final shots to production studios on FireWire drives.

“One of the nice things about our stor- age system is that it allows you to run the disks very full,” claims Thompson. “The 3000 disks are divvied up into 20 stacks, and as they fill up, the data moves from one to the next. However, the users can still get to all their data via normal paths. They don’t know we’re moving data around behind the scenes.”

One Giant Disk

Because the Spinnaker system has one unified naming space, all the disk drives look like one giant disk to the users, whether the data is on the fast production disks or on the slower nearline disks. This means the studio can organize its file systems into a tidy hierarchy. Before, people working on shots had to keep track of which servers had the elements they needed.

“Now, it looks like one giant disk, and they can keep everything for one movie in one area instead of on 14 different servers,” Thompson explains. “And, because the system spreads the data across the servers so that it’s evenly balanced, we can add servers as we need them.”

In fact, during the move from San Rafael to San Francisco, the two facilities acted as one. “We had people on both sides of the Golden Gate Bridge accessing the data and moving it around without losing access,” says Thompson. The studio leased a fiber-optic cable that ran from San Rafael to Berkeley and then across the Oakland Bay Bridge to San Francisco to link the SpinServers in San Rafael to those in San Francisco. “All the data still showed up as one virtual disk,” says Thompson.

Because it could run the two facilities as if they were one, ILM could move people from one location to the other in waves; it was never necessary for anyone to stop working in order to move. “Without this system, we would have had to completely shut down the whole facility,” says Thompson. “Our daily burn rate was around $50,000 a day for downtime. It would have cost millions of dollars, and that doesn’t take into account delays.”

Now, Thompson is looking at ways to implement a similar system between Singapore, where Lucas has opened an animation studio, and Lucas’ headquarters at Skywalker Ranch north of San Francisco. He installed 20tb of storage on Network Appliance hardware running the Data ONTAP NG software in each location, but the problem is WAN latency.

“Data access over fiber between San Rafael and San Francisco was very fast, but when you’re shooting packets to Singapore and introducing millisecond delays, the computers start bogging down,” says Thompson. “It’s not the throughput; it’s the round-trip time. We’re looking at Network Appliance, Hewlett-Packard, and a lot of start-up companies that deal with these WAN issues for a solution.”

Back at the Ranch

Meanwhile, back at ILM, Thompson wants to try playing high-performance, 600mb/sec HD video off the core storage. Currently, the studio uses custom-designed, dedicated HD video servers. “When you’re streaming uncompressed HD video to the desktop, the throughput is astronomical,” says Thompson. “So we have homegrown HD servers. There’s a feature in the new ONTAP NG software, though, that we think we can use to stream HD video to the desktop for the whole facility. Each server would do 1/20th of the load and, when they’re combined, we could play at warp speed.”

Would that imply more data storage? “I’ve been doing storage here for six years, and I’ve found that people will use up whatever you put out there,” says Thompson. “We’ll probably be buying more disks this year. At least now, adding more storage takes only a few hours.”

Barbara Robertson is a freelance writer and a contributing editor for Computer Graphics World. She can be reached at BarbaraRR@comcast.net.
Wow, holy massive data center.

Fixed the tag.

~Faram

Posted: 2006-03-31 02:04am
by Argosh
170TB? :cry: Mine's only 150GB.

Posted: 2006-03-31 02:16am
by darthdavid
*face-fault*

Posted: 2006-03-31 02:48am
by Uraniun235
Argosh wrote:170TB? :cry: Mine's only 150GB.
465GB of fault-tolerant (RAID-5) storage, baby. 8)

Posted: 2006-03-31 05:13am
by Bounty
the data center’s 3000-processor (AMD) renderfarm expands to 5000 processors after-hours by including desktop machines.
Isn't this drifting into supercomputer territory ? Or is that only when the processors are linked in some way ?

EDIT : A friend of mine works with a university supercomputer, and that one has 170-odd nodes of several dual-core processors each, but those are all part of the same computer. Is that the case with this renderfarm or is it just a collection of seperate PC's each doing a bit of the work ?

Posted: 2006-03-31 06:13am
by WyrdNyrd
I would think that the line between "big server cluster" and "small supercomputer" is very blurred by now.

However, I think this counts more as a large server farm. A true supercomputer does indeed have special connections between the CPUs, usually proprietary stuff with super-high bandwidth and infinitisimal latency. They don't use a more general-purpose protocol like TCP or UDP, because they can optimise a specialised protocol for low latency and whatever special needs a supercomputer has.