How do you maintain a gigantic website?

Talk about just about anything else that is non-gaming here, but keep it clean
User avatar
RCBH928
Next-Gen
Posts: 6082
Joined: Wed Apr 02, 2008 6:40 am

How do you maintain a gigantic website?

Post by RCBH928 »

I always wonder how can Google host something Youtube. How come they never lose Data? How come they seem to have an ever expanding memory, its not like you should buy more hard drives. You need space,a building, with employees.

How can they keep track of the file locations, the data. Honestly, how long does it take to backup Youtube? How come the website is always up for years, no down time.

It will be great of some one can explain this to me.
User avatar
Hobie-wan
Next-Gen
Posts: 21705
Joined: Sat Aug 15, 2009 8:28 pm
Location: Under a pile of retro stuff in H-town
Contact:

Re: How do you maintain a gigantic website?

Post by Hobie-wan »

Server farms in big buildings. Redundant drives set up in RAID so that if a drive fails, it can be replaced without losing data. This way you don't have building 1 with a copy of the data, then building 2 with an exact copy of the data and always having to update it.
User avatar
MrPopo
Moderator
Posts: 24190
Joined: Tue Aug 26, 2008 1:01 pm
Location: Orange County, CA

Re: How do you maintain a gigantic website?

Post by MrPopo »

Well, there's a few things going on. The first is that they maintain datacenters which have a huge amount of hardware that stores all the videos. These datacenters are regularly backed up, and it's very likely that they mirror all the data, so if a hard drive goes down they immediately swich to the backup with no data loss.

For keeping track of files, they would use a database which would associate video ids with locations in the server farm.
Blizzard Entertainment Software Developer - All comments and views are my own and not representative of the company.
User avatar
Hobie-wan
Next-Gen
Posts: 21705
Joined: Sat Aug 15, 2009 8:28 pm
Location: Under a pile of retro stuff in H-town
Contact:

Re: How do you maintain a gigantic website?

Post by Hobie-wan »

MrPopo wrote:For keeping track of files, they would use a database which would associate video ids with locations in the server farm.
Indeed, that's why youtube links look like this:

www.youtube.com/watch?v=ABCDEFG

The ABCDEFG is a better unique identifier for the database than trying to assign something based off of titles and keywords that people use. There would be a problem with 1000 videos named "My little angel's first steps" and 643,741 videos named "Hold my beer and watch this".
User avatar
indecks
Next-Gen
Posts: 1742
Joined: Thu Jul 17, 2008 10:18 pm
Location: Austin TX

Re: How do you maintain a gigantic website?

Post by indecks »

they have fat stax.
User avatar
Anapan
Next-Gen
Posts: 3946
Joined: Mon Nov 17, 2008 11:15 am
Location: BC, Canada

Re: How do you maintain a gigantic website?

Post by Anapan »

Here's a video Google made in 2009 of one of their gigantic data centers. They have many of these in different parts of the world, each with multiple coppies of the entire internet to make searching more efficient.
http://www.youtube.com/watch?v=zRwPSFpLX8I
Around 2011, an estimate of Google's server count was around 900,000 servers...
ImageImageImageImage
ImageImageImageImage
User avatar
RCBH928
Next-Gen
Posts: 6082
Joined: Wed Apr 02, 2008 6:40 am

Re: How do you maintain a gigantic website?

Post by RCBH928 »

Great video anapan.

This kind of thing is very interesting to me. I always imagine how frustrating it is to manage your own PC and then seeing how complex that data center is , I think if I was an admin over there I would pull my hair out.

I wonder whatnkind of HD do theynuse, what if it fails? How do they know that it failed?

What kind of CPU power is needed for this kind of thing!!?? Maybe each slide of a server has its own CPU I guess.

Also it is very scary how those people working in the data center got easy access to your information. Looks like he can just hook a wire to a server and download everything to his laptop and no one would ever know. I am kind of paranoid.

I also wonder about what kind of security software they use to stop hackers/crackers from breaching to those servers as they seem a very populat target.

And they have multiple copies of the entire internet ?? What the hell? how many hard drives is that? what kind of inet connection do you need to download so much stuff? How do they even recognize there is new content on the internet?

very very amazing
User avatar
MrPopo
Moderator
Posts: 24190
Joined: Tue Aug 26, 2008 1:01 pm
Location: Orange County, CA

Re: How do you maintain a gigantic website?

Post by MrPopo »

There are enterprise HDDs which have faster seek times. They also include various diagnostics, so it's very easy to tell when failure happens. As mentioned, on failure the system would automatically switch to the standby and the tech would be informed of the failure, so he can swap out the broken one for a new one.

CPU-wise the machines aren't as good as you might think. Data retrieval applications like this are bound by your disk and network I/O, not your CPU. They probably don't have a CPU that's much better than a gaming enthusiast's. And as for the general "keeping things running", these servers are very simple compared to your home computer. They likely run a Linux distro and the majority of the CPU cycles are going to a web server that is doing some fairly simple data retrieval stuff. It's not like your home computer which has a lot of graphical elements and a variety of productivity and other software that's all going on at the same time. As I type this I can see the icons for Winamp, Steam, and the Gmail notifier in my tray, and I have several other background tasks running in the overflow flyout. By contrast a ps -a of one of the Google servers would be a very small list.

Yes, the techs could theoretically just download all the information off of the servers. There's a few things that make that not much of a concern, though. First is the sheer amount of stuff that is there which is organized by complex algorithms that can seem random to someone who is just browsing the file tree. Second is that anything sensitive is encrypted as a matter of course.
Blizzard Entertainment Software Developer - All comments and views are my own and not representative of the company.
User avatar
RCBH928
Next-Gen
Posts: 6082
Joined: Wed Apr 02, 2008 6:40 am

Re: How do you maintain a gigantic website?

Post by RCBH928 »

can we buy those more reliable faster enterprise HDD?
User avatar
Hobie-wan
Next-Gen
Posts: 21705
Joined: Sat Aug 15, 2009 8:28 pm
Location: Under a pile of retro stuff in H-town
Contact:

Re: How do you maintain a gigantic website?

Post by Hobie-wan »

kingmohd84 wrote:can we buy those more reliable faster enterprise HDD?
Some places online sell them yes. You might find them at specialty computer suppliers locally as well, but they're not going to be for sale at your local electronics chain next to the washers, TVs, and HP computers. They cost quite a bit more than consumer ones though.
Post Reply