I always wonder how can Google host something Youtube. How come they never lose Data? How come they seem to have an ever expanding memory, its not like you should buy more hard drives. You need space,a building, with employees.
How can they keep track of the file locations, the data. Honestly, how long does it take to backup Youtube? How come the website is always up for years, no down time.
It will be great of some one can explain this to me.
How do you maintain a gigantic website?
- Hobie-wan
- Next-Gen
- Posts: 21705
- Joined: Sat Aug 15, 2009 8:28 pm
- Location: Under a pile of retro stuff in H-town
- Contact:
Re: How do you maintain a gigantic website?
Server farms in big buildings. Redundant drives set up in RAID so that if a drive fails, it can be replaced without losing data. This way you don't have building 1 with a copy of the data, then building 2 with an exact copy of the data and always having to update it.
I've never met a pun I didn't like. - Stark
My trade, sale and services - Rough want list - Shipping weight reference chart - AC Power Adapter reference list
My trade, sale and services - Rough want list - Shipping weight reference chart - AC Power Adapter reference list
Re: How do you maintain a gigantic website?
Well, there's a few things going on. The first is that they maintain datacenters which have a huge amount of hardware that stores all the videos. These datacenters are regularly backed up, and it's very likely that they mirror all the data, so if a hard drive goes down they immediately swich to the backup with no data loss.
For keeping track of files, they would use a database which would associate video ids with locations in the server farm.
For keeping track of files, they would use a database which would associate video ids with locations in the server farm.
Blizzard Entertainment Software Developer - All comments and views are my own and not representative of the company.
- Hobie-wan
- Next-Gen
- Posts: 21705
- Joined: Sat Aug 15, 2009 8:28 pm
- Location: Under a pile of retro stuff in H-town
- Contact:
Re: How do you maintain a gigantic website?
Indeed, that's why youtube links look like this:MrPopo wrote:For keeping track of files, they would use a database which would associate video ids with locations in the server farm.
www.youtube.com/watch?v=ABCDEFG
The ABCDEFG is a better unique identifier for the database than trying to assign something based off of titles and keywords that people use. There would be a problem with 1000 videos named "My little angel's first steps" and 643,741 videos named "Hold my beer and watch this".
I've never met a pun I didn't like. - Stark
My trade, sale and services - Rough want list - Shipping weight reference chart - AC Power Adapter reference list
My trade, sale and services - Rough want list - Shipping weight reference chart - AC Power Adapter reference list
Re: How do you maintain a gigantic website?
they have fat stax.
Re: How do you maintain a gigantic website?
Here's a video Google made in 2009 of one of their gigantic data centers. They have many of these in different parts of the world, each with multiple coppies of the entire internet to make searching more efficient.
http://www.youtube.com/watch?v=zRwPSFpLX8I
Around 2011, an estimate of Google's server count was around 900,000 servers...
http://www.youtube.com/watch?v=zRwPSFpLX8I
Around 2011, an estimate of Google's server count was around 900,000 servers...
Re: How do you maintain a gigantic website?
Great video anapan.
This kind of thing is very interesting to me. I always imagine how frustrating it is to manage your own PC and then seeing how complex that data center is , I think if I was an admin over there I would pull my hair out.
I wonder whatnkind of HD do theynuse, what if it fails? How do they know that it failed?
What kind of CPU power is needed for this kind of thing!!?? Maybe each slide of a server has its own CPU I guess.
Also it is very scary how those people working in the data center got easy access to your information. Looks like he can just hook a wire to a server and download everything to his laptop and no one would ever know. I am kind of paranoid.
I also wonder about what kind of security software they use to stop hackers/crackers from breaching to those servers as they seem a very populat target.
And they have multiple copies of the entire internet ?? What the hell? how many hard drives is that? what kind of inet connection do you need to download so much stuff? How do they even recognize there is new content on the internet?
very very amazing
This kind of thing is very interesting to me. I always imagine how frustrating it is to manage your own PC and then seeing how complex that data center is , I think if I was an admin over there I would pull my hair out.
I wonder whatnkind of HD do theynuse, what if it fails? How do they know that it failed?
What kind of CPU power is needed for this kind of thing!!?? Maybe each slide of a server has its own CPU I guess.
Also it is very scary how those people working in the data center got easy access to your information. Looks like he can just hook a wire to a server and download everything to his laptop and no one would ever know. I am kind of paranoid.
I also wonder about what kind of security software they use to stop hackers/crackers from breaching to those servers as they seem a very populat target.
And they have multiple copies of the entire internet ?? What the hell? how many hard drives is that? what kind of inet connection do you need to download so much stuff? How do they even recognize there is new content on the internet?
very very amazing
Re: How do you maintain a gigantic website?
There are enterprise HDDs which have faster seek times. They also include various diagnostics, so it's very easy to tell when failure happens. As mentioned, on failure the system would automatically switch to the standby and the tech would be informed of the failure, so he can swap out the broken one for a new one.
CPU-wise the machines aren't as good as you might think. Data retrieval applications like this are bound by your disk and network I/O, not your CPU. They probably don't have a CPU that's much better than a gaming enthusiast's. And as for the general "keeping things running", these servers are very simple compared to your home computer. They likely run a Linux distro and the majority of the CPU cycles are going to a web server that is doing some fairly simple data retrieval stuff. It's not like your home computer which has a lot of graphical elements and a variety of productivity and other software that's all going on at the same time. As I type this I can see the icons for Winamp, Steam, and the Gmail notifier in my tray, and I have several other background tasks running in the overflow flyout. By contrast a ps -a of one of the Google servers would be a very small list.
Yes, the techs could theoretically just download all the information off of the servers. There's a few things that make that not much of a concern, though. First is the sheer amount of stuff that is there which is organized by complex algorithms that can seem random to someone who is just browsing the file tree. Second is that anything sensitive is encrypted as a matter of course.
CPU-wise the machines aren't as good as you might think. Data retrieval applications like this are bound by your disk and network I/O, not your CPU. They probably don't have a CPU that's much better than a gaming enthusiast's. And as for the general "keeping things running", these servers are very simple compared to your home computer. They likely run a Linux distro and the majority of the CPU cycles are going to a web server that is doing some fairly simple data retrieval stuff. It's not like your home computer which has a lot of graphical elements and a variety of productivity and other software that's all going on at the same time. As I type this I can see the icons for Winamp, Steam, and the Gmail notifier in my tray, and I have several other background tasks running in the overflow flyout. By contrast a ps -a of one of the Google servers would be a very small list.
Yes, the techs could theoretically just download all the information off of the servers. There's a few things that make that not much of a concern, though. First is the sheer amount of stuff that is there which is organized by complex algorithms that can seem random to someone who is just browsing the file tree. Second is that anything sensitive is encrypted as a matter of course.
Blizzard Entertainment Software Developer - All comments and views are my own and not representative of the company.
Re: How do you maintain a gigantic website?
can we buy those more reliable faster enterprise HDD?
- Hobie-wan
- Next-Gen
- Posts: 21705
- Joined: Sat Aug 15, 2009 8:28 pm
- Location: Under a pile of retro stuff in H-town
- Contact:
Re: How do you maintain a gigantic website?
Some places online sell them yes. You might find them at specialty computer suppliers locally as well, but they're not going to be for sale at your local electronics chain next to the washers, TVs, and HP computers. They cost quite a bit more than consumer ones though.kingmohd84 wrote:can we buy those more reliable faster enterprise HDD?
I've never met a pun I didn't like. - Stark
My trade, sale and services - Rough want list - Shipping weight reference chart - AC Power Adapter reference list
My trade, sale and services - Rough want list - Shipping weight reference chart - AC Power Adapter reference list



