My Research on Up-Scaling High Traffic Websites

Note: My most up-to-date (but raw) research will be in this ScalabilityResearch.TXT text file on my local computer.  It can be accessed directly online because of a sweet tool I like to use: GetDropBox

HowTo & Conclusion

After much deliberation I decided to go with the company RackSpaceCloud for hosting our dedicated servers.  Loadbalancing on their servers is DIY (do-it-yourself) which is good because then I have full control over the configuration which will make it easier to scale even more.  The main advantage of using their system is I can duplicate out the webservers easily.  I can create and mirror out full copies of any of our servers in not much more time than what it takes to perform the copy over their intranet.  The price is great too :-)  I can increase the server count at demand and I can scale up servers when needed as well.

Here is a HowTo that I am working on to help those who are interested in setting up a Loadbalanced Apache Server Cluster on Linux/CentOS 5.3.  I will provide more conclusions and howto's here when my research and implementation is complete.

Research Notes

Scalability Research
j0zf 2009.7.5

// GOALS ///////////////////////////////////////////////////////////////////////

1. Scalability
2. Redundency
3. Fail-Over
4. PCI Compliance
 - check list : https://www.pcisecuritystandards.org/education/docs/Prioritized_Approach_PCI_DSS_1_2.pdf

// OPTIONS ///////////////////////////////////////////////////////////////////// 

> Cloud Servers
 > RackSpace Cloud http://www.rackspacecloud.com/
  - may request separate geographic locations for servers.
  - very liberal policy, it just has to be legal.
  - Load balancing is DIY
  - Additional Public IP addresses are $2 / month. (request from support)
 > Amazon EC2 http://aws.amazon.com/ec2/
  - may have policy limiting direct marketing, and content. http://aws.amazon.com/agreement/
 > SoftLayer http://www.softlayer.com/
  - Does Load-balancing (no real clear)
  - More expensive than Rackspace (on average)
 > Google App Engine - http://code.google.com/appengine/
  - ""Lavin" (12:34:44 PM): but that we would have to rewrite our entire system into python"

> In House Servers
 - Possibly using load balancing hardware
 pros:
  - direct access to servers and hardware.
 cons:
  - cost of bandwidth
  - up-front cost of servers 
 
> Rented Servers (Hostgator, etc)
 > Multiple Dedicated Servers
  - x number of webservers using rsync
  - use round-robin dns method
  - dedicated database server
 > Hosting service which provides Load Balanced Servers
  - http://www.eliterax.com/server-clusters/

> Load Balancing
 > Apache HTTP Server (mod_proxy_balancer)
  - howto: http://www.howtoforge.com/high_availability_loadbalanced_apache_cluster 
 > DNS Load Balancing Options
  * Round robin DNS http://en.wikipedia.org/wiki/Round_robin_DNS
  - DnsMadeEasy Round-Robin Load Balancing http://www.dnsmadeeasy.com/s0306/res/recs.html#rr
  - ?? Can we do this same thing, in-house?
  * Apache Load Balancing 
  - mod_proxy_balancer extension
  - apache2-mpm-worker is pretty fast and light if well-configured.  
  * Pound
  - http://www.debian-administration.org/article/Simple_webserver_load_balancing_with_pound 
  * NginX
  - http://en.wikipedia.org/wiki/Nginx
  - http://sameerparwani.com/posts/load-balancing-with-nginx/
  * For Failover
  - heartbeat or something similar would do the failover  

> Database Scaling
 - Phase 1 : Powerful Single-Purpose MySQL Database Server.
 - Phase 2 : Mysql Cluster
  - http://www.mysql.com/products/database/cluster/faq.html

// POTENTIAL OBSTACLES /////////////////////////////////////////////////////////

- Multiple wildcard SSL certs on the same server.
 - josephl_ Hydra wants to use their own wildcard SSL certificate rather than the product2web.com one
  
// RESEARCH SOURCES ////////////////////////////////////////////////////////////

http://en.wikipedia.org/wiki/Cloud_computing
http://www.rackspacecloud.com/cloud_hosting_products/servers
josephl: http://porteightyeight.com/2008/03/24/the-hitchhikers-guide-to-php-load-balancing/ <--- this is a neat little guide I found that gives a general introduction
Byron: http://www.dnsmadeeasy.com/
http://en.wikipedia.org/wiki/Load_balancing_(computing)
http://en.wikipedia.org/wiki/Round_robin_DNS
http://www.zytrax.com/books/dns/ch9/rr.html (HOWTO - Configure Load Balancing)
http://www.softlayer.com/
http://www.mysql.com/products/database/cluster/faq.html
http://video.google.com/videoplay?docid=-4567104036778249401
http://www.howtoforge.com/high_availability_loadbalanced_apache_cluster
http://www.howtoforge.com/load_balancing_apache_mod_proxy_balancer
http://www.markround.com/archives/33-Apache-mod_proxy-balancing-with-PHP-sticky-sessions.html
http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass
http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html


// CLIENTS /////////////////////////////////////////////////////////////////////

> B
 - Has now:
  - 1 server.
  - backing up to 2nd server.
  - dns fail setup. via dns made easy.
 - Needs:
  - pci compliance, separate server.
  - need a pci server.
  - 5 or 10 crm carts on each server.
  - 1 server, all landing pages, needs to be fast.
  - 1 server, backend crm server, needs to be stable.
  - cluster situation, they host their front end landing pages.
 
> H
 - Their own wildcard SSL certificate rather than the product2web.com one.





Joseph Frazier | Create Your Badge

This page has been visited 3,823 times since July 10th, 2009

This is an ApogeeInvent Dynamic Website