press room
The Tech Behind HubSpot
May 26, 2011BostInnovation
by Kevin McCarthy
As a technical person, I am always curious how Boston companies do what they do technically. As such, we here at BostInno are starting a series called “The Tech Behind”, where everything from engineering, deployment strategies, hardware and more is discussed about Boston’s hottest companies. For the inaugural post, we had the honor of covering HubSpot. Yoav Shapira, HubSpot’s VP of engineering, was kind enough to give us the low down on the tech behind HubSpot.
Background
HubSpot was founded in 2006 and has over 200 employees. Most recently, the company raised an additional $32 million dollars in funding.
Currently, HubSpot has 35 total developers, some QA engineers, a handful of non-developer product managers, and a small IT group. All said and done, there are about 60 total people in the R&D group. They keep a dev blog here.
Product breakdown
HubSpot’s product is a collection of web applications assembled into a suite. Many of these applications have point competitors, but HubSpot doesn’t try to be the absolute best in every area. They offer simpler technology for their customers (over 4,500 companies and growing quickly), whose feedback indicate that they find leading-edge tools very complex, hard to understand, and hard to use.
There are 3 broad areas of their product:
1. Content Management System: The tools within the CMS (the blog engine and SEO tools like KeywordGrader and LinkGrader) were the first developed at HubSpot. These are written in C#, using the ASP.Net framework, running on IIS web servers, and using the SQL Server database. Because the CMS incorporates the full Microsoft stack, these apps run in the Rackspace Cloud. The main reason why they continue to host the CMS on Rackspace is cost. Another reason is that Rackspace has been reliable and stable, and was also able to help the organization in stressful situations (e.g. DDoS attacks) where Amazon doesn’t offer that kind of service. Some figures: customers publish about 10,000 new blog posts every month, almost 500,000 total.
2. ‘Middle of the funnel’ tools: This area of product includes software like lead management, email marketing, analytics engine and reports, social media tools and more. These are written in a mix of Python and Java, but more Java than Python. “The fact we have more Java than Python is as much due to historical reasons as anything else” says Shapira. These tools use MySQL, Apache, and run on Amazon EC2. (Both QA and production.) Some figures: the leads API serves about 500K requests per day and customers have collected more than 5.7M leads in aggregate.
3. Grader.com tools: These tools include software like WebsiteGrader and TwitterGrader. They are mostly written in PHP, but some Python as well. Over time, these tools will be written more and more in Python in an effort to phase out PHP. The phasing out of PHP will make the code base is easier to manage. (deploying scripts is easier in one language than it is in two.)
Server side languages
Most of HubSpot’s server-side code is in Java. A minority of it is still in C# and Python. Java is used primarily for three reasons, although none of the reasons pertain to the quality of the language:
1. Availability of libraries: “Many of the tools we rely on are written in Java, and are easier to run, operate, and/or extend if you work in Java as well,” says Shapira.
2. Availability of talent: finding great Java programmers tends to be easier than finding great C# programmers.
3. Fit for needs: The HubSpot products is way more data intensive (server side) than front-end dependent. As such, Java is the preference.
Web servers
HubSpot primarily uses Apache httpd as a front-end web server and load-balancer. For the Python apps, they use Apache + Django to run them, whereas the Java apps are on Tomcat behind Apache. ”We try to use the latest stable versions of these servers where possible, but our code is such that we don’t rely on version-specific code that often,” says Shapira, “It’s pretty portable, and in most cases a minor version difference doesn’t mean much to us.”
Each production web application has at least two web servers for redundancy.
Databases
HubSpot’s databases are mostly in MySQL, varying between versions 5.5 (for the newer products) and 5.1 (for the older ones). Some of the CMS tools primarily talk to SQLServer. ”Again, we try to be on the latest stable versions,” says Shapira “but we are cautious of security and performance issues that sometimes make quick upgrades dangerous.”
Operating Systems
For the Amazon EC2 boxes, HubSpot runs CentOS. Obviously, Windows Server is used for the few C# app servers. “We’re not super-picky and we don’t customize the OS much,” says Shapira.
Hardware
All the setup for our new Amazon EC2 instances is done automatically via Puppet. A dedicated team, called the “Q Team” after the character in the James Bond films, creates, updates, and maintains these tools. Their goal is to make the rest of the HubSpot dev team far more productive and scalable while also keeping the infrastructure secure and optimal.
HubSpot currently has about 350 EC2 server instances, adding one every other day on average.
Deployment Strategy
“We’d love to have continuous deployment, but we’re not there yet,” says Shapira, “We have fairly nice scripts, built by the same ‘Q Team’ mentioned above, which do a bunch of deployment tasks. They gradually deploy to one node at a time, taking nodes off and on the load-balancers as needed, checking our continuous integration servers for test status, rolling back, and all that sort of stuff.
“It’s really nice, and it enabled us to do more than 1,000 deploys last month.” (more here.)
Build System and Policies
“We are big fans of continuous integration,” says Shapira. “We’ve been using CI for years now. We use Hudson as our continuous integration server, and we have two clusters of them. All the machines run on EC2.
“One cluster runs unit tests all the time for all our code. If a unit test fails, the build is ‘broken,’ and a flashing red light goes on in the dev area (literally). If your code depends on something broken, you can’t ship (read: you’re blocked) until the build is clear again. This causes people to go harass each other, which is good, because nothing is as effective as peer pressure to drive product quality up.
“The second cluster runs user-level tests, for which we use Selenium primarily, to automate tests at the webapp level. These tests are pretty awesome and are worth their own blog post, most likely. We have a couple of engineers who spend most of their time improving and maintaining this Selenium-based testing framework.”
Interesting Problems and Solutions
With 4,500 current customers, managment of data can be a challenge. “We ask our customers to install a small piece of JavaScript on their web sites, much like Google Analytics and many other companies” says Shapira. “Once that’s installed, we track information about every page view on that web site, until it’s removed. That’s tens of thousands of web sites, each with hundreds (or more) web pages, every page view, forever essentially — it’s a lot of data. (Terabytes of it.) We went from a simple database storage and SQL analysis system, over time, to a fancy parallel Hadoop analytics cluster with dozens of machines on EC2, as well as using Hive for some queries on top of Hadoop, to keep the processing scalable and fast.”
Another issue HubSpot ran into was the management of its web services, both internally-built (a service-oriented architecture) and externally-built (the web 2.0 APIs like Twitter, Facebook, BackType, and more). The external API figures are fascinating: HubSpot processes about 6MM tweets per day , which works out to 8,500 tweets per minute or 142 tweets per second. “When you use a bunch of APIs you run into assorted mashup-related operations, reliability, and data consistency problem,” says Shapira. “We built an internal system that monitors these calls and checks against developer-declared SLAs for these APIs, providing a real time dashboard for service calls, error rates, error logging, and more. It’s called ‘Hydra’ and we might open-source it one day.”
Developer Challenge
In January, HubSpot announced a developer referral program that would give you $10,000 if you successfully referred a developer to HubSpot as an employee. “We got a lot of referrals from people inside and outside HubSpot, including people who were in our network but inactive prior to the bounty announcement,” says Shapira. “And most of the referrals were good, easily beating out candidates we got from recruiters or other channels.”
Hopefully now you have a detailed knowledge on how HubSpot does what it does technically. A big thanks to Yoav for help in the construction of this piece. His honesty and transparency is what makes a piece like this interesting and helpful. Another thanks to Aaron White of Boundless Learning for asking some thoughtful questions along the way.