I'm reaching the end of an evaluation period using Atlassian Stash and so far it's faired significantly better than the alternatives and as such I'm starting to look in to what sort of hardware we'd need to deploy it.
I'm not particularlly familiar with the sort of criteria that need factored in but I need to give some guidance to our IT department and I'm hoping I can get some help here from other users and from any Atlassian employees kicking arround (should I be contacting sales for this sort of help?).
I've been looking at the https://confluence.atlassian.com/display/STASH/Scaling+Stash page and while it's got some usefull guidelines it's not very specific.
Overview of what we're lookign to deploy: We're looking at over a hundred developers. We'll be starting with a few Git repositories (20 or 30) and slowly migrating and adding new ones, that could eventually bring us to hundreds of repositories (potentially over a 1000 if we did a full migration). The vast majority are under 10MB but a few go up to the 300-400MB range.
Disk space for the git repos is relativlly easy to estimate based on average size per commit and the commit rate but the rest is harder.
For memory and CPU I'm guessing the most important figure is how many concurrent clone operations we need to support and what distribution of sizes they are? Does anyone have any experience of how many concurrent operation you tend to get concurrently based on company sizes? I know how many developers we have but I've no idea how to transfer that to activety rates on the server.
Any help would be greatlly appreciated.
Thanks
Hi Andrew,
For memory and CPU I'm guessing the most important figure is how many concurrent clone operations we need to support and what distribution of sizes they are?
Yes, that is correct. Clones tend to be expensive both CPU and memory wise.
Does anyone have any experience of how many concurrent operation you tend to get concurrently based on company sizes?
Unfortunately the company size or the number of users using the system is not a good proxy for estimating the resource usage. Especially with heavy CI usage a system with just a handful of users could easily be under more load than a system with hundreds of users but barely any CI. It further depends on how CI is configured (i.e. whether build agents need a full clone vs. a shallow clone or just a fetch).
For heavy CI usage with a lot of clones we recommend using our caching plugin we have documented here: https://confluence.atlassian.com/display/STASH/Scaling+Stash+for+Continuous+Integration+performance
To come back to your question and to give you some rough ideas around numbers we have one internal system with the following specs:
This machine handles roughly 3500 git operations per hour. The average number of git operations is unfortunately not very useful as especially load peaks will determine the overall performance (for the timeframe I'm looking at we saw a peak of 11000 git ops per hour with the majority being ref advertisements). This is largely due to a large number of builds that run against this machine.
The number of concurrent clone operations is roughly between 14 and 30. The number of concurrent fetch operations around is 40. For both the variance is fairly high.
It's important to point out that the cache plugin is essential for this type of configuration to handle the load and the spikes we see.
Please don't take the absolute numbers too seriously, the usage profiles do vary a lot between companies and teams and even more if you add CI to the mix.
Hope this helps.
Cheers,
Stefan
Thanks for that, some good information in there.
One thing confuses me in your answer Stefan though is your saying a 8 core machine (4 with hyperthreading) is handling 14 to 30 concurrent clone operations but I thought the default was only to allow 1.5*core count. so 12 on that machine? Are you saying that the throttle.resource.scm-hosting
value can generally be increased to a higher multiple? or is that due to the cache plugin?
I'm not actually sure how to use http(s) as it requires entering the password every time. The credentials helper could help end users but I'm not sure how to use it for a build account running in the background as its only permanent storage option is plain text on Linux. Is there any way to get anonymous clone access for a specific user? The only mention I can see is an open ticket (STASH-2565). I can of course use SSH but if that's a performance issue...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
The higher level of concurrency is due to the cache plugin. Any clone or ref advertisement that is served from the cache does not count against the throttle.resource.scm-hosting limits. Those clones are considered to be nearly 'free' in terms of memory and CPU usage.
With regards to HTTP(s) and authentication. We don't have a mechanism yet that would allow you to anonymously clone. STASH-2565 is the issue you'd want to watch for that one, and/or perhaps https://jira.atlassian.com/browse/STASH-2722. If you're using a build server (Bamboo, Jenkins, etc.) you can configure the credentials in the build server.
Switching to SSH is an option, but the performance will suffer a bit. Also note that if you're planning to use the cache plugin, it only supports HTTP(S) at the moment.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi, clones don't and shouldn't appear that often. If they do, it can be slower. Most times you'd pull and/or push. Clone only occurs when a new developer is starting a project or being moved onto a project, etc... I don't see it being the norm.
Based on that, I'd go with an Intel Xeon E3 (that has 4 cores) and up to 32GB of RAM.
However, if you really do anticipate a lot of clones and the speed of this is vital, go with Intel Xeon E5 that has up to 8 cores per CPU and supports dual CPU.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for that.
Quick question: Why so much RAM? My understanding is Stash uses about a gig itself with 1.5 times the size of the repo for a clone. Even with the system overhead 32G seems a lot if youve got a maximum of 12 clones in operation at the same time.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Up to. You don't need the full amount. Just quoting what it can support in the future. You can probably get by with a lot less.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.