stage | group | description | info |
---|---|---|---|
Systems |
Distribution |
Recommended deployments at scale. |
To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments |
DETAILS: Tier: Free, Premium, Ultimate Offering: Self-managed
The GitLab Reference Architectures have been designed and tested by the GitLab Test Platform and Support teams to provide recommended scalable and elastic deployments as starting points for target loads.
The following Reference Architectures are available as recommended starting points for your environment.
The architectures are named in terms of peak load, based on user count or Requests per Second (RPS). Where the latter has been calculated based on average real data.
NOTE: Each architecture has been designed to be scalable and elastic. As such, they can be adjusted accordingly if required by your specific workload. This may be likely in known heavy scenarios such as using large monorepos or notable additional workloads.
For details about what each Reference Architecture has been tested against, see the "Testing Methodology" section of each page.
Below is the list of Linux package based reference architectures:
Below is a list of Cloud Native Hybrid reference architectures, where select recommended components can be run in Kubernetes:
The first choice to consider is whether a Self Managed approach is correct for you and your requirements.
Running any application in production is complex, and the same applies for GitLab. While we aim to make this as smooth as possible, there are still the general complexities. This depends on the design chosen, but typically you'll need to manage all aspects such as hardware, operating systems, networking, storage, security, GitLab itself, and more. This includes both the initial setup of the environment and the longer term maintenance.
As such, it's recommended that you have a working knowledge of running and maintaining applications in production when deciding on going down this route. If you aren't in this position, our Professional Services team offers implementation services, but for those who want a more managed solution long term, it's recommended to instead explore our other offerings such as GitLab SaaS or GitLab Dedicated.
If Self Managed is the approach you're considering, it's strongly encouraged to read through this page in full, in particular the Deciding which architecture to use, Large monorepos and Additional workloads sections.
The Reference Architectures are designed to strike a balance between three important factors--performance, resilience and costs.
While they are designed to make it easier to set up GitLab at scale, it can still be a challenge to know which one meets your requirements and where to start accordingly.
As a general guide, the more performant and/or resilient you want your environment to be, the more complex it is.
This section explains the things to consider when picking a Reference Architecture to start with.
The first thing to check is what the expected peak load is your environment would be expected to serve.
Each architecture is described in terms of peak Requests per Second (RPS) or user count load. As detailed under the "Testing Methodology" section on each page, each architecture is tested against its listed RPS for each endpoint type (API, Web, Git), which is the typical peak load of the given user count, both manual and automated.
It's strongly recommended finding out what peak RPS your environment will be expected to handle across endpoint types, through existing metrics (such as Prometheus) or estimates, and to select the corresponding architecture as this is the most objective.
If it's not possible for you to find out the expected peak RPS then it's recommended to select based on user count to start and then monitor the environment closely to confirm the RPS, whether the architecture is performing and scale accordingly as necessary.
For environments serving 2,000 or fewer users, we generally recommend a standalone approach by deploying a non-highly available single or multi-node environment. With this approach, you can employ strategies such as automated backups for recovery to provide a good level of RPO / RTO while avoiding the complexities that come with HA.
*[RTO]: Recovery time objective *[RPO]: Recovery point objective
With standalone setups, especially single node environments, there are various options available for installation and management including the ability to deploy directly via select cloud provider marketplaces that reduce the complexity a little further.
High Availability ensures every component in the GitLab setup can handle failures through various mechanisms. However, to achieve this is complex, and the environments required can be sizable.
For environments serving 3,000 or more users we generally recommend that a HA strategy is used as at this level outages have a bigger impact against more users. All the architectures in this range have HA built in by design for this reason.
As mentioned above, achieving HA does come at a cost. The environment requirements are sizable as each component needs to be multiplied, which comes with additional actual and maintenance costs.
For a lot of our customers with fewer than 3,000 users, we've found a backup strategy is sufficient and even preferable. While this does have a slower recovery time, it also means you have a much smaller architecture and less maintenance costs as a result.
In general then, we'd only recommend you employ HA in the following scenarios:
If you still need to have HA for a lower number of users, this can be achieved with an adjusted 3K architecture.
Zero-Downtime Upgrades are available for standard Reference Architecture environments with HA (Cloud Native Hybrid is not supported). This allows for an environment to stay up during an upgrade, but the process is more complex as a result and has some limitations as detailed in the documentation.
When going through this process it's worth noting that there may still be brief moments of downtime when the HA mechanisms take effect.
In most cases the downtime required for doing an upgrade shouldn't be substantial, so this is only recommended if it's a key requirement for you.
As an additional layer of HA resilience you can deploy select components in Kubernetes, known as a Cloud Native Hybrid Reference Architecture. For stability reasons, stateful components such as Gitaly cannot be deployed in Kubernetes.
This is an alternative and more advanced setup compared to a standard Reference Architecture. Running services in Kubernetes is well known to be complex. This setup is only recommended if you have strong working knowledge and experience in Kubernetes.
With GitLab Geo, you can achieve distributed environments in different regions with a full Disaster Recovery (DR) setup in place. GitLab Geo requires at least two separate environments:
If the primary site becomes unavailable, you can fail over to one of the secondary sites.
This advanced and complex setup should only be undertaken if DR is a key requirement for your environment. You must also make additional decisions on how each site is configured, such as if each secondary site would be the same architecture as the primary, or if each site is configured for HA.
If you have any large monorepos or significant additional workloads, these can affect the performance of the environment notably and adjustments may be required depending on the context.
If either applies to you, it's encouraged for you to reach out to your Customer Success Manager or our Support team for further guidance.
For all the previously described strategies, you can run select GitLab components on equivalent cloud provider services such as the PostgreSQL database or Redis.
For more information, see the recommended cloud providers and services.
Below you can find the above guidance in the form of a decision tree. It's recommended you read through the above guidance in full first before though.
Before implementing a reference architecture, refer to the following requirements and guidance.
The reference architectures are built and tested across various cloud providers, primarily GCP and AWS, with CPU targets being the lowest common denominator to ensure the widest range of compatibility:
Depending on other requirements such as memory or network bandwidth as well as cloud provider availability, different machine types are used accordingly throughout the architectures, but it is expected that the target CPUs above should perform well.
If you want, you can select a newer machine type series and have improved performance as a result.
Additionally, ARM CPUs are supported for Linux package environments as well as for any Cloud Provider services where applicable.
NOTE: Any "burstable" instance types are not recommended due to inconsistent performance.
As a general guidance, most standard disk types are expected to work for GitLab, but be aware of the following specific call-outs:
Outside the above standard, disk types are expected to work for GitLab and the choice of each depends on your specific requirements around areas, such as durability or costs.
As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP, Azure) and their services, or self-managed (ESXi) that meet both:
However, this does not constitute a guarantee for every potential permutation.
See Recommended cloud providers and services for more information.
The reference architectures were tested with repositories of varying sizes that follow best practices.
However, large monorepos (several gigabytes or more) can significantly impact the performance of Git and in turn the environment itself. Their presence, as well as how they are used, can put a significant strain on the entire system from Gitaly through to the underlying infrastructure.
WARNING: If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your Customer Success Manager or our Support team for further guidance.
As such, large monorepos come with notable cost. If you have such a repository we strongly recommend the following guidance is followed to ensure the best chance of good performance and to keep costs in check:
These reference architectures have been designed and tested for standard GitLab setups based on real data.
However, additional workloads can multiply the impact of operations by triggering follow-up actions. You may need to adjust the suggested specifications to compensate if you use, for example:
As a general rule, you should have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. It's also strongly encouraged for you to reach out to your Customer Success Manager or our Support team for further guidance.
The Reference Architectures make use of up to two Load Balancers depending on the class:
The specifics on which load balancer to use, or its exact configuration is beyond the scope of GitLab documentation. The most common options are to set up load balancers on machine nodes or to use a service such as one offered by Cloud Providers. If deploying a Cloud Native Hybrid environment the Charts can handle the set-up of the External Load Balancer via Kubernetes Ingress.
For each Reference Architecture class a base machine size has given to help get you started if you elect to deploy directly on machines, but these may need to be adjusted accordingly depending on the load balancer used and amount of workload. Of note machines can have varying network bandwidth that should also be taken into consideration.
Note the following sections of additional guidance for Load Balancers.
We recommend that a least-connection-based load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance.
We don’t recommend the use of round-robin algorithms as they are known to not spread connections equally in practice.
The total network bandwidth available to a load balancer when deployed on a machine can vary notably across Cloud Providers. In particular some Cloud Providers, like AWS, may operate on a burst system with credits to determine the bandwidth at any time.
The network bandwidth your environment's load balancers will require is dependent on numerous factors such as data shape and workload. The recommended base sizes for each Reference Architecture class have been selected based on real data but in some scenarios, such as consistent clones of large monorepos, the sizes may need to be adjusted accordingly.
Swap is not recommended in the reference architectures. It's a failsafe that impacts performance greatly. The reference architectures are designed to have enough memory in most cases to avoid needing swap.
Praefect requires its own database server and that to achieve full High Availability, a third-party PostgreSQL database solution is required.
We hope to offer a built-in solution for these restrictions in the future. In the meantime, a non-HA PostgreSQL server can be set up using the Linux package as the specifications reflect. Refer to the following issues for more information:
NOTE: The following lists are non-exhaustive. Generally, other cloud providers not listed here likely work with the same specs, but this hasn't been validated. Additionally, when it comes to other cloud provider services not listed here, it's advised to be cautious as each implementation can be notably different and should be tested thoroughly before production use.
Through testing and real life usage, the Reference Architectures are recommended on the following cloud providers:
Reference Architecture | GCP | AWS | Azure | Bare Metal |
---|---|---|---|---|
Linux package | 🟢 | 🟢 | 🟢1 | 🟢 |
Cloud Native Hybrid | 🟢 | 🟢 |
Additionally, the following cloud provider services are recommended for use as part of the Reference Architectures:
Cloud Service | GCP | AWS | Azure | Bare Metal |
---|---|---|---|---|
Object Storage | 🟢 Cloud Storage | 🟢 S3 | 🟢 Azure Blob Storage | 🟢 MinIO |
Database | 🟢 Cloud SQL1 | 🟢 RDS | 🟢 Azure Database for PostgreSQL Flexible Server | |
Redis | 🟢 Memorystore | 🟢 ElastiCache | 🟢 Azure Cache for Redis (Premium) |
When selecting to use an external database service, it should run a standard, performant, and supported version.
If you choose to use a third party external service:
When selecting to use an external Redis service, it should run a standard, performant, and supported version. Note that this specifically must not be run in Cluster mode as this is unsupported by GitLab.
Redis is primarily single threaded. For environments targeting up to 200 RPS / 10,000 users or higher, separate out the instances as specified into Cache and Persistent data to achieve optimum performance at this scale.
GitLab has been tested against various Object Storage providers that are expected to work.
As a general guidance, it's recommended to use a reputable solution that has full S3 compatibility.
Several database cloud provider services are known not to support the above or have been found to have other issues and aren't recommended:
As a general guideline, the further away you move from the reference architectures, the harder it is to get support for it. With any deviation, you're introducing a layer of complexity that adds challenges to finding out where potential issues might lie.
The reference architectures use the official Linux packages or Helm Charts to install and configure the various components. The components are installed on separate machines (virtualized or bare metal), with machine hardware requirements listed in the "Configuration" column and equivalent VM standard sizes listed in GCP/AWS/Azure columns of each available reference architecture.
Running components on Docker (including Docker Compose) with the same specs should be fine, as Docker is well known in terms of support.
However, it is still an additional layer and may still add some support complexities, such as not being able to run strace
easily in containers.
While we endeavour to try and have a good range of support for GitLab environment designs, there are certain approaches we know definitively not to work, and as a result are not supported. Those approaches are detailed in the following sections.
Running stateful components in Kubernetes, such as Gitaly Cluster, is not supported.
Gitaly Cluster is only supported on conventional virtual machines. Kubernetes enforces strict memory restrictions, but Git memory usage is unpredictable, which can cause sporadic OOM termination of Gitaly pods, leading to significant disruptions and potential data loss. For this reason and others, Gitaly is not tested or supported in Kubernetes. For more information, see epic 6127.
This also applies to other third-party stateful components such as Postgres and Redis, but you can explore other third-party solutions for those components if desired such as supported Cloud Provider services unless called out specifically as unsupported.
As a general guidance, only stateless components of GitLab can be run in Autoscaling groups, namely GitLab Rails and Sidekiq. Other components that have state, such as Gitaly, are not supported in this fashion (for more information, see issue 2997).
This also applies to other third-party stateful components such as Postgres and Redis, but you can explore other third-party solutions for those components if desired such as supported Cloud Provider services unless called out specifically as unsupported.
However, Cloud Native Hybrid setups are generally preferred over ASGs as certain components such as like database migrations and Mailroom can only be run on one node, which is handled better in Kubernetes.
Deploying one GitLab environment over multiple data centers is not supported due to potential split brain edge cases if a data center were to go down. For example, several components of the GitLab setup, namely Consul, Redis Sentinel and Praefect require an odd number quorum to function correctly and splitting over multiple data centers can impact this notably.
For deploying GitLab over multiple data centers or regions we offer GitLab Geo as a comprehensive solution.
The Test Platform team does regular smoke and performance tests for the reference architectures to ensure they remain compliant.
The Quality Department has a focus on measuring and improving the performance of GitLab, as well as creating and validating reference architectures that self-managed customers can rely on as performant configurations.
For more information, see our handbook page.
Testing occurs against all reference architectures and cloud providers in an automated and ad-hoc fashion. This is done by two tools:
Network latency on the test environments between components on all Cloud Providers were measured at <5 ms. This is shared as an observation and not as an implicit recommendation.
We aim to have a "test smart" approach where architectures tested have a good range that can also apply to others. Testing focuses on a 10k Linux package installation on GCP as the testing has shown this is a good bellwether for the other architectures and cloud providers as well as Cloud Native Hybrids.
The Standard Reference Architectures are designed to be platform-agnostic, with everything being run on VMs through the Linux package. While testing occurs primarily on GCP, ad-hoc testing has shown that they perform similarly on hardware with equivalent specs on other Cloud Providers or if run on premises (bare-metal).
Testing on these reference architectures is performed with the GitLab Performance Tool at specific coded workloads, and the throughputs used for testing are calculated based on sample customer data. Select the reference architecture that matches your scale.
Each endpoint type is tested with the following number of requests per second (RPS) per 1,000 users:
The above RPS targets were selected based on real customer data of total environmental loads corresponding to the user count, including CI and other workloads.
NOTE: Read our blog post on how our QA team leverages GitLab performance testing tool.
Testing is done publicly, and all results are shared.
The following table details the testing done against the reference architectures along with the frequency and results. Additional testing is continuously evaluated, and the table is updated accordingly.
<style> table.test-coverage td { border-top: 1px solid #dbdbdb; border-left: 1px solid #dbdbdb; border-right: 1px solid #dbdbdb; border-bottom: 1px solid #dbdbdb; } table.test-coverage th { border-top: 1px solid #dbdbdb; border-left: 1px solid #dbdbdb; border-right: 1px solid #dbdbdb; border-bottom: 1px solid #dbdbdb; } </style>Reference Architecture |
GCP (* also proxy for Bare-Metal) | AWS | Azure | |||
---|---|---|---|---|---|---|
Linux package | Cloud Native Hybrid | Linux package | Cloud Native Hybrid | Linux package | ||
1k | Weekly | |||||
2k | Weekly | Planned | ||||
3k | Weekly | Weekly | ||||
5k | Weekly | |||||
10k | Daily | Weekly | Weekly | Weekly | ||
25k | Weekly | |||||
50k | Weekly |
As a starting point, the following table lists initial compute calculator cost templates for the different reference architectures across GCP, AWS, and Azure via each cloud provider's official calculator.
However, please be aware of the following caveats:
To get an accurate estimate of costs for your specific environment you must take the closest template and adjust it accordingly to match the specs and your expected usage.
Reference Architecture |
GCP | AWS | Azure |
---|---|---|---|
Linux package | Linux package | Linux package | |
1k | Calculated cost | Calculated cost | Calculated cost |
2k | Calculated cost | Calculated cost | Calculated cost |
3k | Calculated cost | Calculated cost | Calculated cost |
5k | Calculated cost | Calculated cost | Calculated cost |
10k | Calculated cost | Calculated cost | Calculated cost |
25k | Calculated cost | Calculated cost | Calculate cost |
50k | Calculated cost | Calculated cost | Calculated cost |
Maintaining a Reference Architecture environment is generally the same as any other GitLab environment is generally covered in other sections of this documentation.
In this section you'll find links to documentation for relevant areas as well as any specific Reference Architecture notes.
The Reference Architectures have been designed as a starting point and are elastic and scalable throughout. It's more likely than not that you may want to adjust the environment for your specific needs after deployment for reasons such as additional performance capacity or reduced costs. This is expected and, as such, scaling can be done iteratively or wholesale to the next size of architecture depending on if metrics suggest a component is being exhausted.
NOTE: If you're seeing a component continuously exhausting it's given resources it's strongly recommended for you to reach out to our Support team before performing any scaling. This is especially so if you're planning to scale any component significantly.
For most components vertical and horizontal scaling can be applied as normal. However, before doing so though please be aware of the below caveats:
Conversely, if you have robust metrics in place that show the environment is over-provisioned, you can scale downwards similarly. You should take an iterative approach when scaling downwards, however, to ensure there are no issues.
In some cases scaling a component significantly may result in knock on effects for downstream components, impacting performance. The Reference Architectures were designed with balance in mind to ensure components that depend on each other are congruent in terms of specs. As such you may find when notably scaling a component that it's increase may result in additional throughput being passed to the other components it depends on and that they, in turn, may need to be scaled as well.
NOTE: The Reference Architectures have been designed to have elasticity to accommodate an upstream component being scaled. However, it's still generally recommended for you to reach out to our Support team before you make any significant changes to the environment to be safe.
The following components can impact others when they have been significantly scaled:
While in most cases vertical scaling is only required to increase an environment's resources, if you are moving to an HA environment additional steps will be required for the following components to switch over to their HA forms respectively by following the given documentation for each as follows
Upgrades for a Reference Architecture environment is the same as any other GitLab environment. The main Upgrade GitLab section has detailed steps on how to approach this.
Zero-downtime upgrades are also available.
NOTE: You should upgrade a Reference Architecture in the same order as you created it.
There are numerous options available to monitor your infrastructure, as well as GitLab itself, and you should refer to your selected monitoring solution's documentation for more information.
Of note, the GitLab application is bundled with Prometheus as well as various Prometheus compatible exporters that could be hooked into your solution.
Below is a history of notable updates for the Reference Architectures (2021-01-01 onward, ascending order), which we aim to keep updated at least once per quarter.
You can find a full history of changes on the GitLab project.
2024:
Cost to run
section to Cost calculator templates
to better reflect the calculators are only a starting point and need to be adjusted with specific usage to give more accurate cost estimates.2023:
2022:
15.6
onwards.500
.20
.15.6
.default
section from Gitaly storages config as it's required.2021:
external_url
setting.此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。