stage | group | info |
---|---|---|
Systems |
Distribution |
To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments |
DETAILS: Tier: Free, Premium, Ultimate Offering: Self-managed
With zero-downtime upgrades, it's possible to upgrade a live GitLab environment without having to take it offline. This guide will take you through the core process of performing such an upgrade.
At a high level, this process is done by sequentially upgrading GitLab nodes in a certain order, utilizing a combination of Load Balancing, HA systems and graceful reloads to minimize the disruption.
For the purposes of this guide it will only pertain to the core GitLab components where applicable. For upgrades or management of third party services, such as AWS RDS, please refer to the respective documentation.
Achieving true zero downtime as part of an upgrade is notably difficult for any distributed application. The process detailed in this guide has been tested as given against our HA Reference Architectures and was found to result in effectively no observable downtime, but please be aware your mileage may vary dependent on the specific system makeup.
For additional confidence some customers have found success via further techniques such as the manual draining of nodes via specific load balancer or infrastructure capabilities. These techniques depend greatly on the underlying infrastructure capabilities and as a result are not covered in this guide. For any additional information please reach out to your GitLab representative or the Support team.
The zero-downtime upgrade process has the following requirements:
/-/readiness
) endpoint.16.1
to 16.2
, not to 16.3
. If you skip releases, database modifications may be run in the wrong sequence and leave the database schema in a broken state.In addition to the above, please be aware of the following considerations:
16.3.2
to 16.4.1
should be safe even if 16.3.3
has been released. You should verify the
version specific upgrading instructions relevant to your upgrade path and be aware of any required upgrade stops.background_migration
queue. To see the size of this queue, check for background migrations before upgrading.NOTE: If you want to upgrade multiple releases or do not meet these requirements upgrades with downtime should be explored instead.
We recommend a "back to front" approach for the order of what components to upgrade with zero downtime. Generally this would be stateful backends first, their dependents next and then the frontends accordingly. While the order of deployment can be changed, it is best to deploy the components running GitLab application code (Rails, Sidekiq) together. If possible, upgrade the supporting infrastructure (PostgreSQL, PgBouncer, Consul, Gitaly, Praefect, Redis) separately since these components do not have dependencies on changes made in version updates within a major release. As such, we generally recommend the following order:
In this section we'll go through the core process of upgrading a multi-node GitLab environment by sequentially going through each as per the upgrade order and load balancers / HA mechanisms handle each node going down accordingly.
For the purposes of this guide we'll upgrade a 200 RPS or 10,000 Reference Architecture built with the Linux package.
The Consul, PostgreSQL, PgBouncer, and Redis components all follow the same underlying process to upgrading without downtime.
Run through the following steps sequentially on each component's node to perform the upgrade:
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Reconfigure and restart to get the latest code in place:
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
Gitaly follows the same core process when it comes to upgrading but with a key difference that the Gitaly process itself is not restarted as it has a built-in process to gracefully reload at the earliest opportunity. Note that any other component will still need to be restarted.
NOTE: The upgrade process attempts to do a graceful handover to a new Gitaly process. Existing long-running Git requests that were started before the upgrade may eventually be dropped as this handover occurs. In the future this functionality may be changed, refer to this Epic for more information.
This process applies to both Gitaly Sharded and Cluster setups. Run through the following steps sequentially on each Gitaly node to perform the upgrade:
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Run the reconfigure
command to get the latest code in place and to instruct Gitaly to gracefully reload at the next opportunity:
sudo gitlab-ctl reconfigure
Finally, while Gitaly will gracefully reload any other components that have been deployed, we will still need a restart:
# Get a list of what other components have been deployed beside Gitaly
sudo gitlab-ctl status
# Restart each component except Gitaly. Example given for Consul, Node Exporter and Logrotate
sudo gitlab-ctl restart consul node-exporter logrotate
For Gitaly Cluster setups, Praefect will be deployed and needs to be upgraded in similar fashion via a graceful reload.
NOTE: The upgrade process attempts to do a graceful handover to a new Praefect process. Existing long-running Git requests that were started before the upgrade may eventually be dropped as this handover occurs. In the future this functionality may be changed, refer to this Epic for more information.
One additional step though for Praefect is that it will also need to run through its database migrations to upgrade its data. Migrations need to be run on only one Praefect node to avoid clashes. This is best done by selecting one of the nodes to be a deploy node. This target node will be configured to run migrations while the rest are not. We'll refer to this as the Praefect deploy node below:
On the Praefect deploy node:
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
,
which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Ensure that praefect['auto_migrate'] = true
is set in /etc/gitlab/gitlab.rb
so that database migrations run.
Run the reconfigure
command to get the latest code in place, apply the Praefect database migrations and restart gracefully:
sudo gitlab-ctl reconfigure
On all remaining Praefect nodes:
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
,
which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Ensure that praefect['auto_migrate'] = false
is set in /etc/gitlab/gitlab.rb
to prevent
reconfigure
from automatically running database migrations.
Run the reconfigure
command to get the latest code in place as well as restart gracefully:
sudo gitlab-ctl reconfigure
Finally, while Praefect will gracefully reload, any other components that have been deployed will still need a restart. On all Praefect nodes:
# Get a list of what other components have been deployed beside Praefect
sudo gitlab-ctl status
# Restart each component except Praefect. Example given for Consul, Node Exporter and Logrotate
sudo gitlab-ctl restart consul node-exporter logrotate
Rails as a webserver consists primarily of Puma, Workhorse, and NGINX.
Each of these components have different behaviours when it comes to doing a live upgrade. While Puma can allow for a graceful reload, Workhorse doesn't. As such, the best approach is to drain the node gracefully through other means such as via your Load Balancer. It's also possible to do this via NGINX on the node through its graceful shutdown functionality. In this section we'll use the NGINX approach.
In addition to the above, Rails is where the main database migrations need to be executed. Like Praefect, this is best done via the deploy node approach. If PgBouncer is currently being used, it also needs to be bypassed as Rails uses an advisory lock when attempting to run a migration to prevent concurrent migrations from running on the same database. These locks are not shared across transactions, resulting in ActiveRecord::ConcurrentMigrationError
and other issues when running database migrations using PgBouncer in transaction pooling mode.
On the Rails deploy node:
Drain the node of traffic gracefully. This can be done in various ways, but one approach is via
NGINX by sending it a QUIT
signal and then stopping the service. As an example this could be
done via the following shell script:
# Send QUIT to NGINX master process to drain and exit
NGINX_PID=$(cat /var/opt/gitlab/nginx/nginx.pid)
kill -QUIT $NGINX_PID
# Wait for drain to complete
while kill -0 $NGINX_PID 2>/dev/null; do sleep 1; done
# Stop NGINX service to prevent automatic restarts
gitlab-ctl stop nginx
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Configure regular migrations to by setting gitlab_rails['auto_migrate'] = true
in the
/etc/gitlab/gitlab.rb
configuration file.
sudo gitlab-ctl patroni members
.Run the regular migrations and get the latest code in place:
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-ctl reconfigure
Leave this node as-is for now as you'll come back to run post-deployment migrations later.
On every other Rails node sequentially:
Drain the node of traffic gracefully. This can be done in various ways, but one approach is via
NGINX by sending it a QUIT
signal and then stopping the service. As an example this could be
done via the following shell script:
# Send QUIT to NGINX master process to drain and exit
NGINX_PID=$(cat /var/opt/gitlab/nginx/nginx.pid)
kill -QUIT $NGINX_PID
# Wait for drain to complete
while kill -0 $NGINX_PID 2>/dev/null; do sleep 1; done
# Stop NGINX service to prevent automatic restarts
gitlab-ctl stop nginx
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Ensure that gitlab_rails['auto_migrate'] = false
is set in /etc/gitlab/gitlab.rb
to prevent
reconfigure
from automatically running database migrations.
Run the reconfigure
command to get the latest code in place as well as restart:
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
On the Rails deploy node run the post-deployment migrations:
Ensure the deploy node is still pointing at the database leader directly. If the node is currently going through PgBouncer to reach the database then you must bypass it and connect directly to the database leader before running migrations.
sudo gitlab-ctl patroni members
.Run the post-deployment migrations:
sudo gitlab-rake db:migrate
Return the config back to normal by setting gitlab_rails['auto_migrate'] = false
in the
/etc/gitlab/gitlab.rb
configuration file.
Run through reconfigure once again to reapply the normal config as well as restart:
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
Sidekiq follows the same underlying process as others to upgrading without downtime.
Run through the following steps sequentially on each component node to perform the upgrade:
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Run the reconfigure
command to get the latest code in place as well as restart:
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
DETAILS: Tier: Premium, Ultimate Offering: Self-managed
This section describes the steps required to upgrade live GitLab environment deployment with Geo.
Overall, the approach is largely the same as the normal process with some additional steps required for each secondary site. The required order is upgrading the primary first, then the secondaries. You must also run any post-deployment migrations on the primary after all secondaries have been updated.
NOTE: The same requirements and consideration apply for upgrading a live GitLab environment with Geo.
The upgrade process for the Primary site is the same as the normal process with one exception being not to run the post-deployment migrations until after all the secondaries have been updated.
Run through the same steps for the Primary site as described but stopping at the Rails node step of running the post-deployment migrations.
The upgrade process for any Secondary sites follow the same steps as the normal process except for the Rails nodes where several additional steps are required as detailed below.
To upgrade the site proceed through the normal process steps as normal until the Rails node and instead follow the steps below:
On the Rails deploy node:
Drain the node of traffic gracefully. This can be done in various ways, but one approach is via
NGINX by sending it a QUIT
signal and then stopping the service. As an example this could be
done via the following shell script:
# Send QUIT to NGINX master process to drain and exit
NGINX_PID=$(cat /var/opt/gitlab/nginx/nginx.pid)
kill -QUIT $NGINX_PID
# Wait for drain to complete
while kill -0 $NGINX_PID 2>/dev/null; do sleep 1; done
# Stop NGINX service to prevent automatic restarts
gitlab-ctl stop nginx
Stop the Geo Logcursor process to ensure it fails over to another node:
gitlab-ctl stop geo-logcursor
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Copy the /etc/gitlab/gitlab-secrets.json
file from the primary site Rails node to the secondary site Rails node if they're different. The file must be the same on all of a site's nodes.
Ensure no migrations are configured to be run automatically by setting gitlab_rails['auto_migrate'] = false
and geo_secondary['auto_migrate'] = false
in the
/etc/gitlab/gitlab.rb
configuration file.
Run the reconfigure
command to get the latest code in place as well as restart:
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
Run the regular Geo Tracking migrations and get the latest code in place:
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate:geo
On every other Rails node sequentially:
Drain the node of traffic gracefully. This can be done in various ways, but one approach is via
NGINX by sending it a QUIT
signal and then stopping the service. As an example this could be
done via the following shell script:
# Send QUIT to NGINX master process to drain and exit
NGINX_PID=$(cat /var/opt/gitlab/nginx/nginx.pid)
kill -QUIT $NGINX_PID
# Wait for drain to complete
while kill -0 $NGINX_PID 2>/dev/null; do sleep 1; done
# Stop NGINX service to prevent automatic restarts
gitlab-ctl stop nginx
Stop the Geo Logcursor process to ensure it fails over to another node:
gitlab-ctl stop geo-logcursor
Create an empty file at /etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from running gitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
sudo touch /etc/gitlab/skip-auto-reconfigure
Ensure no migrations are configured to be run automatically by setting gitlab_rails['auto_migrate'] = false
and geo_secondary['auto_migrate'] = false
in the
/etc/gitlab/gitlab.rb
configuration file.
Run the reconfigure
command to get the latest code in place as well as restart:
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
Following the main process all that's left to be done now is to upgrade Sidekiq.
Upgrade Sidekiq in the same manner as described in the main section.
Finally, head back to the primary site and finish the upgrade by running the post-deployment migrations:
On the Primary site's Rails deploy node run the post-deployment migrations:
Ensure the deploy node is still pointing at the database leader directly. If the node is currently going through PgBouncer to reach the database then you must bypass it and connect directly to the database leader before running migrations.
sudo gitlab-ctl patroni members
.Run the post-deployment migrations:
sudo gitlab-rake db:migrate
Verify Geo configuration and dependencies
sudo gitlab-rake gitlab:geo:check
Return the config back to normal by setting gitlab_rails['auto_migrate'] = false
in the
/etc/gitlab/gitlab.rb
configuration file.
Run through reconfigure once again to reapply the normal config as well as restart:
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
On the Secondary site's Rails deploy node run the post-deployment Geo Tracking migrations:
Run the post-deployment Geo Tracking migrations:
sudo gitlab-rake db:migrate:geo
Verify Geo status:
sudo gitlab-rake geo:status
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。