Use Velero to migrate cluster resources across cloud platforms to TKE

Use Velero to migrate cluster resources across cloud platforms to TKE

Author: Li Quanjiang (jokey), Tencent Cloud Engineer, passionate about cloud native field. At present, he is mainly responsible for the technical support of Tencent Cloud TKE in sales and after-sales, and outputs reasonable technical solutions and best practices according to customer needs.

Overview

Velero is a very powerful open source tools, you can safely backup and restore, perform disaster recovery and migration Kubernetes cluster resources and persistent volumes, you can use Velero backup on TKE platform, restore and migrate cluster resources on how to refer to the use of objects COS as Velero storage cluster resources to achieve storage backup and restore and use Velero copy cluster resource migration in the TKE , this article describes how to use Velero to self or other cloud platforms to seamlessly migrate Kubernetes cluster TKE platform.

Principle of Migration

Architecture principles and use Velero cluster resources migrate to copy a similar principle process, migration clusters and clusters have been migrated to install Velero instance, and specify the same COS Tencent cloud back-end storage, the cluster is migrated to perform backups on demand, on demand reduction target cluster cluster Resources realize resource migration. The difference is that when migrating cluster resources from self-built or other cloud platforms to TKE, we need to consider and solve the problem of cluster environment differences caused by cross-platform. Fortunately, Velero provides many practical backup and restore strategies to help us solve these problems. , The following migration examples will introduce how to make better use and use of them.

Environmental preparation

  • There are self-built or other cloud platform Kubernetes clusters (hereinafter referred to as cluster A), and the cluster version is 1.10 or higher.
  • TKE has been created are migrating clusters (hereinafter referred to as cluster B), TKE create a cluster See create the cluster .
  • A cluster B and cluster are required to install Velero instance (version 1.5 or more), and share the same buckets as COS Tencent cloud Velero backend storage, see the installation step configuration storage, and installation Velero .
  • Ensure that image resources can be pulled normally after migration.
  • Ensure that the APIs of the K8S versions of the two clusters are compatible, preferably the same version.

Migration guidance

Before the migration work, you should first clarify the migration ideas and formulate a detailed migration plan. The migration process probably needs to consider the following points:

  • Filter and analyze which cluster resources need to be migrated and which cluster resources do not need to be migrated

    Filter and classify the list of resources that need to be migrated and those that do not need to be migrated according to the actual situation.

  • Consider whether you need to customize some hook operations according to the business scenario

    Need to consider when backing up cluster resources, the need to be performed during backup backup Hooks , such as the need memory data off the disk application scenarios will be running.

    Similarly, when you restore (migration) cluster resources, reducing the need to perform during the reduction Hooks , such as the need to do some initialization before the reduction.

  • Compile backup and restore commands or resource lists on demand

    Compile backup and restoration strategies based on the resource list filtered and classified. It is recommended to use the method of creating resource lists to perform backup and restoration in complex scenarios. The YAML resource list is intuitive and easy to maintain. The parameter specification method can be used in simple migration scenarios or tests When used.

  • Handling the differences in resources across cloud platforms

    Because it is a cross-cloud platform, the storage class relationship of dynamically created PVC may be different. You need to plan in advance whether the dynamic PVC/PV storage class relationship needs to be remapped, and you need to create related mapping before the restore operation.

    ConfigMap
    Configuration. If you need to solve a more personalized difference, you can manually modify the backup resource list to solve it.

  • Check the migration resources after the operation is completed

    Check whether the migrated cluster resources meet expectations and the data is complete and available.

Steps

Next, I will demonstrate the operation steps of migrating resources in a cloud platform cluster A to TKE cluster B, which involves the practical basic knowledge of Velero backup and restore. If you don t understand, please move to the end of the article. [Velero backup/restore practical knowledge ] View.

Create a sample resource for cluster A

Deploy the Nginx workload with PVC in the Velero example in a cloud platform cluster A. For convenience, use the dynamic storage class to create PVC and PV directly. 1. check the dynamic storage class information supported by the current cluster:

# Get the storage class information supported by the current cluster, where xxx-StorageClass is the storage class code name, and xxx-Provider is the provider code name, the same below. [root@iZj6c3vzs170hmeiu98h5aZ ~]# kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE xxx-StorageClass xxx-Provider Delete Immediate true 3d3h ... Copy code

Use the storage class named "xxx-StorageClass" in the cluster to dynamically create, modify the PVC resource list of with-pv.yaml as shown below:

... --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: nginx-logs namespace: nginx-example labels: app: nginx spec: # Optional: Modify the storage class value of PVC to a certain cloud platform storageClassName: xxx-StorageClass accessModes: -ReadWriteOnce resources: requests: storage: 20Gi # Since the cloud platform limits storage to a minimum of 20Gi, this example needs to be synchronized to modify this value to 20Gi ... Copy code

After the modification is completed, the YAML in the application example creates the following cluster resources (nginx-example namespace):

[root@iZj6c3vzs170hmeiu98h5aZ nginx-app]# kubectl apply -f with-pv.yaml namespace/nginx-example created persistentvolumeclaim/nginx-logs created deployment.apps/nginx-deployment created service/my-nginx created Copy code

The created PVC "nginx-logs" has been mounted to the nginx container

/var/log/nginx
The directory is used as the log storage of the service. This example uses the browser to test access to the Nginx service, and produces some log data for the mounted PVC (for data comparison after subsequent restoration).

# View the size of the Nginx log generated by the test, currently 84 K [root@iZj6c8ttj5dmmrs75yb7ybZ ~]# kubectl exec -it nginx-deployment-5ccc99bffb-6nm5w bash -n nginx-example kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] - [COMMAND] Defaulting container name to nginx. Use'kubectl describe pod/nginx-deployment-5ccc99bffb-6nm5w -n nginx-example' to see all of the containers in this pod root@nginx-deployment-5ccc99bffb-6nm5w:/# du -sh/var/log/nginx/ 84K/var/log/nginx/ # View the first two logs of accss.log and error.log root@nginx-deployment-5ccc99bffb-6nm5w:/# head -n 2/var/log/nginx/access.log 192.168.0.73--[29/Dec/2020:03:02:31 +0000] "GET/?spm=5176.2020520152.0.0.22d016ddHXZumX HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" "-" 192.168.0.73--[29/Dec/2020:03:02:32 +0000] "GET/favicon.ico HTTP/1.1" 404 555 "http://47.242.233.22/?spm=5176.2020520152.0.0.22d016ddHXZumX" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" "-" root@nginx-deployment-5ccc99bffb-6nm5w:/# head -n 2/var/log/nginx/error.log 2020/12/29 03:02:32 [error] 6#6: *597 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: 192.168 .0.73, server: localhost, request: "GET/favicon.ico HTTP/1.1", host: "47.242.233.22", referrer: "http://47.242.233.22/?spm=5176.2020520152.0.0.22d016ddHXZumX" 2020/12/29 03:07:21 [error] 6#6: *1172 open() "/usr/share/nginx/html/0bef" failed (2: No such file or directory), client: 192.168.0.73 , server: localhost, request: "GET/0bef HTTP/1.0" Copy code

Confirm the list of resources that need to be migrated

Use the following command to output a list of all resources in the current cluster:

kubectl api-resources --verbs = list -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found --all-namespaces copy the code

You can also narrow the scope of the output resources according to whether the resources are distinguished from the namespace:

  • View the list of resources that do not distinguish between namespaces:

    kubectl api-resources --namespaced = false --verbs = list -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found copy the code
  • View the list of resources distinguishing namespaces:

    kubectl api-resources --namespaced = true --verbs = list -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found --all-namespaces copy the code

You can filter out the list of resources that need to be migrated according to the actual situation. This example will directly migrate the resources related to the Nginx workload under the "nginx-example" namespace from the cloud platform to the TKE platform. The related resources are as follows:

[root@iZj6c3vzs170hmeiu98h5aZ ~]# kubectl get all -n nginx-example NAME READY STATUS RESTARTS AGE pod/nginx-deployment-5ccc99bffb-tn2sh 2/2 Running 0 2d19h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/my-nginx LoadBalancer 172.21.1.185 xxxx 80:31455/TCP 2d19h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx-deployment 1/1 1 1 2d19h NAME DESIRED CURRENT READY AGE replicaset.apps/nginx-deployment-5ccc99bffb 1 1 1 2d19h [root@iZj6c3vzs170hmeiu98h5aZ ~]# kubectl get pvc -n nginx-example NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE nginx-logs Bound d-j6ccrq4k1moziu1l6l5r 20Gi RWO xxx-StorageClass 2d19h [root@iZj6c3vzs170hmeiu98h5aZ ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE d-j6ccrq4k1moziu1l6l5r 20Gi RWO Delete Bound nginx-example/nginx-logs xxx-StorageClass 2d19h Copy code

Confirm Hook strategy

In this example, the file system is set to read-only before backing up the Nginx workload in with-pv.yaml , and the read-write Hook policy is restored after the backup, as shown in the following YAML:

... annotations: # The annotation of the backup Hook strategy means: before starting the backup, set the nginx log directory to read-only mode, and restore the read-write mode after the backup is completed pre.hook.backup.velero.io/container: fsfreeze pre.hook.backup.velero.io/command:'["/sbin/fsfreeze", "--freeze", "/var/log/nginx"]' post.hook.backup.velero.io/container: fsfreeze post.hook.backup.velero.io/command:'["/sbin/fsfreeze", "--unfreeze", "/var/log/nginx"]' spec: volumes: -name: nginx-logs persistentVolumeClaim: claimName: nginx-logs containers: -image: nginx:1.17.6 name: nginx ports: -containerPort: 80 volumeMounts: -mountPath: "/var/log/nginx" name: nginx-logs readOnly: false -image: ubuntu:bionic name: fsfreeze securityContext: privileged: true volumeMounts: -mountPath: "/var/log/nginx" name: nginx-logs ... Copy code

Start the migration operation

Next, write a backup and restore strategy according to the actual situation, and start the migration of the Nginx workload-related resources of the cloud platform.

Perform a backup in cluster A

This example creates the following YAML to back up the resources that you want to migrate:

apiVersion: velero.io/v1 kind: Backup metadata: name: migrate-backup # Must be the namespace installed by velero namespace: velero spec: # Only include resources in the nginx-example namespace includedNamespaces: -nginx-example # Contains resources that do not distinguish between namespaces includeClusterResources: true # Specify the storage location of the backup data storageLocation: default # Volume snapshot storage location designation volumeSnapshotLocations: -default # Use restic to back up the volume defaultVolumesToRestic: true Copy code

The execution of the backup process is as follows, when the backup status is "Completed" and the number of errors is 0, the backup process is complete and correct:

[root@iZj6c8ttj5dmmrs75yb7ybZ ~]# kubectl apply -f backup.yaml backup.velero.io/migrate-backup created [root@iZj6c8ttj5dmmrs75yb7ybZ ~]# velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR migrate-backup InProgress 0 0 2020-12-29 19:24:12 +0800 CST 29d default <none> [rootftiZi6c8tti5dmmrs75yb7vbZ ~1# velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR migrate-backup Completed 0 0 2020-12-29 19:24:28 +0800 CST 29d default <none> Copy code

After the backup is completed, temporarily update the backup storage location to read-only mode (not necessary, this prevents Velero from creating or deleting backup objects in the backup storage location during the restore process):

kubectl patch backupstoragelocation default --namespace velero/ --type merge/ --patch'{"spec":{"accessMode":"ReadOnly"}}' Copy code

Handling the differences in resources across cloud platforms

  1. Due to the differences in the dynamic storage classes used, the following ConfigMap is needed to create a dynamic storage class name mapping for the persistent volume "nginx-logs":
apiVersion: v1 kind: ConfigMap metadata: name: change-storage-class-config namespace: velero labels: velero.io/plugin-config: "" velero.io/change-storage-class: RestoreItemAction data: # The storage class name is mapped to the Tencent Cloud dynamic storage class cbs xxx-StorageClass: cbs Copy code

Apply the above

ConfigMap
Configuration:

[root@VM-20-5-tlinux ~]# kubectl apply -f cm-storage-class.yaml configmap/change-storage-class-config created Copy code
  1. The list of resources backed up by Velero is based on
    json
    The format is stored in the object storage. If you have a more personalized migration requirement, you can directly download the backup file and customize it. This example will add a "jokey-test:jokey-test" annotation to the Nginx deployment resource customization, modify it The process is as follows:
jokey@JOKEYLI-MB0 Downloads% mkdir migrate-backup # Unzip the backup file jokey@JOKEYLI-MB0 Downloads% tar -zxvf migrate-backup.tar.gz -C migrate-backup # Edit and modify the resource you want to customize. This example adds the annotation item "jokey-test": "jokey-test" to the deployment resource of nginx jokey@JOKEYLI-MB0 migrate-backup% cat resources/deployments.apps/namespaces/nginx-example/nginx-deployment.json {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{"jokey-test":"jokey-test",... # Repackage the modified backup file jokey@JOKEYLI-MB0 migrate-backup% tar -zcvf migrate-backup.tar.gz * Copy code

After completing custom modification and repackaging, upload and replace the original backup file:

Perform a restore in cluster B

This example uses the resource list shown below to perform a restore operation (migration):

apiVersion: velero.io/v1 kind: Restore metadata: name: migrate-restore namespace: velero spec: backupName: migrate-backup includedNamespaces: -nginx-example # Fill in the type of resource to be restored as required. There is no resource to be excluded under the nginx-example namespace, so write'*' directly here includedResources: -'*' includeClusterResources: null # Resources not included when restoring, the StorageClasses resource type is additionally excluded here. excludedResources: -storageclasses.storage.k8s.io # Use the labelSelector selector to select the resource with a specific label. Since this example does not need to use the label selector to filter, please make a note here. # labelSelector: # matchLabels: # app: nginx # Set namespace relationship mapping strategy namespaceMapping: nginx-example: default restorePVs: true Copy code

The execution of the restoration process is as follows, when the restoration status is displayed as "Completed" and the number of "errors" is 0, it means the restoration process is complete and correct:

[root@VM-20-5-tlinux ~]# kubectl apply -f restore.yaml restore.velero.io/migrate-restore created [root@VM-20-5-tlinux ~]# velero restore get NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR migrate-restore migrate-backup Completed 2021-01-12 20:39:14 +0800 CST 2021-01-12 20:39:17 +0800 CST 0 0 2021-01-12 20:39:14 +0800 CST <none > Copy code

Migration resource verification

  1. First check whether the running status of the migrated resource is normal.

    # Since the "nginx-example" namespace is mapped to the "default" namespace when restoring, the restored resource will run under the "default" namespace [root@VM-20-5-tlinux ~]# kubectl get all -n default NAME READY STATUS RESTARTS AGE pod/nginx-deployment-5ccc99bffb-6nm5w 2/2 Running 0 49s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-user LoadBalancer 172.16.253.216 10.0.0.28 443:30060/TCP 8d service/kubernetes ClusterIP 172.16.252.1 <none> 443/TCP 8d service/my-nginx LoadBalancer 172.16.254.16 xxxx 80:30840/TCP 49s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx-deployment 1/1 1 1 49s NAME DESIRED CURRENT READY AGE replicaset.apps/nginx-deployment-5ccc99bffb 1 1 1 49s Copy code
  2. It can be seen from the above that the running status of the migrated resources is normal. Next, check whether the set restoration strategy is successful.

    • Check whether the dynamic storage class name mapping is correct:

      # You can see that the storage class of PVC/PV is already "cbs", indicating that the storage class mapping is successful. [root@VM-20-5-tlinux ~]# kubectl get pvc -n default NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE nginx-logs Bound pvc-bcc17ccd-ec3e-4d27-bec6-b0c8f1c2fa9c 20Gi RWO cbs 55s [root@VM-20-5-tlinux ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-bcc17ccd-ec3e-4d27-bec6-b0c8f1c2fa9c 20Gi RWO Delete Bound default/nginx-logs cbs 57s Copy code
    • Check whether the custom added "jokey-test" annotation for "deployment.apps/nginx-deployment" before restoration is successful:

      # Get the comment "jokey-test" successfully, indicating that the custom modification of the resource is successful. [root@VM-20-5-tlinux ~]# kubectl get deployment.apps/nginx-deployment -o custom-columns=annotations:.metadata.annotations.jokey-test annotations jokey-test Copy code
    • From the above view of the resource running status, it can be seen that the namespace mapping configuration is also successful.

  3. Check whether the PVC data mounted by the workload is successfully migrated:

    # View the size of the data in the mounted PVC data directory. It shows that it is 88K more than before the migration. The reason is that Tencent Cloud CLB actively initiates a health check and generates some logs. [root@VM-20-5-tlinux ~]# kubectl exec -it nginx-deployment-5ccc99bffb-6nm5w -n default - bash Defaulting container name to nginx. Use'kubectl describe pod/nginx-deployment-5ccc99bffb-6nm5w -n default' to see all of the containers in this pod. root@nginx-deployment-5ccc99bffb-6nm5w:/# du -sh/var/log/nginx 88K/var/log/nginx # View the first two log information, which is the same as before the migration, which roughly indicates that the PVC data is not lost root@nginx-deployment-5ccc99bffb-6nm5w:/# head -n 2/var/log/nginx/access.log 192.168.0.73--[29/Dec/2020:03:02:31 +0000] "GET/?spm=5176.2020520152.0.0.22d016ddHXZumX HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" "-" 192.168.0.73--[29/Dec/2020:03:02:32 +0000] "GET/favicon.ico HTTP/1.1" 404 555 "http://47.242.233.22/?spm=5176.2020520152.0.0.22d016ddHXZumX" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" "-" root@nginx-deployment-5ccc99bffb-6nm5w:/# head -n 2/var/log/nginx/error.log 2020/12/29 03:02:32 [error] 6#6: *597 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: 192.168 .0.73, server: localhost, request: "GET/favicon.ico HTTP/1.1", host: "47.242.233.22", referrer: "http://47.242.233.22/?spm=5176.2020520152.0.0.22d016ddHXZumX" 2020/12/29 03:07:21 [error] 6#6: *1172 open() "/usr/share/nginx/html/0bef" failed (2: No such file or directory), client: 192.168.0.73 , server: localhost, request: "GET/0bef HTTP/1.0" Copy code

In summary, this example successfully migrated Nginx (nginx-example namespace) workload-related resources and data of a cloud platform cluster A to TKE cluster B (default namespace).

summary

This example explains and demonstrates the ideas and method steps for migrating common cluster resources to TKE. If you encounter scenarios that are not covered during the actual migration process, you are welcome to consult and discuss migration solutions.

Velero backup/restore practical knowledge

Velero provides many very useful backup and restore strategies, the following is a brief summary:

  • When no filtering options are used, Velero will include all objects in the backup or restore operation. During backup and restore , you can specify parameters to filter resources as needed:

    Containing relationship filtering parameters:

    • --include-resources
      : Specify the list of resource objects to be included.
    • --include-namespaces
      : Specify the list of namespaces to be included.
    • --include-cluster-resources
      : Specify whether to include cluster resources.
    • --selector
      : Specify to include resources that match the tag selector.

    Filter parameters that do not contain relations:

    • --exclude-namespaces
      : Specify a list of namespaces to be excluded
    • --exclude-resources
      : Specify a list of resource objects to be excluded.
    • velero.io/exclude-from-backup=true
      : This configuration item configures the label attribute for the resource object, and the resource object to which this label configuration item is added will be excluded.

    Please refer to filter resources .

  • In during the backup execution Hook Some operations, such as the need in the memory before backing up data off the disk, please refer to the backup Hooks .

  • In the period of reduction to perform some operations Hook, such as before the reduction is available to determine dependent components, please refer to the reduction Hooks .

  • In restore configuration PVC/PV Correlation configuration mapping relationship:

    Please refer to restore reference .

  • Restic backup volume configuration

    Starting from version 1.5, Velero uses Restic to back up all pod volumes by default, instead of annotating each pod individually, so it is recommended to use Velero 1.5 or higher .

    Before version 1.5, when Velero used restic to back up volumes, Restic had two ways to find the Pod volumes that needed to be backed up :

    • The used Pod volume backup selection contains comments (default):

      kubectl -n <YOUR_POD_NAMESPACE> annotate <pod/YOUR_POD_NAME> backup.velero.io/backup-volumes=<YOUR_VOLUME_NAME_1,YOUR_VOLUME_NAME_2,...>Copy code
    • The Pod volume backup selection used does not contain comments:

      kubectl -n <YOUR_POD_NAMESPACE> annotate <pod /YOUR_POD_NAME> backup.velero.io/backup-volumes-excludes=<YOUR_VOLUME_NAME_1,YOUR_VOLUME_NAME_2, ...> copy the code

    After the backup is complete, you can view the backup volume information:

    kubectl -n velero get podvolumebackups -l velero.io/backup-name= < YOUR_BACKUP_NAME> -o yaml copy the code

    After the restoration is complete, you can view the restored volume information:

    kubectl -n velero get podvolumerestores -l velero.io/restore-name= < YOUR_RESTORE_NAME> -o yaml copy the code
  • In addition to performing a backup operation using Velero command can also be triggered creating a backup resource (recommended) , configuration examples refer to backup examples , detailed API reference field definitions can backup API definitions .

  • In addition to performing Velero command to perform a restore operation, can also be triggered to create a restore resources (recommended) , the reference configuration examples refer restore examples , detailed API reference field definitions can restore API definitions .

  • If there are differences in the configuration of other personalized resources such as annotations, labels, etc., you can manually edit the backup josn resource manifest file before restoring.