003-OpenShift Web运维

OpenShift Web控制台部署运维实战总结

1. 概述

OpenShift Web控制台是OpenShift容器平台的核心管理界面,提供:

  • 集群资源的可视化管理和监控
  • 应用部署、扩展和生命周期管理
  • 开发人员自助服务门户
  • 多集群管理能力(4.6+版本)

2. 部署实战

2.1 前置条件

1
2
3
4
5
6
# 验证集群状态
oc get clusterversion
oc get nodes

# 检查控制台Operator状态
oc get clusteroperator console

2.2 部署流程

  1. Operator安装(若未自动安装):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: console-operator
namespace: openshift-console
spec:
targetNamespaces:
- openshift-console
EOF

oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: console-operator
namespace: openshift-console
spec:
channel: stable
name: console-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
  1. 自定义配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# console-config.yaml
apiVersion: operator.openshift.io/v1
kind: Console
metadata:
name: cluster
spec:
customization:
# 自定义企业主题
customLogoFile:
key: logo.svg
name: custom-logo
customProductName: "企业容器云平台"
# 控制台超时配置(分钟)
logoutRedirect: "https://example.com/logout"
providers:
# 配置OIDC身份提供者
oidc:
- clientID: console-app
clientSecret:
name: console-oidc-secret
issuer: https://sso.example.com/auth/realms/master
  1. 应用配置
1
oc apply -f console-config.yaml

2.3 验证部署

1
2
3
4
5
6
7
8
# 检查Pod状态
oc get pods -n openshift-console

# 验证服务访问
curl -kI https://$(oc get route console -n openshift-console -o jsonpath='{.spec.host}')

# 检查Operator日志
oc logs -n openshift-console deployment/console-operator

3. 配置管理

3.1 身份认证集成

认证方式 配置文件位置 关键参数
OAuth/OIDC oauth.config.openshift.io issuerURL, clientSecret
HTPasswd htpasswd Secret username, password
LDAP ldap ConfigMap url, bindDN, bindPassword

3.2 自定义主题

  1. 创建ConfigMap包含自定义资源:
1
2
3
4
5
oc create configmap custom-branding \
--from-file=logo=logo.svg \
--from-file=favicon.ico=favicon.ico \
--from-file=stylesheet.css=styles.css \
-n openshift-console
  1. 更新Console CRD:
1
2
3
4
5
6
7
8
spec:
customization:
customLogoFile:
key: logo
name: custom-branding
customStylesheet:
key: stylesheet.css
name: custom-branding

3.3 插件扩展

1
2
3
4
5
6
7
8
9
10
apiVersion: console.openshift.io/v1
kind: ConsolePlugin
metadata:
name: custom-plugin
spec:
displayName: "监控仪表板"
service:
name: console-plugin
port: 9443
basePath: /

4. 日常运维

4.1 监控与告警

关键监控指标:

  • console_http_requests_total:HTTP请求量
  • console_http_request_duration_seconds:响应延迟
  • container_memory_usage_bytes:容器内存使用
  • kube_pod_status_ready:Pod就绪状态

核心告警规则:

1
2
3
4
5
6
7
8
- alert: ConsoleHighLatency
expr: histogram_quantile(0.95, rate(console_http_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "控制台高延迟"
description: "控制台95%请求延迟超过2秒"

4.2 日志管理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 查看实时日志
oc logs -f deployment/console -n openshift-console

# 日志级别调整(临时)
oc set env deployment/console -n openshift-console LOG_LEVEL=debug

# 持久化日志配置(EFK Stack)
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
spec:
managementState: Managed
collection:
logs:
type: vector
visualization:
type: kibana

4.3 备份与恢复

备份清单:

  1. Console自定义配置:
1
oc get console cluster -o yaml > console-backup.yaml
  1. OAuth配置:
1
oc get oauth.config.openshift.io cluster -o yaml > oauth-backup.yaml
  1. 自定义资源ConfigMap:
1
oc get configmap custom-branding -n openshift-console -o yaml > branding-backup.yaml

恢复流程:

1
2
3
oc apply -f console-backup.yaml
oc apply -f oauth-backup.yaml
oc apply -f branding-backup.yaml

5. 故障排查指南

5.1 常见问题及解决方案

故障现象 排查命令 解决方案
控制台无法访问 oc get route -n openshift-console
curl -vk https://console-url
检查路由、证书和网络策略
身份认证失败 oc get pods -n openshift-authentication
oc logs oauth-openshift-xxx
验证OAuth配置和身份提供者状态
控制台响应缓慢 oc adm top pods -n openshift-console
oc get events --sort-by='.lastTimestamp'
扩展副本数或调整资源限制
自定义主题未生效 oc describe console cluster
oc get configmap custom-branding
检查ConfigMap挂载和权限

5.2 调试工具

1
2
3
4
5
6
7
8
# 检查API服务状态
oc get apiservices | grep console

# 诊断Operator健康状况
oc describe clusteroperators console

# 浏览器端调试
# 访问 https://console-url/dashboards 查看性能指标

6. 最佳实践

6.1 安全加固

  1. 启用TLS 1.3
1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
name: cluster
spec:
tlsSecurityProfile:
type: Custom
custom:
ciphers:
- TLS_AES_128_GCM_SHA256
- TLS_AES_256_GCM_SHA384
minTLSVersion: VersionTLS13
  1. RBAC最小权限原则
1
2
3
4
5
oc create clusterrole console-viewer \
--verb=get,list,watch \
--resource=pods,deployments,services

oc adm policy add-cluster-role-to-group console-viewer developers

6.2 高可用配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: apps/v1
kind: Deployment
metadata:
name: console
namespace: openshift-console
spec:
replicas: 3 # 根据集群规模调整
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- console
topologyKey: "kubernetes.io/hostname"

6.3 性能优化

  • 资源限额
    1
    2
    3
    4
    5
    6
    7
    resources:
    requests:
    memory: "512Mi"
    cpu: "200m"
    limits:
    memory: "1Gi"
    cpu: "500m"
  • 浏览器缓存优化
    1
    2
    3
    4
    5
    location / {
    add_header Cache-Control "public, max-age=3600";
    gzip on;
    gzip_types text/plain application/javascript application/x-javascript text/css;
    }

7. 升级策略

1
2
3
4
5
6
7
8
9
10
11
12
graph LR
A[准备阶段] --> B[备份关键配置]
B --> C[验证升级路径]
C --> D[升级Operator]
D --> E[滚动更新控制台]
E --> F[验收测试]
F --> G[监控稳定性]

subgraph 回滚计划
H[检查备份] --> I[还原Console CR]
I --> J[回滚Operator版本]
end

关键检查点:

  1. 检查Operator订阅通道:oc get subscription console-operator -n openshift-console
  2. 验证兼容性矩阵:确保控制台版本与集群版本匹配
  3. 灰度升级策略:先升级非生产环境集群

8. 总结

OpenShift Web控制台运维核心要点:

  • 自动化优先:通过Operator管理生命周期
  • 配置即代码:所有自定义通过CRD管理
  • 可观测性:建立完善的监控和日志体系
  • 安全纵深防御:网络隔离、RBAC、证书管理
  • 灾备能力:定期备份关键配置,制定回滚方案

经验提示:生产环境每次变更前必须验证配置语法:oc apply --dry-run=server -f console-config.yaml