prometheusalert区分告警到不同钉钉群

这篇具有很好参考价值的文章主要介绍了prometheusalert区分告警到不同钉钉群。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

方法一

修改告警规则

- alert: cpu使用率大于88%
    expr: instance:node_cpu_utilization:ratio * 100 > 88
    for: 5m
    labels:
      severity: critical
      level: 3
      kind: CpuUsage
    annotations:
      summary: "cpu使用率大于85%"
      description: "主机 {{ $labels.hostname }} 的cpu使用率为 {{ $value | humanize }}"

根据Kind区分,规则一kind1,规则二是kind2。

alertmanager配置示例

global:
  resolve_timeout: 5m
  smtp_from: from@email.com
  smtp_smarthost: smtp.net:port
  smtp_auth_username: from@email.com
  smtp_auth_password: PASS
  smtp_require_tls: false
route:
  receiver: 'email'
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m
  routes:
  - receiver: 'our'
    group_wait: 10s
    match_re:
       severity: warning
  - receiver: 'other'
    group_wait: 10s
    match_re:
       severity: busi

templates:
  - '*.html'
receivers:
- name: 'email'
  email_configs:
  - to: 'xuxd@email.com'
    send_resolved: false
    html: '{{ template "default-monitor.html" . }}'
    headers: { Subject: "[WARN] 报警邮件" } #邮件主题
- name: 'our'
  webhook_configs:
  - url: http://127.0.0.1:8060/dingtalk/our/send
- name: 'other'
  webhook_configs:
  - url: http://127.0.0.1:8060/dingtalk/other/send
  • route:除了email这个全局配置的接收者外,下面的routes指定了两个特定的接收者,一个接收者叫“our”,匹配warning级别的;另一个叫“other”,匹配busi级别的,这两个级别在最前面的规则里定义,不是什么特定关键字,就是自己随便定义的一个标记
  • receivers:这里指定了上面定义的接收者的配置,email指定邮件发给谁;“our”指定dingtalk的发送url,注意这个uri的末尾,send前用的"our";“other”下面指定了两个url,区别就是url末尾的send前面,一个是“our”,另一个是"other"

prometheus-webhook-dingtalk配置

## Customizable templates path
templates:
   - /home/user/monitor/alert/prometheus-webhook-dingtalk-1.4.0.linux-amd64/template/template.tmpl

## Targets, previously was known as "profiles"
targets:
  our:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxx
    secret: xxx_secret
  other:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxx_other
    secret: xxx_other_secret

targets下有两个,分别是"our"和"other",这里对应上面alertmanager配置的url里的"our"和"other。

这样配置,如果规则一告警,就是alertmanager的name为other的receiver来发送告警通知,发送到我们的钉钉群和业务侧钉钉群。如果是规则二告警,通过our发送,便只发送到我们的钉钉群。

vmalert配置文件value.yaml

# Default values for victoria-metrics-alert.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name:
  # mount API token to pod directly
  automountToken: true

imagePullSecrets: []

rbac:
  create: true
  pspEnabled: true
  namespaced: false
  extraLabels: {}
  annotations: {}

server:
  name: server
  enabled: true
  image:
    repository: victoriametrics/vmalert
    tag: "" # rewrites Chart.AppVersion
    pullPolicy: IfNotPresent
  nameOverride: ""
  fullnameOverride: ""

  ## See `kubectl explain poddisruptionbudget.spec` for more
  ## ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
  podDisruptionBudget:
    enabled: false
    # minAvailable: 1
    # maxUnavailable: 1
    labels: {}

  # -- Additional environment variables (ex.: secret tokens, flags) https://github.com/VictoriaMetrics/VictoriaMetrics#environment-variables
  env:
    []
    # - name: VM_remoteWrite_basicAuth_password
    #   valueFrom:
    #     secretKeyRef:
    #       name: auth_secret
    #       key: password

  replicaCount: 1

  # deployment strategy, set to standard k8s default
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%

  # specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing/terminating
  # 0 is the standard k8s default
  minReadySeconds: 0

  # vmalert reads metrics from source, next section represents its configuration. It can be any service which supports
  # MetricsQL or PromQL.
  datasource:
    url: "http://192.168.47.9:8481/select/0/prometheus/"
    basicAuth:
      username: ""
      password: ""

  remote:
    write:
      url: ""
    read:
      url: ""

  notifier:
    alertmanager:
      url: "http://x.x.x.x:9093"

  extraArgs:
    envflag.enable: "true"
    envflag.prefix: VM_
    loggerFormat: json

  # Additional hostPath mounts
  extraHostPathMounts:
    []
    # - name: certs-dir
    #   mountPath: /etc/kubernetes/certs
    #   subPath: ""
    #   hostPath: /etc/kubernetes/certs
  #   readOnly: true

  # Extra Volumes for the pod
  extraVolumes:
    []
     #- name: example
     #  configMap:
     #    name: example

  # Extra Volume Mounts for the container
  extraVolumeMounts:
    []
    # - name: example
    #   mountPath: /example

  extraContainers:
    []
    #- name: config-reloader
    #  image: reloader-image

  service:
    annotations: {}
    labels: {}
    clusterIP: ""
    ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
    ##
    externalIPs: []
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    servicePort: 8880
    type: ClusterIP
    # Ref: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip
    # externalTrafficPolicy: "local"
    # healthCheckNodePort: 0

  ingress:
    enabled: false
    annotations: {}
    #   kubernetes.io/ingress.class: nginx
    #   kubernetes.io/tls-acme: 'true'

    extraLabels: {}
    hosts: []
    #   - name: vmselect.local
    #     path: /select
    #     port: http

    tls: []
    #   - secretName: vmselect-ingress-tls
    #     hosts:
    #       - vmselect.local

    # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
    # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
    # ingressClassName: nginx
    # -- pathType is only for k8s >= 1.1=
    pathType: Prefix

  podSecurityContext: {}
  # fsGroup: 2000

  securityContext:
    {}
    # capabilities:
    #   drop:
    #   - ALL
    # readOnlyRootFilesystem: true
    # runAsNonRoot: true
  # runAsUser: 1000

  resources:
    {}
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    # limits:
    #   cpu: 100m
    #   memory: 128Mi
    # requests:
    #   cpu: 100m
  #   memory: 128Mi

  # Annotations to be added to the deployment
  annotations: {}
  # labels to be added to the deployment
  labels: {}

  # Annotations to be added to pod
  podAnnotations: {}

  podLabels: {}

  nodeSelector: {}

  priorityClassName: ""

  tolerations: []

  affinity: {}

  # vmalert alert rules configuration configuration:
  # use existing configmap if specified
  # otherwise .config values will be used
  configMap: ""
  config:
    alerts:
      groups:
          - name: 磁盘挂载错误
            rules:
            - alert: 磁盘挂载错误
              annotations:
                description: '{{$labels.job}}链{{$labels.instance}}节点磁盘挂载错误'
              expr: mount_error{job=~"dev|sit"} == 1
              for: 1m
              labels:
                severity: critical
                kind: kind1
          - name: 进程不存在
            rules:
            - alert: 进程不存在
              annotations:
                description: '{{$labels.job}}链{{$labels.instance}}进程不存在'
              expr: process_total_error{job=~"dev|sit"} == 1
              for: 1m
              labels:
                severity: critical
                kind: kind2

serviceMonitor:
  enabled: false
  extraLabels: {}
  annotations: {}
#    interval: 15s
#    scrapeTimeout: 5s
  # -- Commented. HTTP scheme to use for scraping.
#    scheme: https
  # -- Commented. TLS configuration to use when scraping the endpoint
#    tlsConfig:
#      insecureSkipVerify: true

alertmanager:
  enabled: true
  replicaCount: 1
  podMetadata:
    labels: {}
    annotations: {}
  image: prom/alertmanager
  tag: v0.20.0
  retention: 120h
  nodeSelector: {}
  priorityClassName: ""
  resources: {}
  tolerations: []
  imagePullSecrets: []
  podSecurityContext: {}
  extraArgs: {}
  # key: value

  # external URL, that alertmanager will expose to receivers
  baseURL: ""
  # use existing configmap if specified
  # otherwise .config values will be used
  configMap: ""
  config:
    global:
      resolve_timeout: 5m
    route:
      # default receiver
      receiver: aldaba
      # tag to group by
      group_by: [alertname]
      # How long to initially wait to send a notification for a group of alerts
      group_wait: 30s
      # How long to wait before sending a notification about new alerts that are added to a group
      group_interval: 60s
      # How long to wait before sending a notification again if it has already been sent successfully for an alert
      repeat_interval: 1h
      routes:
      - receiver: 'mychain'
        group_wait: 10s
        match_re:
          kind: mychain
    receivers:
      - name: aldaba
        webhook_configs:
        - url: http://192.168.208.133:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=72a3a55795094a6878c2c2443a81a3545add1f688ddee18701c0dd753dbb3b2a&split=false
          send_resolved: true
      - name: mychain
        webhook_configs:
        - url: http://192.168.208.133:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=307270fdcd1bb0c4b0533e29005cca7cb353c27d7f988fdff0ec00e6affc6e83&split=false
          send_resolved: true
    inhibit_rules:
      - source_match:
          #severity: 'warning'
        target_match:
          #severity: 'warning'
        #equal: ['alertname', 'job']

  templates: {}
  #  alertmanager.tmpl: |-
  service:
    annotations: {}
    type: ClusterIP
    port: 9093
    # if you want to force a specific nodePort. Must be use with service.type=NodePort
    # nodePort:
  ingress:
    enabled: false
    annotations:
            #  nginx.ingress.kubernetes.io/auth-realm: Authentication Required
            #  nginx.ingress.kubernetes.io/auth-secret: victoria-metrics/basic-auth
            #  nginx.ingress.kubernetes.io/auth-type: basic
    #   kubernetes.io/ingress.class: nginx
    #   kubernetes.io/tls-acme: 'true'
    extraLabels: {}
    hosts: {}
    #   - name: wangjuan.test.com
    #    path: /
    #     port: web

    tls: []
    #   - secretName: alertmanager-ingress-tls
    #     hosts:
    #       - alertmanager.local

    # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
    # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
    # ingressClassName: nginx
    # -- pathType is only for k8s >= 1.1=
    pathType: Prefix
  persistentVolume:
    # -- Create/use Persistent Volume Claim for alertmanager component. Empty dir if false
    enabled: false
    # -- Array of access modes. Must match those of existing PV or dynamic provisioner. Ref: [http://kubernetes.io/docs/user-guide/persistent-volumes/](http://kubernetes.io/docs/user-guide/persistent-volumes/)
    accessModes:
      - ReadWriteOnce
    # -- Persistant volume annotations
    annotations: {}
    # -- StorageClass to use for persistent volume. Requires alertmanager.persistentVolume.enabled: true. If defined, PVC created automatically
    storageClass: ""
    # -- Existing Claim name. If defined, PVC must be created manually before volume will be bound
    existingClaim: ""
    # -- Mount path. Alertmanager data Persistent Volume mount root path.
    mountPath: /data
    # -- Mount subpath
    subPath: ""
    # -- Size of the volume. Better to set the same as resource limit memory property.
    size: 50Mi

方法二

根据job过滤

alertmanager配置文章来源地址https://www.toymoban.com/news/detail-650234.html

apiVersion: v1
data:
  alertmanager.yaml: |-
    global:
      resolve_timeout: 5m
    inhibit_rules:
    - equal:
      - alertname
      - job
      source_match:
        severity: warning
      target_match:
        severity: warning
    receivers:
    - name: nft
      webhook_configs:
      - send_resolved: false
        url: http://x.x.x.x:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxx&split=false
    - name: poap
      webhook_configs:
      - send_resolved: false
        url: http://x.x.x.x:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxx&split=false
    - name: ipforce
      webhook_configs:
      - send_resolved: false
        url: http://x.x.x.x:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxx&split=false
    route:
      group_by:
      - alertname
      group_interval: 60s
      group_wait: 30s
      receiver: nft
      repeat_interval: 1h
      routes:
      - group_wait: 10s
        match_re:
          job: test_poap
        receiver: poap
      - group_wait: 10s
        match_re:
          job: test_ipforce
        receiver: ipforce
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: vmalert
    meta.helm.sh/release-namespace: victoria-metrics
  creationTimestamp: '2022-04-06T07:31:38Z'
  labels:
    app: alertmanager
    app.kubernetes.io/instance: vmalert
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: victoria-metrics-alert
    helm.sh/chart: victoria-metrics-alert-0.4.33
  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:data': {}
        'f:metadata':
          'f:annotations':
            .: {}
            'f:meta.helm.sh/release-name': {}
            'f:meta.helm.sh/release-namespace': {}
          'f:labels':
            .: {}
            'f:app': {}
            'f:app.kubernetes.io/instance': {}
            'f:app.kubernetes.io/managed-by': {}
            'f:app.kubernetes.io/name': {}
            'f:helm.sh/chart': {}
      manager: helm
      operation: Update
      time: '2022-04-06T07:31:38Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          'f:alertmanager.yaml': {}
      manager: ACK-Console Apache-HttpClient
      operation: Update
      time: '2023-01-05T07:52:13Z'
  name: vmalert-alertmanager-alertmanager-config
  namespace: victoria-metrics
  resourceVersion: '80954053'
  uid: 653e4633-86e5-41ce-9a17-301f75224e9c

到了这里,关于prometheusalert区分告警到不同钉钉群的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 钉钉群消息推送

    PC端登录(当前版本手机端无法进行推送设置),群设置-- 机器人 -- webhook 进行安全设置 复制webhook对应的url 钉钉群消息支持纯文本和markdown类型 消息中需要包含群机器人设置的安全词(此处为预警),否则会被过滤

    2024年02月14日
    浏览(34)
  • 钉钉机器人发送jira消息到钉钉群聊

    前期准备 1.安装JIRA相关插件 Automation for Jira - Server Lite。 、 2.配置钉钉群机器人 01.钉钉群右上角点击群设置,选择智能群助手,点击添加机器人,选择自定义机器人; 02.给机器人起个名字,消息推送开启,复制出webhook,后面会用到,勾选自定义,填写(

    2024年02月15日
    浏览(47)
  • 钉钉小程序生态5—钉钉群机器人消息通知和钉钉工作通知

    钉钉小程序生态1—区分企业内部应用、第三方企业应用、第三方个人应用 钉钉小程序生态2—区分小程序和H5微应用 钉钉小程序生态3—钉钉扫码登录PC端网站 钉钉小程序生态4—钉钉小程序三方企业应用事件与回调 钉钉小程序生态5—钉钉群机器人消息通知和钉钉工作通知 钉

    2024年02月09日
    浏览(83)
  • 在钉钉群通过机器人发送信息

    在第三方API接口对接中,需要及时获取第三方接口请求结果情况,所以在代码中融合钉钉机器人,对请求的异常结果及时发送通知。 自定义机器人参考链接 通用响应参数-封装API的错误码 通用响应参数-状态码 通过返回类定义 在钉钉群可以通过手机号码@指定人员 关于钉钉群

    2024年02月01日
    浏览(53)
  • 使用postman每天发送新闻到钉钉群

    概述:使用postman抓取百度热搜,再将标题提取出来,每天早上9点通过钉钉群机器人定时发送到钉钉群里。 接口信息: GET请求,URL:https://top.baidu.com/board?tab=realtime   

    2023年04月24日
    浏览(35)
  • 快手无需代码连接钉钉群机器人的方法

    快手用户使用场景: 对于视频运营人员来说,每当在快手平台上发布视频进行推广后,常需要关注视频的播放量,点赞量,转发量以及评论等,然后将数据发送到部门群,便于运营人员分析,做好后续策略调整。随着公司的快速发展,公司每天需要发布多个视频,同时还需要

    2024年02月11日
    浏览(63)
  • 扩展ABP的Webhook功能,推送数据到第三方接口(企业微信群、钉钉群等)

    ASP.NET Boilerplate(以下简称ABP)在v5.2(2020-02-18)版本中发布了Webhook功能,详细说明,请参见:官方帮助链接; ASP.NET ZERO(以下简称ZERO)在v8.2.0(2020-02-20)版本中发布了Webhook功能; 我们系统是在2021年4月完成了对Webhook功能的改造:内部接口(用户自行设定接口地址的)、第

    2024年02月09日
    浏览(38)
  • linux环境下监控docker进程运行情况,使用钉钉群机器人报警异常服务

    背景:在linux环境下,很多服务我们都使用docker来跑,很是方便,容器服务独立,配置独立,数据独立等等,但是有个问题,就是如果某个服务异常了,暂停了,停止了,一直重启中,我们要怎么及时的知道是哪个服务,并进行处理,保证业务正常运行。 本文主要介绍使用

    2024年02月13日
    浏览(45)
  • 【博客647】MetricsQL(VictoriaMetrics)和PromQL(Prometheus)的不同之处

    https://valyala.medium.com/promql-tutorial-for-beginners-9ab455142085 2-1、MetricsQL 考虑了方括号中窗口之前的前一个点,用于范围函数,例如速率和增加。这允许返回用户对 increase(metric[$__interval]) 查询期望的准确结果,而不是 Prometheus 为此类查询返回的不完整结果 2-2、MetricsQL不推断范围函数

    2024年02月08日
    浏览(62)
  • Outlook无需API开发连接钉钉群机器人,实现新增会议日程自动发送群消息通知

    Outlook用户使用场景: 在企业中,会议和活动的顺利举行对于业务运转和团队协作至关重要。然而,计划的变动总是无法避免,这可能会导致其他人的计划受到影响,打乱原有的安排。为了解决这个问题,许多企业开始使用各种工具和技术来确保信息的及时传递和更新。其中

    2024年02月09日
    浏览(39)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包