0、引
生产环境中经常经常需要检查EC2、RDS、ElasticCache的RI利用率、到期时间、型号等数据,那现在需要检测EC2、RDS、Elasticache的RI利用率,于是就就参考Amazon SDK 和 之前公司已有的检测RI过期时间的脚本 写了这么一个测试
0.1 v1版本流程
① 首先创建一个SNS主题,订阅通知为Email
② 然后写了一个lambda脚本,主要功能是获取到当前时间,然后返回当前时间一周内的EC2实例 RI的利用率情况,当判断到当前的利用率小于97%时,通过SNS发出邮件提醒
③ lambda测试完成之后,编写CFN自动创建lambda、role、cloudwatch Events定时调用lambda
0.2 v2版本更新
① CloudFormation 模板中忘记写SNS部分,v2版本在CFN中添加上SNS Topic、SNS Subscription部分,并调整了CFN中它们的位置顺序( 类似yaml这种语言,创建资源的顺序都取决于在文件中定义的前后顺序)这里因为lambda中要使用topic的ARN,所以将lambda的放在SNS主题、订阅后边
② v2版CloudFormation模板中为Lambda函数添加了环境变量 topic_arn(使用!Ref 从SNS Topic资源获取)、before_days( 用户想获得的当前前多长时间的一个窗口期 )、appenv(项目名称),lambda函数中使用os.environ['变量key']获取到该值
③ v2版lambda函数较v1版增加了对RDS、Elasticache RI 的具体的实例类型、平台类型、RI数量,可用区等将详细信息,做一个判断,利用率低于97%的时候执行sns通知
0.3 v3版本更新
① V3版本增加Redshift的利用率报告
② V3版本增加所有RI的过期时间信息
0.4 体系流程图
自动化流程是:
将lambda代码上传到S3存储桶,部署CFN自动创建Role、Lambda Function、Lambda Permission、CloudWatch Events资源,Cloudwatch Events 定时使用 Role 去执行 Lambda 函数,Lambda判断 RI 利用率如果低于某个值,输出RI详细信息并通过调用SNS,通知到用户
1、输出展示
1.1 ( 为展示数据,此账号此输出未作筛选,此账号RI资源多,仅测试Lambda函数功能,未使用CFN自动化 )
1.2 另一账号使用CFN并测试SNS输出展示
2、使用此方案流程
2.1 将 lambda 代码上传到S3存储桶
import json
import boto3
import datetime
import os
# 获取当前时间
def get_date():
# datetime.datetime.today()获取到当前时间,格式为2020-11-03 14:12:28.339466
# strftime("%Y-%m-%d")是去除时分秒,只留年月日
# datetime.datetime.today() - datetime.timedelta(days=1)是获取到今天日期后,减1天
# 从 lambda 环境变量获取到用户希望收集当前时间前多久的一个时间段的RI利用率值
before_days = os.environ['before_days']
start_time = (datetime.datetime.today() - datetime.timedelta(days=int(before_days))).strftime("%Y-%m-%d")
stop_time = datetime.datetime.today().strftime("%Y-%m-%d")
return start_time, stop_time
# 将response的利用率转为整型
# def to_int(str):
# try:
# int(str)
# return int(str)
# except ValueError:
# try:
# float(str)
# return int(float(str))
# except ValueError:
# return False
# get_reservation_utilization()的Filter下的Service可以用的值:
# Amazon Elastic Compute Cloud - Compute,
# Amazon Relational Database Service, Amazon ElastiCache, Amazon Redshift, Amazon Elasticsearch Service 可以填的值!
# ec2 ri utilization
def get_ec2_reservation_utilization():
# regions = ['cn-north-1', 'cn-northwest-1']
# 项目名称,从环境变量中获取
appenv = os.environ['appenv']
# 定义 sns 消息标头内容
message_list = [" " + appenv + " RI Utilization Monitor: "]
# for region in regions:
ec2_client = boto3.client('ce')
'''
如果定义了时间戳,get_reservation_utilization()返回的数据类型是字典,
它会返回每一天的利用率作为一个Group,并在每一个Group中返回一个利用率,
最后会返回设定时间戳内总的利用率'Total'字段中
'''
start_time, stop_time = get_date()
response = ec2_client.get_reservation_utilization(
TimePeriod={
'Start': start_time,
'End': stop_time
},
Filter={
'Dimensions': {
'Key': 'SERVICE',
'Values': [
'Amazon Elastic Compute Cloud - Compute'
]
}
},
GroupBy=[
{
'Type': "DIMENSION",
'Key': "SUBSCRIPTION_ID"
}
]
)
# 字典类型,直接获取到response对应字段
# TotalUtilizationPercentage = response['Total']['UtilizationPercentageInUnits']
# for循环获取每一个group下的RI
# ec2_ri_groups_list = response['UtilizationsByTime'][0]['Groups'][0]['Attributes']['instanceType']
ec2_ri_groups_list = response['UtilizationsByTime'][0]['Groups']
# for ri in ec2_ri_groups_list:
# if ri['Attributes']['instanceType'] == 'db.m4.2xlarge':
# instanceType = 'db.m4.2xlarge'
# return instanceType
# print(ec2_ri_groups_list)
# print("0------------------")
if ec2_ri_groups_list:
message_list.append(" -------EC2 RI--------")
for ec2_ri_group in ec2_ri_groups_list:
ec2_ri_region = ec2_ri_group['Attributes']['region']
ec2_ri_numberOfInstances = ec2_ri_group['Attributes']['numberOfInstances']
ec2_ri_instanceType = ec2_ri_group['Attributes']['instanceType']
ec2_ri_platform = ec2_ri_group['Attributes']['platform']
ec2_ri_endDateTime = ec2_ri_group['Attributes']['endDateTime'].split('T')[0]
# 格式化输出->利用率
float_ec2_ri_UtilizationPercentage = float(ec2_ri_group['Utilization']['UtilizationPercentage'])
ec2_ri_UtilizationPercentage = "{:.2f}%".format(float_ec2_ri_UtilizationPercentage)
if float_ec2_ri_UtilizationPercentage < 97:
message = " On the AZ " + ec2_ri_region + " , " + ec2_ri_numberOfInstances \
+ " EC2 " + ec2_ri_instanceType + " RI Utilization is " + ec2_ri_UtilizationPercentage \
+ " , its platform is " + ec2_ri_platform + " , its expiration date is " + ec2_ri_endDateTime
message_list.append(message)
# 如果利用率较高,不输出分割线
# if float_ec2_ri_UtilizationPercentage >= 97:
# message_list.remove(" -------EC2 RI--------")
return message_list
# rds ri utilization
def get_rds_reservation_utilization():
# regions = ['cn-north-1', 'cn-northwest-1']
message_list = get_ec2_reservation_utilization()
# print(message_list)
# print("0------------------------")
# for region in regions:
rds_client = boto3.client('ce')
start_time, stop_time = get_date()
rds_response = rds_client.get_reservation_utilization(
TimePeriod={
'Start': start_time,
'End': stop_time
},
Filter={
'Dimensions': {
'Key': 'SERVICE',
'Values': [
'Amazon Relational Database Service'
]
}
},
GroupBy=[
{
'Type': "DIMENSION",
'Key': "SUBSCRIPTION_ID"
}
]
)
rds_ri_groups_list = rds_response['UtilizationsByTime'][0]['Groups']
if rds_ri_groups_list:
message_list.append(" -------RDS RI--------")
for rds_ri_group in rds_ri_groups_list:
rds_ri_region = rds_ri_group['Attributes']['region']
rds_ri_numberOfInstance = rds_ri_group['Attributes']['numberOfInstances']
rds_ri_instanceType = rds_ri_group['Attributes']['instanceType']
rds_ri_platform = rds_ri_group['Attributes']['platform']
rds_ri_endDateTime = rds_ri_group['Attributes']['endDateTime'].split('T')[0]
float_rds_ri_UtilizationPercentage = float(rds_ri_group['Utilization']['UtilizationPercentage'])
rds_ri_UtilizationPercentage = "{:.2f}%".format(float_rds_ri_UtilizationPercentage)
if float_rds_ri_UtilizationPercentage < 97:
message = " On the AZ " + rds_ri_region + " , " + rds_ri_numberOfInstance \
+ " RDS " + rds_ri_instanceType + " RI Utilization is " + rds_ri_UtilizationPercentage \
+ " , its platform is " + rds_ri_platform + " , its expiration date is " + rds_ri_endDateTime
message_list.append(message)
# 如果利用率较高,不输出分割线
# if float_rds_ri_UtilizationPercentage > 97:
# message_list.remove(" -------RDS RI--------")
return message_list
def get_redshift_utilization():
message_list = get_rds_reservation_utilization()
redshift_client = boto3.client('ce')
start_time, stop_time = get_date()
redshift_response = redshift_client.get_reservation_utilization(
TimePeriod={
'Start': start_time,
'End': stop_time
},
Filter={
'Dimensions': {
'Key': 'SERVICE',
'Values': [
'Amazon Redshift'
]
}
},
GroupBy=[
{
'Type': 'DIMENSION',
'Key': 'SUBSCRIPTION_ID'
}
]
)
redshift_ri_groups_list = redshift_response['UtilizationsByTime'][0]['Groups']
if redshift_ri_groups_list:
message_list.append(" -------Redshift RI--------")
for redshift_ri_group in redshift_ri_groups_list:
redshift_ri_region = redshift_ri_group['Attributes']['region']
redshift_ri_numberOfInstance = redshift_ri_group['Attributes']['numberOfInstances']
redshift_ri_instanceType = redshift_ri_group['Attributes']['instanceType']
redshift_ri_platform = redshift_ri_group['Attributes']['platform']
redshift_ri_endDateTime = rds_ri_group['Attributes']['endDateTime'].split('T')[0]
float_redshift_ri_UtilizationPercentage = float(redshift_ri_group['Utilization']['UtilizationPercentage'])
redshift_ri_UtilizationPercentage = "{:.2f}%".format(float_redshift_ri_UtilizationPercentage)
if float_redshift_ri_UtilizationPercentage < 97:
message = " On the AZ " + redshift_ri_region + " , " + redshift_ri_numberOfInstance \
+ " Redshift " + redshift_ri_instanceType + " RI Utilization is " + redshift_ri_UtilizationPercentage \
+ " , its platform is " + redshift_ri_platform + " , its expiration date is " + redshift_ri_endDateTime
message_list.append(message)
return message_list
# elasticache ri utilization
def get_elasticache_utilization():
message_list = get_redshift_utilization()
message_str = "\n".join(message_list)
# for region in regions:
elasticache_client = boto3.client('ce')
start_time, stop_time = get_date()
elasticache_response = elasticache_client.get_reservation_utilization(
TimePeriod={
'Start': start_time,
'End': stop_time
},
Filter={
'Dimensions': {
'Key': 'SERVICE',
'Values': [
'Amazon ElastiCache'
]
}
},
GroupBy=[
{
'Type': "DIMENSION",
'Key': "SUBSCRIPTION_ID"
}
]
)
elasticache_ri_groups_list = elasticache_response['UtilizationsByTime'][0]['Groups']
if elasticache_ri_groups_list:
message_list.append(" -------ElastiCache RI--------")
for elasticache_ri_group in elasticache_ri_groups_list:
elasticache_ri_region = elasticache_ri_group['Attributes']['region']
elasticache_ri_numberOfInstance = elasticache_ri_group['Attributes']['numberOfInstances']
elasticache_ri_instanceType = elasticache_ri_group['Attributes']['instanceType']
elasticache_ri_platform = elasticache_ri_group['Attributes']['platform']
elasticache_ri_endDateTime = elasticache_ri_group['Attributes']['endDateTime'].split('T')[0]
float_elasticache_ri_UtilizationPercentage = float(
elasticache_ri_group['Utilization']['UtilizationPercentage'])
elasticache_ri_UtilizationPercentage = "{:.2f}%".format(float_elasticache_ri_UtilizationPercentage)
if float_elasticache_ri_UtilizationPercentage < 97:
message = " On the AZ " + elasticache_ri_region + " , " + elasticache_ri_numberOfInstance \
+ " ElastiCache " + elasticache_ri_instanceType + " RI Utilization is " + elasticache_ri_UtilizationPercentage \
+ " , its platform is " + elasticache_ri_platform + " , its expiration date is " + elasticache_ri_endDateTime
message_list.append(message)
message_str = "\n".join(message_list)
return message_str
def sns_publish():
# 使用os.environ获取lambda中的环境变量,该环境变量值在CloudFormation创建lambda时已获取到,传递给lambda的Environment环境变量
topic_arn = os.environ['topic_arn']
topic_region = 'cn-north-1'
# 获取到返回的消息值
message_str = get_elasticache_utilization()
sns = boto3.client('sns', region_name=topic_region)
response = sns.publish(
TopicArn=topic_arn,
Subject='RI Utilization Monitor',
Message=message_str
)
def lambda_handler(event, context):
# TODO implement
before_days = os.environ['before_days']
# 判断是否执行
# TotalUtilizationPercentage = get_reservation_utilization()[0]
# if float(TotalUtilizationPercentage) < 97:
sns_publish()
2.2 修改CloudFormation代码可变字段
AWSTemplateFormatVersion: 2010-09-09
Description: RI-Utilization-Monitor
Resources:
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
Path: /
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
Policies:
- PolicyName: RI_Utilization_Monitor
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- ec2:DescribeReservedInstances
- ec2:DescribeReservedInstancesModifications
- ec2:DescribeReservedInstancesOfferings
- ec2:DescribeReservedInstancesListings
- ce:GetReservationUtilization
- sns:Publish
- s3:Get*
- s3:List*
Resource: "*"
CloudwacthEventsScheduledRule:
Type: AWS::Events::Rule
Properties:
Name: RI_Utilization_Monitor
Description: AWS Cloudwatch Events Schedule Rule
ScheduleExpression: "cron(00 08 ? * FRI *)" # 修改这里周期调用Lambda函数的周期,GMT
State: "ENABLED"
Targets:
-
Arn:
Fn::GetAtt:
- LambdaFunctionCreator
- Arn
Id: GetReservationUtilization
PermissionForEventsToInvokeLambda:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !GetAtt
- LambdaFunctionCreator
- Arn
Action: lambda:InvokeFunction
Principal: events.amazonaws.com
SourceArn: !GetAtt
- CloudwacthEventsScheduledRule
- Arn
SNSTopic:
Type: AWS::SNS::Topic
Properties:
TopicName: RI_Utilization_Monitor
SNSSubscription:
Type: AWS::SNS::Subscription
Properties:
Endpoint: 'XXXX@light2cloud.com' # 修改这里SNS订阅的邮箱
Protocol: email
TopicArn: !Ref SNSTopic
LambdaFunctionCreator:
Type: AWS::Lambda::Function
Properties:
FunctionName: GetReservationUtilization
Description: Lambda For RI Utilization Monitor
Environment:
Variables:
topic_arn: !Ref SNSTopic
before_days: 7 # 修改这里选择返回当前时间前多久的一个RI利用率
appenv: Friso # 修改这里项目名
Runtime: python3.7
Handler: GetReservationUtilization.lambda_handler
MemorySize: 128
Role: !GetAtt LambdaExecutionRole.Arn
Timeout: 60
Code:
S3Bucket: XXXXXXXXXXXXXXXX # S3桶名
S3Key: RI/GetReservationUtilization.zip # S3路径/文件名
2.3 使用已有模板生成CloudFormation,并上传yaml文件
2.4 需要一个CloudFormation的执行角色权限
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "*", "Resource": "*" } ] }
2.5 执行CLoudFormation自动化创建Lambda函数、Lambda Role、SNS Topic、Sns 订阅等资源
测试时可以将定时调用的周期距离当前时间近一些,待无问题之后在设定为日常使用所需要的一个周期
3、测试代码内容
l = {
"UtilizationsByTime": [{
"Groups": [
{
'Key': 'string',
'Value': 'string',
"Attributes": {
"AccountId": "0123456789",
"AccountName": "0123456789",
"AvailabilityZone": "",
"CancellationDateTime": "2019-09-28T15:22:31.000Z",
"EndDateTime": "2019-09-28T15:22:31.000Z",
"InstanceType": "t2.nano",
"LeaseId": "0123456789",
"NumberOfInstances": "1",
"OfferingType": "convertible",
"Platform": "Linux/UNIX",
"Region": "us-east-1",
"Scope": "Region",
"StartDateTime": "2016-09-28T15:22:32.000Z",
"SubscriptionId": "359809062",
"SubscriptionStatus": "Active",
"SubscriptionType": "All Upfront",
"Tenancy": "Shared"
},
"Key": "SUBSCRIPTION_ID",
"Utilization": {
"PurchasedHours": 2208,
"TotalActualHours": 2208,
"UnusedHours": 0,
"UtilizationPercentage": 100
},
"Value": "359809062"
},
{
"Attributes": {
"": "0123456789",
"AccountName": "asdasdad",
"AvailabilityZone": "us-east-1d",
"CancellationDateTime": "2017-09-28T15:22:31.000Z",
"EndDateTime": "2017-09-28T15:22:31.000Z",
"InstanceType": "t2.nano",
"LeaseId": "asdasda",
"NumberOfInstances": "1",
"OfferingType": "Standard",
"Platform": "Linux/UNIX",
"Region": "us-east-1",
"Scope": "Availability Zone",
"StartDateTime": "2016-09-28T15:22:32.000Z",
"SubscriptionId": "359809070",
"SubscriptionStatus": "Active",
"SubscriptionType": "All Upfront",
"Tenancy": "Shared"
},
"Key": "SUBSCRIPTION_ID",
"Utilization": {
"PurchasedHours": 2151,
"TotalActualHours": 2151,
"UnusedHours": 0,
"UtilizationPercentage": 100
},
"Value": "359809070"
},
{
"Attributes": {
"AccountId": "0123456789",
"AccountName": "sdasad",
"AvailabilityZone": "us-west-2a",
"CancellationDateTime": "2017-09-20T04:06:02.000Z",
"EndDateTime": "2017-09-20T04:06:02.000Z",
"InstanceType": "t2.nano",
"LeaseId": "asdasda",
"NumberOfInstances": "1",
"OfferingType": "Standard",
"Platform": "Linux/UNIX",
"Region": "us-west-2",
"Scope": "Availability Zone",
"StartDateTime": "2016-09-20T04:06:03.000Z",
"SubscriptionId": "353571154",
"SubscriptionStatus": "Active",
"SubscriptionType": "Partial Upfront"
},
"Key": "SUBSCRIPTION_ID",
"Utilization": {
"PurchasedHours": 1948,
"TotalActualHours": 0,
"UnusedHours": 1948,
"UtilizationPercentage": 0
},
"Value": "353571154"
}
],
"TimePeriod": {
"End": "2017-10-01",
"Start": "2017-07-01"
},
"Total": {
"PurchasedHours": 6307,
"TotalActualHours": 4359,
"UnusedHours": 1948,
"UtilizationPercentage": 69.11368320913270968764864436340574
}
}]
}
ec2_ri_groups_list = l['UtilizationsByTime'][0]['Groups']
message_list = [" RI Utilization Monitor: "]
for ec2_ri_group in ec2_ri_groups_list:
ec2_ri_region = ec2_ri_group['Attributes']['Region']
ec2_ri_numberOfInstances = ec2_ri_group['Attributes']['NumberOfInstances']
ec2_ri_instanceType = ec2_ri_group['Attributes']['InstanceType']
ec2_ri_platform = ec2_ri_group['Attributes']['Platform']
# 格式化输出利用率
ec2_ri_UtilizationPercentage = "{:.2f}%".format(float(ec2_ri_group['Utilization']['UtilizationPercentage']))
message = " On the AZ " + ec2_ri_region + " , " + ec2_ri_numberOfInstances \
+ " EC2 " + ec2_ri_instanceType + " RI Utilization is " + ec2_ri_UtilizationPercentage + " , its platform is " + ec2_ri_platform
message_list.append(message)
message_str = "\n".join(message_list)
print(message_str)
N、CFN Error排错过程
:
1、
cron表达式有误,已解决
设置了每周周日执行,日期就不能填*,必须填?
2、
CFN中创建角色的权限不够
排错步骤:
① 首先检查CFN是否执行成功
② 检查CFN中设定的资源是否都创建成功:Role、Cloudwatch Events、Lambda
③ 检查lambda执行情况
排错时找到了lambda报错原因:权限问题
"errorMessage": "An error occurred (AccessDeniedException) when calling the GetReservationUtilization operation: User: arn:aws-cn:sts::936669166135:assumed-role/RI-Utilization-Monitor-LambdaExecutionRole-KD1YDAO01XWR/GetReservationUtilization is not authorized to perform: ce:GetReservationUtilization on resource: arn:aws:ce:cn-northwest-1:936669166135:/GetReservationUtilization",
解决:
更改CFN创建角色的权限为:
Policies: - PolicyName: RI_Utilization_Monitor PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - ec2:DescribeReservedInstances - ec2:DescribeReservedInstancesModifications - ec2:DescribeReservedInstancesOfferings - ec2:DescribeReservedInstancesListings - ce:GetReservationUtilization - sns:Publish - s3:Get* - s3:List* Resource: "*"
3、
CFN中弄混 !GetAtt、!Ref
,无法获取Topic
的ARN
解决:
① 查看报错信息
③ 报错定位
⑤ 修改CFN代码(图片只有部分,详见版本2 CFN代码)
参考:
评论