本文档介绍如何将 LiteLLM AI Gateway 的请求日志和费用数据接入 Amazon QuickSight,构建运维监控 Dashboard,实现对 LLM 使用量、费用、性能的可视化分析。
| 数据源 | 内容 | 实时性 | 适合场景 |
|---|---|---|---|
| S3 + Athena | 每次请求的完整日志(含 messages、response) | 实时 | 请求级分析、内容审计 |
| Aurora PostgreSQL | 费用汇总表(按天/用户/团队/Tag) | 实时 | 费用统计、预算监控 |
| 对比项 | LiteLLM 内置 Dashboard | QuickSight |
|---|---|---|
| 自定义图表 | 固定视图,不可自定义 | 完全自定义,拖拽建图 |
| 多维度交叉分析 | 有限 | 支持任意维度组合 |
| 数据导出 | 不支持 | 支持 CSV/PDF 导出 |
| 权限控制 | 需要 LiteLLM admin 权限 | 独立权限体系,可分享给非技术人员 |
| 自然语言查询 | 不支持 | Amazon Q 自然语言建图 |
| 告警 | 不支持 | 支持阈值告警 |
| 历史数据 | 受数据库保留策略限制 | S3 永久保存 |
确保 litellm_config.yaml 包含以下配置:
litellm_settings:
cache: true
store_audit_logs: true
success_callback: ["s3_v2"]
s3_callback_params:
s3_bucket_name: os.environ/S3_BUCKET_NAME
s3_region_name: os.environ/AWS_DEFAULT_REGION
s3_path: litellm-logs
s3_endpoint_url: https://s3.amazonaws.comLiteLLM 将每次请求的完整日志以 JSON 格式写入 S3:
s3://<bucket>/litellm-logs/
├── 2026-05-10/
│ ├── time-01-07-22_chatcmpl-xxx.json
│ └── ...
├── 2026-05-17/
│ └── ...
每条日志包含的关键字段:
{
"id": "chatcmpl-xxx",
"model": "bedrock/us.anthropic.claude-opus-4-6-v1",
"status": "success",
"response_time": 3.07,
"response_cost": 0.0022495,
"metadata": {
"user_api_key_alias": "claudecode",
"user_api_key_auth_metadata": {
"tags": ["user:cloudcode-user1", "project:game1"]
}
},
"cost_breakdown": {
"input_cost": 0.0001595,
"output_cost": 0.00209
},
"request_tags": ["user:cloudcode-user1", "project:game1"]
}在 Athena 控制台(或 CLI)执行以下 DDL,创建外部表映射 S3 JSON 日志:
CREATE EXTERNAL TABLE IF NOT EXISTS litellm_logs (
id STRING,
litellm_call_id STRING,
trace_id STRING,
call_type STRING,
cache_hit STRING,
stream STRING,
status STRING,
custom_llm_provider STRING,
startTime DOUBLE,
endTime DOUBLE,
response_time DOUBLE,
model STRING,
metadata STRUCT<
user_api_key_alias: STRING,
user_api_key_spend: DOUBLE,
user_api_key_user_id: STRING,
user_api_key_request_route: STRING,
requester_ip_address: STRING,
user_agent: STRING,
user_api_key_auth_metadata: STRUCT<
tags: ARRAY<STRING>
>
>,
messages ARRAY<STRUCT<role: STRING, content: STRING>>,
response STRUCT<
choices: ARRAY<STRUCT<
finish_reason: STRING,
message: STRUCT<content: STRING, role: STRING>
>>,
usage: STRUCT<
completion_tokens: INT,
prompt_tokens: INT,
total_tokens: INT
>
>,
response_cost DOUBLE,
cost_breakdown STRUCT<
input_cost: DOUBLE,
output_cost: DOUBLE,
total_cost: DOUBLE
>,
request_tags ARRAY<STRING>,
model_group STRING,
api_base STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'ignore.malformed.json' = 'true',
'case.insensitive' = 'true'
)
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<your-bucket>/litellm-logs/'
TBLPROPERTIES ('has_encrypted_data'='false');创建视图将嵌套字段展平为独立列,方便 QuickSight 使用:
CREATE OR REPLACE VIEW litellm_logs_flat AS
SELECT
id,
status,
custom_llm_provider,
startTime,
response_time,
model,
-- 时间字段
from_unixtime(CAST(startTime AS BIGINT)) AS request_time,
date(from_unixtime(CAST(startTime AS BIGINT))) AS request_date,
hour(from_unixtime(CAST(startTime AS BIGINT))) AS request_hour,
day_of_week(from_unixtime(CAST(startTime AS BIGINT))) AS request_day_of_week,
-- 用户维度
metadata.user_api_key_alias,
metadata.requester_ip_address AS metadata_ip,
metadata.user_agent AS metadata_user_agent,
-- 从 tags 提取用户和项目
regexp_extract(
array_join(filter(metadata.user_api_key_auth_metadata.tags, t -> t LIKE 'user:%'), ','),
'user:(.*)', 1
) AS tag_user,
regexp_extract(
array_join(filter(metadata.user_api_key_auth_metadata.tags, t -> t LIKE 'project:%'), ','),
'project:(.*)', 1
) AS tag_project,
-- 响应
response.choices[1].finish_reason AS finish_reason,
response.choices[1].message.content AS response_content,
response.usage.completion_tokens AS usage_completion_tokens,
response.usage.prompt_tokens AS usage_prompt_tokens,
response.usage.total_tokens AS usage_total_tokens,
-- 费用
response_cost,
cost_breakdown.input_cost,
cost_breakdown.output_cost,
cost_breakdown.total_cost,
request_tags,
model_group,
api_base
FROM litellm_logs| 环节 | 延迟 |
|---|---|
| LiteLLM → S3 | 实时(秒级) |
| S3 → Athena | 实时(Athena 直接读 S3,无缓存) |
| Athena → QuickSight (Direct Query) | 实时 |
| Athena → QuickSight (SPICE) | 按刷新计划(最快每小时) |
LiteLLM 自动在 Aurora 中维护以下汇总表:
| 表名 | 说明 |
|---|---|
LiteLLM_SpendLogs |
每次请求的花费明细 |
LiteLLM_DailyTagSpend |
按 Tag(项目/环境)每日花费汇总 |
LiteLLM_DailyUserSpend |
按用户每日花费汇总 |
LiteLLM_DailyTeamSpend |
按团队每日花费汇总 |
LiteLLM_VerificationToken |
API Key 信息(别名、花费、预算) |
LiteLLM_AuditLog |
管理操作审计日志 |
| 维度 | 字段 | 用途 |
|---|---|---|
| 用户维度 | user_api_key_alias |
哪个 API Key / 谁在用 |
| 项目维度 | tag(来自 request tags) |
哪个项目 |
| 环境维度 | tag(如 env:prod) |
哪个环境 |
| 模型维度 | model_group |
用了哪个模型 |
| 时间维度 | date |
按天聚合 |
QuickSight 使用 IAM Role
aws-quicksight-service-role-v0,需要附加以下策略:
| 策略 | 用途 |
|---|---|
AWSQuickSightS3Policy |
访问 S3 日志桶 |
AWSQuicksightAthenaAccess |
访问 Athena 查询 |
AWSQuickSightVPCPolicy |
VPC 网络访问(连 Aurora) |
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "arn:aws:s3:::*"
},
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:GetBucketLocation", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::<your-bucket>",
"arn:aws:s3:::<your-bucket>/*"
]
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateNetworkInterface",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:DeleteNetworkInterface",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeVpcs"
],
"Resource": "*"
}
]
}aws quicksight create-vpc-connection \
--aws-account-id <ACCOUNT_ID> \
--vpc-connection-id litellm-vpc \
--name litellm-vpc \
--subnet-ids subnet-xxx subnet-yyy \
--security-group-ids sg-xxx \
--role-arn arn:aws:iam::<ACCOUNT_ID>:role/service-role/aws-quicksight-service-role-v0 \
--region us-east-1重要:Aurora 安全组需要添加入站规则,允许 QuickSight VPC ENI 访问端口 5432:
aws ec2 authorize-security-group-ingress \
--group-id <aurora-sg-id> \
--protocol tcp --port 5432 \
--source-group <aurora-sg-id>aws quicksight create-data-source \
--aws-account-id <ACCOUNT_ID> \
--data-source-id litellm-athena \
--name "litellm-athena" \
--type ATHENA \
--data-source-parameters '{"AthenaParameters":{"WorkGroup":"primary"}}' \
--region us-east-1aws quicksight create-data-source \
--aws-account-id <ACCOUNT_ID> \
--data-source-id litellm-aurora \
--name "litellm-aurora" \
--type POSTGRESQL \
--data-source-parameters '{
"PostgreSqlParameters": {
"Host": "<aurora-cluster-endpoint>",
"Port": 5432,
"Database": "litellm"
}
}' \
--credentials '{
"SecretArn": "arn:aws:secretsmanager:us-east-1:<ACCOUNT_ID>:secret:<secret-name>"
}' \
--vpc-connection-properties '{
"VpcConnectionArn": "arn:aws:quicksight:us-east-1:<ACCOUNT_ID>:vpcConnection/litellm-vpc"
}' \
--ssl-properties '{"DisableSsl": false}' \
--region us-east-1# 创建 Dataset(以 DailyTagSpend 为例)
aws quicksight create-data-set \
--aws-account-id <ACCOUNT_ID> \
--data-set-id litellm-daily-tag-spend \
--name "LiteLLM Daily Tag Spend" \
--import-mode SPICE \
--physical-table-map '{
"table": {
"RelationalTable": {
"DataSourceArn": "arn:aws:quicksight:us-east-1:<ACCOUNT_ID>:datasource/litellm-aurora",
"Schema": "public",
"Name": "LiteLLM_DailyTagSpend",
"InputColumns": [
{"Name": "tag", "Type": "STRING"},
{"Name": "date", "Type": "STRING"},
{"Name": "model", "Type": "STRING"},
{"Name": "spend", "Type": "DECIMAL"},
{"Name": "api_requests", "Type": "INTEGER"},
{"Name": "successful_requests", "Type": "INTEGER"},
{"Name": "failed_requests", "Type": "INTEGER"}
]
}
}
}' \
--permissions '[{
"Principal": "arn:aws:quicksight:us-east-1:<ACCOUNT_ID>:user/default/<your-user>",
"Actions": ["quicksight:DescribeDataSet","quicksight:PassDataSet","quicksight:UpdateDataSet","quicksight:DeleteDataSet","quicksight:DescribeIngestion","quicksight:ListIngestions","quicksight:CreateIngestion"]
}]' \
--region us-east-1
# 设置每日刷新(UTC 00:00 = 北京时间 08:00)
aws quicksight create-refresh-schedule \
--aws-account-id <ACCOUNT_ID> \
--data-set-id litellm-daily-tag-spend \
--schedule '{
"ScheduleId": "daily-refresh",
"ScheduleFrequency": {
"Interval": "DAILY",
"TimeOfTheDay": "00:00"
},
"RefreshType": "FULL_REFRESH"
}' \
--region us-east-1| 措施 | 说明 |
|---|---|
| ✅ Secrets Manager | 数据库密码通过 Secrets Manager 管理,不硬编码 |
| ✅ 密码自动轮换 | Secrets Manager 支持自动轮换,QuickSight 自动获取新密码 |
| ✅ SSL 加密传输 | QuickSight 到 Aurora 连接启用 SSL |
| ✅ KMS 加密存储 | Aurora 和 S3 数据使用 KMS 加密 |
| 措施 | 说明 |
|---|---|
| ✅ VPC 私有连接 | QuickSight 通过 VPC Connection 访问 Aurora,不走公网 |
| ✅ 安全组隔离 | Aurora 安全组仅允许 QuickSight ENI 和 LiteLLM ECS 访问 |
| ✅ 无公网暴露 | Aurora 不开启公网访问 |
| 措施 | 说明 |
|---|---|
| ✅ IAM 最小权限 | QuickSight Role 仅授予必要的 S3/Athena/EC2 权限 |
| ✅ QuickSight 用户权限 | 通过 QuickSight 内部权限控制谁能看哪些 Dataset |
| ✅ 行级安全 (RLS) | 可配置 RLS 限制不同用户只能看自己项目的数据 |
| ✅ 审计日志 | LiteLLM 的 store_audit_logs: true 记录所有管理操作 |
| 措施 | 说明 |
|---|---|
| ✅ S3 桶策略 | 限制只有 QuickSight Role 和 LiteLLM 可以访问 |
| ✅ Athena 查询结果加密 | 查询结果存储在指定 S3 路径,可配置 KMS 加密 |
| ✅ SPICE 加密 | QuickSight SPICE 数据静态加密 |
| 图表 | 数据源 | 用途 |
|---|---|---|
| 每日费用趋势 | DailyTagSpend | 费用是否异常增长 |
| 按项目费用占比 | DailyTagSpend | 钱花在哪个项目 |
| 按用户费用排行 | DailyUserSpend | 谁花钱最多 |
| 请求成功率 | DailyTagSpend | 服务稳定性 |
| 平均响应延迟 | S3 Athena 视图 | 性能监控 |
| Token 消耗趋势 | DailyTagSpend | 容量规划 |
| API Key 预算使用率 | VerificationToken | 预算告警 |
| 模型使用分布 | SpendLogs | 模型选择优化 |
tag 字段下拉筛选user_api_key_alias 下拉筛选model_group 下拉筛选QuickSight Enterprise Edition 内置 Amazon Q,支持自然语言创建图表:
"Show total spend by tag as pie chart"
"Line chart of api_requests by date"
"Top 5 api_key by spend"
A: 需要在 Manage QuickSight → Security &
permissions 中启用 Athena 访问,并确保 IAM Role 附加了
AWSQuicksightAthenaAccess 策略。
A: QuickSight 直连 S3 只能解析一层嵌套。解决方案是通过 Athena 建表 + 视图展平嵌套字段,QuickSight 连接 Athena 而非直连 S3。
A: 使用 CLI 创建数据源可以指定 VPC Connection,绕过 UI 限制。
A: CLI 创建的数据源需要手动授权给用户:
aws quicksight update-data-source-permissions \
--data-source-id <id> \
--grant-permissions '[{"Principal":"<user-arn>","Actions":[...]}]'A: SPICE 是快照模式,需要刷新。可以设置每日自动刷新,或手动点击 “Refresh now”。
| 服务 | 计费方式 | 预估费用 |
|---|---|---|
| QuickSight Enterprise | $24/月/Author, $5/月/Reader | 按用户数 |
| Athena | $5/TB 扫描量 | 日志量小时几乎免费 |
| S3 存储 | $0.023/GB/月 | 日志量小时 < $1/月 |
| Aurora | 按实例规格 | 已有,无额外费用 |
| SPICE | $0.38/GB/月 | 汇总数据量小,< $1/月 |