news 2026/4/15 10:48:51

Iceberg Rest Catalog + OSS 实践踩坑记录:Polaris x-amz-content-sha256 报错 与 Nessie 配置

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
Iceberg Rest Catalog + OSS 实践踩坑记录:Polaris x-amz-content-sha256 报错 与 Nessie 配置

在做查询引擎Iceberg 性能测试,主要是环境准备、测试集准备、性能测试开展。

本篇只包括环境准备部分,记录下环境准备过程,几个方面:

Catalog:尽量贴合生产,需要主流的catalog type,且性能测试在国内,所以Glue、Snowflake Catalog 等都用不了,只能自己部署1套catalog 服务。

Storage:性能测试机器在国内,海外的对象存储是用不了了(比如S3,Azure,GCS),只能用国内的(比如OSS,COS,OBS)且可能由于catalog server没支持到位,只能走S3协议。

Query Engine:保证选的catalog type 几种查询引擎都支持。

过滤以上几个条件,环境情况如下

Type System

catalog type Rest catalog Polaris, Nessie

storage scheme S3 OSS

query engine Doris, Trino

以下集成情况二选一

Doris/Trino + Polaris + OSS

Doris/Trino + Nessie + OSS

Polaris

先说下结论,最新Polaris版本(1.2.0)+ OSS(S3协议) 跑不起来,会有个报错

2025-12-12 17:13:45,460 INFO [org.apa.pol.ser.exc.IcebergExceptionMapper] [4a2120d6-8520-441d-b502-a090f890b03d_0000000000000000030,POLARIS] [,,,] (executor-thread-1) Handling runtimeException aws-chunked encoding is not supported with the specified x-amz-content-sha256 value. (Service: S3, Status Code: 400, Request ID: 693C4D496D7461373771398C) (SDK Attempt Count: 1)

参考两个文档

0002-00000427

使用AWS SDK访问OSS

大概意思这个x-amz-content-sha256 header 不能传,Polaris也没配置参数可以控制这个。

在最新Polaris版本(1.2.0)加了个开关stsUnavailable 支持 Polaris 适配所有支持S3协议的对象存储。在1.2.0 之前,因为必须要走标准的S3 STS鉴权,所以老版本Polaris OSS肯定用不了。

这里有个小插曲,release note里stsUnavailable这个参数拼写错了,导致一直走STS鉴权,花了点时间折腾了下。最终通过日志发现这个参数没设置上,文档上拼写错的,复制错了。

Polaris 1.2.0 release note

image

当然,顺手提个PR fix下

https://github.com/apache/polaris/pull/3262

附上Polaris + OSS的docker yaml

参考quickstart和ceph example 改的

services:

polaris:

image: apache/polaris:latest

ports:

# API port

- "8181:8181"

# Management port (metrics and health checks)

- "8182:8182"

# Optional, allows attaching a debugger to the Polaris JVM

- "5005:5005"

environment:

JAVA_DEBUG: true

JAVA_DEBUG_PORT: "*:5005"

AWS_REGION: cn-beijing

AWS_ACCESS_KEY_ID: xxxx

AWS_SECRET_ACCESS_KEY: xxxx

AWS_ENDPOINT: http://oss-cn-beijing-internal.aliyuncs.com

POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3t

polaris.realm-context.realms: POLARIS

quarkus.otel.sdk.disabled: "true"

healthcheck:

test: ["CMD", "curl", "http://localhost:8182/q/health"]

interval: 2s

timeout: 10s

retries: 10

start_period: 10s

polaris-setup:

image: alpine/curl

depends_on:

polaris:

condition: service_healthy

environment:

- CLIENT_ID=${ROOT_CLIENT_ID:-root}

- CLIENT_SECRET=${ROOT_CLIENT_SECRET:-s3cr3t}

- CATALOG_NAME=${CATALOG_NAME:-quickstart_catalog}

- REALM=${POLARIS_REALM:-POLARIS}

- BASE_LOCATION=${BASE_LOCATION:-s3://xxx/polaris_warehouse}

- S3_ENDPOINT=${S3_ENDPOINT:-http://oss-cn-beijing-internal.aliyuncs.com}

entrypoint: /bin/sh

command:

- -c

- |

set -ex

sleep 10

sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories

apk add --no-cache jq

echo "Obtaining root access token..."

TOKEN_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/catalog/v1/oauth/tokens \

-H 'Content-Type: application/x-www-form-urlencoded' \

-d "grant_type=client_credentials&client_id=$${CLIENT_ID}&client_secret=$${CLIENT_SECRET}&scope=PRINCIPAL_ROLE:ALL")

TOKEN=$$(echo $$TOKEN_RESPONSE | jq -r '.access_token')

echo "Obtained access token"

echo "Creating catalog '$$CATALOG_NAME' in realm $$REALM..."

PAYLOAD='{

"catalog": {

"name": "'$$CATALOG_NAME'",

"type": "INTERNAL",

"readOnly": false,

"properties": {

"default-base-location": "'$$BASE_LOCATION'"

},

"storageConfigInfo": {

"storageType": "S3",

"allowedLocations": ["'$$BASE_LOCATION'", "'$$BASE_LOCATION'/"],

"endpoint": "'$$S3_ENDPOINT'",

"region": "cn-beijing",

"endpointInternal": "'$$S3_ENDPOINT'",

"pathStyleAccess": false,

"stsUnavailable": true

}

}

}'

curl -s -X POST http://polaris:8181/api/management/v1/catalogs \

-H "Authorization: Bearer $$TOKEN" \

-H "Accept: application/json" \

-H "Content-Type: application/json" \

-H "Polaris-Realm: $$REALM" \

-d "$$PAYLOAD" > /dev/null

echo "✅ Catalog created"

echo ""

echo "Creating principal 'quickstart_user'..."

PRINCIPAL_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/management/v1/principals \

-H "Authorization: Bearer $$TOKEN" \

-H "Polaris-Realm: $$REALM" \

-H "Content-Type: application/json" \

-d '{"principal": {"name": "quickstart_user", "properties": {}}}')

USER_CLIENT_ID=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientId')

USER_CLIENT_SECRET=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientSecret')

echo "✅ Principal created with clientId: $$USER_CLIENT_ID"

echo "Creating principal role 'quickstart_user_role'..."

curl -s -X POST http://polaris:8181/api/management/v1/principal-roles \

-H "Authorization: Bearer $$TOKEN" \

-H "Polaris-Realm: $$REALM" \

-H "Content-Type: application/json" \

-d '{"principalRole": {"name": "quickstart_user_role", "properties": {}}}' > /dev/null

echo "✅ Principal role created"

echo "Creating catalog role 'quickstart_catalog_role'..."

curl -s -X POST http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles \

-H "Authorization: Bearer $$TOKEN" \

-H "Polaris-Realm: $$REALM" \

-H "Content-Type: application/json" \

-d '{"catalogRole": {"name": "quickstart_catalog_role", "properties": {}}}' > /dev/null

echo "✅ Catalog role created"

echo "Assigning principal role to principal..."

curl -s -X PUT http://polaris:8181/api/management/v1/principals/quickstart_user/principal-roles \

-H "Authorization: Bearer $$TOKEN" \

-H "Polaris-Realm: $$REALM" \

-H "Content-Type: application/json" \

-d '{"principalRole": {"name": "quickstart_user_role"}}' > /dev/null

echo "✅ Principal role assigned"

echo "Assigning catalog role to principal role..."

curl -s -X PUT http://polaris:8181/api/management/v1/principal-roles/quickstart_user_role/catalog-roles/$$CATALOG_NAME \

-H "Authorization: Bearer $$TOKEN" \

-H "Polaris-Realm: $$REALM" \

-H "Content-Type: application/json" \

-d '{"catalogRole": {"name": "quickstart_catalog_role"}}' > /dev/null

echo "✅ Catalog role assigned"

echo "Granting CATALOG_MANAGE_CONTENT privilege..."

curl -s -X PUT http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles/quickstart_catalog_role/grants \

-H "Authorization: Bearer $$TOKEN" \

-H "Polaris-Realm: $$REALM" \

-H "Content-Type: application/json" \

-d '{"type": "catalog", "privilege": "CATALOG_MANAGE_CONTENT"}' > /dev/null

echo "✅ Privileges granted"

echo ""

echo "=========================================="

echo "🎉 Polaris Quickstart Setup Complete!"

echo "=========================================="

echo ""

echo "Catalog: $$CATALOG_NAME"

echo " Storage: S3 (MinIO)"

echo " Location: s3://bucket123"

echo " MinIO UI: http://localhost:9001"

echo ""

echo "Root credentials:"

echo " Client ID: $$CLIENT_ID"

echo " Client Secret: $$CLIENT_SECRET"

echo ""

echo "User credentials:"

echo " Client ID: $$USER_CLIENT_ID"

echo " Client Secret: $$USER_CLIENT_SECRET"

echo ""

echo "Polaris main APIs:"

echo " - Iceberg REST: http://localhost:8181/api/catalog/v1"

echo " - Management: http://localhost:8181/api/management/v1"

echo " - Generic Tables: http://localhost:8181/api/polaris/v1"

echo ""

echo "Polaris admin APIs:"

echo " - Health check: http://localhost:8182/q/health"

echo " - Metrics: http://localhost:8182/q/metrics"

echo ""

echo "To get started with Spark:"

echo " spark-sql \\"

echo " --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \\"

echo " --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\"

echo " --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \\"

echo " --conf spark.sql.catalog.polaris.type=rest \\"

echo " --conf spark.sql.catalog.polaris.warehouse=$$CATALOG_NAME \\"

echo " --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \\"

echo " --conf spark.sql.catalog.polaris.credential=$$USER_CLIENT_ID:$$USER_CLIENT_SECRET \\"

echo " --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \\"

echo " --conf spark.sql.catalog.polaris.s3.endpoint=http://localhost:9000 \\"

echo " --conf spark.sql.catalog.polaris.s3.path-style-access=true \\"

echo " --conf spark.sql.catalog.polaris.s3.access-key-id=minio_root \\"

echo " --conf spark.sql.catalog.polaris.s3.secret-access-key=m1n1opwd \\"

echo " --conf spark.sql.catalog.polaris.client.region=irrelevant \\"

echo " --conf spark.sql.defaultCatalog=polaris"

echo ""

echo "To get started with REST API:"

echo " # Get a token"

echo " export TOKEN=\$$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \\"

echo " -d 'grant_type=client_credentials' \\"

echo " -d 'client_id=$$USER_CLIENT_ID' \\"

echo " -d 'client_secret=$$USER_CLIENT_SECRET' \\"

echo " -d 'scope=PRINCIPAL_ROLE:ALL' \\"

echo " | jq -r '.access_token')"

echo ""

echo " # Create a namespace"

echo " curl -X POST http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"

echo " -H \"Authorization: Bearer \$$TOKEN\" \\"

echo " -H 'Content-Type: application/json' \\"

echo " -d '{\"namespace\": [\"my_namespace\"], \"properties\": {}}'"

echo ""

echo " # List namespaces"

echo " curl -X GET http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"

echo " -H \"Authorization: Bearer \$$TOKEN\""

echo ""

echo "=========================================="

Nessie

这个也说下结论,能跑起来。

首先先看下sha256 这种header是怎么解决的

Nessie有个开关可以控制这块

image

所以问题迎刃而解了

附上Nessie + OSS的docker yaml

version: '3'

services:

nessie:

image: ghcr.io/projectnessie/nessie

container_name: nessie

ports:

- "19120:19120"

environment:

- nessie.catalog.default-warehouse=warehouse

- nessie.catalog.warehouses.warehouse.location=s3://mybucket/my-lakehouse/

- nessie.catalog.warehouses.zgx.location=s3://xxxxx/iceberg_warehouse/

- nessie.catalog.service.s3.default-options.endpoint=http://oss-cn-beijing-internal.aliyuncs.com

- nessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:nessie.catalog.secrets.access-key

- nessie.catalog.service.s3.default-options.path-style-access=false

- nessie.catalog.service.s3.default-options.chunked-encoding-enabled=false

- nessie.catalog.service.s3.default-options.auth-type=STATIC

- nessie.catalog.secrets.access-key.name=xxx

- nessie.catalog.secrets.access-key.secret=xxx

- nessie.catalog.service.s3.default-options.region=cn-beijing

- nessie.server.authentication.enabled=false

- nessie.catalog.service.s3.default-options.request-signing-enabled=false

networks:

nessie-rest:

networks:

nessie-rest:

Trino 测试 nessie 连通性

参考

https://projectnessie.org/nessie-latest/trino/?h=client+temp#starter-configuration

获取对应的配置

NESSIE_BASE_URL="http://127.0.0.1:19120/"

curl "${NESSIE_BASE_URL}/iceberg-ext/v1/client-template/trino?format=static"

补充配置 s3.aws-access-key s3.aws-secret-key

Trino 就可以正常读Iceberg表了

[trino@dec7c1a34cb6 /]$ trino --catalog nessie

trino> use zgx;

USE

trino:zgx> show tables;

Table

----------------------

unpartitioned_table

unpartitioned_table1

unpartitioned_table2

unpartitioned_table3

(4 rows)

Query 20251214_145124_00043_v9qpy, FINISHED, 1 node

Splits: 19 total, 19 done (100.00%)

0.24 [4 rows, 417B] [16 rows/s, 1.72KiB/s]

trino:zgx> select * from unpartitioned_table;

col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9

------+------+---------------------+--------+------------+------------+-------+------------+----------------------------

true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456

true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456

true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456

true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456

true | 101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456

(5 rows)

Query 20251214_145128_00044_v9qpy, FINISHED, 1 node

Splits: 5 total, 5 done (100.00%)

0.26 [5 rows, 27.5KiB] [19 rows/s, 107KiB/s]

trino:zgx>

但是Trino 做DML 操作有点不顺利...

trino:zgx> CREATE TABLE user_profiles (

-> id BIGINT,

-> name VARCHAR,

-> registration_date DATE

-> )

-> WITH (

-> format = 'PARQUET'

-> );

trino:zgx> insert into debug values(1,2);

Query 20251214_152721_00027_jtedg, FAILED, 1 node

Splits: 66 total, 1 done (1.52%)

0.34 [1 rows, 0B] [2 rows/s, 0B/s]

Query 20251214_152721_00027_jtedg failed: Error committing write parquet to Hive

trino:zgx>

Caused by: software.amazon.awssdk.services.s3.model.S3Exception: A header you provided implies functionality that is not implemented. (Service: S3, Status Code: 400, Request ID: 693EEB94153DBB3432C97FC5) (SDK Attempt Count: 1)

image

可以看出Trino 兼容国内产品没那么好,Doris 试过能正常建表 写数据。

Doris catalog 创建语句

CREATE CATALOG `nessie` PROPERTIES (

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/4/15 10:47:30

前端设计模式:详解、应用场景与核心对比

前端设计模式:详解、应用场景与核心对比 前端设计模式是解决前端开发中重复出现的问题的标准化解决方案,涵盖创建型、结构型、行为型三大类核心模式,同时包含前端特有的适配型模式(如发布-订阅、MVVM等)。本文结合前端…

作者头像 李华
网站建设 2026/4/14 14:43:50

JoltPhysics球体碰撞边缘优化:从理论到工程实践

JoltPhysics球体碰撞边缘优化:从理论到工程实践 【免费下载链接】JoltPhysics A multi core friendly rigid body physics and collision detection library, written in C, suitable for games and VR applications. 项目地址: https://gitcode.com/GitHub_Trend…

作者头像 李华
网站建设 2026/4/10 20:02:52

30亿参数掀起企业AI革命:IBM Granite 4.0如何重塑部署范式

30亿参数掀起企业AI革命:IBM Granite 4.0如何重塑部署范式 【免费下载链接】granite-4.0-h-small-bnb-4bit 项目地址: https://ai.gitcode.com/hf_mirrors/unsloth/granite-4.0-h-small-bnb-4bit 导语 当企业还在为传统大模型部署的高昂成本和资源需求发愁…

作者头像 李华
网站建设 2026/4/10 8:26:29

企业级工业物联网网关:iioiot/iotgateway如何重塑智能制造数据架构

企业级工业物联网网关:iioiot/iotgateway如何重塑智能制造数据架构 【免费下载链接】iotgateway 基于.NET8的跨平台物联网网关。通过可视化配置,轻松的连接到你的任何设备和系统(如PLC、扫码枪、CNC、数据库、串口设备、上位机、OPC Server、OPC UA Serv…

作者头像 李华
网站建设 2026/4/1 3:02:51

ATX自动化测试终极指南:从零开始快速上手

ATX自动化测试终极指南:从零开始快速上手 【免费下载链接】ATX Smart phone automation tool. Support iOS, Android, WebApp and game. 项目地址: https://gitcode.com/gh_mirrors/at/ATX ATX(AutomatorX)是一款由网易游戏团队开发的…

作者头像 李华