Claude 官方 SDK

以下为 Python 示例,Java / Go 等语言用法完全相同,只改 api_key 和 base_url 即可。

参考官方文档:docs.anthropic.com

对话 + 思考过程(extended thinking)

python

from anthropic import Anthropic

base_url = 'https://www.dianlitoken.com/'          # 接口地址
api_key = 'sk-xxxxxx'                                  # 你的令牌

client = Anthropic(api_key=api_key, base_url=base_url)


def chat_with_claude():
    try:
        message = client.messages.create(
            model='claude-opus-4-1-20250805',
            max_tokens=5000,
            temperature=1,            # 推理模式下 temperature 只能为 1 或不设置
            thinking={
                'type': 'enabled',
                'budget_tokens': 2000   # 不能 < 1024,也不能 > max_tokens
            },
            messages=[
                {'role': 'user', 'content': '你是谁'}
            ]
        )

        print('Claude 的回复:')
        for block in message.content:
            if block.type == 'thinking':
                print(f'\n[思考过程]: {block.thinking}')
            elif block.type == 'text':
                print(f'\n[回复内容]: {block.text}')

        return message

    except Exception as e:
        print(f'发生错误: {e}')
        return None


if __name__ == '__main__':
    print(f'使用 API 地址: {base_url}')
    print('\n非流对话测试:')
    chat_with_claude()
    print('\n=== 测试完成 ===')

思考模式约束

temperature 必须设为 1 或不传(其他值会被拒绝)
budget_tokens 必须 ≥ 1024,且 ≤ max_tokens
思考过程在响应的 content 数组里,type === 'thinking' 的 block

Prompt Caching(降低成本)

发送长 system prompt / 大文档时,启用缓存可以显著降低后续调用的费用。

python

import anthropic

client = anthropic.Anthropic(
    api_key='sk-xxxxxx',
    base_url='https://www.dianlitoken.com/',
)

long_document = '''...'''  # 你的长文档内容(必须 > 1024 tokens 才会缓存)

# 第一次:创建缓存
response1 = client.messages.create(
    model='claude-sonnet-4-20250514',
    max_tokens=256,
    messages=[{
        'role': 'user',
        'content': [
            {
                'type': 'text',
                'text': long_document,
                'cache_control': {'type': 'ephemeral'}     # 触发缓存
            },
            {
                'type': 'text',
                'text': '请用 3 句话总结这篇文档'
            }
        ]
    }]
)
print(f'缓存创建 tokens: {response1.usage.cache_creation_input_tokens}')
print(f'缓存读取 tokens: {response1.usage.cache_read_input_tokens}')

# 第二次:相同文档 + 不同问题 → 命中缓存
response2 = client.messages.create(
    model='claude-sonnet-4-20250514',
    max_tokens=256,
    messages=[{
        'role': 'user',
        'content': [
            {
                'type': 'text',
                'text': long_document,
                'cache_control': {'type': 'ephemeral'}
            },
            {
                'type': 'text',
                'text': 'Python 的 GIL 是什么?根据文档回答'
            }
        ]
    }]
)
print(f'缓存读取 tokens: {response2.usage.cache_read_input_tokens}')   # 应该 > 0

缓存使用规则

缓存的内容必须 > 1024 tokens 才会生效
缓存有 5 分钟 TTL,命中后会自动续期
缓存读取费用是正常输入的 10%

详见 Anthropic 官方缓存文档。

Claude 官方 SDK ​

对话 + 思考过程(extended thinking) ​

Prompt Caching(降低成本) ​

Claude 官方 SDK

对话 + 思考过程(extended thinking)

Prompt Caching(降低成本)