integration · 12 menit baca

API Integration Basics

Menggunakan OpenAI dan Anthropic API dengan prompt engineering yang efektif

Introduction

Mengintegrasikan AI ke aplikasi Anda membutuhkan lebih dari sekadar API call. Anda perlu memahami: parameter API, token management, error handling, rate limiting, dan cost optimization.

Lesson ini fokus pada OpenAI API (GPT-4, GPT-3.5) dan Anthropic API (Claude), dua provider paling populer untuk production use cases.

API Basics

Authentication

Kedua provider menggunakan API key authentication.

OpenAI:

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

Anthropic:

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY
});

Basic Request Structure

OpenAI Chat Completions:

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in simple terms." }
  ],
  temperature: 0.7,
  max_tokens: 500
});

const answer = response.choices[0].message.content;

Anthropic Messages:

const response = await anthropic.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  system: "You are a helpful assistant.",
  messages: [
    { role: "user", content: "Explain quantum computing in simple terms." }
  ]
});

const answer = response.content[0].text;

Technique	When to Use	Token Cost	Quality	Complexity
OpenAI API	Need GPT-4 capabilities, function calling	Medium	High	Low
Anthropic API	Need longer context, better reasoning	Medium	High	Low

Pilih provider berdasarkan use case. OpenAI lebih mature ecosystem, Anthropic lebih baik untuk long context.

Key Parameters

Model Selection

Technique	When to Use	Token Cost	Quality	Complexity
GPT-4	Complex reasoning, high accuracy needed	High	High	High
GPT-3.5 Turbo	Simple tasks, cost optimization	Low	Medium	Low
Claude 3.5 Sonnet	Long documents, coding tasks	Medium	High	Medium
Claude 3 Haiku	Fast responses, low cost	Low	Medium	Low

Model selection = tradeoff antara quality, speed, dan cost. Start dengan cheaper model, upgrade jika quality insufficient.

Temperature

Controls randomness. Range: 0.0 (deterministic) to 2.0 (very random).

Guidelines:

0.0-0.3 — Factual tasks (data extraction, classification, formatting)
0.4-0.7 — Balanced (general Q&A, explanations, summaries)
0.8-1.2 — Creative tasks (brainstorming, storytelling, marketing copy)
1.3-2.0 — Highly creative (experimental, artistic)

Before

25 tokens

const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...]
});

After

38 tokens

const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...],
temperature: 0.2  // Low for factual extraction
});

Max Tokens

Limits output length. Affects cost dan response time.

Calculation:

Input tokens + max_tokens = total tokens charged
Set max_tokens berdasarkan expected output length + buffer

Example:

// Bad: no limit, bisa generate 4000+ tokens
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [...]
});

// Good: limit output untuk control cost
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [...],
  max_tokens: 500  // ~375 words max
});

System Prompt vs User Prompt

System prompt — persistent instructions, role definition, constraints. User prompt — specific request, variable input.

Before

45 tokens

messages: [
{ role: "user", content: "You are a customer support agent. Be polite and helpful. Answer this: How do I reset my password?" }
]

After

52 tokens

messages: [
{ role: "system", content: "You are a customer support agent. Be polite, helpful, and concise." },
{ role: "user", content: "How do I reset my password?" }
]

Benefit: System prompt reusable across requests. User prompt stays focused on actual query.

Error Handling

API calls bisa fail. Handle errors gracefully.

Common errors:

401 Unauthorized — Invalid API key
429 Rate Limit — Too many requests
500 Server Error — Provider downtime
Timeout — Request took too long

Robust error handling:

async function callAI(prompt, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await openai.chat.completions.create({
        model: "gpt-4",
        messages: [{ role: "user", content: prompt }],
        max_tokens: 500,
        timeout: 30000  // 30s timeout
      });
      return response.choices[0].message.content;
    } catch (error) {
      if (error.status === 429) {
        // Rate limit: wait and retry with exponential backoff
        const waitTime = Math.pow(2, i) * 1000;
        await sleep(waitTime);
        continue;
      }
      if (error.status === 500 && i < maxRetries - 1) {
        // Server error: retry
        await sleep(1000);
        continue;
      }
      // Other errors: throw immediately
      throw new Error(`AI API error: ${error.message}`);
    }
  }
  throw new Error('Max retries exceeded');
}

Rate Limiting

Providers membatasi requests per minute (RPM) dan tokens per minute (TPM).

OpenAI rate limits (tier-dependent):

Free tier: 3 RPM, 40K TPM
Tier 1: 500 RPM, 200K TPM
Tier 5: 10K RPM, 30M TPM

Strategies untuk handle rate limits:

Request queuing

import PQueue from 'p-queue';

const queue = new PQueue({
  concurrency: 5,  // Max 5 concurrent requests
  interval: 60000,  // Per minute
  intervalCap: 500  // Max 500 requests per minute
});

async function queuedAICall(prompt) {
  return queue.add(() => callAI(prompt));
}

Token budgeting

let tokensUsedThisMinute = 0;
const TOKEN_LIMIT_PER_MINUTE = 200000;

async function budgetedAICall(prompt, estimatedTokens) {
  if (tokensUsedThisMinute + estimatedTokens > TOKEN_LIMIT_PER_MINUTE) {
    await waitUntilNextMinute();
    tokensUsedThisMinute = 0;
  }
  
  const response = await callAI(prompt);
  tokensUsedThisMinute += response.usage.total_tokens;
  return response;
}

Cost Optimization

Token usage = cost. Optimize untuk reduce spending.

Strategy 1: Prompt Caching (Anthropic)

Anthropic supports prompt caching untuk reduce cost pada repeated system prompts.

const response = await anthropic.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "Very long system prompt...",
      cache_control: { type: "ephemeral" }  // Cache this
    }
  ],
  messages: [
    { role: "user", content: "Variable user query" }
  ]
});

Savings: Cached tokens cost 90% less. Huge savings untuk long system prompts.

Strategy 2: Model Tiering

Route requests ke cheaper models when possible.

async function smartAICall(prompt, complexity) {
  const model = complexity === 'high' ? 'gpt-4' : 'gpt-3.5-turbo';
  
  return openai.chat.completions.create({
    model,
    messages: [{ role: "user", content: prompt }],
    max_tokens: 500
  });
}

// Usage
await smartAICall("Explain quantum physics", 'high');  // GPT-4
await smartAICall("Summarize this paragraph", 'low');  // GPT-3.5

Strategy 3: Output Length Control

Shorter output = lower cost.

Before

35 tokens

Summarize this article: [5000 word article]

After

48 tokens

Summarize this article in 3 bullet points, max 50 words total: [5000 word article]

Penghematan Token

↓ 79%

Before850 tokens

After180 tokens

Savings: 78% token reduction dengan explicit length constraint.

Streaming Responses

Untuk better UX, stream responses instead of waiting for complete response.

OpenAI streaming:

const stream = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: prompt }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);  // Stream to user
}

Anthropic streaming:

const stream = await anthropic.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  messages: [{ role: "user", content: prompt }],
  stream: true
});

for await (const event of stream) {
  if (event.type === 'content_block_delta') {
    process.stdout.write(event.delta.text);
  }
}

Production Checklist

Before deploying AI integration:

Security:

API keys stored in environment variables
API keys never logged or exposed to client
Rate limiting implemented
Input validation (prevent prompt injection)

Reliability:

Error handling with retries
Timeout configured (30-60s)
Fallback behavior for API failures
Health check endpoint

Cost Control:

Max tokens limit set
Model selection optimized
Token usage monitored
Budget alerts configured

Quality:

Temperature tuned for use case
System prompt tested with diverse inputs
Output validation implemented
A/B testing for prompt changes

Monitoring & Logging

Track API usage untuk optimize cost dan quality.

Metrics to log:

{
  timestamp: Date.now(),
  model: "gpt-4",
  promptTokens: 150,
  completionTokens: 320,
  totalTokens: 470,
  cost: 0.0141,  // Calculate based on pricing
  latency: 2340,  // ms
  success: true,
  errorType: null
}

Dashboard metrics:

Total cost per day/week/month
Average tokens per request
P95 latency
Error rate by type
Most expensive prompts (for optimization)

Summary

API integration essentials:

Setup:

Store API keys in environment variables
Choose provider based on use case (OpenAI vs Anthropic)
Select model based on quality/cost tradeoff

Key parameters:

Temperature: 0.0-0.3 (factual), 0.4-0.7 (balanced), 0.8+ (creative)
Max tokens: set limit untuk control cost
System prompt: reusable instructions
User prompt: specific request

Production requirements:

Error handling with retries
Rate limiting strategy
Cost optimization (caching, model tiering, length control)
Streaming untuk better UX
Monitoring dan logging

Cost optimization:

Use prompt caching (Anthropic)
Route to cheaper models when possible
Control output length explicitly
Monitor usage dan optimize expensive prompts

Integration yang baik = reliable, cost-effective, dan maintainable. Test thoroughly sebelum production deployment.

Introduction

API Basics

Authentication

Basic Request Structure

Key Parameters

Model Selection

Temperature

Before

After

Max Tokens

System Prompt vs User Prompt

Before

After

Error Handling

Rate Limiting

Cost Optimization

Strategy 1: Prompt Caching (Anthropic)

Strategy 2: Model Tiering

Strategy 3: Output Length Control

Before

After

Penghematan Token

Streaming Responses

Production Checklist

Monitoring & Logging

Summary

Related Topics