integration · 12 menit baca

API Integration Basics

Menggunakan OpenAI dan Anthropic API dengan prompt engineering yang efektif

Introduction

Mengintegrasikan AI ke aplikasi Anda membutuhkan lebih dari sekadar API call. Anda perlu memahami: parameter API, token management, error handling, rate limiting, dan cost optimization.

Lesson ini fokus pada OpenAI API (GPT-4, GPT-3.5) dan Anthropic API (Claude), dua provider paling populer untuk production use cases.

API Basics

Authentication

Kedua provider menggunakan API key authentication.

OpenAI:

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

Anthropic:

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY
});

Basic Request Structure

OpenAI Chat Completions:

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in simple terms." }
  ],
  temperature: 0.7,
  max_tokens: 500
});

const answer = response.choices[0].message.content;

Anthropic Messages:

const response = await anthropic.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  system: "You are a helpful assistant.",
  messages: [
    { role: "user", content: "Explain quantum computing in simple terms." }
  ]
});

const answer = response.content[0].text;
TechniqueWhen to UseToken CostQualityComplexity
OpenAI APINeed GPT-4 capabilities, function callingMediumHighLow
Anthropic APINeed longer context, better reasoningMediumHighLow
Pilih provider berdasarkan use case. OpenAI lebih mature ecosystem, Anthropic lebih baik untuk long context.

Key Parameters

Model Selection

TechniqueWhen to UseToken CostQualityComplexity
GPT-4Complex reasoning, high accuracy neededHighHighHigh
GPT-3.5 TurboSimple tasks, cost optimizationLowMediumLow
Claude 3.5 SonnetLong documents, coding tasksMediumHighMedium
Claude 3 HaikuFast responses, low costLowMediumLow
Model selection = tradeoff antara quality, speed, dan cost. Start dengan cheaper model, upgrade jika quality insufficient.

Temperature

Controls randomness. Range: 0.0 (deterministic) to 2.0 (very random).

Guidelines:

  • 0.0-0.3 — Factual tasks (data extraction, classification, formatting)
  • 0.4-0.7 — Balanced (general Q&A, explanations, summaries)
  • 0.8-1.2 — Creative tasks (brainstorming, storytelling, marketing copy)
  • 1.3-2.0 — Highly creative (experimental, artistic)

Before

25 tokens
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...]
});

After

38 tokens
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...],
temperature: 0.2  // Low for factual extraction
});

Max Tokens

Limits output length. Affects cost dan response time.

Calculation:

  • Input tokens + max_tokens = total tokens charged
  • Set max_tokens berdasarkan expected output length + buffer

Example:

// Bad: no limit, bisa generate 4000+ tokens
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [...]
});

// Good: limit output untuk control cost
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [...],
  max_tokens: 500  // ~375 words max
});

System Prompt vs User Prompt

System prompt — persistent instructions, role definition, constraints. User prompt — specific request, variable input.

Before

45 tokens
messages: [
{ role: "user", content: "You are a customer support agent. Be polite and helpful. Answer this: How do I reset my password?" }
]

After

52 tokens
messages: [
{ role: "system", content: "You are a customer support agent. Be polite, helpful, and concise." },
{ role: "user", content: "How do I reset my password?" }
]

Benefit: System prompt reusable across requests. User prompt stays focused on actual query.

Error Handling

API calls bisa fail. Handle errors gracefully.

Common errors:

  • 401 Unauthorized — Invalid API key
  • 429 Rate Limit — Too many requests
  • 500 Server Error — Provider downtime
  • Timeout — Request took too long

Robust error handling:

async function callAI(prompt, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await openai.chat.completions.create({
        model: "gpt-4",
        messages: [{ role: "user", content: prompt }],
        max_tokens: 500,
        timeout: 30000  // 30s timeout
      });
      return response.choices[0].message.content;
    } catch (error) {
      if (error.status === 429) {
        // Rate limit: wait and retry with exponential backoff
        const waitTime = Math.pow(2, i) * 1000;
        await sleep(waitTime);
        continue;
      }
      if (error.status === 500 && i < maxRetries - 1) {
        // Server error: retry
        await sleep(1000);
        continue;
      }
      // Other errors: throw immediately
      throw new Error(`AI API error: ${error.message}`);
    }
  }
  throw new Error('Max retries exceeded');
}

Rate Limiting

Providers membatasi requests per minute (RPM) dan tokens per minute (TPM).

OpenAI rate limits (tier-dependent):

  • Free tier: 3 RPM, 40K TPM
  • Tier 1: 500 RPM, 200K TPM
  • Tier 5: 10K RPM, 30M TPM

Strategies untuk handle rate limits:

  1. Request queuing
import PQueue from 'p-queue';

const queue = new PQueue({
  concurrency: 5,  // Max 5 concurrent requests
  interval: 60000,  // Per minute
  intervalCap: 500  // Max 500 requests per minute
});

async function queuedAICall(prompt) {
  return queue.add(() => callAI(prompt));
}
  1. Token budgeting
let tokensUsedThisMinute = 0;
const TOKEN_LIMIT_PER_MINUTE = 200000;

async function budgetedAICall(prompt, estimatedTokens) {
  if (tokensUsedThisMinute + estimatedTokens > TOKEN_LIMIT_PER_MINUTE) {
    await waitUntilNextMinute();
    tokensUsedThisMinute = 0;
  }
  
  const response = await callAI(prompt);
  tokensUsedThisMinute += response.usage.total_tokens;
  return response;
}

Cost Optimization

Token usage = cost. Optimize untuk reduce spending.

Strategy 1: Prompt Caching (Anthropic)

Anthropic supports prompt caching untuk reduce cost pada repeated system prompts.

const response = await anthropic.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "Very long system prompt...",
      cache_control: { type: "ephemeral" }  // Cache this
    }
  ],
  messages: [
    { role: "user", content: "Variable user query" }
  ]
});

Savings: Cached tokens cost 90% less. Huge savings untuk long system prompts.

Strategy 2: Model Tiering

Route requests ke cheaper models when possible.

async function smartAICall(prompt, complexity) {
  const model = complexity === 'high' ? 'gpt-4' : 'gpt-3.5-turbo';
  
  return openai.chat.completions.create({
    model,
    messages: [{ role: "user", content: prompt }],
    max_tokens: 500
  });
}

// Usage
await smartAICall("Explain quantum physics", 'high');  // GPT-4
await smartAICall("Summarize this paragraph", 'low');  // GPT-3.5

Strategy 3: Output Length Control

Shorter output = lower cost.

Before

35 tokens
Summarize this article: [5000 word article]

After

48 tokens
Summarize this article in 3 bullet points, max 50 words total: [5000 word article]

Penghematan Token

79%
Before850 tokens
After180 tokens

Savings: 78% token reduction dengan explicit length constraint.

Streaming Responses

Untuk better UX, stream responses instead of waiting for complete response.

OpenAI streaming:

const stream = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: prompt }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);  // Stream to user
}

Anthropic streaming:

const stream = await anthropic.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  messages: [{ role: "user", content: prompt }],
  stream: true
});

for await (const event of stream) {
  if (event.type === 'content_block_delta') {
    process.stdout.write(event.delta.text);
  }
}

Production Checklist

Before deploying AI integration:

Security:

  • API keys stored in environment variables
  • API keys never logged or exposed to client
  • Rate limiting implemented
  • Input validation (prevent prompt injection)

Reliability:

  • Error handling with retries
  • Timeout configured (30-60s)
  • Fallback behavior for API failures
  • Health check endpoint

Cost Control:

  • Max tokens limit set
  • Model selection optimized
  • Token usage monitored
  • Budget alerts configured

Quality:

  • Temperature tuned for use case
  • System prompt tested with diverse inputs
  • Output validation implemented
  • A/B testing for prompt changes

Monitoring & Logging

Track API usage untuk optimize cost dan quality.

Metrics to log:

{
  timestamp: Date.now(),
  model: "gpt-4",
  promptTokens: 150,
  completionTokens: 320,
  totalTokens: 470,
  cost: 0.0141,  // Calculate based on pricing
  latency: 2340,  // ms
  success: true,
  errorType: null
}

Dashboard metrics:

  • Total cost per day/week/month
  • Average tokens per request
  • P95 latency
  • Error rate by type
  • Most expensive prompts (for optimization)

Summary

API integration essentials:

Setup:

  • Store API keys in environment variables
  • Choose provider based on use case (OpenAI vs Anthropic)
  • Select model based on quality/cost tradeoff

Key parameters:

  • Temperature: 0.0-0.3 (factual), 0.4-0.7 (balanced), 0.8+ (creative)
  • Max tokens: set limit untuk control cost
  • System prompt: reusable instructions
  • User prompt: specific request

Production requirements:

  • Error handling with retries
  • Rate limiting strategy
  • Cost optimization (caching, model tiering, length control)
  • Streaming untuk better UX
  • Monitoring dan logging

Cost optimization:

  • Use prompt caching (Anthropic)
  • Route to cheaper models when possible
  • Control output length explicitly
  • Monitor usage dan optimize expensive prompts

Integration yang baik = reliable, cost-effective, dan maintainable. Test thoroughly sebelum production deployment.

Related Topics