integration · 12 menit baca
API Integration Basics
Menggunakan OpenAI dan Anthropic API dengan prompt engineering yang efektif
Introduction
Mengintegrasikan AI ke aplikasi Anda membutuhkan lebih dari sekadar API call. Anda perlu memahami: parameter API, token management, error handling, rate limiting, dan cost optimization.
Lesson ini fokus pada OpenAI API (GPT-4, GPT-3.5) dan Anthropic API (Claude), dua provider paling populer untuk production use cases.
API Basics
Authentication
Kedua provider menggunakan API key authentication.
OpenAI:
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY
});
Anthropic:
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY
});
Basic Request Structure
OpenAI Chat Completions:
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing in simple terms." }
],
temperature: 0.7,
max_tokens: 500
});
const answer = response.choices[0].message.content;
Anthropic Messages:
const response = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
system: "You are a helpful assistant.",
messages: [
{ role: "user", content: "Explain quantum computing in simple terms." }
]
});
const answer = response.content[0].text;
| Technique | When to Use | Token Cost | Quality | Complexity |
|---|---|---|---|---|
| OpenAI API | Need GPT-4 capabilities, function calling | Medium | High | Low |
| Anthropic API | Need longer context, better reasoning | Medium | High | Low |
Key Parameters
Model Selection
| Technique | When to Use | Token Cost | Quality | Complexity |
|---|---|---|---|---|
| GPT-4 | Complex reasoning, high accuracy needed | High | High | High |
| GPT-3.5 Turbo | Simple tasks, cost optimization | Low | Medium | Low |
| Claude 3.5 Sonnet | Long documents, coding tasks | Medium | High | Medium |
| Claude 3 Haiku | Fast responses, low cost | Low | Medium | Low |
Temperature
Controls randomness. Range: 0.0 (deterministic) to 2.0 (very random).
Guidelines:
- 0.0-0.3 — Factual tasks (data extraction, classification, formatting)
- 0.4-0.7 — Balanced (general Q&A, explanations, summaries)
- 0.8-1.2 — Creative tasks (brainstorming, storytelling, marketing copy)
- 1.3-2.0 — Highly creative (experimental, artistic)
Before
25 tokensconst response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...]
});After
38 tokensconst response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...],
temperature: 0.2 // Low for factual extraction
});Max Tokens
Limits output length. Affects cost dan response time.
Calculation:
- Input tokens + max_tokens = total tokens charged
- Set max_tokens berdasarkan expected output length + buffer
Example:
// Bad: no limit, bisa generate 4000+ tokens
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...]
});
// Good: limit output untuk control cost
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...],
max_tokens: 500 // ~375 words max
});
System Prompt vs User Prompt
System prompt — persistent instructions, role definition, constraints. User prompt — specific request, variable input.
Before
45 tokensmessages: [
{ role: "user", content: "You are a customer support agent. Be polite and helpful. Answer this: How do I reset my password?" }
]After
52 tokensmessages: [
{ role: "system", content: "You are a customer support agent. Be polite, helpful, and concise." },
{ role: "user", content: "How do I reset my password?" }
]Benefit: System prompt reusable across requests. User prompt stays focused on actual query.
Error Handling
API calls bisa fail. Handle errors gracefully.
Common errors:
- 401 Unauthorized — Invalid API key
- 429 Rate Limit — Too many requests
- 500 Server Error — Provider downtime
- Timeout — Request took too long
Robust error handling:
async function callAI(prompt, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: prompt }],
max_tokens: 500,
timeout: 30000 // 30s timeout
});
return response.choices[0].message.content;
} catch (error) {
if (error.status === 429) {
// Rate limit: wait and retry with exponential backoff
const waitTime = Math.pow(2, i) * 1000;
await sleep(waitTime);
continue;
}
if (error.status === 500 && i < maxRetries - 1) {
// Server error: retry
await sleep(1000);
continue;
}
// Other errors: throw immediately
throw new Error(`AI API error: ${error.message}`);
}
}
throw new Error('Max retries exceeded');
}
Rate Limiting
Providers membatasi requests per minute (RPM) dan tokens per minute (TPM).
OpenAI rate limits (tier-dependent):
- Free tier: 3 RPM, 40K TPM
- Tier 1: 500 RPM, 200K TPM
- Tier 5: 10K RPM, 30M TPM
Strategies untuk handle rate limits:
- Request queuing
import PQueue from 'p-queue';
const queue = new PQueue({
concurrency: 5, // Max 5 concurrent requests
interval: 60000, // Per minute
intervalCap: 500 // Max 500 requests per minute
});
async function queuedAICall(prompt) {
return queue.add(() => callAI(prompt));
}
- Token budgeting
let tokensUsedThisMinute = 0;
const TOKEN_LIMIT_PER_MINUTE = 200000;
async function budgetedAICall(prompt, estimatedTokens) {
if (tokensUsedThisMinute + estimatedTokens > TOKEN_LIMIT_PER_MINUTE) {
await waitUntilNextMinute();
tokensUsedThisMinute = 0;
}
const response = await callAI(prompt);
tokensUsedThisMinute += response.usage.total_tokens;
return response;
}
Cost Optimization
Token usage = cost. Optimize untuk reduce spending.
Strategy 1: Prompt Caching (Anthropic)
Anthropic supports prompt caching untuk reduce cost pada repeated system prompts.
const response = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
system: [
{
type: "text",
text: "Very long system prompt...",
cache_control: { type: "ephemeral" } // Cache this
}
],
messages: [
{ role: "user", content: "Variable user query" }
]
});
Savings: Cached tokens cost 90% less. Huge savings untuk long system prompts.
Strategy 2: Model Tiering
Route requests ke cheaper models when possible.
async function smartAICall(prompt, complexity) {
const model = complexity === 'high' ? 'gpt-4' : 'gpt-3.5-turbo';
return openai.chat.completions.create({
model,
messages: [{ role: "user", content: prompt }],
max_tokens: 500
});
}
// Usage
await smartAICall("Explain quantum physics", 'high'); // GPT-4
await smartAICall("Summarize this paragraph", 'low'); // GPT-3.5
Strategy 3: Output Length Control
Shorter output = lower cost.
Before
35 tokensSummarize this article: [5000 word article]After
48 tokensSummarize this article in 3 bullet points, max 50 words total: [5000 word article]Penghematan Token
↓ 79%Savings: 78% token reduction dengan explicit length constraint.
Streaming Responses
Untuk better UX, stream responses instead of waiting for complete response.
OpenAI streaming:
const stream = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: prompt }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content); // Stream to user
}
Anthropic streaming:
const stream = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: prompt }],
stream: true
});
for await (const event of stream) {
if (event.type === 'content_block_delta') {
process.stdout.write(event.delta.text);
}
}
Production Checklist
Before deploying AI integration:
Security:
- API keys stored in environment variables
- API keys never logged or exposed to client
- Rate limiting implemented
- Input validation (prevent prompt injection)
Reliability:
- Error handling with retries
- Timeout configured (30-60s)
- Fallback behavior for API failures
- Health check endpoint
Cost Control:
- Max tokens limit set
- Model selection optimized
- Token usage monitored
- Budget alerts configured
Quality:
- Temperature tuned for use case
- System prompt tested with diverse inputs
- Output validation implemented
- A/B testing for prompt changes
Monitoring & Logging
Track API usage untuk optimize cost dan quality.
Metrics to log:
{
timestamp: Date.now(),
model: "gpt-4",
promptTokens: 150,
completionTokens: 320,
totalTokens: 470,
cost: 0.0141, // Calculate based on pricing
latency: 2340, // ms
success: true,
errorType: null
}
Dashboard metrics:
- Total cost per day/week/month
- Average tokens per request
- P95 latency
- Error rate by type
- Most expensive prompts (for optimization)
Summary
API integration essentials:
Setup:
- Store API keys in environment variables
- Choose provider based on use case (OpenAI vs Anthropic)
- Select model based on quality/cost tradeoff
Key parameters:
- Temperature: 0.0-0.3 (factual), 0.4-0.7 (balanced), 0.8+ (creative)
- Max tokens: set limit untuk control cost
- System prompt: reusable instructions
- User prompt: specific request
Production requirements:
- Error handling with retries
- Rate limiting strategy
- Cost optimization (caching, model tiering, length control)
- Streaming untuk better UX
- Monitoring dan logging
Cost optimization:
- Use prompt caching (Anthropic)
- Route to cheaper models when possible
- Control output length explicitly
- Monitor usage dan optimize expensive prompts
Integration yang baik = reliable, cost-effective, dan maintainable. Test thoroughly sebelum production deployment.