LLM Integration¶

Kairos uses LiteLLM as a unified LLM client, supporting Claude, GPT, Gemini, and 100+ other models through a single interface. Three core functions: skill extraction, JD analysis, and resume tailoring.

Architecture¶

graph TB
    subgraph "kairos.core (interfaces)"
        T0[SkillExtractorService]
        T1[JdAnalysisService]
        T2[ResumeTailorService]
    end

    subgraph "kairos.llm (implementations)"
        SE[SkillExtractor]
        A[Analyzer]
        T[Tailor]
        P0[Skill Extraction Prompts]
        P1[JD Analysis Prompts]
        P2[Resume Tailoring Prompts]
        SDK[LiteLLM acompletion]
    end

    subgraph "LLM Providers"
        Claude[Claude API]
        GPT[OpenAI API]
        Gemini[Gemini API]
        Other[...]
    end

    SE --> T0
    A --> T1
    T --> T2
    SE --> P0
    A --> P1
    T --> P2
    SE --> SDK
    A --> SDK
    T --> SDK
    SDK --> Claude
    SDK --> GPT
    SDK --> Gemini
    SDK --> Other

Multi-Provider Support¶

LiteLLM provides a unified interface — switch models by changing a string:

from litellm import acompletion

# Claude
response = await acompletion(model="claude-sonnet-4-20250514", messages=[...])

# GPT
response = await acompletion(model="gpt-4o", messages=[...])

# Gemini
response = await acompletion(model="gemini/gemini-2.0-flash", messages=[...])

Fallback Chains¶

Configure fallback models in case primary provider is unavailable:

[llm]
model = "claude-sonnet-4-20250514"
fallback_models = ["gpt-4o", "gemini/gemini-2.0-flash"]

Provider API Keys¶

Each provider needs its own API key, configured via environment variables:

ANTHROPIC_API_KEY=sk-ant-...      # Claude
OPENAI_API_KEY=sk-...             # GPT
GEMINI_API_KEY=...                # Gemini

Only the key for the configured model (and fallbacks) is required.

Skill Extraction¶

Runs once during resume import to build the user's Skill Profile.

Prompt Strategy¶

Temperature: 0.0 (deterministic extraction)

The prompt instructs the LLM to: 1. Analyze each work experience and project section from the resume 2. Extract every technical skill mentioned (languages, frameworks, databases, tools, concepts) 3. Estimate proficiency (1-10) based on context clues: duration, depth of usage, role seniority 4. Estimate years of experience per skill 5. Cite evidence — which project/role demonstrates each skill

Output is parsed as a list[SkillEntry] via tool use with Pydantic schema. Users can review and adjust proficiency ratings after generation.

JD Analysis¶

Prompt Strategy¶

Temperature: 0.0 (deterministic extraction)

The prompt instructs the LLM to: 1. Extract structured data from the JD text 2. Identify required vs preferred skills with estimated proficiency level needed 3. Extract ATS keywords 4. Compare against the user's Skill Profile (proficiency-aware, not just binary) 5. Compute a match score (0.0 - 1.0) with per-skill breakdown and reasoning

Output is parsed as a JdAnalysis Pydantic model via structured output.

Resume Tailoring¶

Anti-Hallucination Rules (enforced in system prompt)¶

Only rephrase, reorder, or emphasize existing content
Never invent skills, experiences, achievements, or metrics
Never change company names, dates, job titles, education details
Only modify sections marked as tailorable
Preserve the original format (LaTeX or Markdown)

Temperature: 0.3 (controlled creativity)

Validation¶

After LLM response, a validation step checks: - No new company names or job titles appeared - No new degree or institution names - Dates haven't changed - Section types match what was sent

If validation fails, retry with stricter prompt or flag to user.

Output Flow¶

flowchart TD
    Input["Base Resume Sections\n+ JD Analysis"] --> LLM[LLM via LiteLLM]
    LLM --> Raw[Raw LLM Response]
    Raw --> Parse[Parse modified sections]
    Parse --> Validate[Validate: no fabricated content]
    Validate --> |Pass| Diff[Generate diff]
    Validate --> |Fail| Retry[Retry with stricter prompt]
    Diff --> User[Show to user for approval]

Structured Output Pattern¶

All LLM calls use tool use for structured output (works across providers):

from litellm import acompletion

# LiteLLM 统一接口，tool use 跨 provider 通用
response = await acompletion(
    model=settings.llm_model,
    messages=[...],
    tools=[{
        "type": "function",
        "function": {
            "name": "output",
            "description": "Structured analysis output",
            "parameters": JdAnalysis.model_json_schema(),
        },
    }],
    tool_choice={"type": "function", "function": {"name": "output"}},
)

Cost¶

Operation	Tokens (approx)	Cost at Sonnet	Frequency
Skill Extraction	1,000-2,000	~$0.01-0.02	Once per resume import
JD Analysis	500-1,500	~$0.003-0.01	Per job
Resume Tailoring	1,000-3,000	~$0.01-0.03	Per job

LiteLLM provides built-in cost tracking via response._hidden_params["response_cost"].

Model Configuration¶

Default: claude-sonnet-4-20250514

[llm]
model = "claude-sonnet-4-20250514"
fallback_models = ["gpt-4o"]
temperature_extraction = 0.0
temperature_tailoring = 0.3

Configurable via environment variable KAIROS_LLM_MODEL or config file. Any LiteLLM-supported model string works.