0. Series Closed Loop (Read Along Even Without Public Source Code)

End-to-End Pipeline: Vue Frontend → api/routes/chat.py → Guide Multi-turn SSE → run_analysis_pipeline (Parse→Analyze→Match→Report) → tools/pdf_exporter PDF.
This Article: 3/17 · Analysis Loop · Holland RIASEC

Phase Visible to User Code Entry Corresponding Article
Create Session Welcome Message POST /api/sessions 09
Multi-turn Dialogue SSE Streaming chat/stream → run_guide_single_turn 06, 14
Information Sufficient Start Analysis _run_analysis_background 05, 07
Resume Parsing Progress 30% run_resume_parser 12
Profile/RIASEC Progress 50% run_profile_analyzer 03, 13
Career Matching Progress 70% run_career_matcher 02
Report Progress 90% run_reporter 11
Download PDF File GET …/report/pdf 11, 15
Description
Before reading Article 02: profile_analyzer_node
After reading Understand how analyze_riasec produces six-dimension scores
Next up Article 13: RIASEC Dedicated Prompt (Article 4)

Full Series Closed Loop Index: SERIES-LOOP.md

1. Introduction to the Holland RIASEC Model

The Holland Theory of Vocational Personalities was proposed by American psychologist John Holland in 1959. It divides people’s vocational interests into six dimensions:

Dimension English Type Typical Careers
R Realistic Realistic Engineer, Technician, Architect
I Investigative Investigative Scientist, Data Analyst, Doctor
A Artistic Artistic Designer, Writer, Musician
S Social Social Teacher, Counselor, Social Worker
E Enterprising Enterprising Entrepreneur, Sales Manager, Lawyer
C Conventional Conventional Accountant, Administrator, Auditor

Holland Code: Take the 2-3 highest-scoring dimensions to form a code (e.g., IAS, RCE), each code corresponds to a set of matching careers.

Why this model was chosen:

  • Widely recognized in psychology with over 60 years of empirical research support
  • O*NET (Occupational Information Network) official classification foundation
  • Clear six-dimensional structure, suitable for LLM quantitative evaluation
  • No need for users to answer 120 questions; can be inferred through dialogue

Holland RIASEC Scoring Flow

2. LLM Integration: OpenAI Compatible Interface (DeepSeek as Common Deployment Configuration)

In implementation, there is no DeepSeek-specific SDK. Everything is integrated through langchain_openai.ChatOpenAI (llm/providers.py), using LLM_BASE_URL + LLM_MODEL_CHAT / LLM_MODEL_LIGHT to switch providers—DeepSeek, official OpenAI, and local Ollama all use the same invoke_llm / invoke_llm_with_json.

Comparison with Other Deployment Options

Option JSON Mode Domestic Access Note
DeepSeek API (e.g., deepseek-v4-flash) ✅ Direct Common .env configuration in project
GPT-4o etc. (Official OpenAI) Depends on network Code default when config.py has no env
Ollama + Qwen3 Local Docker can point to 11434 by default, requires /no_think

.env example (deployment configuration, not hardcoded in framework):

1
2
LLM_MODEL_CHAT=deepseek-v4-flash
LLM_BASE_URL=https://api.deepseek.com/v1

Dual Model Strategy (Based on Current Code)

llm/providers.py provides get_chat_model() and get_light_model(); defaults come from config.py (when no env is configured, it’s gpt-4o):

Caller Current Implementation
Guide / ProfileAnalyzer / CareerMatcher get_chat_model()
ResumeParser (agents/resume_parser.py) get_light_model()
Reporter (agents/reporter.py) get_chat_model() (if docs say light, follow the code)

Switch LLM_MODEL_CHAT / LLM_MODEL_LIGHT via .env without changing business code. Writing deepseek-v4-flash in .env is just a deployment example, not the code default.

3. RIASEC’s Location in agents/profile_analyzer.py

Holland scoring is not an independent service; it’s a node in the ProfileAnalyzer subgraph. create_profile_analyzer_graph() defines a sequential chain:

1
2
load_profile → analyze_abilities → infer_work_style → infer_personality
→ analyze_values → analyze_riasec → identify_strengths_weaknesses → synthesize_profile → END

analyze_riasec reads the structured profile from upstream agents/resume_parser.py via ProfileAnalysisState.structured_profile, truncates the JSON to about 2000 characters, then calls the LLM:

1
2
3
4
5
6
7
8
# agents/profile_analyzer.py — analyze_riasec core
profile_summary = json.dumps(structured_profile, ensure_ascii=False)[:2000]
messages = [
{"role": "system", "content": PROFILE_ANALYZER_SYSTEM_PROMPT}, # llm/prompts.py
{"role": "user", "content": f"Please analyze the user's Holland RIASEC based on the following structured resume info...\n{profile_summary}"},
]
riasec_data = await invoke_llm_with_json(get_chat_model(), messages)
riasec_scores = {k: float(riasec_data.get(k, 0)) for k in "RIASEC"}

The top-level workflow.py‘s profile_analyzer_node writes riasec_scores into personal_profile for use by career_matcher_node and tools/pdf_exporter.py chart generation.

4. How the LLM Performs Quantitative Evaluation

Traditional Questionnaire vs LLM Evaluation

Traditional Holland assessments require users to answer 120+ questions, taking 20-30 minutes. In contrast, the LLM evaluation approach is:

  1. Users describe their experiences, preferences, and confusions in dialogue
  2. The LLM infers the tendency of the six dimensions from this natural language
  3. Provides a score from 0-10, along with reasoning

Prompt Design

The rules are concentrated in the sixth section of PROFILE_ANALYZER_SYSTEM_PROMPT in llm/prompts.py; the analyze_riasec node further constrains the JSON shape in the user message:

1
2
3
4
5
6
7
# llm/prompts.py — excerpt (Section 6)
### 6. Holland RIASEC Analysis
Based on John Holland's theory of vocational interests, analyze the user's tendency across six dimensions:
- R (Realistic): enjoys operating, practicing, hands-on work
- I (Investigative): enjoys analyzing, exploring, researching
# ... A/S/E/C definitions ...
Give each dimension a score from 0 to 10, and mark the highest 2-3 dimensions as the user's "Holland code".

Output format:

1
2
3
4
5
6
7
8
9
10
11
12
{
"riasec": {
"R": 6,
"I": 9,
"A": 3,
"S": 5,
"E": 7,
"C": 4,
"holland_code": "IEA",
"analysis": "The user has a strong investigative tendency, showing a pursuit of technical depth in 5 years of Java development..."
}
}

Scoring Basis Design

The key is that the Prompt requires the LLM to explain the reason for each score:

1
2
3
Scores must have a basis; do not assign scores arbitrarily. Explain the reason for each score in the analysis.
The analysis should be objective and fair, neither exaggerating nor belittling.
If information is insufficient to make an accurate judgment, state this in the analysis and provide a possible range.

This is more valuable than simply outputting scores—users can see “why I got a 9 for I”.

5. Connecting RIASEC Scores to Career Matching

How Holland Code Enters CareerMatcher

The project does not hardcode a HOLLAND_CAREER_MAP dictionary. generate_candidate_paths in agents/career_matcher.py serializes the full personal_profile (including riasec_scores) and sends it along with CAREER_MATCHER_SYSTEM_PROMPT from llm/prompts.py to the LLM. The Prompt requires combining the Holland code to explain the three-level paths:

1
2
3
4
5
6
7
8
### Level 1: Vertical Deepening (Deepen)
Continue developing further in the user's current industry/position. Matching degree 80-95%.

### Level 2: Horizontal Expansion (Expand)
Expand in related fields or adjacent industries. Matching degree 60-80%.

### Level 3: Transformation Exploration (Transform)
Explore cross-industry or cross-field transformation. Matching degree 40-60%.

The Prompt requires the LLM to reference holland_code (e.g., IEA) in the recommendation text, explaining which directions align with high-score dimensions and which are stretch directions—this is more flexible than maintaining a static mapping table, but also more dependent on Prompt constraints and JSON parsing stability (see Article 12).

6. Visualization: tools/pdf_exporter.py Bar Chart

When embedding charts in the PDF report, it reads the six keys R–C from personal_profile.riasec_scores; keys must be consistent with the output of analyze_riasec (single uppercase letter).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# tools/pdf_exporter.py — _generate_bar_chart excerpt
def _generate_bar_chart(holland_data: dict = None) -> str:
plt.rcParams["axes.unicode_minus"] = False

categories = list(holland_data.keys())
values = list(holland_data.values())

colors = ["#0d9488", "#14b8a6", "#2dd4bf", "#5eead4", "#99f6e4", "#ccfbf1"]

fig, ax = plt.subplots(figsize=(5.5, 4.5))
bars = ax.bar(categories, values, color=colors[:len(categories)],
edgecolor="white", linewidth=1.8, width=0.62)

for bar, val in zip(bars, values):
ax.text(bar.get_x() + bar.get_width() / 2., bar.get_height() + 0.25,
f"{val}", ha="center", va="bottom", fontsize=12, fontweight="bold")

ax.set_title("Holland Interest Distribution", fontsize=15, fontweight="bold")
plt.tight_layout()

buf = io.BytesIO()
plt.savefig(buf, format="png", dpi=160, bbox_inches="tight")
plt.close()
buf.seek(0)
return base64.b64encode(buf.read()).decode()

Chinese Font Handling

In Docker deployments, matplotlib does not support Chinese by default. Solution:

  1. Install system font package: apt-get install fonts-noto-cjk
  2. Specify font priority in the code:
1
2
3
4
5
6
plt.rcParams["font.sans-serif"] = [
"Noto Sans CJK SC", # Linux Docker environment
"WenQuanYi Zen Hei", # Fallback Linux font
"PingFang SC", # macOS
"SimHei", # Windows
]

7. Evaluation of Assessment Effectiveness

Comparison with Traditional Questionnaires

Dimension Traditional Questionnaire LLM Assessment
User Time 20-30 minutes 5-minute conversation
Number of Questions 120+ Natural conversation
Scoring Accuracy Standardized scale Semantic inference based
Adaptability Fixed questions Dynamic follow-up
User Experience Dull Like chatting with a friend

Limitations

  • Accuracy depends on input quality: The more detailed the user’s description, the more accurate the assessment
  • No standardized scale: Lacks extensive validity verification like the SDS (Self-Directed Search) scale
  • Cultural differences: Holland model is based on the U.S. workplace; the Chinese context requires weight adjustments

8. Pitfall Records

  1. Don’t hardcode DeepSeek as default: config.py defaults to gpt-4o; DeepSeek is a deployment choice via LLM_BASE_URL=https://api.deepseek.com/v1 in .env.
  2. holland_code not written into riasec_scores: analyze_riasec only writes the six R–C floats into state; holland_code is in the LLM JSON but not persisted to riasec_scores. If the PDF needs to display the code, it must be supplemented from the raw JSON or via post-processing.
  3. JSON parsing failure returns all zeros: When analyze_riasec encounters an exception, it returns six-dimension 0.0; downstream Matcher will still proceed—requires judging based on Guide dialogue quality and Parser confidence scores (confidence_scores).
  4. Ollama Qwen3 requires /no_think: _inject_no_think in llm/providers.py automatically injects this when detecting Ollama + qwen3; otherwise, the RIASEC JSON may be polluted by thinking blocks.

9. Summary

The core idea of implementing the Holland assessment with an LLM is: Write the psychological scale into llm/prompts.py, and have analyze_riasec in agents/profile_analyzer.py infer dimensional scores from the structured profile.

Key design points:

  • OpenAI compatible interface + environment variable to switch models (DeepSeek / GPT / Ollama all usable)
  • Prompt requires scores + reasoning, ensuring interpretability
  • tools/pdf_exporter.py‘s matplotlib bar chart embedded in PDF
  • Install fonts-noto-cjk in Docker environment to solve Chinese display issues

Next Article: Layered usage of TypedDict + Annotated Reducer in core/state.py.


← Back to iCan Topic