Phase 4.4: Context-aware entity filtering in Step 1

- OntologyGenerator.generate() now accepts template_filter_rules parameter
- When template_id is provided, API loads filter rules from templates.json
- Filter rules injected into ontology system prompt:
  - exclude_self: don't create entity for the business/brand that uploaded data
  - exclude_types: don't create specific entity types
  - focus: guide LLM to focus on specific entity categories
- API endpoint accepts template_id in form data
This commit is contained in:
Kunthawat Greethong
2026-06-26 11:46:37 +07:00
parent 166ef73ad2
commit c9f76babeb
2 changed files with 53 additions and 8 deletions

View File

@@ -186,20 +186,22 @@ class OntologyGenerator:
self,
document_texts: List[str],
simulation_requirement: str,
additional_context: Optional[str] = None
additional_context: Optional[str] = None,
template_filter_rules: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
生成本体定义
Generate ontology definition
Args:
document_texts: 文档文本列表
simulation_requirement: 模拟需求描述
additional_context: 额外上下文
document_texts: Document text list
simulation_requirement: Simulation requirement description
additional_context: Additional context
template_filter_rules: Entity filter rules from template (e.g., exclude_types, exclude_self)
Returns:
本体定义(entity_types, edge_types等)
Ontology definition (entity_types, edge_types, etc.)
"""
# 构建用户消息
# Build user message
user_message = self._build_user_message(
document_texts,
simulation_requirement,
@@ -208,6 +210,30 @@ class OntologyGenerator:
lang_instruction = get_language_instruction()
system_prompt = f"{ONTOLOGY_SYSTEM_PROMPT}\n\n{lang_instruction}\nIMPORTANT: Entity type names MUST be in English PascalCase (e.g., 'PersonEntity', 'MediaOrganization'). Relationship type names MUST be in English UPPER_SNAKE_CASE (e.g., 'WORKS_FOR'). Attribute names MUST be in English snake_case. Only description fields and analysis_summary should use the specified language above."
# Add template-aware entity filtering rules
if template_filter_rules:
exclude_types = template_filter_rules.get('exclude_types', [])
exclude_self = template_filter_rules.get('exclude_self', False)
focus = template_filter_rules.get('focus', '')
filter_instruction = "\n\n## Context-Aware Entity Filtering\n"
if exclude_self:
filter_instruction += (
"- IMPORTANT: The uploaded data is from a business/brand/advertiser. "
"Do NOT create entity types for the business/brand that created this content. "
"Only create entities for the TARGET AUDIENCE, competitors, influencers, media, etc.\n"
)
if exclude_types:
filter_instruction += f"- Do NOT create entity types matching: {', '.join(exclude_types)}\n"
if focus:
filter_instruction += f"- Focus entity types on: {focus}\n"
system_prompt += filter_instruction
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}