feat: Import 35+ skills, merge duplicates, add openclaw installer

Major updates:
- Added 35+ new skills from awesome-opencode-skills and antigravity repos
- Merged SEO skills into seo-master
- Merged architecture skills into architecture
- Merged security skills into security-auditor and security-coder
- Merged testing skills into testing-master and testing-patterns
- Merged pentesting skills into pentesting
- Renamed website-creator to thai-frontend-dev
- Replaced skill-creator with github version
- Removed Chutes references (use MiniMax API instead)
- Added install-openclaw-skills.sh for cross-platform installation
- Updated .env.example with MiniMax API credentials
This commit is contained in:
Kunthawat Greethong
2026-03-26 11:37:39 +07:00
parent 48595100a1
commit 7edf5bc4d0
469 changed files with 131580 additions and 417 deletions

View File

@@ -0,0 +1,357 @@
# CJK Typography & Mixed-Script Guide
Rules for Chinese, Japanese, and Korean text in DOCX documents.
## Table of Contents
1. [Font Selection](#font-selection)
2. [Font Size Names (CJK)](#font-size-names)
3. [RunFonts Mapping](#runfonts-mapping)
4. [Punctuation & Line Breaking](#punctuation--line-breaking)
5. [Paragraph Indentation](#paragraph-indentation)
6. [Line Spacing for CJK](#line-spacing)
7. [Chinese Government Standard (GB/T 9704)](#gbt-9704)
8. [Mixed CJK + Latin Best Practices](#mixed-script)
9. [OpenXML Quick Reference](#openxml-quick-reference)
---
## Font Selection
### Recommended CJK Fonts
| Language | Serif (正文) | Sans (标题) | Notes |
|----------|-------------|-------------|-------|
| **Simplified Chinese** | 宋体 (SimSun) | 微软雅黑 (Microsoft YaHei) | YaHei for screen, SimSun for print |
| **Simplified Chinese** | 仿宋 (FangSong) | 黑体 (SimHei) | Government documents |
| **Traditional Chinese** | 新細明體 (PMingLiU) | 微軟正黑體 (Microsoft JhengHei) | Taiwan standard |
| **Japanese** | MS 明朝 (MS Mincho) | MS ゴシック (MS Gothic) | Classic pairing |
| **Japanese** | 游明朝 (Yu Mincho) | 游ゴシック (Yu Gothic) | Modern, Windows 10+ |
| **Korean** | 바탕 (Batang) | 맑은 고딕 (Malgun Gothic) | Standard pairing |
### Government Document Fonts (公文)
| Element | Font | Size |
|---------|------|------|
| 标题 (title) | 小标宋 (FZXiaoBiaoSong-B05S) | 二号 (22pt) |
| 一级标题 | 黑体 (SimHei) | 三号 (16pt) |
| 二级标题 | 楷体_GB2312 (KaiTi_GB2312) | 三号 (16pt) |
| 三级标题 | 仿宋_GB2312 加粗 | 三号 (16pt) |
| 正文 (body) | 仿宋_GB2312 (FangSong_GB2312) | 三号 (16pt) |
| 附注/页码 | 宋体 (SimSun) | 四号 (14pt) |
---
## Font Size Names
CJK uses named sizes. Map to points and `w:sz` half-point values:
| 字号 | Points | `w:sz` | Common Use |
|------|--------|--------|------------|
| 初号 | 42pt | 84 | Display title |
| 小初 | 36pt | 72 | Large title |
| 一号 | 26pt | 52 | Chapter heading |
| 小一 | 24pt | 48 | Major heading |
| 二号 | 22pt | 44 | Document title (公文) |
| 小二 | 18pt | 36 | Western H1 equivalent |
| 三号 | 16pt | 32 | CJK heading / 公文 body |
| 小三 | 15pt | 30 | Sub-heading |
| 四号 | 14pt | 28 | CJK subheading |
| 小四 | 12pt | 24 | Standard body (CJK) |
| 五号 | 10.5pt | 21 | Compact CJK body |
| 小五 | 9pt | 18 | Footnotes |
| 六号 | 7.5pt | 15 | Fine print |
---
## RunFonts Mapping
OpenXML uses four font slots to handle multilingual text:
```xml
<w:rFonts
w:ascii="Calibri" <!-- Latin characters (U+0000U+007F) -->
w:hAnsi="Calibri" <!-- Latin extended, Greek, Cyrillic -->
w:eastAsia="SimSun" <!-- CJK Unified Ideographs, Kana, Hangul -->
w:cs="Arial" <!-- Arabic, Hebrew, Thai, Devanagari -->
/>
```
**Word's character classification logic:**
1. Character is in CJK range → uses `w:eastAsia` font
2. Character is in complex script range → uses `w:cs` font
3. Character is basic Latin (ASCII) → uses `w:ascii` font
4. Everything else → uses `w:hAnsi` font
**Key**: `w:eastAsia` is the **only** way to set CJK fonts. Setting just `w:ascii` will NOT affect CJK characters. Mixed text within a single run auto-switches fonts at the character level — no need for separate runs.
### Document Defaults
```xml
<w:docDefaults>
<w:rPrDefault>
<w:rPr>
<w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" w:eastAsia="SimSun" w:cs="Arial" />
<w:sz w:val="22" />
<w:szCs w:val="22" />
<w:lang w:val="en-US" w:eastAsia="zh-CN" />
</w:rPr>
</w:rPrDefault>
</w:docDefaults>
```
`w:lang w:eastAsia` helps Word resolve ambiguous characters (e.g., punctuation shared between CJK and Latin).
---
## Punctuation & Line Breaking
### Full-Width vs Half-Width
CJK text uses full-width punctuation:
| Type | CJK | Latin |
|------|-----|-------|
| Period | 。(U+3002) | . |
| Comma | (U+FF0C) 、(U+3001) | , |
| Colon | (U+FF1A) | : |
| Semicolon | (U+FF1B) | ; |
| Quotes | 「」『』 or ""'' | "" '' |
| Parentheses | (U+FF08/09) | () |
In mixed text, use the punctuation style of the **surrounding language context**.
### OpenXML Controls
```xml
<w:pPr>
<w:adjustRightInd w:val="true" /> <!-- Adjust right indent for CJK punctuation -->
<w:snapToGrid w:val="true" /> <!-- Align to document grid -->
<w:kinsoku w:val="true" /> <!-- Enable CJK line breaking rules -->
<w:overflowPunct w:val="true" /> <!-- Allow punctuation to overflow margins -->
</w:pPr>
```
### Kinsoku Rules (禁則処理)
Prevents certain characters from appearing at the start or end of a line:
- **Cannot start a line**: `)」』】〉》。、,!?;:` and closing brackets
- **Cannot end a line**: `(「『【〈《` and opening brackets
Word applies these automatically when `w:kinsoku` is enabled.
### Line Breaking
- CJK characters can break between **any two characters** (no word boundaries needed)
- Latin words within CJK text still follow word-boundary breaking
- `w:wordWrap w:val="false"` enables CJK-style breaking (break anywhere)
---
## Paragraph Indentation
### Chinese Standard: 2-Character Indent
Chinese body text conventionally uses a 2-character first-line indent:
```xml
<w:ind w:firstLineChars="200" /> <!-- 200 = 2 characters × 100 -->
```
Preferred over `w:firstLine` with fixed DXA because `firstLineChars` scales with font size.
| Indent | Value |
|--------|-------|
| 1 character | `w:firstLineChars="100"` |
| 2 characters | `w:firstLineChars="200"` |
| 3 characters | `w:firstLineChars="300"` |
---
## Line Spacing
- CJK characters are taller than Latin characters at the same point size
- Default `1.0` line spacing may feel cramped with CJK text
- Recommended: `1.151.5` for mixed CJK+Latin, `1.0` with fixed 28pt for 公文
### Auto Spacing
```xml
<w:pPr>
<w:autoSpaceDE w:val="true"/> <!-- auto space between CJK and Latin -->
<w:autoSpaceDN w:val="true"/> <!-- auto space between CJK and numbers -->
</w:pPr>
```
Adds ~¼ em spacing between CJK and non-CJK characters automatically. **Recommended: always enable.**
---
## GB/T 9704
Chinese government document standard (党政机关公文格式). These are **strict requirements**, not suggestions.
### Page Setup
| Parameter | Value | OpenXML |
|-----------|-------|---------|
| Page size | A4 (210×297mm) | Width=11906, Height=16838 |
| Top margin | 37mm | 2098 DXA |
| Bottom margin | 35mm | 1984 DXA |
| Left margin | 28mm | 1588 DXA |
| Right margin | 26mm | 1474 DXA |
| Characters/line | 28 | |
| Lines/page | 22 | |
| Line spacing | Fixed 28pt | `line="560"` lineRule="exact" |
### Document Structure
```
┌─────────────────────────────────┐
│ 发文机关标志 (红头) │ ← 小标宋 or 红色大字
│ ══════════════════ (红线) │ ← Red #FF0000, 2pt
├─────────────────────────────────┤
│ 发文字号: X机发2025X号 │ ← 仿宋 三号, centered
│ │
│ 标题 (Title) │ ← 小标宋 二号, centered
│ │ 可分多行,回行居中
│ 主送机关: │ ← 仿宋 三号
│ │
│ 正文 (Body)... │ ← 仿宋_GB2312 三号
│ 一、一级标题 │ ← 黑体 三号
│ (一)二级标题 │ ← 楷体 三号
│ 1. 三级标题 │ ← 仿宋 三号 加粗
│ (1) 四级标题 │ ← 仿宋 三号
│ │
│ 附件: 1. xxx │ ← 仿宋 三号
│ │
│ 发文机关署名 │ ← 仿宋 三号
│ 成文日期 │ ← 仿宋 三号, 小写中文数字
├─────────────────────────────────┤
│ ══════════════════ (版记线) │
│ 抄送: xxx │ ← 仿宋 四号
│ 印发机关及日期 │ ← 仿宋 四号
└─────────────────────────────────┘
```
### Numbering System
```
一、 ← 黑体 (SimHei), no indentation
(一) ← 楷体 (KaiTi), indented 2 chars
1. ← 仿宋加粗 (FangSong Bold), indented 2 chars
(1) ← 仿宋 (FangSong), indented 2 chars
```
### Colors
| Element | Color | Requirement |
|---------|-------|-------------|
| All body text | Black #000000 | Mandatory |
| 红头 (agency name) | Red #FF0000 | Mandatory |
| 红线 (separator) | Red #FF0000 | Mandatory |
| 公章 (official seal) | Red | Mandatory |
### Page Numbers
- Position: bottom center
- Format: `-X-` (dash-number-dash)
- Font: 宋体 四号 (SimSun 14pt, `sz="28"`)
- No page number on cover page if present
---
## Mixed Script
### Font Size Harmony
CJK characters appear larger than Latin characters at the same point size. Compensation:
- If body is Calibri 11pt, pair with CJK at 11pt (same size — CJK looks slightly larger but acceptable)
- If precise visual match needed, CJK can be set 0.51pt smaller
- In practice, same point size is standard — don't over-optimize
### Bold and Italic
- **Chinese/Japanese have no true italic.** Word synthesizes a slant which looks poor
- Use **bold** for emphasis in CJK text
- Use 着重号 (emphasis dots) for traditional emphasis: `<w:em w:val="dot"/>` on RunProperties
---
## OpenXML Quick Reference
### Set EastAsia Font (C#)
```csharp
new Run(
new RunProperties(
new RunFonts { EastAsia = "SimSun", Ascii = "Calibri", HighAnsi = "Calibri" },
new FontSize { Val = "32" } // 三号 = 16pt = sz 32
),
new Text("这是正文内容")
);
```
### Document Defaults (C#)
```csharp
new DocDefaults(new RunPropertiesDefault(new RunPropertiesBaseStyle(
new RunFonts {
Ascii = "Calibri", HighAnsi = "Calibri",
EastAsia = "Microsoft YaHei"
},
new Languages { Val = "en-US", EastAsia = "zh-CN" }
)));
```
### 公文 Style Definitions (C#)
```csharp
// Title style — 小标宋 二号 centered
new Style(
new StyleName { Val = "GongWen Title" },
new BasedOn { Val = "Normal" },
new StyleRunProperties(
new RunFonts { EastAsia = "FZXiaoBiaoSong-B05S" },
new FontSize { Val = "44" }, // 二号 = 22pt
new Bold()
),
new StyleParagraphProperties(
new Justification { Val = JustificationValues.Center },
new SpacingBetweenLines { Line = "560", LineRule = LineSpacingRuleValues.Exact }
)
) { Type = StyleValues.Paragraph, StyleId = "GongWenTitle" };
// Body style — 仿宋_GB2312 三号
new Style(
new StyleName { Val = "GongWen Body" },
new StyleRunProperties(
new RunFonts { EastAsia = "FangSong_GB2312", Ascii = "FangSong_GB2312" },
new FontSize { Val = "32" } // 三号 = 16pt
),
new StyleParagraphProperties(
new SpacingBetweenLines { Line = "560", LineRule = LineSpacingRuleValues.Exact }
)
) { Type = StyleValues.Paragraph, StyleId = "GongWenBody" };
```
### Emphasis Dots (着重号)
```csharp
new RunProperties(new Emphasis { Val = EmphasisMarkValues.Dot });
```
### East Asian Text Layout
```xml
<!-- Snap to grid (align CJK chars to character grid) -->
<w:snapToGrid w:val="true"/>
<!-- Two-lines-in-one (双行合一) -->
<w:eastAsianLayout w:id="1" w:combine="true"/>
<!-- Vertical text in a cell -->
<w:textDirection w:val="tbRl"/>
```

View File

@@ -0,0 +1,184 @@
# Chinese University Thesis Template Guide (中国高校论文模板指南)
## Why This Guide Exists
Chinese university thesis templates (.docx) have structural patterns that differ significantly
from Western templates. Agents that assume Western conventions (Heading1/Heading2/Normal) will
fail repeatedly. This guide documents the ACTUAL patterns found in Chinese templates.
## Common StyleId Patterns
### Pattern A: Numeric IDs (most common in Chinese Word templates)
| Style Purpose | styleId | w:name | w:basedOn |
|--------------|---------|--------|-----------|
| Normal body | `a` | "Normal" | — |
| Default paragraph font | `a0` | "Default Paragraph Font" | — |
| Heading 1 (章标题) | `1` | "heading 1" | `a` |
| Heading 2 (节标题) | `2` | "heading 2" | `a` |
| Heading 3 (小节标题) | `3` | "heading 3" | `a` |
| TOC 1 | `11` | "toc 1" | `a` |
| TOC 2 | `21` | "toc 2" | `a` |
| TOC 3 | `31` | "toc 3" | `a` |
| Header | `a3` | "header" | `a` |
| Footer | `a4` | "footer" | `a` |
| Table of Contents heading | `10` | "TOC Heading" | `1` |
### Pattern B: English IDs (less common, usually from international templates)
Standard Heading1/Heading2/Heading3/Normal — these follow the Western pattern.
### Pattern C: Mixed (some Chinese, some English)
Some templates define custom styles with Chinese names:
| Style Purpose | styleId | w:name |
|--------------|---------|--------|
| 论文标题 | `lunwenbiaoti` | "论文标题" |
| 章标题 | `zhangbiaoti` | "章标题" |
| 正文 | `zhengwen` | "正文" |
### How to Identify Which Pattern
```bash
# Extract all styleIds from the template
$CLI analyze --input template.docx --styles-only
# Or manually:
# unzip template.docx word/styles.xml
# Search for w:styleId= in the extracted file
```
Look at the first few styleIds. If you see `1`, `2`, `3`, `a`, `a0` → Pattern A.
If you see `Heading1`, `Normal` → Pattern B.
## Standard Thesis Structure
Chinese university theses follow a highly standardized structure:
```
┌─────────────────────────────────────┐
│ 封面 (Cover Page) │ ← Usually 1-2 pages
│ - 校名、校徽 │
│ - 论文题目 (title) │
│ - 作者、导师、院系、日期 │
├─────────────────────────────────────┤
│ 学术诚信承诺书 / 独创性声明 │ ← 1 page
│ (Academic Integrity Declaration) │
├─────────────────────────────────────┤
│ 中文摘要 (Chinese Abstract) │ ← 1-2 pages
│ - "摘 要" heading │
│ - Abstract body │
│ - "关键词:" line │
├─────────────────────────────────────┤
│ 英文摘要 (English Abstract) │ ← 1-2 pages
│ - "ABSTRACT" heading │
│ - Abstract body │
│ - "Keywords:" line │
├─────────────────────────────────────┤
│ 目录 (Table of Contents) │ ← 1-3 pages
│ - Often inside SDT block │
│ - Static example entries │
│ - TOC field code │
├─────────────────────────────────────┤
│ 正文 (Body) │ ← Main content
│ 第1章 绪论 │
│ 1.1 研究背景 │
│ 1.2 研究目的和意义 │
│ 第2章 文献综述 │
│ ... │
│ 第N章 结论与展望 │
├─────────────────────────────────────┤
│ 参考文献 (References) │ ← Styled differently
├─────────────────────────────────────┤
│ 致谢 (Acknowledgments) │ ← Optional
├─────────────────────────────────────┤
│ 附录 (Appendices) │ ← Optional
└─────────────────────────────────────┘
```
## Identifying Zone Boundaries in Templates
Templates contain EXAMPLE content that must be replaced. Here's how to find the zones:
### Zone A (Front matter) — KEEP from template
- Starts at: paragraph 0
- Ends at: the paragraph BEFORE the first chapter heading
- Contains: cover, declaration, abstracts, TOC
- How to detect end: search for first paragraph with style `1` (or Heading1) containing "第1章" or "绪论"
### Zone B (Body content) — REPLACE with user content
- Starts at: first chapter heading ("第1章...")
- Ends at: "参考文献" heading (inclusive) or last body paragraph before acknowledgments
- How to detect:
```python
for i, el in enumerate(body_elements):
text = get_text(el)
style = get_style(el)
if style in ('1', 'Heading1') and ('第1章' in text or '绪论' in text):
zone_b_start = i
if '参考文献' in text:
zone_b_end = i
```
### Zone C (Back matter) — KEEP from template (or remove)
- Starts after: 参考文献
- Contains: 致谢, 附录, final sectPr
## Font Expectations in Chinese Thesis Templates
| Element | Font | Size (字号) | Size (pt) | w:sz |
|---------|------|------------|-----------|------|
| 论文标题 | 华文中宋 or 黑体 | 二号 or 小二 | 22pt or 18pt | 44 or 36 |
| 章标题 (H1) | 黑体 | 三号 | 16pt | 32 |
| 节标题 (H2) | 黑体 | 四号 | 14pt | 28 |
| 小节标题 (H3) | 黑体 | 小四 | 12pt | 24 |
| 正文 | 宋体 | 小四 | 12pt | 24 |
| 页眉 | 宋体 | 五号 | 10.5pt | 21 |
| 页脚/页码 | 宋体 | 五号 | 10.5pt | 21 |
| 表格内容 | 宋体 | 五号 | 10.5pt | 21 |
| 参考文献条目 | 宋体 | 五号 | 10.5pt | 21 |
## RunFonts for CJK Body Text
```xml
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"
w:eastAsia="宋体" w:cs="Times New Roman"/>
```
For headings:
```xml
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"
w:eastAsia="黑体" w:cs="Times New Roman"/>
```
IMPORTANT: When cleaning direct formatting, ALWAYS preserve w:eastAsia.
Removing it causes Chinese text to fall back to the wrong font.
## Common Mistakes with Chinese Templates
1. **Searching for `Heading1`** — Chinese templates use `1`, not `Heading1`
2. **Clearing all rFonts** — Must keep eastAsia font declarations
3. **Assuming "第1章" is the first paragraph** — It's typically paragraph 100+ after cover/abstract/TOC
4. **Ignoring SDT blocks in TOC** — The TOC is wrapped in an SDT, not just field codes
5. **Wrong line spacing** — Chinese theses typically use fixed 20pt (line="400") or 22pt (line="440"), not the 28pt used in government documents
6. **Missing section breaks** — Each zone (abstract, TOC, body) usually has its own sectPr for different headers/footers
## Style Mapping Quick Reference
When source document uses Western IDs and template uses Chinese numeric IDs:
```json
{
"Heading1": "1",
"Heading2": "2",
"Heading3": "3",
"Heading4": "3",
"Normal": "a",
"BodyText": "a",
"ListParagraph": "a",
"Caption": "a",
"TOC1": "11",
"TOC2": "21",
"TOC3": "31"
}
```
When source uses Chinese numeric IDs and template uses Western IDs — reverse the mapping.

View File

@@ -0,0 +1,191 @@
# Comments System Guide (4-File Architecture)
## Overview
Word comments require coordination across **four XML files** plus references in `document.xml`, `[Content_Types].xml`, and `document.xml.rels`.
---
## The Four Comment Files
### 1. `word/comments.xml` — Main Comment Content
Contains the actual comment text:
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:comments xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<w:comment w:id="1" w:author="Alice" w:date="2026-03-21T09:00:00Z" w:initials="A">
<w:p>
<w:pPr><w:pStyle w:val="CommentText" /></w:pPr>
<w:r>
<w:rPr><w:rStyle w:val="CommentReference" /></w:rPr>
<w:annotationRef />
</w:r>
<w:r>
<w:t>This needs clarification.</w:t>
</w:r>
</w:p>
</w:comment>
</w:comments>
```
Key attributes: `w:id` (unique integer), `w:author`, `w:date` (ISO 8601), `w:initials`.
### 2. `word/commentsExtended.xml` — W15 Extensions
Links comments to paragraphs and tracks resolved status:
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w15:commentsEx xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml">
<w15:commentEx w15:paraId="1A2B3C4D" w15:done="0" />
</w15:commentsEx>
```
- `w15:paraId` — matches the `w14:paraId` of the comment's paragraph in `comments.xml`
- `w15:done``"0"` = open, `"1"` = resolved
### 3. `word/commentsIds.xml` — Persistent ID Mapping
Provides durable IDs that survive copy/paste across documents:
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w16cid:commentsIds xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid">
<w16cid:commentId w16cid:paraId="1A2B3C4D" w16cid:durableId="12345678" />
</w16cid:commentsIds>
```
- `w16cid:paraId` — same as `w15:paraId`
- `w16cid:durableId` — globally unique identifier (8-digit hex)
### 4. `word/commentsExtensible.xml` — W16 Extensions
Modern comment extensions (used in newer Word versions):
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w16cex:commentsExtensible xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex">
<w16cex:commentExtensible w16cex:durableId="12345678" w16cex:dateUtc="2026-03-21T09:00:00Z" />
</w16cex:commentsExtensible>
```
---
## Document.xml References
Comments are anchored in document content using three elements:
```xml
<w:p>
<w:commentRangeStart w:id="1" />
<w:r><w:t>This text has a comment.</w:t></w:r>
<w:commentRangeEnd w:id="1" />
<w:r>
<w:rPr><w:rStyle w:val="CommentReference" /></w:rPr>
<w:commentReference w:id="1" />
</w:r>
</w:p>
```
- `w:commentRangeStart` — marks where the commented text begins
- `w:commentRangeEnd` — marks where the commented text ends
- `w:commentReference` — the visible comment marker (superscript number), placed in a run after the range end
The `w:id` on all three must match the `w:id` in `comments.xml`.
---
## Content Types Registration
Add to `[Content_Types].xml`:
```xml
<Override PartName="/word/comments.xml"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml" />
<Override PartName="/word/commentsExtended.xml"
ContentType="application/vnd.ms-word.commentsExtended+xml" />
<Override PartName="/word/commentsIds.xml"
ContentType="application/vnd.ms-word.commentsIds+xml" />
<Override PartName="/word/commentsExtensible.xml"
ContentType="application/vnd.ms-word.commentsExtensible+xml" />
```
---
## Relationship Registration
Add to `word/_rels/document.xml.rels`:
```xml
<Relationship Id="rId20" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments"
Target="comments.xml" />
<Relationship Id="rId21" Type="http://schemas.microsoft.com/office/2011/relationships/commentsExtended"
Target="commentsExtended.xml" />
<Relationship Id="rId22" Type="http://schemas.microsoft.com/office/2016/09/relationships/commentsIds"
Target="commentsIds.xml" />
<Relationship Id="rId23" Type="http://schemas.microsoft.com/office/2018/08/relationships/commentsExtensible"
Target="commentsExtensible.xml" />
```
---
## Step-by-Step: Adding a New Comment
1. **Choose a unique comment ID** (scan existing `w:id` values, use max + 1)
2. **Generate a paraId** (8-character hex, e.g., `"1A2B3C4D"`) and durableId (8-digit hex)
3. **Add to `comments.xml`**: Create `w:comment` element with content
4. **Add to `commentsExtended.xml`**: Create `w15:commentEx` with `paraId`, `done="0"`
5. **Add to `commentsIds.xml`**: Create `w16cid:commentId` with `paraId` and `durableId`
6. **Add to `commentsExtensible.xml`**: Create `w16cex:commentExtensible` with `durableId` and `dateUtc`
7. **Add to `document.xml`**: Insert `w:commentRangeStart`, `w:commentRangeEnd`, and `w:commentReference` around target text
8. **Verify `[Content_Types].xml`** and `document.xml.rels` have entries for all 4 files
---
## Step-by-Step: Adding a Reply
Replies are comments whose paragraph's `w14:paraId` links to a parent comment:
1. Create a new `w:comment` in `comments.xml` with a new `w:id`
2. In `commentsExtended.xml`, add `w15:commentEx` with:
- `w15:paraId` = new paragraph ID
- `w15:paraIdParent` = the `paraId` of the comment being replied to
- `w15:done="0"`
3. Add entries in `commentsIds.xml` and `commentsExtensible.xml`
4. In `document.xml`, the reply does NOT need its own range markers — it shares the parent's range
```xml
<!-- In commentsExtended.xml -->
<w15:commentEx w15:paraId="5E6F7A8B" w15:paraIdParent="1A2B3C4D" w15:done="0" />
```
---
## Step-by-Step: Resolving a Comment
Set `w15:done="1"` on the comment's `w15:commentEx` entry:
```xml
<!-- Before -->
<w15:commentEx w15:paraId="1A2B3C4D" w15:done="0" />
<!-- After -->
<w15:commentEx w15:paraId="1A2B3C4D" w15:done="1" />
```
This marks the comment (and all its replies) as resolved. The comment remains visible but appears grayed out in Word.
---
## Minimum Viable Comment
At minimum, a working comment requires:
1. `comments.xml` with the `w:comment` element
2. `document.xml` with range markers and reference
3. Relationship in `document.xml.rels`
4. Content type in `[Content_Types].xml`
The extended files (`commentsExtended`, `commentsIds`, `commentsExtensible`) are optional but recommended for full compatibility with modern Word.

View File

@@ -0,0 +1,829 @@
# GOOD vs BAD Document Design — Concrete OpenXML Examples
A side-by-side reference showing common design mistakes and their fixes, with exact OpenXML parameter values. Use this to develop an intuitive sense of what makes a document look professional versus amateur.
Format: Each comparison shows the **BAD** version first (the mistake), then the **GOOD** version (the fix), with OpenXML markup and a short explanation.
---
## 1. Font Size Disasters
### 1a. No Hierarchy — Everything the Same Size
**BAD: Body=12pt, H1=12pt bold**
```
┌──────────────────────────────────┐
│ INTRODUCTION │ ← 12pt bold... same visual weight
│ This is the body text of the │ ← 12pt regular
│ report. It discusses findings │
│ from the quarterly review. │
│ METHODOLOGY │ ← Where does the section start?
│ We collected data from three │
│ sources across the enterprise. │
└──────────────────────────────────┘
```
```xml
<!-- H1: bold but same size as body — no visual separation -->
<w:rPr><w:b/><w:sz w:val="24"/></w:rPr>
<!-- Body -->
<w:rPr><w:sz w:val="24"/></w:rPr>
```
**GOOD: Modular scale — body=11pt, H3=13pt, H2=16pt, H1=20pt**
```
┌──────────────────────────────────┐
│ │
│ Introduction │ ← 20pt, clearly a title
│ │
│ This is the body text of the │ ← 11pt, comfortable reading size
│ report. It discusses findings │
│ from the quarterly review. │
│ │
│ Methodology │ ← 20pt, section break is obvious
│ │
│ We collected data from three │
│ sources across the enterprise. │
└──────────────────────────────────┘
```
```xml
<!-- H1: 20pt = w:sz 40 -->
<w:rPr><w:rFonts w:ascii="Calibri Light"/><w:sz w:val="40"/></w:rPr>
<!-- H2: 16pt = w:sz 32 -->
<w:rPr><w:rFonts w:ascii="Calibri Light"/><w:sz w:val="32"/></w:rPr>
<!-- H3: 13pt = w:sz 26, bold -->
<w:rPr><w:rFonts w:ascii="Calibri"/><w:b/><w:sz w:val="26"/></w:rPr>
<!-- Body: 11pt = w:sz 22 -->
<w:rPr><w:rFonts w:ascii="Calibri"/><w:sz w:val="22"/></w:rPr>
```
**Why better:** A clear size progression (ratio ~1.25x per step) lets readers instantly identify structure without reading a word.
---
### 1b. Too Much Contrast — Children's Book Look
**BAD: H1=28pt with body=10pt (ratio 2.8x)**
```
┌──────────────────────────────────┐
│ │
│ QUARTERLY REPORT │ ← 28pt, dominates the page
│ │
│ This is body text set very small │ ← 10pt, straining to read
│ and the contrast with the title │
│ makes it feel like a poster. │
└──────────────────────────────────┘
```
```xml
<w:rPr><w:b/><w:sz w:val="56"/></w:rPr> <!-- 28pt heading -->
<w:rPr><w:sz w:val="20"/></w:rPr> <!-- 10pt body -->
```
**GOOD: H1=20pt with body=11pt (ratio ~1.8x)**
```xml
<w:rPr><w:sz w:val="40"/></w:rPr> <!-- 20pt heading -->
<w:rPr><w:sz w:val="22"/></w:rPr> <!-- 11pt body -->
```
**Why better:** A heading-to-body ratio between 1.5x and 2.0x reads as "structured" rather than "shouting."
---
## 2. Spacing Crimes
### 2a. Wall of Text — No Paragraph or Line Spacing
**BAD: Single line spacing, 0pt between paragraphs**
```
┌──────────────────────────────────┐
│The findings indicate a strong │
│correlation between training hours│
│and performance metrics. │
│Further analysis revealed that │ ← No gap — where does the new
│departments with higher budgets │ paragraph start?
│achieved better outcomes in all │
│measured categories. │
└──────────────────────────────────┘
```
```xml
<w:pPr>
<w:spacing w:line="240" w:lineRule="auto"/> <!-- 1.0 spacing (240/240) -->
<w:spacing w:after="0"/> <!-- no paragraph gap -->
</w:pPr>
```
**GOOD: 1.15x line spacing, 8pt after each paragraph**
```
┌──────────────────────────────────┐
│The findings indicate a strong │
│correlation between training │ ← Slightly more air between lines
│hours and performance metrics. │
│ │ ← 8pt gap signals new paragraph
│Further analysis revealed that │
│departments with higher budgets │
│achieved better outcomes in all │
│measured categories. │
└──────────────────────────────────┘
```
```xml
<w:pPr>
<w:spacing w:line="276" w:lineRule="auto"/> <!-- 1.15x (276/240) -->
<w:spacing w:after="160"/> <!-- 8pt = 160 twips -->
</w:pPr>
```
**Why better:** Line spacing gives each line room to breathe; paragraph spacing separates ideas without wasting a full blank line.
---
### 2b. Floating Headings — Same Space Above and Below
**BAD: 12pt before and 12pt after heading**
```
┌──────────────────────────────────┐
│ ...end of previous section. │
│ │ ← 12pt gap
│ Section Two │ ← Heading floats in the middle
│ │ ← 12pt gap
│ Start of section two content. │
└──────────────────────────────────┘
```
```xml
<w:pPr>
<w:spacing w:before="240" w:after="240"/> <!-- 12pt both sides -->
</w:pPr>
```
**GOOD: 24pt before, 8pt after heading**
```
┌──────────────────────────────────┐
│ ...end of previous section. │
│ │
│ │ ← 24pt gap — clear section break
│ Section Two │ ← Heading is close to its content
│ │ ← 8pt gap
│ Start of section two content. │
└──────────────────────────────────┘
```
```xml
<w:pPr>
<w:spacing w:before="480" w:after="160"/> <!-- 24pt before, 8pt after -->
</w:pPr>
```
**Why better:** Proximity principle: a heading belongs to the text that follows it, so more space above and less space below anchors it to its content.
---
### 2c. Wasteful Gaps — Huge Spacing Everywhere
**BAD: 24pt after every paragraph, including body text**
```
┌──────────────────────────────────┐
│ First paragraph of text here. │
│ │
│ │ ← 24pt gap after every paragraph
│ │
│ Second paragraph of text here. │
│ │
│ │
│ │
│ Third paragraph. │ ← Document looks mostly white space
└──────────────────────────────────┘
```
```xml
<w:spacing w:after="480"/> <!-- 24pt = 480 twips after every paragraph -->
```
**GOOD: Proportional spacing — body=8pt, H2=6pt after, H1=10pt after**
```xml
<!-- Body paragraph -->
<w:spacing w:after="160"/> <!-- 8pt after body -->
<!-- H1 -->
<w:spacing w:before="480" w:after="200"/> <!-- 24pt before, 10pt after -->
<!-- H2 -->
<w:spacing w:before="320" w:after="120"/> <!-- 16pt before, 6pt after -->
```
**Why better:** Spacing should vary by element role, creating a visual rhythm rather than uniform gaps.
---
## 3. Margin Mistakes
### 3a. Cramped Margins — Text Running to the Edge
**BAD: 0.5in margins all around**
```
┌────────────────────────────────────────────────┐
│Text starts almost at the paper edge and runs │
│all the way across making extremely long lines │
│that are hard to track from end back to start. │
│The eye loses its place on every line return. │
└────────────────────────────────────────────────┘
```
```xml
<w:pgMar w:top="720" w:right="720" w:bottom="720" w:left="720"/>
<!-- 720 twips = 0.5in — line length ~7.5in on letter paper -->
```
**GOOD: 1in margins (standard)**
```xml
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/>
<!-- 1440 twips = 1.0in — line length ~6.5in, ideal for 11pt body -->
```
**Why better:** Optimal line length is 60-75 characters. At 11pt Calibri, 6.5in width achieves roughly 70 characters per line.
---
### 3b. Over-Padded Margins — Looks Like the Content is Hiding
**BAD: 2in margins on a short document**
```xml
<w:pgMar w:top="2880" w:right="2880" w:bottom="2880" w:left="2880"/>
<!-- 2880 twips = 2.0in — only 4.5in of text width, looks padded -->
```
**GOOD: 1in standard, or 1.25in for formal documents**
```xml
<!-- Standard -->
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/>
<!-- Formal / bound documents with gutter -->
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1800" w:gutter="0"/>
<!-- 1800 twips = 1.25in left for binding margin -->
```
**Why better:** Margins should frame the content, not overwhelm it. 1-1.25in works for virtually all business and academic documents.
---
## 4. Table Ugliness
### 4a. Prison Grid — Full Borders on Every Cell
**BAD: Every cell with 1pt borders on all four sides**
```
┌───────┬───────┬───────┬───────┐
│ Name │ Dept │ Score │ Grade │
├───────┼───────┼───────┼───────┤
│ Alice │ Eng │ 92 │ A │
├───────┼───────┼───────┼───────┤
│ Bob │ Sales │ 85 │ B │
├───────┼───────┼───────┼───────┤
│ Carol │ Eng │ 78 │ C+ │
└───────┴───────┴───────┴───────┘
```
```xml
<w:tcBorders>
<w:top w:val="single" w:sz="4" w:color="000000"/>
<w:left w:val="single" w:sz="4" w:color="000000"/>
<w:bottom w:val="single" w:sz="4" w:color="000000"/>
<w:right w:val="single" w:sz="4" w:color="000000"/>
</w:tcBorders>
```
**GOOD: Three-line table (三线表) — top thick, header-bottom medium, table-bottom thick**
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ← 1.5pt top border
Name Dept Score Grade
────────────────────────────────── ← 0.75pt header separator
Alice Eng 92 A
Bob Sales 85 B
Carol Eng 78 C+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ← 1.5pt bottom border
```
```xml
<!-- Top border of header row cells -->
<w:top w:val="single" w:sz="12" w:color="000000"/> <!-- 1.5pt -->
<w:left w:val="nil"/><w:right w:val="nil"/>
<w:bottom w:val="single" w:sz="6" w:color="000000"/> <!-- 0.75pt -->
<!-- Data row cells: no left/right/top borders -->
<w:top w:val="nil"/><w:left w:val="nil"/><w:right w:val="nil"/>
<w:bottom w:val="nil"/>
<!-- Last row bottom border -->
<w:bottom w:val="single" w:sz="12" w:color="000000"/> <!-- 1.5pt -->
```
**Why better:** Removing inner borders lets the eye scan data freely. Three lines provide structure without visual clutter.
---
### 4b. Text Touching Borders — No Cell Padding
**BAD: Zero cell margins**
```
┌──────────┬──────────┐
│Name │Department│ ← Text cramped against borders
├──────────┼──────────┤
│Alice │Engineering│
└──────────┴──────────┘
```
```xml
<w:tcMar>
<w:top w:w="0" w:type="dxa"/>
<w:start w:w="0" w:type="dxa"/>
<w:bottom w:w="0" w:type="dxa"/>
<w:end w:w="0" w:type="dxa"/>
</w:tcMar>
```
**GOOD: 0.08in vertical, 0.12in horizontal padding**
```xml
<w:tcMar>
<w:top w:w="115" w:type="dxa"/> <!-- ~0.08in = 115 twips -->
<w:start w:w="173" w:type="dxa"/> <!-- ~0.12in = 173 twips -->
<w:bottom w:w="115" w:type="dxa"/>
<w:end w:w="173" w:type="dxa"/>
</w:tcMar>
```
**Why better:** Padding gives text breathing room inside cells, making every value easier to read.
---
### 4c. Invisible Headers — Header Row Same Style as Data
**BAD: Header row indistinguishable from data**
```xml
<!-- Header cell run properties — identical to data -->
<w:rPr><w:sz w:val="22"/></w:rPr>
```
**GOOD: Bold header text, subtle background fill, bottom border**
```xml
<!-- Header cell run properties -->
<w:rPr><w:b/><w:sz w:val="22"/><w:color w:val="333333"/></w:rPr>
<!-- Header cell shading -->
<w:tcPr>
<w:shd w:val="clear" w:color="auto" w:fill="F2F2F2"/> <!-- light gray bg -->
<w:tcBorders>
<w:bottom w:val="single" w:sz="8" w:color="666666"/> <!-- 1pt separator -->
</w:tcBorders>
</w:tcPr>
<!-- Mark row as header (repeats on page break) -->
<w:trPr><w:tblHeader/></w:trPr>
```
**Why better:** Distinct header styling lets readers instantly locate column meanings, especially in long tables that span pages. The `w:tblHeader` element ensures the header row repeats on every page.
---
## 5. Font Pairing Failures
### 5a. Visual Chaos — Too Many Fonts
**BAD: 4+ fonts in one document**
```xml
<!-- H1 in Impact -->
<w:rPr><w:rFonts w:ascii="Impact"/><w:sz w:val="40"/></w:rPr>
<!-- H2 in Georgia -->
<w:rPr><w:rFonts w:ascii="Georgia"/><w:sz w:val="32"/></w:rPr>
<!-- Body in Verdana -->
<w:rPr><w:rFonts w:ascii="Verdana"/><w:sz w:val="22"/></w:rPr>
<!-- Captions in Courier New -->
<w:rPr><w:rFonts w:ascii="Courier New"/><w:sz w:val="18"/></w:rPr>
```
**GOOD: One font family with weight variation, or two complementary families**
```xml
<!-- H1: Calibri Light (thin weight of Calibri family) -->
<w:rPr><w:rFonts w:ascii="Calibri Light"/><w:sz w:val="40"/></w:rPr>
<!-- H2: Calibri Light -->
<w:rPr><w:rFonts w:ascii="Calibri Light"/><w:sz w:val="32"/></w:rPr>
<!-- Body: Calibri (regular weight) -->
<w:rPr><w:rFonts w:ascii="Calibri"/><w:sz w:val="22"/></w:rPr>
<!-- Captions: Calibri -->
<w:rPr><w:rFonts w:ascii="Calibri"/><w:sz w:val="18"/></w:rPr>
```
**Why better:** Limiting to one or two font families creates visual coherence. Vary by size and weight, not by font.
---
### 5b. Mismatched Personality — Comic Sans Meets Times New Roman
**BAD:**
```xml
<w:rPr><w:rFonts w:ascii="Comic Sans MS"/><w:sz w:val="36"/></w:rPr> <!-- heading -->
<w:rPr><w:rFonts w:ascii="Times New Roman"/><w:sz w:val="24"/></w:rPr> <!-- body -->
```
**GOOD: Fonts with compatible character**
```xml
<w:rPr><w:rFonts w:ascii="Calibri Light"/><w:sz w:val="36"/></w:rPr> <!-- heading -->
<w:rPr><w:rFonts w:ascii="Calibri"/><w:sz w:val="22"/></w:rPr> <!-- body -->
```
**Why better:** Paired fonts should share a similar level of formality and geometric character. Comic Sans is playful/informal; Times New Roman is formal/traditional. They clash.
---
### 5c. Everything Bold — Nothing Stands Out
**BAD: Bold on body, headings, captions, everything**
```xml
<w:rPr><w:b/><w:sz w:val="40"/></w:rPr> <!-- heading: bold -->
<w:rPr><w:b/><w:sz w:val="22"/></w:rPr> <!-- body: also bold -->
<w:rPr><w:b/><w:sz w:val="18"/></w:rPr> <!-- caption: still bold -->
```
**GOOD: Bold reserved for headings and key terms only**
```xml
<w:rPr><w:b/><w:sz w:val="40"/></w:rPr> <!-- H1: bold -->
<w:rPr><w:sz w:val="32"/></w:rPr> <!-- H2: size alone is enough -->
<w:rPr><w:sz w:val="22"/></w:rPr> <!-- body: regular weight -->
<w:rPr><w:b/><w:sz w:val="22"/></w:rPr> <!-- key term inline: bold -->
<w:rPr><w:sz w:val="18"/></w:rPr> <!-- caption: regular, small -->
```
**Why better:** When everything is emphasized, nothing is emphasized. Bold should be a signal, not a default.
---
## 6. Color Abuse
### 6a. Rainbow Headings
**BAD: Each heading level a different bright color**
```xml
<w:rPr><w:color w:val="FF0000"/><w:sz w:val="40"/></w:rPr> <!-- H1: red -->
<w:rPr><w:color w:val="00AA00"/><w:sz w:val="32"/></w:rPr> <!-- H2: green -->
<w:rPr><w:color w:val="0000FF"/><w:sz w:val="26"/></w:rPr> <!-- H3: blue -->
```
**GOOD: Single accent color for headings, black or dark gray for body**
```xml
<!-- All headings use the same muted accent -->
<w:rPr><w:color w:val="1F4E79"/><w:sz w:val="40"/></w:rPr> <!-- H1: dark blue -->
<w:rPr><w:color w:val="1F4E79"/><w:sz w:val="32"/></w:rPr> <!-- H2: same blue -->
<w:rPr><w:color w:val="1F4E79"/><w:sz w:val="26"/></w:rPr> <!-- H3: same blue -->
<!-- Body in near-black -->
<w:rPr><w:color w:val="333333"/><w:sz w:val="22"/></w:rPr>
```
**Why better:** A single accent color establishes brand consistency. Multiple bright colors compete for attention and look unprofessional.
---
### 6b. Low Contrast — Light Gray on White
**BAD: #CCCCCC text on white background**
```xml
<w:rPr><w:color w:val="CCCCCC"/></w:rPr>
<!-- Contrast ratio: ~1.6:1 — fails WCAG AA (minimum 4.5:1) -->
```
**GOOD: #333333 text on white**
```xml
<w:rPr><w:color w:val="333333"/></w:rPr>
<!-- Contrast ratio: ~12:1 — passes WCAG AAA -->
```
**Why better:** Sufficient contrast is not just an accessibility requirement; it makes text physically easier to read for everyone, especially in printed documents.
---
### 6c. Bright Body Text
**BAD: Body text in a saturated color**
```xml
<w:rPr><w:color w:val="0066FF"/><w:sz w:val="22"/></w:rPr> <!-- blue body text -->
```
**GOOD: Color reserved for headings and inline accents only**
```xml
<!-- Body: neutral dark -->
<w:rPr><w:color w:val="333333"/><w:sz w:val="22"/></w:rPr>
<!-- Hyperlink: color is functional here -->
<w:rPr><w:color w:val="0563C1"/><w:u w:val="single"/></w:rPr>
```
**Why better:** Colored body text causes eye fatigue over long reading. Reserve color for elements that need to attract attention (headings, links, warnings).
---
## 7. List Formatting Issues
### 7a. Bullet at the Margin — No Indent
**BAD: List items start at the left margin**
```
┌──────────────────────────────────┐
│Here is a paragraph of text. │
│• First item │ ← Bullet at margin, no indent
│• Second item │
│• Third item │
│Next paragraph continues here. │
└──────────────────────────────────┘
```
```xml
<w:pPr>
<w:ind w:left="0" w:hanging="0"/>
</w:pPr>
```
**GOOD: 0.25in left indent with hanging indent for the bullet**
```
┌──────────────────────────────────┐
│Here is a paragraph of text. │
│ • First item │ ← Indented, clearly a list
│ • Second item │
│ • Third item │
│Next paragraph continues here. │
└──────────────────────────────────┘
```
```xml
<w:pPr>
<w:ind w:left="360" w:hanging="360"/> <!-- 0.25in = 360 twips -->
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="1"/>
</w:numPr>
</w:pPr>
```
For nested lists, increment by 360 twips per level:
```xml
<!-- Level 1 -->
<w:ind w:left="720" w:hanging="360"/> <!-- 0.5in left -->
<!-- Level 2 -->
<w:ind w:left="1080" w:hanging="360"/> <!-- 0.75in left -->
```
**Why better:** Indentation visually separates lists from body text and makes nesting levels clear.
---
### 7b. List Items with Full Paragraph Spacing
**BAD: List items have the same 8-10pt spacing as body paragraphs**
```
┌──────────────────────────────────┐
│ • First item │
│ │ ← 10pt gap — looks like separate
│ • Second item │ paragraphs, not a list
│ │
│ • Third item │
└──────────────────────────────────┘
```
```xml
<w:spacing w:after="200"/> <!-- 10pt after each list item -->
```
**GOOD: Tight spacing between list items (2-4pt)**
```
┌──────────────────────────────────┐
│ • First item │
│ • Second item │ ← 2pt gap — cohesive list
│ • Third item │
└──────────────────────────────────┘
```
```xml
<w:spacing w:after="40" w:line="276" w:lineRule="auto"/> <!-- 2pt after -->
<!-- Or 4pt: -->
<w:spacing w:after="80"/>
```
**Why better:** Tight spacing groups list items as a single unit, matching how readers expect a list to behave.
---
## 8. Header/Footer Problems
### 8a. Header Text Too Large — Competes with Body
**BAD: Header in 12pt, same as body**
```
┌──────────────────────────────────┐
│ Quarterly Report - Q3 2025 │ ← 12pt header, same as body
│──────────────────────────────────│
│ Introduction │
│ This is the body text... │ ← 12pt body — header distracts
└──────────────────────────────────┘
```
```xml
<!-- Header paragraph -->
<w:rPr><w:sz w:val="24"/></w:rPr> <!-- 12pt, same as body -->
```
**GOOD: Header in 9pt, gray color, subtle**
```
┌──────────────────────────────────┐
│ Quarterly Report - Q3 2025 │ ← 9pt, gray — present but quiet
│──────────────────────────────────│
│ Introduction │
│ This is the body text... │ ← Body stands out as primary
└──────────────────────────────────┘
```
```xml
<!-- Header paragraph -->
<w:rPr>
<w:sz w:val="18"/> <!-- 9pt -->
<w:color w:val="808080"/> <!-- medium gray -->
</w:rPr>
<w:pPr>
<w:pBdr>
<w:bottom w:val="single" w:sz="4" w:color="D9D9D9"/> <!-- subtle separator -->
</w:pBdr>
</w:pPr>
```
**Why better:** Headers are reference information, not primary content. They should be legible but visually subordinate.
---
### 8b. No Page Numbers on a Long Document
**BAD: 20-page document with no page numbers**
```xml
<!-- Footer section: empty or missing -->
```
**GOOD: Page numbers in footer, right-aligned or centered**
```xml
<!-- Footer paragraph with page number field -->
<w:p>
<w:pPr>
<w:jc w:val="center"/>
<w:rPr><w:sz w:val="18"/><w:color w:val="808080"/></w:rPr>
</w:pPr>
<w:r>
<w:rPr><w:sz w:val="18"/><w:color w:val="808080"/></w:rPr>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText> PAGE </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
<w:t>1</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:p>
```
**Why better:** Page numbers are essential for navigation in any document over ~3 pages. Readers need to reference specific pages, and printed documents need an ordering mechanism.
---
## 9. CJK-Specific Mistakes
### 9a. Using Italic for Chinese Emphasis
**BAD: Italic applied to Chinese text**
```xml
<w:rPr>
<w:i/>
<w:rFonts w:eastAsia="SimSun"/>
<w:sz w:val="24"/>
</w:rPr>
```
CJK glyphs have no true italic form. The renderer applies a synthetic slant that looks broken and ugly — characters appear to lean awkwardly.
**GOOD: Use bold or emphasis dots (着重号) for Chinese emphasis**
```xml
<!-- Option A: Bold emphasis -->
<w:rPr>
<w:b/>
<w:rFonts w:eastAsia="SimHei"/> <!-- Switch to bold-capable font -->
<w:sz w:val="24"/>
</w:rPr>
<!-- Option B: Emphasis marks (dots under characters) -->
<w:rPr>
<w:em w:val="dot"/>
<w:rFonts w:eastAsia="SimSun"/>
<w:sz w:val="24"/>
</w:rPr>
```
**Why better:** Chinese typography has its own emphasis traditions. Bold and emphasis dots are native CJK conventions; italic is a Latin-script concept that does not translate.
---
### 9b. Latin Font for Chinese Characters
**BAD: Only ASCII font set, no EastAsia font specified**
```xml
<w:rPr>
<w:rFonts w:ascii="Arial"/> <!-- No eastAsia attribute -->
<w:sz w:val="24"/>
</w:rPr>
<!-- Word falls back to a random font. Chinese characters may render
with wrong metrics, inconsistent stroke widths, or missing glyphs. -->
```
**GOOD: Explicit EastAsia font alongside ASCII font**
```xml
<w:rPr>
<w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" w:eastAsia="Microsoft YaHei"/>
<w:sz w:val="22"/>
</w:rPr>
```
For formal/academic Chinese documents:
```xml
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"
w:eastAsia="SimSun"/>
<w:sz w:val="24"/> <!-- 小四 12pt -->
</w:rPr>
```
**Why better:** Setting `w:eastAsia` ensures Chinese characters render in a font designed for CJK glyphs, with correct stroke widths, spacing, and metrics.
---
### 9c. English Line Spacing for Dense CJK Text
**BAD: 1.15x line spacing for Chinese body text**
```xml
<w:spacing w:line="276" w:lineRule="auto"/> <!-- 1.15x — too tight for CJK -->
```
CJK characters are taller and denser than Latin letters. At 1.15x, lines of Chinese text feel cramped and hard to read.
**GOOD: 1.5x line spacing or fixed 28pt for CJK body at 12pt (小四)**
```xml
<!-- Option A: 1.5x proportional -->
<w:spacing w:line="360" w:lineRule="auto"/> <!-- 360/240 = 1.5x -->
<!-- Option B: Fixed 28pt (standard for 小四/12pt CJK body) -->
<w:spacing w:line="560" w:lineRule="exact"/> <!-- 28pt = 560 twips -->
```
For 公文 (government documents) at 三号/16pt body:
```xml
<w:spacing w:line="580" w:lineRule="exact"/> <!-- 29pt fixed line spacing -->
```
**Why better:** CJK characters occupy a full em square with no ascenders/descenders providing natural gaps. Extra line spacing compensates, improving readability of dense text blocks.
---
## 10. Overall Document Feel
### Student Homework vs Professional Document
**BAD: "Student homework" — every setting is Word's default, no intentional choices**
```xml
<!-- Default everything: Calibri 11pt, no heading styles, 1.08 spacing -->
<w:rPr><w:rFonts w:ascii="Calibri"/><w:sz w:val="22"/></w:rPr>
<w:pPr><w:spacing w:after="160" w:line="259" w:lineRule="auto"/></w:pPr>
<!-- Headings: just bold body text, no style applied -->
<w:rPr><w:b/><w:sz w:val="22"/></w:rPr>
<!-- No section breaks, no headers/footers, no page numbers -->
<!-- Tables with default full grid borders -->
<!-- No intentional color or spacing variations -->
```
**GOOD: Intentional design at every level**
```xml
<!-- Theme fonts defined -->
<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi"/>
<!-- H1: Calibri Light 20pt, dark blue, generous spacing -->
<w:pPr>
<w:pStyle w:val="Heading1"/>
<w:spacing w:before="480" w:after="200"/>
</w:pPr>
<w:rPr>
<w:rFonts w:ascii="Calibri Light"/>
<w:color w:val="1F4E79"/>
<w:sz w:val="40"/>
</w:rPr>
<!-- H2: Calibri Light 16pt, same blue -->
<w:pPr>
<w:pStyle w:val="Heading2"/>
<w:spacing w:before="320" w:after="120"/>
</w:pPr>
<w:rPr>
<w:rFonts w:ascii="Calibri Light"/>
<w:color w:val="1F4E79"/>
<w:sz w:val="32"/>
</w:rPr>
<!-- Body: Calibri 11pt, dark gray, 1.15 spacing, 8pt after -->
<w:pPr>
<w:spacing w:after="160" w:line="276" w:lineRule="auto"/>
</w:pPr>
<w:rPr>
<w:rFonts w:ascii="Calibri"/>
<w:color w:val="333333"/>
<w:sz w:val="22"/>
</w:rPr>
<!-- Tables: three-line style, padded cells, repeated headers -->
<!-- Headers/footers: 9pt gray with page numbers -->
<!-- Margins: 1in all around -->
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/>
```
**Why better:** Professional documents result from deliberate, consistent choices across all design dimensions. Each element reinforces the same visual language. The reader may not consciously notice good typography, but they feel the difference in credibility and readability.
---
## Quick Reference: Safe Defaults
A cheat sheet of values that produce a professional result for most Western business documents:
| Element | Value | OpenXML |
|---------|-------|---------|
| Body font | Calibri 11pt | `w:sz="22"` |
| H1 | Calibri Light 20pt | `w:sz="40"` |
| H2 | Calibri Light 16pt | `w:sz="32"` |
| H3 | Calibri 13pt bold | `w:sz="26"`, `w:b` |
| Body color | #333333 | `w:color="333333"` |
| Heading color | #1F4E79 | `w:color="1F4E79"` |
| Line spacing | 1.15x | `w:line="276" w:lineRule="auto"` |
| Para spacing after | 8pt | `w:after="160"` |
| H1 spacing | 24pt before, 10pt after | `w:before="480" w:after="200"` |
| H2 spacing | 16pt before, 6pt after | `w:before="320" w:after="120"` |
| Margins | 1in all around | `w:pgMar` all `"1440"` |
| Table cell padding | 0.08in / 0.12in | `w:w="115"` / `w:w="173"` |
| Header/footer size | 9pt gray | `w:sz="18" w:color="808080"` |
| List indent | 0.25in per level | `w:left="360" w:hanging="360"` |
| List item spacing | 2pt after | `w:after="40"` |
For CJK documents, adjust: body font to SimSun/YaHei, line spacing to 1.5x (`w:line="360"`), and set `w:eastAsia` on all `w:rFonts`.

View File

@@ -0,0 +1,819 @@
# Design Principles for Document Typography
WHY certain typographic choices look good -- the perceptual and psychological
reasons behind professional document design. Use this to make judgment calls
when exact specs are not provided.
## Table of Contents
1. [White Space & Breathing Room](#1-white-space--breathing-room)
2. [Contrast & Scale](#2-contrast--scale)
3. [Proximity & Grouping](#3-proximity--grouping)
4. [Alignment & Grid](#4-alignment--grid)
5. [Repetition & Consistency](#5-repetition--consistency)
6. [Visual Hierarchy & Flow](#6-visual-hierarchy--flow)
---
## 1. White Space & Breathing Room
### Why It Works
The human eye does not read continuously. It jumps in saccades, fixating on
small clusters of words. White space provides landing zones for these fixations
and gives the reader's peripheral vision a "frame" that makes each text block
feel manageable. When a page is packed to the edges, every glance returns more
text than working memory can buffer, triggering fatigue and avoidance.
Research on content density consistently shows:
- **60-70% content coverage** feels comfortable and professional.
- **80%+** starts to feel dense and bureaucratic.
- **90%+** feels oppressive -- the reader unconsciously rushes or skips.
- **Below 50%** feels wasteful or pretentious (unless intentional, like poetry).
Wider margins also carry cultural signals. Academic and luxury documents use
generous margins (1.25-1.5 inches). Internal memos and drafts use narrower
margins (0.75-1.0 inches). The margin width tells the reader how much care
went into the document before they read a single word.
Line spacing has a direct physiological basis: the eye must track back to the
start of the next line after each line break. If lines are too close, the eye
"slips" to the wrong line. If too far apart, the eye loses its sense of
continuity. The sweet spot is 120-145% of the font size.
**Rule of thumb: when in doubt, add more space, not less.**
### Good Example
```
Margins: 1 inch (1440 twips) all sides for business documents.
Line spacing: 1.15 (276 twips at 240 twips-per-line = 115%).
Paragraph spacing after: 8pt (160 twips) between body paragraphs.
```
```xml
<!-- Page margins: 1 inch = 1440 twips on all sides -->
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"
w:header="720" w:footer="720" w:gutter="0"/>
<!-- Body paragraph: 1.15 line spacing, 8pt after -->
<w:pPr>
<w:spacing w:after="160" w:line="276" w:lineRule="auto"/>
</w:pPr>
```
This produces a page where content occupies roughly 65% of the area. The
reader sees clear top/bottom breathing room, and paragraphs are distinct
without feeling disconnected.
```
Page layout (good):
+----------------------------------+
| 1" margin |
| +------------------------+ |
| | Heading | |
| | | |
| | Body text here with | |
| | comfortable spacing | |
| | between lines. | |
| | | | <- visible gap between paragraphs
| | Another paragraph of | |
| | body text follows. | |
| | | |
| +------------------------+ |
| 1" margin |
+----------------------------------+
```
### Bad Example
```xml
<!-- Cramped margins: 0.5 inch = 720 twips -->
<w:pgMar w:top="720" w:right="720" w:bottom="720" w:left="720"
w:header="360" w:footer="360" w:gutter="0"/>
<!-- No paragraph spacing, single line spacing -->
<w:pPr>
<w:spacing w:after="0" w:line="240" w:lineRule="auto"/>
</w:pPr>
```
This fills ~85% of the page. Text runs edge-to-edge with no visual rest stops.
The reader sees a wall of text.
```
Page layout (bad):
+----------------------------------+
| Heading |
| Body text crammed right up to |
| the margins with no spacing |
| between lines or paragraphs. |
| Another paragraph starts here |
| and the reader cannot tell where |
| one idea ends and another begins |
| because everything blurs into a |
| single dense block of text. |
+----------------------------------+
```
### Quick Test
1. Zoom out to 50% in your document viewer. If you cannot see clear "channels"
of white between text blocks, the spacing is too tight.
2. Print a test page. Hold it at arm's length. The text area should look like
a rectangle floating in white, not filling the page.
3. Check: is the line spacing value at least 264 (`w:line` for 1.1x) for body
text? If it is 240 (single), it is too tight for anything over 10pt.
---
## 2. Contrast & Scale
### Why It Works
The brain processes visual hierarchy through relative difference, not absolute
size. A 20pt heading above 11pt body text creates a clear "this is important"
signal. But if every heading is 20pt and every sub-heading is 19pt, the brain
cannot distinguish them -- they merge into the same level.
The key insight is **modular scale**: font sizes that grow by a consistent
ratio. This mirrors natural proportions and feels harmonious for the same
reason musical intervals do.
Common scales and their character:
| Ratio | Name | Character | Example progression (from 11pt) |
|-------|----------------|---------------------------------|---------------------------------|
| 1.200 | Minor third | Subtle, refined | 11 → 13.2 → 15.8 → 19.0 |
| 1.250 | Major third | Balanced, professional | 11 → 13.75 → 17.2 → 21.5 |
| 1.333 | Perfect fourth | Strong, authoritative | 11 → 14.7 → 19.5 → 26.0 |
| 1.414 | Augmented 4th | Dramatic, presentation-style | 11 → 15.6 → 22.0 → 31.1 |
For most business documents, 1.25 (major third) works best:
```
Body = 11pt (w:sz="22")
H3 = 13pt (w:sz="26") -- 11 * 1.25 ≈ 13.75, round to 13
H2 = 16pt (w:sz="32") -- 13 * 1.25 ≈ 16.25, round to 16
H1 = 20pt (w:sz="40") -- 16 * 1.25 = 20
```
Beyond size, **weight contrast** creates hierarchy without consuming vertical
space. Regular (400) vs Bold (700) is visible at any size. Semi-bold (600) vs
Regular is subtle and best avoided unless you also vary size or color.
**Color contrast** adds a third dimension. Dark blue headings (#1F3864) against
softer dark gray body text (#333333) signals "heading" without needing a huge
size jump. Pure black (#000000) body text is harsher than necessary on white
backgrounds -- #333333 or #2D2D2D reduces glare without losing legibility.
### Good Example
```xml
<!-- H1: 20pt, bold, dark navy -->
<w:rPr>
<w:b/>
<w:sz w:val="40"/>
<w:color w:val="1F3864"/>
</w:rPr>
<!-- H2: 16pt, bold, dark navy -->
<w:rPr>
<w:b/>
<w:sz w:val="32"/>
<w:color w:val="1F3864"/>
</w:rPr>
<!-- H3: 13pt, bold, dark navy -->
<w:rPr>
<w:b/>
<w:sz w:val="26"/>
<w:color w:val="1F3864"/>
</w:rPr>
<!-- Body: 11pt, regular, dark gray -->
<w:rPr>
<w:sz w:val="22"/>
<w:color w:val="333333"/>
</w:rPr>
```
```
Visual hierarchy (good):
[████████████████████] <- H1: 20pt bold navy (clearly dominant)
(generous space)
[██████████████] <- H2: 16pt bold navy (distinct step down)
(moderate space)
[████████████] <- H3: 13pt bold navy (smaller but still bold)
[░░░░░░░░░░░░░░░░░░░░░░] <- Body: 11pt regular gray
[░░░░░░░░░░░░░░░░░░░░░░]
[░░░░░░░░░░░░░░░░░░░░░░]
```
Each level is visually distinct from its neighbors. You can identify the
hierarchy even in peripheral vision.
### Bad Example
```xml
<!-- H1: 14pt bold black -->
<w:rPr>
<w:b/>
<w:sz w:val="28"/>
<w:color w:val="000000"/>
</w:rPr>
<!-- H2: 13pt bold black -->
<w:rPr>
<w:b/>
<w:sz w:val="26"/>
<w:color w:val="000000"/>
</w:rPr>
<!-- H3: 12pt bold black -->
<w:rPr>
<w:b/>
<w:sz w:val="24"/>
<w:color w:val="000000"/>
</w:rPr>
<!-- Body: 12pt regular black -->
<w:rPr>
<w:sz w:val="24"/>
<w:color w:val="000000"/>
</w:rPr>
```
Problems:
- H3 (12pt bold) and body (12pt regular) differ only by weight -- too subtle.
- H1 (14pt) to H2 (13pt) is a 1pt step -- invisible at reading distance.
- Everything is pure black so color provides no differentiating signal.
- The ratio between levels is ~1.07, far too flat.
### Quick Test
1. **The squint test**: blur your eyes or step back from the screen. Can you
count the number of heading levels? If two levels merge, their contrast
is insufficient.
2. **Ratio check**: divide each heading size by the next smaller size. If any
ratio is below 1.15, the levels will look too similar.
3. **Color check**: do headings look distinct from body text when you glance
at the page? If everything is the same color, you are relying solely on
size/weight, which limits your hierarchy to ~3 effective levels.
---
## 3. Proximity & Grouping
### Why It Works
The Gestalt principle of proximity: items that are close together are perceived
as belonging to the same group. In document typography, this means a heading
must be **closer to the content it introduces** than to the content above it.
If a heading sits equidistant between two paragraphs, it looks orphaned -- the
reader's eye does not know if it belongs to the text above or below. The fix
is asymmetric spacing: **large space before the heading, small space after**.
The recommended ratio is 2:1 or 3:1 (space-before : space-after).
This same principle applies to:
- **List items**: spacing between items should be less than spacing between
paragraphs. Items in a list are a group and should visually cluster.
- **Captions**: a figure caption should be close to its figure, not floating
in the middle between the figure and the next paragraph.
- **Table titles**: the title sits close above the table, with more space
separating the title from preceding text.
### Good Example
```xml
<!-- H2: 18pt before, 6pt after (3:1 ratio) -->
<w:pPr>
<w:pStyle w:val="Heading2"/>
<w:spacing w:before="360" w:after="120"/>
</w:pPr>
<!-- Body paragraph: 0pt before, 8pt after -->
<w:pPr>
<w:spacing w:before="0" w:after="160"/>
</w:pPr>
<!-- List item: 0pt before, 2pt after (tight grouping) -->
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
<w:spacing w:before="0" w:after="40"/>
</w:pPr>
```
```
Proximity (good):
...end of previous section text.
<- 18pt gap (w:before="360")
## Section Heading
<- 6pt gap (w:after="120")
First paragraph of new section
continues here with content.
<- 8pt gap (w:after="160")
Second paragraph follows.
The heading clearly "belongs to" the text below it.
```
```
List grouping (good):
Consider these factors:
- First item <- 2pt gap between items
- Second item <- items cluster as a group
- Third item
<- 8pt gap after list
The next paragraph starts here.
```
### Bad Example
```xml
<!-- H2: 12pt before, 12pt after (1:1 ratio -- orphaned heading) -->
<w:pPr>
<w:pStyle w:val="Heading2"/>
<w:spacing w:before="240" w:after="240"/>
</w:pPr>
<!-- List item: same spacing as body (10pt after) -->
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
<w:spacing w:before="0" w:after="200"/>
</w:pPr>
```
```
Proximity (bad):
...end of previous section text.
<- 12pt gap
## Section Heading
<- 12pt gap (same!)
First paragraph of new section.
The heading floats between sections. It is unclear what it belongs to.
```
```
List grouping (bad):
Consider these factors:
<- 10pt gap
- First item
<- 10pt gap (same as paragraphs)
- Second item
<- 10pt gap
- Third item
<- 10pt gap
Next paragraph.
The list does not feel like a group. Each item looks like a
separate paragraph that happens to have a bullet.
```
### Quick Test
1. **Cover test**: cover the heading text. Looking only at the whitespace,
can you tell which block of text the heading belongs to? If the gaps above
and below are equal, the answer is "no."
2. **Number check**: `w:before` on headings should be at least 2x `w:after`.
Common good values: before=360 / after=120, or before=240 / after=80.
3. **List check**: `w:after` on list items should be less than half of
`w:after` on body paragraphs. If body uses 160, list items should use
40-60.
---
## 4. Alignment & Grid
### Why It Works
Alignment creates invisible lines that the eye follows down the page. When
elements share the same left edge, the reader perceives order and intention.
When elements are slightly misaligned (off by a few twips), the page looks
sloppy even if the reader cannot consciously identify why.
**Left-align vs Justify:**
- **Left-aligned** (ragged right) is best for English and other Latin-script
languages. The uneven right edge actually helps reading because each line
has a unique silhouette, making it easier for the eye to find the next line.
Justified text forces uneven word spacing that creates distracting "rivers"
of white running vertically through paragraphs.
- **Justified** is best for CJK text. Chinese, Japanese, and Korean characters
are monospaced by design -- each occupies the same cell in an invisible grid.
Justification preserves this grid perfectly. Ragged right in CJK text breaks
the grid and looks untidy.
**Indentation rule:** Use first-line indent OR paragraph spacing to separate
paragraphs -- never both. They serve the same purpose (marking paragraph
boundaries). Using both wastes space and creates visual stutter.
- Western convention: paragraph spacing (no indent) is more modern.
- CJK convention: first-line indent of 2 characters is standard.
- Academic convention: first-line indent of 0.5 inch is traditional.
### Good Example
```xml
<!-- English body: left-aligned, paragraph spacing, no indent -->
<w:pPr>
<w:jc w:val="left"/>
<w:spacing w:after="160" w:line="276" w:lineRule="auto"/>
<!-- No w:ind firstLine -->
</w:pPr>
<!-- CJK body: justified, first-line indent 2 chars, no paragraph spacing -->
<w:pPr>
<w:jc w:val="both"/>
<w:spacing w:after="0" w:line="360" w:lineRule="auto"/>
<w:ind w:firstLineChars="200"/>
</w:pPr>
<!-- Tab stops creating aligned columns -->
<w:pPr>
<w:tabs>
<w:tab w:val="left" w:pos="2880"/> <!-- 2 inches -->
<w:tab w:val="right" w:pos="9360"/> <!-- 6.5 inches (right margin) -->
</w:tabs>
</w:pPr>
```
```
English paragraph separation (good -- spacing, no indent):
This is the first paragraph with some text
that wraps to a second line naturally.
This is the second paragraph. The gap above
clearly marks the boundary.
CJK paragraph separation (good -- indent, no spacing):
  第一段正文内容从这里开始,使用两个字符
的首行缩进来标记段落边界。
  第二段紧跟其后,没有段间距,但首行缩进
清晰地标识了新段落的开始。
```
### Bad Example
```xml
<!-- English body: justified (creates word-spacing rivers) -->
<w:pPr>
<w:jc w:val="both"/>
<w:spacing w:after="160" w:line="276" w:lineRule="auto"/>
<w:ind w:firstLine="720"/> <!-- BOTH indent AND spacing: redundant -->
</w:pPr>
<!-- CJK body: left-aligned (breaks character grid) -->
<w:pPr>
<w:jc w:val="left"/>
<w:spacing w:after="200" w:line="276" w:lineRule="auto"/>
<!-- No indent, using spacing instead -- unidiomatic for CJK -->
</w:pPr>
```
Problems:
- Justified English text with narrow columns creates uneven word gaps.
- Using both first-line indent AND paragraph spacing is redundant.
- Left-aligned CJK breaks the character grid that CJK readers expect.
- CJK with spacing-based separation looks like translated western layout.
### Quick Test
1. **River test**: in justified English text, squint and look for vertical
white streaks running through the paragraph. If you see them, switch to
left-align or increase the column width.
2. **Double signal check**: does the document use BOTH first-line indent AND
paragraph spacing? If yes, remove one. Choose indent for CJK/academic,
spacing for modern western.
3. **Tab alignment**: if you use tabs for columns, do all tab stops across
the document use the same positions? Inconsistent tab stops create jagged
invisible grid lines.
---
## 5. Repetition & Consistency
### Why It Works
Consistency is a trust signal. When a reader sees that every H2 looks the same,
every table follows the same pattern, and every page number sits in the same
spot, they unconsciously trust that the document was crafted with care. A single
inconsistency -- one H2 that is 15pt instead of 14pt, one table with different
borders -- breaks that trust and makes the reader question the content.
Consistency also reduces cognitive load. Once the reader learns "bold dark blue
= section heading," they stop spending mental effort on identifying structure
and focus entirely on content. Every inconsistency forces them to re-evaluate:
"Is this a different kind of heading, or did someone just forget to apply the
style?"
The implementation rule is simple: **use named styles, not direct formatting.**
If you define Heading2 as a style and apply it everywhere, consistency is
automatic. If you manually set font size, bold, and color on each heading
individually, inconsistency is inevitable.
### Good Example
```xml
<!-- Define styles once in styles.xml -->
<w:style w:type="paragraph" w:styleId="Heading2">
<w:name w:val="heading 2"/>
<w:basedOn w:val="Normal"/>
<w:next w:val="Normal"/>
<w:pPr>
<w:keepNext/>
<w:keepLines/>
<w:spacing w:before="360" w:after="120"/>
<w:outlineLvl w:val="1"/>
</w:pPr>
<w:rPr>
<w:rFonts w:asciiTheme="majorHAnsi" w:hAnsiTheme="majorHAnsi"/>
<w:b/>
<w:sz w:val="32"/>
<w:color w:val="1F3864"/>
</w:rPr>
</w:style>
<!-- Apply consistently: every H2 references the style -->
<w:p>
<w:pPr>
<w:pStyle w:val="Heading2"/>
<!-- No direct formatting overrides -->
</w:pPr>
<w:r><w:t>Market Analysis</w:t></w:r>
</w:p>
```
When using a table style, define it once and reference it for every table:
```xml
<!-- All tables reference the same style -->
<w:tblPr>
<w:tblStyle w:val="GridTable4Accent1"/>
<w:tblW w:w="0" w:type="auto"/>
</w:tblPr>
```
### Bad Example
```xml
<!-- First H2: manually formatted -->
<w:p>
<w:pPr>
<w:spacing w:before="360" w:after="120"/>
</w:pPr>
<w:r>
<w:rPr>
<w:b/>
<w:sz w:val="32"/>
<w:color w:val="1F3864"/>
</w:rPr>
<w:t>Market Analysis</w:t>
</w:r>
</w:p>
<!-- Second H2: slightly different (16pt instead of 16pt? No, 15pt!) -->
<w:p>
<w:pPr>
<w:spacing w:before="240" w:after="160"/> <!-- different spacing! -->
</w:pPr>
<w:r>
<w:rPr>
<w:b/>
<w:sz w:val="30"/> <!-- 15pt instead of 16pt! -->
<w:color w:val="2E74B5"/> <!-- different shade of blue! -->
</w:rPr>
<w:t>Financial Overview</w:t>
</w:r>
</w:p>
```
Problems:
- No style references -- everything is direct formatting.
- Second H2 has different size (30 vs 32), color, and spacing.
- If there are 20 headings, each could drift slightly differently.
- Changing the design later means editing every heading individually.
### Quick Test
1. **Style audit**: does every paragraph reference a `w:pStyle`? If you find
paragraphs with only direct formatting and no style, that is a consistency
risk.
2. **Search for variance**: search the XML for all `w:sz` values used with
`w:b` (bold). If you find three different sizes for what should be the same
heading level, there is an inconsistency.
3. **Table check**: do all tables in the document reference the same
`w:tblStyle`? If some tables have manual border definitions while others
use a style, the document will look patchy.
4. **Page numbers**: check that header/footer content is defined in the
default section properties and inherited by all sections, not redefined
inconsistently in each section.
---
## 6. Visual Hierarchy & Flow
### Why It Works
A well-designed document guides the reader's eye in a predictable path:
title at the top, subtitle below it, section headings as signposts, body text
as the main content, footnotes and captions as supporting details. This flow
mirrors reading priority -- the most important information is the most visually
prominent.
Each level in the hierarchy must be **distinguishable from its adjacent
levels**. It is not enough for H1 to differ from body text; H1 must also
clearly differ from H2, and H2 from H3. If any two adjacent levels are too
similar, the hierarchy collapses at that point.
Effective hierarchy uses **multiple simultaneous signals**:
| Level | Size | Weight | Color | Spacing above |
|----------|-------|---------|---------|---------------|
| Title | 26pt | Bold | #1F3864 | 0 (top) |
| Subtitle | 15pt | Regular | #4472C4 | 4pt |
| H1 | 20pt | Bold | #1F3864 | 24pt |
| H2 | 16pt | Bold | #1F3864 | 18pt |
| H3 | 13pt | Bold | #1F3864 | 12pt |
| Body | 11pt | Regular | #333333 | 0pt |
| Caption | 9pt | Italic | #666666 | 4pt |
| Footnote | 9pt | Regular | #666666 | 0pt |
Notice how each level differs from its neighbors on at least two dimensions
(size + weight, or size + color, or weight + style). Single-dimension
differences are fragile and can be missed.
**Section breaks** create rhythm in long documents. A page break before each
major section (H1) gives the reader a mental reset. Within sections, consistent
heading + body patterns create a predictable cadence that makes long documents
less intimidating.
### Good Example
```xml
<!-- Title: large, bold, navy, centered -->
<w:style w:type="paragraph" w:styleId="Title">
<w:pPr>
<w:jc w:val="center"/>
<w:spacing w:after="80"/>
</w:pPr>
<w:rPr>
<w:b/>
<w:sz w:val="52"/>
<w:color w:val="1F3864"/>
</w:rPr>
</w:style>
<!-- Subtitle: medium, regular weight, lighter blue, centered -->
<w:style w:type="paragraph" w:styleId="Subtitle">
<w:pPr>
<w:jc w:val="center"/>
<w:spacing w:after="320"/>
</w:pPr>
<w:rPr>
<w:sz w:val="30"/>
<w:color w:val="4472C4"/>
</w:rPr>
</w:style>
<!-- H1: page break before, large bold navy -->
<w:style w:type="paragraph" w:styleId="Heading1">
<w:pPr>
<w:pageBreakBefore/>
<w:keepNext/>
<w:keepLines/>
<w:spacing w:before="480" w:after="160"/>
<w:outlineLvl w:val="0"/>
</w:pPr>
<w:rPr>
<w:b/>
<w:sz w:val="40"/>
<w:color w:val="1F3864"/>
</w:rPr>
</w:style>
<!-- Caption: small, italic, gray -->
<w:style w:type="paragraph" w:styleId="Caption">
<w:pPr>
<w:spacing w:before="80" w:after="200"/>
</w:pPr>
<w:rPr>
<w:i/>
<w:sz w:val="18"/>
<w:color w:val="666666"/>
</w:rPr>
</w:style>
```
```
Visual flow (good):
+----------------------------------+
| |
| ANNUAL REPORT 2025 | <- Title: 26pt bold navy centered
| Acme Corporation | <- Subtitle: 15pt regular blue
| |
| |
+----------------------------------+
+----------------------------------+
| |
| 1. Executive Summary | <- H1: 20pt bold navy (page break)
| |
| Body text introducing the | <- Body: 11pt regular gray
| main findings of the year. |
| |
| 1.1 Revenue Highlights | <- H2: 16pt bold navy
| |
| Revenue grew by 23% year | <- Body
| over year, driven by... |
| |
| Figure 1: Revenue Growth | <- Caption: 9pt italic gray
| |
+----------------------------------+
Each level is immediately identifiable. The eye flows naturally
from title -> heading -> body -> caption.
```
### Bad Example
```xml
<!-- All headings same color as body, minimal size difference -->
<w:style w:type="paragraph" w:styleId="Heading1">
<w:rPr>
<w:b/>
<w:sz w:val="28"/> <!-- 14pt -- only 3pt above body -->
<w:color w:val="000000"/> <!-- same color as body -->
</w:rPr>
</w:style>
<!-- Caption same size as body, not italic -->
<w:style w:type="paragraph" w:styleId="Caption">
<w:rPr>
<w:sz w:val="22"/> <!-- same 11pt as body! -->
<w:color w:val="000000"/> <!-- same color as body -->
</w:rPr>
</w:style>
<!-- No page breaks between major sections -->
<!-- H1 has no pageBreakBefore, keepNext, or keepLines -->
```
Problems:
- H1 at 14pt is too close to body at 11pt (ratio 1.27 -- acceptable in
isolation but with black color matching body, the hierarchy is weak).
- Caption is indistinguishable from body text.
- No page breaks means major sections bleed into each other with no
visual rhythm.
- Everything is black, so color provides zero hierarchy signal.
### Quick Test
1. **The squint test**: blur your eyes while looking at a full page. You
should see 3-4 distinct "weight levels" of gray. If the page looks like
one uniform shade, the hierarchy is too flat.
2. **The scan test**: flip through pages quickly. Can you identify section
boundaries in under one second per page? If yes, the visual hierarchy is
working. If pages blur together, you need stronger differentiation at H1.
3. **Adjacent level test**: for each heading level, check that it differs
from the next level on at least 2 of: size, weight, color, style (italic).
Single-dimension differences get lost.
4. **Rhythm test**: in a document over 10 pages, do major sections (H1) start
on new pages? If not, long documents will feel like an undifferentiated
stream. Add `w:pageBreakBefore` to Heading1.
---
## Summary: Decision Checklist
When you are unsure about a typographic choice, run through these checks:
| Principle | Question | If No... |
|-----------|----------|----------|
| White Space | Does the page have at least 30% white space? | Increase margins or spacing |
| Contrast | Can I count heading levels by squinting? | Increase size ratios (target 1.25x) |
| Proximity | Does each heading clearly belong to text below it? | Make space-before > space-after (2:1) |
| Alignment | Is English left-aligned and CJK justified? | Switch alignment mode |
| Repetition | Do all same-level elements use the same style? | Replace direct formatting with styles |
| Hierarchy | Can I see the document structure at arm's length? | Add more differentiation signals |
**When two principles conflict, prioritize in this order:**
1. **Readability** (white space, line spacing) -- always wins
2. **Hierarchy** (contrast, scale) -- readers must find what they need
3. **Consistency** (repetition) -- builds trust
4. **Aesthetics** (alignment, grouping) -- the finishing touch

View File

@@ -0,0 +1,308 @@
# OpenXML Child Element Ordering Rules
Element ordering in OpenXML is defined by the XSD schema. Incorrect ordering produces invalid documents that Word may refuse to open or silently repair (potentially losing data).
> **Key rule**: Properties elements (`*Pr`) must always be the **first child** of their parent.
---
## w:document
```
Children in order:
1. w:background [0..1] — page background color/fill
2. w:body [0..1] — document content container
```
---
## w:body
```
Children in order (repeating group):
1. w:p [0..*] — paragraph
2. w:tbl [0..*] — table
3. w:sdt [0..*] — structured document tag (content control)
4. w:sectPr [0..1] — LAST child: final section properties
```
Note: `w:p`, `w:tbl`, and `w:sdt` are interleaved in document order. The only strict rule is that `w:sectPr` must be the **last child** of `w:body`.
---
## w:p (Paragraph)
```
Children in order:
1. w:pPr [0..1] — paragraph properties (MUST be first)
Then any mix of (interleaved in document order):
- w:r [0..*] — run
- w:hyperlink [0..*] — hyperlink wrapper
- w:ins [0..*] — tracked insertion
- w:del [0..*] — tracked deletion
- w:bookmarkStart [0..*] — bookmark anchor start
- w:bookmarkEnd [0..*] — bookmark anchor end
- w:commentRangeStart [0..*] — comment range start
- w:commentRangeEnd [0..*] — comment range end
- w:proofErr [0..*] — proofing error marker
- w:fldSimple [0..*] — simple field
- w:sdt [0..*] — inline content control
- w:smartTag [0..*] — smart tag
```
**Practical note**: After `w:pPr`, the remaining children appear in document reading order. Runs, hyperlinks, bookmarks, and comment ranges intermix freely based on their position in the text.
---
## w:pPr (Paragraph Properties)
```
Children in order:
1. w:pStyle [0..1] — paragraph style reference
2. w:keepNext [0..1] — keep with next paragraph
3. w:keepLines [0..1] — keep lines together
4. w:pageBreakBefore [0..1] — page break before paragraph
5. w:framePr [0..1] — text frame properties
6. w:widowControl [0..1] — widow/orphan control
7. w:numPr [0..1] — numbering properties
8. w:suppressLineNumbers [0..1]
9. w:pBdr [0..1] — paragraph borders
10. w:shd [0..1] — shading
11. w:tabs [0..1] — tab stops
12. w:suppressAutoHyphens [0..1]
13. w:kinsoku [0..1] — CJK kinsoku settings
14. w:wordWrap [0..1]
15. w:overflowPunct [0..1]
16. w:topLinePunct [0..1]
17. w:autoSpaceDE [0..1]
18. w:autoSpaceDN [0..1]
19. w:bidi [0..1] — right-to-left paragraph
20. w:adjustRightInd [0..1]
21. w:snapToGrid [0..1]
22. w:spacing [0..1] — line and paragraph spacing
23. w:ind [0..1] — indentation
24. w:contextualSpacing [0..1]
25. w:mirrorIndents [0..1]
26. w:suppressOverlap [0..1]
27. w:jc [0..1] — justification (left/center/right/both)
28. w:textDirection [0..1]
29. w:textAlignment [0..1]
30. w:outlineLvl [0..1] — outline level
31. w:divId [0..1]
32. w:rPr [0..1] — run properties for paragraph mark
33. w:sectPr [0..1] — section break (section ends at this paragraph)
34. w:pPrChange [0..1] — tracked paragraph property change
```
---
## w:r (Run)
```
Children in order:
1. w:rPr [0..1] — run properties (MUST be first)
Then any of (one per run, typically):
- w:t [0..*] — text content
- w:br [0..*] — break (line, page, column)
- w:tab [0..*] — tab character
- w:cr [0..*] — carriage return
- w:sym [0..*] — symbol character
- w:drawing [0..*] — DrawingML object (images)
- w:pict [0..*] — VML picture (legacy)
- w:fldChar [0..*] — complex field character
- w:instrText [0..*] — field instruction text
- w:delText [0..*] — deleted text (inside w:del)
- w:footnoteReference [0..*]
- w:endnoteReference [0..*]
- w:commentReference [0..*]
- w:lastRenderedPageBreak [0..*]
```
---
## w:rPr (Run Properties)
```
Children in order:
1. w:rStyle [0..1] — character style reference
2. w:rFonts [0..1] — font specification
3. w:b [0..1] — bold
4. w:bCs [0..1] — complex script bold
5. w:i [0..1] — italic
6. w:iCs [0..1] — complex script italic
7. w:caps [0..1] — all capitals
8. w:smallCaps [0..1] — small capitals
9. w:strike [0..1] — strikethrough
10. w:dstrike [0..1] — double strikethrough
11. w:outline [0..1]
12. w:shadow [0..1]
13. w:emboss [0..1]
14. w:imprint [0..1]
15. w:noProof [0..1] — suppress proofing
16. w:snapToGrid [0..1]
17. w:vanish [0..1] — hidden text
18. w:color [0..1] — text color
19. w:spacing [0..1] — character spacing
20. w:w [0..1] — character width scaling
21. w:kern [0..1] — font kerning
22. w:position [0..1] — vertical position (raise/lower)
23. w:sz [0..1] — font size (half-points)
24. w:szCs [0..1] — complex script font size
25. w:highlight [0..1] — text highlight color
26. w:u [0..1] — underline
27. w:effect [0..1] — text effect (animated)
28. w:bdr [0..1] — run border
29. w:shd [0..1] — run shading
30. w:vertAlign [0..1] — superscript/subscript
31. w:rtl [0..1] — right-to-left
32. w:cs [0..1] — complex script
33. w:lang [0..1] — language
34. w:rPrChange [0..1] — tracked run property change
```
---
## w:tbl (Table)
```
Children in order:
1. w:tblPr [1..1] — table properties (REQUIRED, must be first)
2. w:tblGrid [1..1] — column width definitions (REQUIRED)
3. w:tr [1..*] — table row(s)
```
---
## w:tblPr (Table Properties)
```
Children in order:
1. w:tblStyle [0..1] — table style reference
2. w:tblpPr [0..1] — table positioning
3. w:tblOverlap [0..1]
4. w:bidiVisual [0..1] — right-to-left table
5. w:tblStyleRowBandSize [0..1]
6. w:tblStyleColBandSize [0..1]
7. w:tblW [0..1] — preferred table width
8. w:jc [0..1] — table alignment
9. w:tblCellSpacing [0..1]
10. w:tblInd [0..1] — table indent from margin
11. w:tblBorders [0..1] — table borders
12. w:shd [0..1] — table shading
13. w:tblLayout [0..1] — fixed or autofit
14. w:tblCellMar [0..1] — default cell margins
15. w:tblLook [0..1] — conditional formatting flags
16. w:tblCaption [0..1] — accessibility caption
17. w:tblDescription [0..1] — accessibility description
18. w:tblPrChange [0..1] — tracked table property change
```
---
## w:tr (Table Row)
```
Children in order:
1. w:trPr [0..1] — row properties (must be first)
2. w:tc [1..*] — table cell(s)
```
---
## w:trPr (Table Row Properties)
```
Children in order:
1. w:cnfStyle [0..1] — conditional formatting
2. w:divId [0..1]
3. w:gridBefore [0..1] — grid columns before first cell
4. w:gridAfter [0..1] — grid columns after last cell
5. w:wBefore [0..1]
6. w:wAfter [0..1]
7. w:cantSplit [0..1] — don't split row across pages
8. w:trHeight [0..1] — row height
9. w:tblHeader [0..1] — repeat as header row
10. w:tblCellSpacing [0..1]
11. w:jc [0..1] — row alignment
12. w:hidden [0..1]
13. w:ins [0..1] — tracked row insertion
14. w:del [0..1] — tracked row deletion
15. w:trPrChange [0..1] — tracked row property change
```
---
## w:tc (Table Cell)
```
Children in order:
1. w:tcPr [0..1] — cell properties (must be first)
2. w:p [1..*] — paragraph(s) — at least one required
3. w:tbl [0..*] — nested table(s)
```
---
## w:tcPr (Table Cell Properties)
```
Children in order:
1. w:cnfStyle [0..1]
2. w:tcW [0..1] — cell width
3. w:gridSpan [0..1] — horizontal merge (column span)
4. w:hMerge [0..1] — legacy horizontal merge
5. w:vMerge [0..1] — vertical merge
6. w:tcBorders [0..1] — cell borders
7. w:shd [0..1] — cell shading
8. w:noWrap [0..1]
9. w:tcMar [0..1] — cell margins
10. w:textDirection [0..1]
11. w:tcFitText [0..1]
12. w:vAlign [0..1] — vertical alignment
13. w:hideMark [0..1]
14. w:tcPrChange [0..1] — tracked cell property change
```
---
## w:sectPr (Section Properties)
```
Children in order:
1. w:headerReference [0..*] — header references (type: default/first/even)
2. w:footerReference [0..*] — footer references
3. w:endnotePr [0..1]
4. w:footnotePr [0..1]
5. w:type [0..1] — section break type (nextPage/continuous/evenPage/oddPage)
6. w:pgSz [0..1] — page size
7. w:pgMar [0..1] — page margins
8. w:paperSrc [0..1]
9. w:pgBorders [0..1] — page borders
10. w:lnNumType [0..1] — line numbering
11. w:pgNumType [0..1] — page numbering
12. w:cols [0..1] — column definitions
13. w:formProt [0..1]
14. w:vAlign [0..1] — vertical alignment of page
15. w:noEndnote [0..1]
16. w:titlePg [0..1] — different first page header/footer
17. w:textDirection [0..1]
18. w:bidi [0..1]
19. w:rtlGutter [0..1]
20. w:docGrid [0..1] — document grid
21. w:sectPrChange [0..1] — tracked section property change
```
---
## w:hdr (Header) / w:ftr (Footer)
```
Children (same structure as w:body content):
1. w:p [0..*] — paragraph(s)
2. w:tbl [0..*] — table(s)
3. w:sdt [0..*] — content controls
```
Headers and footers are essentially mini-documents. They follow the same content model as `w:body` but without a final `w:sectPr`.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,82 @@
# OpenXML Namespaces, Relationship Types, and Content Types
## Core Namespaces
| Prefix | URI | Used In |
|--------|-----|---------|
| `w` | `http://schemas.openxmlformats.org/wordprocessingml/2006/main` | document.xml, styles.xml, numbering.xml, headers, footers |
| `r` | `http://schemas.openxmlformats.org/officeDocument/2006/relationships` | Relationship references (r:id) |
| `wp` | `http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing` | Image/drawing placement in document |
| `a` | `http://schemas.openxmlformats.org/drawingml/2006/main` | DrawingML core (shapes, images, themes) |
| `pic` | `http://schemas.openxmlformats.org/drawingml/2006/picture` | Picture element in DrawingML |
| `v` | `urn:schemas-microsoft-com:vml` | VML (legacy shapes, watermarks) |
| `o` | `urn:schemas-microsoft-com:office:office` | Office VML extensions |
| `m` | `http://schemas.openxmlformats.org/officeDocument/2006/math` | Math equations (OMML) |
| `mc` | `http://schemas.openxmlformats.org/markup-compatibility/2006` | Markup compatibility (Ignorable, AlternateContent) |
## Extended Namespaces
| Prefix | URI | Purpose |
|--------|-----|---------|
| `w14` | `http://schemas.microsoft.com/office/word/2010/wordml` | Word 2010 extensions (contentPart, etc.) |
| `w15` | `http://schemas.microsoft.com/office/word/2012/wordml` | Word 2013 extensions (commentEx, etc.) |
| `w16cid` | `http://schemas.microsoft.com/office/word/2016/wordml/cid` | Comment IDs (durable IDs) |
| `w16cex` | `http://schemas.microsoft.com/office/word/2018/wordml/cex` | Comment extensible |
| `w16se` | `http://schemas.microsoft.com/office/word/2015/wordml/symex` | Symbol extensions |
| `wps` | `http://schemas.microsoft.com/office/word/2010/wordprocessingShape` | WordprocessingML shapes |
| `wpc` | `http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas` | Drawing canvas |
## Relationship Types
| Relationship | Type URI |
|-------------|----------|
| Document | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument` |
| Styles | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles` |
| Numbering | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering` |
| Font Table | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable` |
| Settings | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings` |
| Theme | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme` |
| Image | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/image` |
| Hyperlink | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink` |
| Header | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/header` |
| Footer | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/footer` |
| Comments | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments` |
| CommentsExtended | `http://schemas.microsoft.com/office/2011/relationships/commentsExtended` |
| CommentsIds | `http://schemas.microsoft.com/office/2016/09/relationships/commentsIds` |
| CommentsExtensible | `http://schemas.microsoft.com/office/2018/08/relationships/commentsExtensible` |
| Footnotes | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/footnotes` |
| Endnotes | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/endnotes` |
| Glossary | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/glossaryDocument` |
| Web Settings | `http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings` |
## Content Types (`[Content_Types].xml`)
### Default Extensions
```xml
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml" />
<Default Extension="xml" ContentType="application/xml" />
<Default Extension="png" ContentType="image/png" />
<Default Extension="jpeg" ContentType="image/jpeg" />
<Default Extension="gif" ContentType="image/gif" />
<Default Extension="emf" ContentType="image/x-emf" />
```
### Part Overrides
| Part | Content Type |
|------|-------------|
| `/word/document.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml` |
| `/word/styles.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml` |
| `/word/numbering.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml` |
| `/word/settings.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml` |
| `/word/fontTable.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml` |
| `/word/theme/theme1.xml` | `application/vnd.openxmlformats-officedocument.theme+xml` |
| `/word/header1.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml` |
| `/word/footer1.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml` |
| `/word/comments.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml` |
| `/word/commentsExtended.xml` | `application/vnd.ms-word.commentsExtended+xml` |
| `/word/commentsIds.xml` | `application/vnd.ms-word.commentsIds+xml` |
| `/word/commentsExtensible.xml` | `application/vnd.ms-word.commentsExtensible+xml` |
| `/word/footnotes.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.footnotes+xml` |
| `/word/endnotes.xml` | `application/vnd.openxmlformats-officedocument.wordprocessingml.endnotes+xml` |

View File

@@ -0,0 +1,72 @@
# OpenXML Unit Conversion Quick Reference
## Master Conversion Table
| Unit | 1 inch | 1 cm | 1 mm | 1 pt | Description |
|------|--------|------|------|------|-------------|
| DXA (twips) | 1440 | 567 | 56.7 | 20 | 1/20 of a point. Used for margins, indents, spacing, page size. |
| EMU | 914400 | 360000 | 36000 | 12700 | English Metric Unit. Used for images, drawings, shapes. |
| Half-points | 144 | 56.7 | 5.67 | 2 | Used for font sizes (`w:sz`, `w:szCs`). |
| Points | 72 | 28.35 | 2.835 | 1 | Standard typographic unit. Not used directly in most attributes. |
| Eighths of a point | 576 | 226.8 | 22.68 | 8 | Used for `w:spacing` character spacing. |
## Common Page Sizes
| Size | Width (DXA) | Height (DXA) | Width (mm) | Height (mm) |
|------|-------------|--------------|------------|-------------|
| A4 | 11906 | 16838 | 210 | 297 |
| Letter | 12240 | 15840 | 215.9 | 279.4 |
| Legal | 12240 | 20160 | 215.9 | 355.6 |
| A3 | 16838 | 23811 | 297 | 420 |
| A5 | 8391 | 11906 | 148 | 210 |
## Common Margin Values
| Margin | DXA | Inches | cm |
|--------|-----|--------|----|
| 0.5 inch | 720 | 0.5 | 1.27 |
| 0.75 inch | 1080 | 0.75 | 1.91 |
| 1 inch | 1440 | 1.0 | 2.54 |
| 1.25 inch | 1800 | 1.25 | 3.18 |
| 1.5 inch | 2160 | 1.5 | 3.81 |
## Font Size Values (`w:sz`)
| Display Size | w:sz value | Notes |
|-------------|-----------|-------|
| 8pt | 16 | |
| 9pt | 18 | |
| 10pt | 20 | |
| 10.5pt | 21 | Common CJK body size |
| 11pt | 22 | Default Calibri body |
| 12pt | 24 | Default TNR body |
| 14pt | 28 | Small heading |
| 16pt | 32 | |
| 18pt | 36 | |
| 20pt | 40 | |
| 24pt | 48 | |
| 28pt | 56 | |
| 36pt | 72 | |
## Line Spacing Values
Line spacing in `w:spacing` uses the `w:line` attribute in 240ths of a line (when `w:lineRule="auto"`):
| Spacing | w:line value | w:lineRule |
|---------|-------------|-----------|
| Single | 240 | auto |
| 1.15 (Word default) | 276 | auto |
| 1.5 | 360 | auto |
| Double | 480 | auto |
| Exact 12pt | 240 | exact |
| At least 12pt | 240 | atLeast |
Note: When `lineRule="exact"` or `"atLeast"`, `w:line` is in **twips** (DXA), not 240ths. So `line="240"` with `lineRule="exact"` means exactly 12pt (240/20 = 12pt).
## Conversion Formulas
```
DXA = inches × 1440 = cm × 567 = pt × 20
EMU = inches × 914400 = cm × 360000 = pt × 12700
sz = pt × 2 (half-points)
```

View File

@@ -0,0 +1,284 @@
# Scenario A: Creating a New DOCX from Scratch
## When to Use
Use Scenario A when:
- The user has no existing file and wants a brand new document
- The user provides content (text, tables, images) and wants it assembled into a DOCX
- The user specifies a document type (report, letter, memo, academic) or describes a custom layout
Do NOT use when: the user already has a DOCX they want to modify (→ Scenario B) or wants to restyle an existing document (→ Scenario C).
---
## Step-by-Step Workflow
### 1. Determine Document Type
Ask or infer the document type from the user's request:
| Type | Typical Signals |
|------|----------------|
| Report | "report", "analysis", "whitepaper", sections with headings |
| Letter | "letter", "dear", address block, salutation |
| Memo | "memo", "memorandum", To/From/Subject fields |
| Academic | "paper", "essay", "thesis", APA/MLA/Chicago mention |
| Custom | None of the above, or user specifies exact formatting |
### 2. Gather Content Requirements
Collect from the user:
- Title and subtitle (if any)
- Author / organization
- Section structure (headings and nesting)
- Body content per section
- Tables (headers + rows)
- Images (file paths or placeholders)
- Special elements: TOC, page numbers, watermark, headers/footers
### 3. Select Style Set
Based on document type, load the matching styles XML asset:
- Report → `assets/styles/default_styles.xml` or `assets/styles/corporate_styles.xml`
- Academic → `assets/styles/academic_styles.xml`
- Letter / Memo / Custom → `assets/styles/default_styles.xml` (with overrides)
### 4. Configure Page Setup
Set `w:sectPr` values based on document type defaults (see below) or user overrides.
```xml
<w:sectPr>
<w:pgSz w:w="11906" w:h="16838" /> <!-- A4 -->
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"
w:header="720" w:footer="720" w:gutter="0" />
</w:sectPr>
```
### 5. Build Document Structure
Assemble `word/document.xml` with:
1. `w:body` as root container
2. Paragraphs (`w:p`) with heading styles for section titles
3. Body paragraphs with `Normal` style
4. Tables, images, and other elements as needed
5. Final `w:sectPr` as last child of `w:body`
### 6. Apply Typography Defaults
Set document-level defaults in `styles.xml` under `w:docDefaults`:
```xml
<w:docDefaults>
<w:rPrDefault>
<w:rPr>
<w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" w:eastAsia="SimSun" w:cs="Arial" />
<w:sz w:val="22" /> <!-- 11pt -->
<w:szCs w:val="22" />
</w:rPr>
</w:rPrDefault>
<w:pPrDefault>
<w:pPr>
<w:spacing w:after="160" w:line="259" w:lineRule="auto" />
</w:pPr>
</w:pPrDefault>
</w:docDefaults>
```
### 7. Add Complex Elements
See the Complex Elements Guide section below.
### 8. Run Validation Pipeline
```
dotnet run ... validate --xsd wml-subset.xsd
dotnet run ... validate --xsd business-rules.xsd # if applying a template
```
---
## Document Type Defaults
### Report
| Property | Value |
|----------|-------|
| Body font | Calibri 11pt |
| Heading font | Calibri Light |
| H1 / H2 / H3 / H4 size | 28pt / 24pt / 18pt / 14pt |
| Heading color | #2F5496 (corporate blue) |
| Margins | 1 inch (1440 DXA) all sides |
| Page size | A4 (11906 × 16838 DXA) |
| Line spacing | Single (line="240") |
| Paragraph spacing | 0pt before, 8pt after body |
### Letter
| Property | Value |
|----------|-------|
| Font | Calibri 11pt |
| Page size | Letter (12240 × 15840 DXA) |
| Margins | 1 inch all sides |
| Structure | Date → Address → Salutation → Body → Closing → Signature |
| Line spacing | Single |
### Memo
| Property | Value |
|----------|-------|
| Font | Arial 11pt |
| Page size | Letter |
| Margins | 0.75 inch (1080 DXA) |
| Header | "MEMO" centered, bold, 16pt |
| Fields | To, From, Date, Subject (bold labels, tab-aligned values) |
### Academic
| Property | Value |
|----------|-------|
| Font | Times New Roman 12pt |
| Line spacing | Double (line="480") |
| Margins | 1 inch all sides |
| Page size | Letter |
| Headings | Bold, same font, 14/13/12pt for H1/H2/H3 |
| First line indent | 0.5 inch (720 DXA) |
| Heading color | Black (no color) |
---
## Content Configuration JSON Format
The CLI `create` command accepts a JSON config:
```json
{
"type": "report",
"title": "Quarterly Revenue Analysis",
"subtitle": "Q1 2026",
"author": "Finance Team",
"pageSize": "A4",
"margins": { "top": 1440, "right": 1440, "bottom": 1440, "left": 1440 },
"sections": [
{
"heading": "Executive Summary",
"level": 1,
"content": [
{ "type": "paragraph", "text": "Revenue grew 12% year-over-year..." },
{
"type": "table",
"headers": ["Region", "Revenue", "Growth"],
"rows": [
["North America", "$4.2M", "+15%"],
["Europe", "$2.8M", "+8%"],
["Asia Pacific", "$1.9M", "+18%"]
]
},
{ "type": "image", "path": "charts/revenue.png", "width": "5in", "alt": "Revenue chart" }
]
},
{
"heading": "Detailed Analysis",
"level": 1,
"content": [
{ "type": "paragraph", "text": "Breaking down by product line..." }
]
}
]
}
```
Supported content types:
- `paragraph` — body text (applies Normal style)
- `table` — headers + rows (applies TableGrid style)
- `image` — inline image with width/height control
- `list` — bulleted or numbered list items
- `pageBreak` — forces a page break
---
## Complex Elements Guide
### Table of Contents
Insert a TOC field code. Word will update the actual entries when the file is opened:
```xml
<w:p>
<w:pPr><w:pStyle w:val="TOCHeading" /></w:pPr>
<w:r><w:t>Table of Contents</w:t></w:r>
</w:p>
<w:p>
<w:r>
<w:fldChar w:fldCharType="begin" />
</w:r>
<w:r>
<w:instrText xml:space="preserve"> TOC \o "1-3" \h \z \u </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate" />
</w:r>
<w:r>
<w:t>[Table of contents — update to populate]</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end" />
</w:r>
</w:p>
```
### Page Numbers in Footer
Add a footer part (`word/footer1.xml`) and reference it in `w:sectPr`:
```xml
<!-- In footer1.xml -->
<w:ftr xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:p>
<w:pPr><w:jc w:val="center" /></w:pPr>
<w:r>
<w:fldChar w:fldCharType="begin" />
</w:r>
<w:r>
<w:instrText>PAGE</w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate" />
</w:r>
<w:r><w:t>1</w:t></w:r>
<w:r>
<w:fldChar w:fldCharType="end" />
</w:r>
</w:p>
</w:ftr>
<!-- In sectPr -->
<w:footerReference w:type="default" r:id="rId8" />
```
### Watermark
Add a header part with a shape behind the text:
```xml
<w:hdr>
<w:p>
<w:r>
<w:pict>
<v:shape style="position:absolute;margin-left:0;margin-top:0;width:468pt;height:180pt;
z-index:-251657216;mso-position-horizontal:center;
mso-position-vertical:center"
fillcolor="silver" stroked="f">
<v:textpath style="font-family:'Calibri';font-size:1pt" string="DRAFT" />
</v:shape>
</w:pict>
</w:r>
</w:p>
</w:hdr>
```
---
## Post-Creation Checklist
1. **Validate** against `wml-subset.xsd` — all elements in correct order, required attributes present
2. **Merge adjacent runs** with identical formatting to keep XML clean
3. **Verify relationships** — every `r:id` in document.xml has a matching entry in `document.xml.rels`
4. **Check content types** — every part in the package is registered in `[Content_Types].xml`
5. **Preview** — open in Word or LibreOffice to visually confirm layout
6. **File size** — confirm images are reasonably sized (compress if > 2MB each)

View File

@@ -0,0 +1,295 @@
# Scenario B: Editing / Filling Content in Existing DOCX
## Core Principle
**"First, do no harm."** When editing an existing document, minimize changes. Touch only what needs to change. Preserve all formatting, styles, relationships, and structure that are not directly involved in the edit.
---
## When to Use
- Replacing placeholder text (`{{name}}`, `$DATE$`, `[PLACEHOLDER]`)
- Updating specific paragraphs or table cells
- Filling in form fields
- Adding or removing paragraphs in a known location
- Inserting tracked changes for review workflows
Do NOT use when: the user wants to change the look/style of the entire document (→ Scenario C) or create from scratch (→ Scenario A).
---
## Workflow
```
1. Preview → CLI: analyze <input.docx>
2. Analyze → Understand structure: sections, styles, headings, tables
3. Identify → Locate exact edit targets (paragraph index, table index, placeholder text)
4. Edit → Apply surgical changes via CLI or direct XML
5. Validate → CLI: validate <output.docx>
6. Diff → Compare before/after to verify only intended changes were made
```
---
## When to Use API vs Direct XML
### Use CLI Edit Command When:
- Replacing placeholder text (e.g., `{{fieldName}}` → actual value)
- Filling table data from JSON
- Updating document properties (title, author)
- Simple text insertions or deletions
### Use Direct XML Manipulation When:
- Text spans multiple runs with different formatting (run-boundary issues)
- Adding complex structures (nested tables, multi-image layouts)
- Manipulating Track Changes markup
- Modifying header/footer content
- Adjusting section properties
---
## Placeholder Patterns
The CLI natively supports `{{fieldName}}` placeholders:
```bash
# Replace all {{placeholders}} from a JSON map
dotnet run ... edit input.docx --fill-placeholders data.json --output filled.docx
```
Where `data.json`:
```json
{
"companyName": "Acme Corp",
"date": "March 21, 2026",
"amount": "$15,000.00",
"recipientName": "Jane Smith"
}
```
Other placeholder formats (`$FIELD$`, `[PLACEHOLDER]`) require text replacement:
```bash
dotnet run ... edit input.docx --replace "$DATE$" "March 21, 2026" --output updated.docx
```
---
## Text Replacement Strategies
### Simple Replacement
When the entire search text is within a single `w:r` (run):
```xml
<!-- Before -->
<w:r>
<w:rPr><w:b /></w:rPr>
<w:t>{{companyName}}</w:t>
</w:r>
<!-- After — formatting preserved -->
<w:r>
<w:rPr><w:b /></w:rPr>
<w:t>Acme Corp</w:t>
</w:r>
```
Direct replacement. The run's `w:rPr` is untouched.
### Complex Replacement (Split Runs)
When the search text is split across multiple runs (common when Word applies spell-check or formatting mid-text):
```xml
<!-- "{{companyName}}" split into 3 runs -->
<w:r><w:rPr><w:b /></w:rPr><w:t>{{company</w:t></w:r>
<w:r><w:rPr><w:b /><w:i /></w:rPr><w:t>Na</w:t></w:r>
<w:r><w:rPr><w:b /></w:rPr><w:t>me}}</w:t></w:r>
```
Strategy:
1. Concatenate text across runs to find the match
2. Place the replacement text in the **first** run (preserving its `w:rPr`)
3. Remove the text from subsequent runs (or remove the runs entirely if empty)
```xml
<!-- After -->
<w:r><w:rPr><w:b /></w:rPr><w:t>Acme Corp</w:t></w:r>
```
**Rule**: Always preserve the formatting of the first run in the match.
---
## Table Editing
### By Index
Tables are 0-indexed in document order:
```bash
dotnet run ... edit input.docx --table-index 0 --table-data data.json --output updated.docx
```
### By Header Matching
Find a table by its header row content:
```bash
dotnet run ... edit input.docx --table-match "Name,Amount,Date" --table-data data.json
```
### Table Data JSON Format
```json
{
"rows": [
["Alice Johnson", "$5,000", "2026-03-15"],
["Bob Smith", "$3,200", "2026-03-18"]
],
"appendRows": true
}
```
- `appendRows: true` — add rows after existing data
- `appendRows: false` (default) — replace all data rows (keeps header row)
### Direct XML Table Editing
To modify a specific cell, locate it by row/column index:
```xml
<!-- Row 2 (0-indexed), Column 1 -->
<w:tr> <!-- tr[2] -->
<w:tc>...</w:tc>
<w:tc> <!-- tc[1] — target cell -->
<w:p>
<w:r><w:t>Old Value</w:t></w:r>
</w:p>
</w:tc>
</w:tr>
```
Replace the `w:t` content. Do NOT modify `w:tcPr` (cell properties) or `w:tblPr` (table properties).
---
## Track Changes Guidance
### When to Add Revision Marks
- User explicitly requests tracked changes
- Document already has tracking enabled (`w:trackChanges` in settings)
- Collaborative review workflow
### When NOT to Add Revision Marks
- Form filling / placeholder replacement (these are "completing" the document, not "revising" it)
- Direct edits where the user wants a clean result
- Batch data filling operations
### Adding Tracked Changes
See `references/track_changes_guide.md` for full XML examples.
Quick reference — inserting text with tracking:
```xml
<w:ins w:id="1" w:author="MiniMaxAI" w:date="2026-03-21T10:00:00Z">
<w:r>
<w:t>New text here</w:t>
</w:r>
</w:ins>
```
Deleting text with tracking:
```xml
<w:del w:id="2" w:author="MiniMaxAI" w:date="2026-03-21T10:00:00Z">
<w:r>
<w:delText>Removed text</w:delText> <!-- MUST use delText, not t -->
</w:r>
</w:del>
```
---
## Common Pitfalls
### 1. Breaking Run Boundaries
**Problem**: Replacing text that spans runs by naively modifying individual runs destroys inline formatting.
**Fix**: Concatenate run text, find match boundaries, consolidate into the first run, remove consumed runs.
### 2. Hyperlink Content
**Problem**: Replacing text inside a `w:hyperlink` element without preserving the hyperlink wrapper removes the link.
```xml
<w:hyperlink r:id="rId5">
<w:r>
<w:rPr><w:rStyle w:val="Hyperlink" /></w:rPr>
<w:t>Click here</w:t> <!-- Only replace this text -->
</w:r>
</w:hyperlink>
```
**Fix**: Only modify the `w:t` inside the hyperlink's run. Never remove or replace the `w:hyperlink` element itself.
### 3. Tracked Change Context
**Problem**: Replacing text that is inside a `w:ins` or `w:del` element without understanding the revision context creates invalid markup.
**Fix**: If the target text is inside a revision mark, either:
- Replace within the revision context (preserving the `w:ins`/`w:del` wrapper)
- Or delete the old revision and create a new one
### 4. Style Preservation
**Problem**: Adding new paragraphs without specifying a style causes them to inherit `Normal`, which may not match the surrounding context.
**Fix**: When inserting paragraphs, copy the `w:pStyle` from an adjacent paragraph of the same type.
### 5. Numbering Continuity
**Problem**: Inserting a new list item breaks numbering sequence.
**Fix**: Ensure the new paragraph has the same `w:numId` and `w:ilvl` as adjacent list items. If continuing a sequence, set `w:numPr` to match.
### 6. XML Special Characters
**Problem**: User content contains `&`, `<`, `>`, `"`, `'` — these must be escaped in XML.
**Fix**: Always XML-escape user-provided text before inserting into `w:t` elements:
- `&``&amp;`
- `<``&lt;`
- `>``&gt;`
- `"``&quot;`
- `'``&apos;`
### 7. Whitespace Preservation
**Problem**: Leading/trailing spaces in `w:t` are stripped by XML parsers.
**Fix**: Add `xml:space="preserve"` attribute:
```xml
<w:t xml:space="preserve"> text with leading space</w:t>
```
---
## Diff Verification
After editing, always compare the before and after states:
```bash
# Structural diff — shows only changed elements
dotnet run ... diff original.docx modified.docx
# Text-only diff — shows content changes
dotnet run ... diff original.docx modified.docx --text-only
```
Verify:
- Only intended text changed
- No styles were modified
- No relationships were added/removed unexpectedly
- Table structure intact (same number of rows/columns unless intentionally changed)
- Images and other media unchanged

View File

@@ -0,0 +1,456 @@
# Scenario C: Applying Formatting / Templates
## When to Use
Use Scenario C when:
- The user has an existing document and wants to apply a different visual style
- The user wants to rebrand a document (new fonts, colors, heading styles)
- The user provides a template DOCX and wants its look applied to a content document
- The user wants consistent formatting across multiple documents
Do NOT use when: the user wants to edit content (→ Scenario B) or create from scratch (→ Scenario A).
---
## Workflow
```
1. Analyze source → CLI: analyze source.docx (list styles, fonts, structure)
2. Analyze template → CLI: analyze template.docx (list styles, fonts, structure)
3. Map styles → Create mapping plan (source style → template style)
4. Apply template → CLI: apply-template source.docx --template template.docx --output result.docx
5. Validate (XSD) → CLI: validate result.docx --xsd wml-subset.xsd
6. GATE-CHECK → CLI: validate result.docx --xsd business-rules.xsd ← MUST PASS
7. Diff verify → CLI: diff source.docx result.docx --text-only (content must be identical)
```
---
## What Gets Copied from Template
| Part | File | Description |
|------|------|-------------|
| Styles | `word/styles.xml` | All style definitions (paragraph, character, table, numbering) |
| Theme | `word/theme/theme1.xml` | Color scheme, font scheme, format scheme |
| Numbering | `word/numbering.xml` | List and numbering definitions |
| Headers | `word/header*.xml` | Header content and formatting |
| Footers | `word/footer*.xml` | Footer content and formatting |
| Section props | `w:sectPr` | Margins, page size, orientation, columns |
## What Does NOT Get Copied
| Part | Reason |
|------|--------|
| Document content | Paragraphs, tables, images stay from source |
| Comments | Belong to source document's review history |
| Tracked changes | Belong to source document's revision history |
| Custom XML parts | Application-specific data, not visual |
| Document properties | Title, author, dates belong to source |
| Glossary document | Template's building blocks are not transferred |
---
## Template Structure Analysis (REQUIRED)
Before choosing Overlay or Base-Replace, you MUST analyze the template's internal structure. This is the #1 cause of failure when skipped.
### Step 1: Count template paragraphs and identify structural zones
Run `$CLI analyze --input template.docx` or manually inspect:
```bash
# Quick structure scan
scripts/docx_preview.sh template.docx
```
Identify these zones in the template:
```
Zone A: Front matter (cover page, declaration, abstract, TOC)
→ These are KEPT from template, never replaced
Zone B: Example/placeholder body content ("第1章 XXX", sample paragraphs)
→ This is REPLACED with user's actual content
Zone C: Back matter (appendices, acknowledgments, blank pages)
→ These are KEPT from template or removed
Zone D: Final sectPr
→ ALWAYS kept from template
```
### Step 2: Find Zone B boundaries (replacement range)
Search the template's document.xml for anchor text that marks the start and end of example content:
**Start anchor patterns** (first paragraph of example body):
- "第1章", "第一章", "Chapter 1", "1 Introduction", "绪论"
- The first paragraph with a Heading1-equivalent style after TOC
**End anchor patterns** (last paragraph before back matter):
- "参考文献", "References", "致谢", "Acknowledgments"
- The last paragraph before appendices or final sectPr
```python
# Pseudocode for finding replacement range
for i, element in enumerate(template_body_elements):
text = get_text(element)
style = get_style(element)
if style in heading1_styles and ("第1章" in text or "Chapter 1" in text):
replace_start = i
if "参考文献" in text or "References" in text:
replace_end = i
break
```
**CRITICAL**: Verify the range by printing what's inside:
```
Template elements [0..replace_start-1]: front matter (KEEP)
Template elements [replace_start..replace_end]: example content (REPLACE)
Template elements [replace_end+1..end]: back matter (KEEP)
```
If replace_start or replace_end cannot be found, DO NOT proceed. Ask the user to identify the replacement boundaries.
### Step 3: Decide Overlay vs Base-Replace
Now that you know the structure:
| Observation | Decision |
|-------------|----------|
| Template has ≤30 paragraphs, no cover/TOC | **C-1: Overlay** (pure style template) |
| Template has >100 paragraphs with cover/TOC/example sections | **C-2: Base-Replace** |
| Template paragraph count ≈ user document | **C-1: Overlay** (similar structure) |
| Template paragraph count >> user document (e.g., 263 vs 134) | **C-2: Base-Replace** |
### Step 4: For Base-Replace, execute the replacement
1. Load template as base (all files)
2. Extract user content elements using `list(body)` — NOT `findall('w:p')` (which misses tables)
3. Build new body: `template[0:replace_start] + cleaned_user_content + template[replace_end+1:]`
4. Apply style mapping to every paragraph
5. Clean direct formatting (see rules below)
6. Rebuild document.xml, keeping template's namespace declarations
7. Merge relationships (images + hyperlinks)
8. Write output using template as ZIP base
---
## Style Mapping Strategy
When template style names differ from source style names, a mapping is required. **This step is mandatory** — skipping it is the #1 cause of formatting failures in template application.
### Step 0: Extract StyleIds from Both Documents (REQUIRED)
Before any template application, extract and compare styleIds from both documents:
```bash
# Extract all styleIds from source
$CLI analyze --input source.docx --styles-only
# Output example:
# Heading1 (paragraph, basedOn: Normal)
# Heading2 (paragraph, basedOn: Normal)
# Normal (paragraph)
# ListBullet (paragraph, basedOn: Normal)
# Extract all styleIds from template
$CLI analyze --input template.docx --styles-only
# Output example:
# 1 (paragraph, basedOn: a, name: "heading 1")
# 2 (paragraph, basedOn: a, name: "heading 2")
# 3 (paragraph, basedOn: a, name: "heading 3")
# a (paragraph, name: "Normal")
# a0 (character, name: "Default Paragraph Font")
```
**Critical distinction**: `w:styleId` vs `w:name`:
```xml
<!-- styleId="1" but name="heading 1" -->
<w:style w:type="paragraph" w:styleId="1">
<w:name w:val="heading 1"/>
<w:basedOn w:val="a"/>
</w:style>
```
The `w:styleId` attribute is what `<w:pStyle w:val="..."/>` references. The `w:name` attribute is the human-readable display name. **They can be completely different.** Many CJK templates use numeric styleIds (`1`, `2`, `3`, `a`, `a0`) instead of English names.
### Tier 1: Exact StyleId Match
If source uses `Heading1` and template defines `Heading1` as a styleId, map directly. No action needed.
### Tier 2: Name-Based Match
If no exact styleId match, try matching by `w:name` attribute:
- Source `Heading1` (name="heading 1") → Template styleId `1` (name="heading 1")
- Match is case-insensitive on the name value
Within the same type, also try matching by:
- Built-in style ID (Word's internal ID, e.g., heading 1 = built-in ID 1)
- Style type (paragraph → paragraph, character → character, table → table)
### Tier 3: Manual Mapping
For renamed or custom styles, provide an explicit mapping:
```json
{
"styleMap": {
"Heading1": "1",
"Heading2": "2",
"Heading3": "3",
"Heading4": "3",
"Normal": "a",
"BodyText": "a",
"ListBullet": "a",
"CompanyName": "Title",
"OldTableStyle": "TableGrid"
}
}
```
### Common Non-Standard StyleId Patterns
| Template Origin | StyleId Pattern | Example |
|----------------|-----------------|---------|
| Chinese Word (default) | Numeric/alphabetic | `1`, `2`, `3`, `a`, `a0` |
| English Word (default) | English names | `Heading1`, `Normal`, `Title` |
| Google Docs export | Prefixed | `Subtitle`, `NormalWeb` |
| WPS Office | Mixed | `1`, `Heading1`, custom names |
| Academic templates | Custom | `ThesisHeading1`, `ThesisBody` |
### Building the Mapping Table
Follow this algorithm:
1. **List source styleIds** actually used in `document.xml` (not all defined in `styles.xml`):
```python
# Pseudocode: find all unique pStyle values in source document.xml
used_styles = set()
for p in body.iter('w:p'):
pStyle = p.find('w:pPr/w:pStyle')
if pStyle is not None:
used_styles.add(pStyle.get('val'))
```
2. **For each used style**, find the best match in template:
- First try: exact styleId match
- Second try: match by `w:name` value (case-insensitive)
- Third try: match by style purpose (any heading → template's heading style)
- Fallback: map to template's default paragraph style (usually `Normal` or `a`)
3. **Validate the mapping** — every source styleId must map to an existing template styleId:
```
✓ Heading1 → 1 (name match: "heading 1")
✓ Heading2 → 2 (name match: "heading 2")
✓ Normal → a (name match: "Normal")
✗ CustomCallout → ??? (no match found, will fallback to 'a')
```
4. **Apply the mapping** when copying content — update every `<w:pStyle w:val="..."/>`:
```xml
<!-- Source -->
<w:pPr><w:pStyle w:val="Heading1"/></w:pPr>
<!-- After mapping -->
<w:pPr><w:pStyle w:val="1"/></w:pPr>
```
### Unmapped Styles
Styles in the source document that have no match in the template are logged as warnings:
```
WARNING: Style 'CustomCallout' has no mapping in template. Content will fall back to 'a' (Normal).
```
The content is preserved; only the style reference is updated to the template's default paragraph style.
### C-2 BASE-REPLACE: Additional StyleId Considerations
When using the template as a base document (C-2 strategy), the template's `styles.xml` is already in place. You must:
1. **Never copy source `styles.xml`** — the template's styles are the authority
2. **Map every content paragraph's pStyle** to the template's styleId before insertion
3. **Strip direct formatting selectively** (see detailed rules below) — let the template style control appearance
4. **Verify table styles** — if source tables use `TableGrid` but template defines it as `a3` or similar, remap `<w:tblStyle>` too
5. **Check character styles** — `rPr` inside runs may reference character styles like `Hyperlink` or `Strong` that have different IDs in the template
### Direct Formatting Cleanup Rules (Detailed)
When copying content from source to template, apply these rules to EACH paragraph and run:
**REMOVE from `<w:rPr>`:**
- `<w:rFonts w:ascii="..." w:hAnsi="..."/>` — Latin font overrides (EXCEPT: keep `w:eastAsia`)
- `<w:sz>`, `<w:szCs>` — font size (let style control)
- `<w:color>` — text color
- `<w:highlight>` — highlight color
- `<w:shd>` — shading
- `<w:b>`, `<w:i>` — bold/italic UNLESS the source style requires it (e.g., emphasis)
- `<w:u>` — underline
- `<w:spacing>` — character spacing
**KEEP in `<w:rPr>`:**
- `<w:rFonts w:eastAsia="宋体"/>` — CJK font declaration (MUST keep, or Chinese text renders wrong)
- `<w:rFonts w:eastAsia="华文中宋"/>` — same reason
- Anything inside `<w:drawing>` — image references (handle separately via rId remapping)
**REMOVE from `<w:pPr>`:**
- `<w:pBdr>` — paragraph borders
- `<w:shd>` — paragraph shading
- `<w:spacing>` — line/paragraph spacing (let style control)
- `<w:jc>` — justification (let style control)
- `<w:tabs>` — custom tab stops
- `<w:rPr>` inside pPr — default run formatting for the paragraph
**KEEP in `<w:pPr>`:**
- `<w:pStyle>` — style reference (after mapping to template's styleId)
- `<w:sectPr>` — section properties (if intentionally inserting section breaks)
- `<w:numPr>` — numbering reference (after mapping numId to template's numbering)
**Table cells (`<w:tc>`):**
Apply the same rPr/pPr cleanup to every paragraph inside every cell. Also:
- Keep `<w:tcPr>` structural properties (column span, row span, width)
- Remove `<w:tcPr><w:shd>` (cell shading — let table style control)
---
## Relationship ID Remapping
When copying parts (headers, footers, images) from the template into the source package, relationship IDs (`r:id`) may collide.
**Problem**:
- Source has `rId7` → `image1.png`
- Template has `rId7` → `header1.xml`
- Copying template's `rId7` overwrites source's image reference
**Solution**:
1. Scan source's `document.xml.rels` for all existing `rId` values
2. Find the maximum numeric ID (e.g., `rId12`)
3. Remap all template relationship IDs starting from `rId13`
4. Update all references in copied parts to use new IDs
```xml
<!-- Template original -->
<Relationship Id="rId1" Type="...header" Target="header1.xml" />
<!-- After remapping into source package -->
<Relationship Id="rId13" Type="...header" Target="header1.xml" />
<!-- Update sectPr reference -->
<w:headerReference w:type="default" r:id="rId13" />
```
### Hyperlink Relationship Merging
When the source document contains external hyperlinks (e.g., URLs in references or footnotes), these are stored as relationships in `word/_rels/document.xml.rels`:
```xml
<Relationship Id="rId15" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"
Target="https://example.com/paper" TargetMode="External"/>
```
The corresponding text in document.xml references this rId:
```xml
<w:hyperlink r:id="rId15">
<w:r><w:t>https://example.com/paper</w:t></w:r>
</w:hyperlink>
```
**Merging steps:**
1. Scan source document.xml for all `<w:hyperlink r:id="...">` elements
2. For each, find the corresponding relationship in source's rels file
3. Check if template already has a relationship with the same Target URL
- If yes: reuse the existing rId, update the hyperlink reference
- If no: assign a new rId (starting from template's max rId + 1), add the relationship to template's rels, update the hyperlink reference
4. Also check for hyperlink relationships used in footnotes (`word/_rels/footnotes.xml.rels`) and endnotes
**Common mistake:** Copying hyperlink paragraphs without merging rels → hyperlinks silently break (clicking does nothing in Word).
---
## XSD Gate-Check
### What It Is
After template application, the output document **MUST** pass `business-rules.xsd` validation. This is a **hard gate** — if it fails, the document is **NOT deliverable**.
### What business-rules.xsd Checks
| Rule | What It Validates |
|------|-------------------|
| Template styles exist | All styles referenced by content paragraphs are defined in `styles.xml` |
| Margins match | Page margins match template specification |
| Fonts correct | `w:docDefaults` fonts match template's font scheme |
| Heading hierarchy | Heading levels are sequential (no H1 → H3 without H2) |
| Required styles present | `Normal`, `Heading1`-`Heading3`, `TableGrid` exist |
| Page size | Matches template's declared page size |
### Handling Failures
```
GATE-CHECK FAILED:
- Style 'CustomStyle1' referenced in paragraph 14 but not defined in styles.xml
- Margin w:left=1080 does not match template requirement 1440
```
Fix each failure:
1. **Missing style**: Add the style definition to `styles.xml`, or remap the paragraph to an existing style
2. **Margin mismatch**: Update `w:sectPr` margins to match template
3. **Font mismatch**: Update `w:docDefaults` to match template font scheme
4. **Heading hierarchy gap**: Insert intermediate heading levels or adjust existing levels
Re-validate after every fix until gate-check passes.
---
## Common Pitfalls
### 1. Orphaned Numbering References
**Problem**: Source document uses `w:numId="5"` in list paragraphs, but after replacing `numbering.xml` with the template's version, numbering ID 5 doesn't exist.
**Symptom**: Lists appear as plain paragraphs (no bullets/numbers).
**Fix**:
- Map source numbering IDs to template numbering IDs
- Update all `w:numId` references in document content
- Or merge source numbering definitions into template's `numbering.xml`
### 2. Missing Theme Colors
**Problem**: Source document's styles reference theme colors (`w:themeColor="accent1"`) that have different values in the template's theme.
**Symptom**: Colors change unexpectedly (usually acceptable — this IS the point of re-theming). But if a style uses `w:color` with both `w:val` and `w:themeColor`, the theme color wins in Word.
**Fix**: Review color changes. If specific colors must be preserved, use explicit `w:val` without `w:themeColor`.
### 3. Section Property Conflicts
**Problem**: Source document has multiple sections (e.g., portrait + landscape pages), but the template assumes a single section.
**Symptom**: All sections get the same margins/orientation, breaking landscape pages.
**Fix**:
- Only apply template section properties to the final `w:sectPr` in `w:body`
- Preserve intermediate `w:sectPr` elements (inside `w:pPr`) from the source
- Or apply template properties to all sections but preserve orientation overrides
### 4. Embedded Font Conflicts
**Problem**: Template specifies fonts not available on the target system.
**Fix**: Either embed fonts in the DOCX (`word/fonts/`) or use web-safe alternatives:
- Calibri → available on Windows/Mac/Office online
- Arial → universal fallback
- Times New Roman → universal serif fallback
### 5. Broken Style Inheritance
**Problem**: Template has `Heading1` based on `Normal`, but after applying template, `Normal` has different properties, cascading unwanted changes to headings.
**Fix**: Verify the `w:basedOn` chain for all critical styles. Ensure base styles are also correctly transferred from template.
---
## Verification Checklist
After template application, verify:
1. **Content preserved** — text diff shows zero content changes
2. **Gate-check passed** — `business-rules.xsd` validation succeeds
3. **Styles applied** — headings, body text, tables use template formatting
4. **Images intact** — all images render correctly (relationship IDs valid)
5. **Lists working** — numbered and bulleted lists display correctly
6. **Headers/footers** — template headers/footers appear on all pages
7. **Page layout** — margins, page size, orientation match template
8. **No corruption** — file opens without errors in Word

View File

@@ -0,0 +1,200 @@
# Track Changes Guide
## Overview
Track Changes in OpenXML uses revision markup elements to record insertions, deletions, and formatting changes. Each revision has a unique ID, author, and timestamp.
---
## Insertion: `<w:ins>`
Wraps runs that were inserted during tracking:
```xml
<w:ins w:id="1" w:author="John Smith" w:date="2026-03-21T10:30:00Z">
<w:r>
<w:rPr>
<w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" />
<w:sz w:val="22" />
</w:rPr>
<w:t>This text was inserted.</w:t>
</w:r>
</w:ins>
```
- `w:id` — unique revision ID (integer, must be unique across document)
- `w:author` — free text string identifying the author
- `w:date` — ISO 8601 format with timezone: `YYYY-MM-DDTHH:MM:SSZ`
- Content inside is normal runs (`w:r`) with optional formatting
---
## Deletion: `<w:del>`
Wraps runs that were deleted during tracking:
```xml
<w:del w:id="2" w:author="John Smith" w:date="2026-03-21T10:31:00Z">
<w:r>
<w:rPr>
<w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" />
<w:sz w:val="22" />
</w:rPr>
<w:delText xml:space="preserve">This text was deleted.</w:delText>
</w:r>
</w:del>
```
**CRITICAL**: Inside `<w:del>`, text MUST use `<w:delText>`, NOT `<w:t>`. Using `<w:t>` inside a deletion is invalid and will cause corruption or unexpected behavior. Word may silently repair it, but other consumers will fail.
---
## Formatting Change: `<w:rPrChange>`
Records that a run's formatting was changed. Placed inside `w:rPr`, it stores the **previous** formatting:
```xml
<w:r>
<w:rPr>
<w:b /> <!-- Current: bold -->
<w:rPrChange w:id="3" w:author="Jane Doe" w:date="2026-03-21T11:00:00Z">
<w:rPr>
<!-- Previous: not bold (empty rPr means no formatting) -->
</w:rPr>
</w:rPrChange>
</w:rPr>
<w:t>This text was made bold.</w:t>
</w:r>
```
The outer `w:rPr` holds the **new** (current) formatting. The `w:rPrChange` child holds the **old** (previous) formatting.
---
## Paragraph Property Change: `<w:pPrChange>`
Records paragraph-level formatting changes (alignment, spacing, style):
```xml
<w:pPr>
<w:jc w:val="center" /> <!-- Current: centered -->
<w:pPrChange w:id="4" w:author="Jane Doe" w:date="2026-03-21T11:05:00Z">
<w:pPr>
<w:jc w:val="left" /> <!-- Previous: left-aligned -->
</w:pPr>
</w:pPrChange>
</w:pPr>
```
---
## Revision ID Management
- Every revision element (`w:ins`, `w:del`, `w:rPrChange`, `w:pPrChange`, `w:tblPrChange`, etc.) requires a `w:id` attribute
- IDs must be **unique integers** across the entire document
- IDs should be **monotonically increasing** (not strictly required, but expected by Word)
- When adding revisions, scan for the current maximum `w:id` and increment from there
```
Existing max ID: 47
New insertion: w:id="48"
New deletion: w:id="49"
```
---
## Author and Date
- **Author**: Free text. Use consistent strings (e.g., `"MiniMaxAI"` for all automated edits)
- **Date**: ISO 8601 with UTC timezone marker: `2026-03-21T10:30:00Z`
- Must include the `T` separator and `Z` suffix (or `+HH:MM` offset)
- Omitting the date is allowed but not recommended
---
## Operations
### Propose Insertion
Add `<w:ins>` wrapper around new content at the target location:
```xml
<w:p>
<w:r><w:t>Existing text. </w:t></w:r>
<w:ins w:id="5" w:author="MiniMaxAI" w:date="2026-03-21T12:00:00Z">
<w:r><w:t>Proposed new text. </w:t></w:r>
</w:ins>
<w:r><w:t>More existing text.</w:t></w:r>
</w:p>
```
### Propose Deletion
Wrap existing content in `<w:del>` and change `<w:t>` to `<w:delText>`:
```xml
<w:p>
<w:r><w:t>Keep this. </w:t></w:r>
<w:del w:id="6" w:author="MiniMaxAI" w:date="2026-03-21T12:01:00Z">
<w:r>
<w:rPr><w:b /></w:rPr>
<w:delText>Remove this.</w:delText>
</w:r>
</w:del>
<w:r><w:t> Keep this too.</w:t></w:r>
</w:p>
```
### Accept a Tracked Change
- **Accept insertion**: Remove the `<w:ins>` wrapper, keep the inner runs as normal content
- **Accept deletion**: Remove the entire `<w:del>` element and its content
### Reject a Tracked Change
- **Reject insertion**: Remove the entire `<w:ins>` element and its content
- **Reject deletion**: Remove the `<w:del>` wrapper, change `<w:delText>` back to `<w:t>`
---
## Cross-Paragraph Operations
### Deleting a Paragraph Break (Merging Paragraphs)
When tracked deletion spans a paragraph boundary, use `<w:pPrChange>` on the merged paragraph:
```xml
<w:p>
<w:pPr>
<w:pPrChange w:id="7" w:author="MiniMaxAI" w:date="2026-03-21T12:05:00Z">
<w:pPr>
<w:pStyle w:val="Normal" />
</w:pPr>
</w:pPrChange>
</w:pPr>
<w:r><w:t>First paragraph text. </w:t></w:r>
<w:del w:id="8" w:author="MiniMaxAI" w:date="2026-03-21T12:05:00Z">
<w:r><w:delText> </w:delText></w:r>
</w:del>
<w:r><w:t>Second paragraph text (now merged).</w:t></w:r>
</w:p>
```
### Inserting a New Paragraph
The entire new paragraph is wrapped in `<w:ins>`:
```xml
<w:p>
<w:pPr>
<w:rPr>
<w:ins w:id="9" w:author="MiniMaxAI" w:date="2026-03-21T12:10:00Z" />
</w:rPr>
</w:pPr>
<w:ins w:id="10" w:author="MiniMaxAI" w:date="2026-03-21T12:10:00Z">
<w:r><w:t>Entirely new paragraph.</w:t></w:r>
</w:ins>
</w:p>
```
The paragraph mark itself is marked as inserted via `w:ins` inside `w:pPr > w:rPr`.

View File

@@ -0,0 +1,506 @@
# Troubleshooting Guide — Symptom-Driven
## How to Use This Guide
Search by the **SYMPTOM** you observe, not the technical concept. Each entry follows:
- **Symptom** — what you see or what the user reports
- **Diagnosis** — how to confirm the root cause
- **Fix** — exact steps, commands, or code
- **Prevention** — how to avoid it next time
**Quick search keywords:** headings wrong, body text, repair, corrupt, font, tables missing, images missing, TOC broken, update table, page break, section break, hyperlink, numbered list, bullets, margins, page size, Chinese tofu, cover page, track changes, revision marks
---
## 1. "All headings look like body text" (Heading Styles Not Applied)
**Symptom:** After template application, headings have no formatting — they look like Normal paragraphs. Font size, bold, spacing are all wrong.
**Diagnosis:** The `pStyle` values in `document.xml` don't match the `styleId` values in `styles.xml`.
Common mismatches:
- Source uses `Heading1` but template defines the style as `1` (Chinese templates often use numeric styleIds)
- Source uses `heading1` (lowercase) but template has `Heading1` (case-sensitive!)
- `pStyle` references a style that simply doesn't exist in the output's `styles.xml`
Check with:
```bash
# List all pStyle values used in the document
$CLI analyze --input output.docx | grep -i "pStyle"
# List all styleIds defined in styles.xml
$CLI analyze --input template.docx --part styles | grep "styleId"
```
**Fix:** Build a styleId mapping table before applying the template. Update every `pStyle` value in the document content.
```csharp
// Build mapping: source styleId → template styleId
var mapping = new Dictionary<string, string>();
// Compare by style name (w:name), not by styleId
foreach (var srcStyle in sourceStyles)
{
var templateStyle = templateStyles.FirstOrDefault(
s => s.StyleName?.Val?.Value == srcStyle.StyleName?.Val?.Value);
if (templateStyle != null)
mapping[srcStyle.StyleId!] = templateStyle.StyleId!;
}
// Apply mapping to all paragraphs
foreach (var para in body.Descendants<Paragraph>())
{
var pStyle = para.ParagraphProperties?.ParagraphStyleId;
if (pStyle != null && mapping.TryGetValue(pStyle.Val!, out var newId))
pStyle.Val = newId;
}
```
**Prevention:** ALWAYS extract and compare styleIds from both source and template before template application. Never assume styleIds are the same across documents.
---
## 2. "Document opens with repair warnings" (XML Corruption)
**Symptom:** Word says "We found a problem with some content" or "Word found unreadable content" when opening.
**Diagnosis:** Element ordering is wrong. OpenXML is strict about child element order.
Common violations:
- `pPr` must come before runs in `w:p`
- `tblPr` must come before `tblGrid` in `w:tbl`
- `rPr` must come before `t`/`br`/`tab` in `w:r`
- `trPr` must come before `tc` in `w:tr`
- `tcPr` must come before content in `w:tc`
```bash
# Validate to find ordering issues
$CLI validate --input doc.docx --xsd assets/xsd/wml-subset.xsd
# Auto-fix element ordering
$CLI fix-order --input doc.docx
# Re-validate
$CLI validate --input doc.docx --xsd assets/xsd/wml-subset.xsd
```
**Fix:**
```bash
$CLI fix-order --input doc.docx
```
If auto-fix doesn't resolve it, unpack and inspect manually:
```bash
$CLI unpack --input doc.docx --output unpacked/
# Check word/document.xml for ordering issues
# Fix, then repack:
$CLI pack --input unpacked/ --output fixed.docx
```
**Prevention:** Read `references/openxml_element_order.md` before writing any XML manipulation code. Always append properties elements first, then content elements.
---
## 3. "All text is in wrong font" (Font Contamination)
**Symptom:** Template specifies 宋体/Times New Roman but document shows Google Sans, Arial, Calibri, or whatever font the source document used.
**Diagnosis:** Source document's `rPr` contains inline `rFonts` declarations that override template styles. Direct formatting always wins over style-based formatting in OpenXML.
```bash
# Check for font contamination
$CLI analyze --input output.docx | grep -i "font"
# Look for rFonts in the content — if present, they're overriding styles
```
**Fix:** Strip `rFonts` from `rPr` when copying content, but KEEP `w:eastAsia` for CJK text:
```csharp
foreach (var rPr in body.Descendants<RunProperties>())
{
var rFonts = rPr.GetFirstChild<RunFonts>();
if (rFonts != null)
{
// Preserve EastAsia font for CJK — removing it causes tofu (□□□)
var eastAsia = rFonts.EastAsia?.Value;
rFonts.Remove();
// Re-add only eastAsia if it was set and text contains CJK
if (!string.IsNullOrEmpty(eastAsia))
{
rPr.Append(new RunFonts { EastAsia = eastAsia });
}
}
}
```
Also strip these common direct formatting overrides:
- `w:sz` / `w:szCs` (font size)
- `w:color` (text color)
- `w:b` / `w:i` when they contradict the style
**Prevention:** Always clean direct formatting when copying content between documents. Keep only `pStyle`/`rStyle` references and `w:t` text.
---
## 4. "Tables are missing" (Tables Lost During Copy)
**Symptom:** Source had 5 tables but output only has 2 (or 0).
**Diagnosis:** Code used `body.findall('w:p')` or `body.Descendants<Paragraph>()` at the top level instead of iterating all children. This skips `w:tbl` elements.
```bash
# Verify table count
$CLI analyze --input source.docx | grep -i "table"
$CLI analyze --input output.docx | grep -i "table"
```
**Fix:** Use `list(body)` or `body.ChildElements` to get ALL top-level children including tables:
```csharp
// WRONG — skips tables, section properties, and other non-paragraph elements
var paragraphs = body.Elements<Paragraph>();
// CORRECT — gets everything: paragraphs, tables, SDT blocks, etc.
var allElements = body.ChildElements.ToList();
```
In Python with lxml:
```python
# WRONG
elements = body.findall('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p')
# CORRECT
elements = list(body) # all direct children
```
**Prevention:** Always use `list(body)` or `body.ChildElements` for iteration, never filter by a single element type alone when copying content.
---
## 5. "Images are missing or show broken icon"
**Symptom:** Image placeholders appear but images don't render. Or images are completely absent.
**Diagnosis:** The `r:embed` rId in `w:drawing` doesn't match any relationship in `document.xml.rels`, or the media file wasn't copied to the output ZIP.
```bash
# Check relationships
$CLI analyze --input output.docx --part rels | grep -i "image"
# Check if media files exist
$CLI unpack --input output.docx --output unpacked/
ls unpacked/word/media/
```
**Fix:**
1. Check source rels for image file paths
2. Copy media files from source to output
3. Add/update relationships in output rels
4. Update `r:embed` values in drawing elements
```csharp
// When copying content with images between documents:
foreach (var drawing in body.Descendants<Drawing>())
{
var blip = drawing.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault();
if (blip?.Embed?.Value != null)
{
var sourceRel = sourcePart.GetReferenceRelationship(blip.Embed.Value);
// Copy the image part to the target document
var imagePart = targetPart.AddImagePart(ImagePartType.Png);
using var stream = sourcePart.GetPartById(blip.Embed.Value).GetStream();
imagePart.FeedData(stream);
// Update the rId reference
blip.Embed = targetPart.GetIdOfPart(imagePart);
}
}
```
**Prevention:** Always do rId remapping + media file copy when moving content between documents. Never assume rIds are portable across documents.
---
## 6. "TOC shows stale/wrong entries" or "Update Table doesn't work"
**Symptom:** Table of contents shows the template's example entries (e.g., "第1章 绪论...1") instead of actual headings. Or clicking "Update Table" in Word does nothing.
**Diagnosis:**
- **Stale entries (normal):** TOC entries are static text cached inside the field. They don't auto-update until the user explicitly updates in Word.
- **Update Table fails:** The SDT wrapper or field code structure is damaged. The TOC in real templates is a mixed structure: SDT block + field code + static entries.
```bash
# Check if TOC SDT exists
$CLI analyze --input output.docx | grep -i "sdt\|toc"
```
**Fix:**
- **If entries are just stale:** This is expected behavior. The user must right-click TOC, then "Update Field" in Word. Or enable auto-update:
```csharp
// See FieldAndTocSamples.EnableUpdateFieldsOnOpen()
FieldAndTocSamples.EnableUpdateFieldsOnOpen(settingsPart);
```
- **If SDT is damaged:** Keep the entire SDT block from the template intact. Do not modify it.
- **If field code is missing:** Ensure the TOC contains: `fldChar begin` + `instrText` + `fldChar separate` + static entries + `fldChar end`. See `FieldAndTocSamples.CreateMixedTocStructure()` for the complete pattern.
- **If you rebuilt TOC from scratch (common mistake):** You likely destroyed the SDT wrapper. Use the template's original SDT block instead. See `Samples/FieldAndTocSamples.cs` method `CreateMixedTocStructure` for how real-world TOC is structured.
**Prevention:** When doing Base-Replace (C-2), keep the template's TOC zone completely untouched. Do not strip, rebuild, or modify the SDT block. The TOC will auto-update when the user opens in Word.
---
## 7. "Chapters don't start on new pages" (Missing Section Breaks)
**Symptom:** Content flows continuously without page breaks between chapters. Chapter 2 starts right after Chapter 1's last paragraph on the same page.
**Diagnosis:** No `sectPr` elements or page break paragraphs between chapters.
**Fix:** Insert a paragraph with `sectPr` in its `pPr` before each chapter heading, or insert a page break:
```csharp
// Option 1: Section break (preserves per-section settings like headers/margins)
var breakPara = new Paragraph(
new ParagraphProperties(
new SectionProperties(
new SectionType { Val = SectionMarkValues.NextPage })));
// Option 2: Simple page break (lighter weight)
var breakPara = new Paragraph(
new Run(new Break { Type = BreakValues.Page }));
// Insert before each Heading1
body.InsertBefore(breakPara, heading1Paragraph);
```
**Prevention:** When copying content, insert page/section breaks before Heading1 paragraphs as needed. Check source document's section structure before copying.
---
## 8. "Hyperlinks don't work" (Broken Links)
**Symptom:** Clicking a hyperlink in the output document does nothing, or it navigates to the wrong URL.
**Diagnosis:** `w:hyperlink r:id` points to a relationship that doesn't exist in `document.xml.rels`.
```bash
# Check hyperlink relationships
$CLI analyze --input output.docx --part rels | grep -i "hyperlink"
```
**Fix:** Merge source document's hyperlink relationships into output's rels file. Update rId references.
```csharp
foreach (var hyperlink in body.Descendants<Hyperlink>())
{
if (hyperlink.Id?.Value != null)
{
var sourceRel = sourcePart.HyperlinkRelationships
.FirstOrDefault(r => r.Id == hyperlink.Id.Value);
if (sourceRel != null)
{
targetPart.AddHyperlinkRelationship(sourceRel.Uri, sourceRel.IsExternal);
var newRel = targetPart.HyperlinkRelationships.Last();
hyperlink.Id = newRel.Id;
}
}
}
```
**Prevention:** Always merge ALL relationship types (images, hyperlinks, headers, footers) when combining documents. Never assume source rIds work in the target.
---
## 9. "Numbered lists show wrong numbers" or "Bullets disappeared"
**Symptom:** Lists that were numbered 1, 2, 3 now show 1, 1, 1 or have no numbers/bullets at all.
**Diagnosis:** `numId` in `pPr` references a numbering definition that doesn't exist in `numbering.xml`, or `abstractNumId` mapping is broken.
```bash
# Check numbering definitions
$CLI analyze --input output.docx --part numbering
```
**Fix:** Map source numIds to template numIds, or merge numbering definitions:
```csharp
// 1. Copy abstractNum definitions from source to target numbering.xml
// 2. Create new num entries pointing to the copied abstractNum
// 3. Update all numId references in document content
var sourceNumbering = sourceNumberingPart.Numbering;
var targetNumbering = targetNumberingPart.Numbering;
// Get max existing IDs to avoid collisions
int maxAbstractNumId = targetNumbering.Elements<AbstractNum>()
.Max(a => a.AbstractNumberId?.Value ?? 0) + 1;
int maxNumId = targetNumbering.Elements<NumberingInstance>()
.Max(n => n.NumberID?.Value ?? 0) + 1;
```
**Prevention:** Include `numbering.xml` reconciliation in template application workflow. See `Samples/ListAndNumberingSamples.cs` for correct numbering setup.
---
## 10. "Page margins/size are wrong"
**Symptom:** Output has different margins, page size, or orientation than the template.
**Diagnosis:** Source document's `sectPr` is overriding the template's `sectPr`. The final `sectPr` (child of `body`) controls the last section's layout.
```bash
# Compare section properties
$CLI analyze --input template.docx | grep -i "sectPr\|margin\|pgSz"
$CLI analyze --input output.docx | grep -i "sectPr\|margin\|pgSz"
```
**Fix:** Use the template's final `sectPr`. For intermediate `sectPr` elements (multi-section documents), merge carefully.
```csharp
// Replace output's final sectPr with template's
var templateSectPr = templateBody.Elements<SectionProperties>().LastOrDefault();
var outputSectPr = outputBody.Elements<SectionProperties>().LastOrDefault();
if (templateSectPr != null)
{
var cloned = templateSectPr.CloneNode(true) as SectionProperties;
if (outputSectPr != null)
outputBody.ReplaceChild(cloned!, outputSectPr);
else
outputBody.Append(cloned!);
}
```
**Prevention:** Always use the template's `sectPr` as authority for page layout. Strip source document's `sectPr` before copying content.
---
## 11. "Chinese text renders as boxes/tofu"
**Symptom:** Chinese characters display as square boxes (□□□) or missing glyphs.
**Diagnosis:** `rFonts w:eastAsia` is set to a font that doesn't exist on the system, or is missing entirely. Without an East Asian font declaration, the rendering engine may fall back to a font without CJK coverage.
**Fix:** Ensure all CJK text has `w:eastAsia` set to an available font:
```csharp
foreach (var run in body.Descendants<Run>())
{
var text = run.InnerText;
if (ContainsCjk(text))
{
var rPr = run.RunProperties ?? new RunProperties();
var rFonts = rPr.GetFirstChild<RunFonts>();
if (rFonts == null)
{
rFonts = new RunFonts();
rPr.Append(rFonts);
}
// Set to a universally available CJK font
rFonts.EastAsia = "SimSun"; // 宋体 — safest default
if (run.RunProperties == null) run.PrependChild(rPr);
}
}
static bool ContainsCjk(string text)
{
return text.Any(c => c >= 0x4E00 && c <= 0x9FFF);
}
```
Common safe CJK fonts: 宋体 (SimSun), 黑体 (SimHei), 仿宋 (FangSong), 楷体 (KaiTi).
**Prevention:** When cleaning `rPr` formatting, ALWAYS preserve `w:eastAsia` font declarations. See also `references/cjk_typography.md`.
---
## 12. "Template's cover page / declaration page is missing"
**Symptom:** Output document starts directly with body content — no cover page, no declaration, no abstract, no table of contents. The template's structural front matter was discarded.
**Diagnosis:** Used Overlay (C-1) strategy when Base-Replace (C-2) was needed. Overlay applies styles to the source document but discards the template's structural content (cover, declaration, abstract, TOC).
```bash
# Check template structure
$CLI analyze --input template.docx
# If template has >50 paragraphs with cover/TOC/declaration, C-2 is needed
```
**Fix:** Use Base-Replace (C-2) strategy — template is the base, only replace the example body content zone with the user's content:
1. Identify the template's "body zone" (everything between TOC and final sectPr)
2. Remove the template's example body content
3. Insert the user's content into the body zone
4. Keep everything else from the template (cover, declaration, abstract, TOC, sectPr)
```bash
$CLI apply-template --input source.docx --template template.docx --output out.docx --strategy base-replace
```
**Prevention:** Analyze template structure FIRST. If template has structural content (cover, TOC, declaration sections), always use C-2 (Base-Replace). Read `references/scenario_c_apply_template.md` for detailed decision criteria.
---
## 13. "Track changes markers appear unexpectedly"
**Symptom:** Output shows red/green revision marks (insertions, deletions) that weren't in the source document.
**Diagnosis:** Template had track changes enabled, or content was inserted as revisions rather than normal text.
```bash
# Check for revision marks
$CLI analyze --input output.docx | grep -i "revision\|ins\|del\|track"
```
**Fix:** Accept all revisions by flattening `w:ins` and `w:del` elements:
```csharp
// Accept insertions: unwrap w:ins, keep content
foreach (var ins in body.Descendants<InsertedRun>().ToList())
{
var parent = ins.Parent!;
foreach (var child in ins.ChildElements.ToList())
{
parent.InsertBefore(child.CloneNode(true), ins);
}
ins.Remove();
}
// Accept deletions: remove w:del and its content entirely
foreach (var del in body.Descendants<DeletedRun>().ToList())
{
del.Remove();
}
```
Or disable tracking in settings:
```csharp
var settings = settingsPart.Settings;
var trackChanges = settings.GetFirstChild<TrackChanges>();
trackChanges?.Remove();
```
**Prevention:** Check template's `settings.xml` for `trackChanges` before starting. If present, accept all revisions in the template first.
---
## Recovery Strategy — When Multiple Issues Exist
When a document has multiple problems, fix them in this priority order:
```
1. [Content_Types].xml — without this, nothing opens
2. _rels/.rels — package relationships
3. word/_rels/document.xml.rels — part relationships (images, hyperlinks)
4. word/document.xml — element ordering (fix-order)
5. word/styles.xml — style definitions and styleId mapping
6. word/numbering.xml — list/numbering definitions
7. Everything else — headers, footers, comments, settings
```
```bash
# Full recovery pipeline
$CLI unpack --input broken.docx --output unpacked/
$CLI validate --input broken.docx --xsd assets/xsd/wml-subset.xsd # find all errors
$CLI fix-order --input broken.docx # fix element ordering
$CLI validate --input broken.docx --business # check business rules
scripts/docx_preview.sh broken.docx # visual check
```

View File

@@ -0,0 +1,294 @@
# Professional Document Design & Typography Guide
## Table of Contents
1. [Font Pairing](#font-pairing)
2. [Font Sizes by Document Type](#font-sizes-by-document-type)
3. [Line Spacing](#line-spacing)
4. [Paragraph Spacing](#paragraph-spacing)
5. [Page Layout](#page-layout)
6. [Table Design](#table-design)
7. [Color Schemes](#color-schemes)
8. [Visual Hierarchy](#visual-hierarchy)
9. [Quick Reference Defaults](#quick-reference-defaults)
---
## Font Pairing
### Recommended Pairs
| Headings | Body | Style | Best For |
|----------|------|-------|----------|
| Calibri Light | Calibri | Modern sans | Corporate reports |
| Aptos | Aptos | Office 365 default | Modern business docs |
| Cambria | Calibri | Serif + sans | Academic-corporate hybrid |
| Times New Roman | Times New Roman | Traditional serif | Academic, legal |
| Arial | Arial | Clean sans | Memos, internal docs |
| Georgia | Garamond | Classical serif pair | Formal reports |
### Rules
- **Limit**: 2 font families max (3 if CJK mixed)
- **Contrast**: Pair serif with sans-serif, OR use weight contrast within one family
- **Consistency**: Same font for all body text, same font for all headings
---
## Font Sizes by Document Type
| Document Type | Body | H1 | H2 | H3 | Footnotes |
|--------------|------|----|----|----|----|
| **Business report** | 11pt | 18-20pt | 14-16pt | 12-13pt bold | 9pt |
| **Business letter** | 11-12pt | — | — | — | 9-10pt |
| **Memo** | 11pt | 14pt bold | 12pt bold | 11pt bold | 9pt |
| **Contract / Legal** | 12pt | 14pt bold caps | 12pt bold | 12pt bold | 10pt |
| **Academic (APA 7)** | 12pt | 12pt bold center | 12pt bold left | 12pt bold italic | 10pt |
| **Resume / CV** | 10-11pt | 14-16pt | 12pt bold | 11pt bold | 8-9pt |
| **Chinese 公文** | 三号(16pt) | 二号(22pt) | 三号(16pt) | 四号(14pt) | 小四(12pt) |
### OpenXML `w:sz` Values (half-points)
| Point Size | `w:sz` Val | Common Use |
|-----------|-----------|------------|
| 9pt | 18 | Footnotes, captions |
| 10pt | 20 | Compact body text |
| 10.5pt (五号) | 21 | CJK body small |
| 11pt | 22 | Standard body (Calibri) |
| 12pt (小四) | 24 | Standard body (TNR), CJK |
| 14pt (四号) | 28 | CJK body, subheading |
| 16pt (三号) | 32 | CJK heading, western H2 |
| 18pt (小二) | 36 | Western H1 |
| 22pt (二号) | 44 | CJK document title |
| 26pt (一号) | 52 | Large title |
---
## Line Spacing
| Spacing | OpenXML `w:spacing line` | When to Use |
|---------|--------------------------|-------------|
| Single (1.0) | `line="240"` lineRule="auto" | Tables, footnotes, captions |
| 1.08 (MS default) | `line="259"` lineRule="auto" | Modern Office documents |
| 1.15 | `line="276"` lineRule="auto" | Business reports — best general default |
| 1.5 | `line="360"` lineRule="auto" | Some academic, drafts for markup |
| Double (2.0) | `line="480"` lineRule="auto" | APA/MLA manuscripts, legal briefs |
| Fixed 28pt | `line="560"` lineRule="exact" | Chinese 公文 (GB/T 9704) |
**`lineRule` values**: `auto` = proportional (240 = 1 line), `exact` = fixed height, `atLeast` = minimum.
---
## Paragraph Spacing
| Element | Space Before (DXA) | Space After (DXA) |
|---------|-------------------|-------------------|
| Body paragraph | 0 | 120-160 (6-8pt) |
| Heading 1 | 480 (24pt) | 120-240 |
| Heading 2 | 360 (18pt) | 120 |
| Heading 3 | 240 (12pt) | 80-120 |
| List items | 0 | 40-80 (2-4pt) |
| Block quote | 120-240 | 120-240 |
| Table/Figure caption | 240 | 240 |
**Principle**: Space before a heading > space after, so heading visually "belongs to" content below (2:1 or 3:1 ratio).
---
## Page Layout
### Margins by Document Type
| Document Type | Top | Bottom | Left | Right | DXA Values |
|--------------|-----|--------|------|-------|------------|
| **Standard business** | 1 in | 1 in | 1 in | 1 in | 1440 all |
| **Academic (APA/MLA)** | 1 in | 1 in | 1 in | 1 in | 1440 all |
| **Thesis (binding)** | 1 in | 1 in | 1.5 in | 1 in | T/B:1440 L:2160 R:1440 |
| **Chinese 公文** | 37mm | 35mm | 28mm | 26mm | T:2098 B:1984 L:1588 R:1474 |
| **Narrow modern** | 0.75 in | 0.75 in | 0.75 in | 0.75 in | 1080 all |
| **Wide** | 1 in | 1 in | 2 in | 2 in | T/B:1440 L/R:2880 |
### Page Sizes
| Size | Width × Height | DXA Width × Height |
|------|---------------|-------------------|
| US Letter | 8.5 × 11 in | 12240 × 15840 |
| A4 | 210 × 297 mm | 11906 × 16838 |
| Legal | 8.5 × 14 in | 12240 × 20160 |
| A3 | 297 × 420 mm | 16838 × 23811 |
**Rule**: A4 for international audiences, Letter for US-only.
### Page Numbers
| Convention | Placement | Common In |
|-----------|-----------|-----------|
| Bottom center | Footer, centered | Academic, government |
| Bottom right | Footer, right-aligned | Business reports |
| "Page X of Y" | Footer, right-aligned | Contracts, legal |
| Bottom outside | Alternating L/R for odd/even | Books, bound reports |
| Chinese 公文 | Bottom center, format "-X-" | Government documents |
---
## Table Design
### Style Patterns
| Style | Description | When to Use |
|-------|------------|-------------|
| **Three-line (三线表)** | Top rule + header-bottom rule + bottom rule only, no vertical lines | Academic, scientific — gold standard |
| **Banded rows** | Alternating white/light-gray, no borders | Modern corporate |
| **Light grid** | Thin 0.5pt gray borders all cells | Business reports |
| **Header-accent** | Dark/colored header row, no other borders | Modern templates |
| **Full border** | All cells bordered | Financial tables, forms |
### Border Weights (OpenXML `w:sz` in eighths of a point)
| Visual | `Size` value | Points |
|--------|-------------|--------|
| Hairline | 2 | 0.25pt |
| Thin | 4 | 0.5pt |
| Medium | 8 | 1pt |
| Thick | 12 | 1.5pt |
### Cell Padding
- **Minimum**: 0.05 in (28 DXA) — too tight for most uses
- **Recommended**: 0.08-0.1 in (57-72 DXA) top/bottom, 0.1-0.15 in (72-108 DXA) left/right
- **Spacious**: 0.12 in (86 DXA) top/bottom, 0.19 in (137 DXA) left/right
### Header Row Best Practices
- Bold text, optionally SMALL CAPS
- Background: light gray (#F2F2F2) or dark with white text (#2F5496 + white)
- Repeat header row on each page (`w:tblHeader` on `w:trPr`)
- Right-align number columns, left-align text columns
---
## Color Schemes
### Corporate / Business
| Element | Hex | Notes |
|---------|-----|-------|
| Primary heading | #1F3864 | Dark navy, authoritative |
| Secondary heading | #2E75B6 | Medium blue |
| Body text | #333333 | Near-black (softer than #000) |
| Table header bg | #4472C4 | With white #FFFFFF text |
| Alternate row | #F2F2F2 | Subtle gray banding |
| Hyperlink | #0563C1 | Standard blue |
### Academic
All text **#000000** (black). Color only in figures/charts.
### Chinese Government (公文)
| Element | Color |
|---------|-------|
| All body text | Black (required) |
| 红头 agency name | Red #FF0000 |
| 红线 separator | Red #FF0000 |
| 公章 seal | Red |
### Accessibility
- Minimum contrast ratio 4.5:1 for normal text, 3:1 for large text (WCAG AA)
- Never use color as sole means of conveying information
- Ensure distinguishable in grayscale for printed documents
---
## Visual Hierarchy
### Heading Levels by Document Length
| Pages | Recommended Levels |
|-------|-------------------|
| 1-5 (memo, letter) | 1-2 levels |
| 5-20 (report) | 2-3 levels |
| 20-100 (long report) | 3-4 levels |
| 100+ (thesis) | 4-5 levels max |
### Numbering Systems
**Decimal (ISO 2145)** — technical, international:
```
1 → 1.1 → 1.1.1 → 1.1.1.1
```
**Traditional outline (US legal):**
```
I. → A. → 1. → a. → (1) → (a)
```
**Chinese government (公文):**
```
一、(黑体) → (一)(楷体) → 1.(仿宋加粗) → (1)(仿宋)
```
### Typography Emphasis
| Format | Use For | Avoid |
|--------|---------|-------|
| **Bold** | Key terms, headings, emphasis | Entire paragraphs |
| *Italic* | Titles, foreign words, mild emphasis | Long passages (hard to read) |
| Underline | Hyperlinks only (digital) | General emphasis (archaic) |
| SMALL CAPS | Legal defined terms, acronyms | Body text |
| ALL CAPS | Very short headings | Long text (reduces readability 15%) |
**CJK note**: Chinese/Japanese have no true italic. Use bold for emphasis.
### List Formatting
**Bullets** (unordered): `•``○``■` by level
**Numbers** (ordered): `1.``a.``i.` by level
- Indent each level 0.25-0.5 in (360-720 DXA)
- Hanging indent: number hangs, text aligns consistently
- Spacing between items: 2-4pt (less than paragraph spacing)
---
## Quick Reference Defaults
### Business Report (Safe Default)
| Parameter | Value | OpenXML |
|-----------|-------|---------|
| Body font | Calibri 11pt | sz="22", RunFonts Ascii="Calibri" |
| H1 | 18pt Bold Dark Blue | sz="36", Bold, Color="#1F3864" |
| H2 | 14pt Bold Dark Blue | sz="28", Bold |
| H3 | 12pt Bold Dark Blue | sz="24", Bold |
| Line spacing | 1.15 | line="276" lineRule="auto" |
| Para after | 8pt | after="160" |
| Margins | 1 in all | 1440 DXA all |
| Page size | Letter or A4 | 12240×15840 or 11906×16838 |
| Page numbers | Bottom right, 10pt | |
### Academic Paper (APA 7th)
| Parameter | Value | OpenXML |
|-----------|-------|---------|
| Font | Times New Roman 12pt | sz="24" |
| Line spacing | Double | line="480" lineRule="auto" |
| First-line indent | 0.5 in | ind firstLine="720" |
| Margins | 1 in all | 1440 DXA all |
| Page numbers | Top right | Header, right-aligned |
### Chinese Government (公文 GB/T 9704)
| Parameter | Value | OpenXML |
|-----------|-------|---------|
| Body font | 仿宋_GB2312 三号 | sz="32", EastAsia="FangSong_GB2312" |
| Title | 小标宋 二号 centered | sz="44" |
| L1 heading | 黑体 三号 | sz="32", EastAsia="SimHei" |
| L2 heading | 楷体 三号 | sz="32", EastAsia="KaiTi_GB2312" |
| Line spacing | Fixed 28pt | line="560" lineRule="exact" |
| Margins | T:37mm B:35mm L:28mm R:26mm | T:2098 B:1984 L:1588 R:1474 |
| Page size | A4 | 11906×16838 |
| Page numbers | Bottom center, 宋体 四号, "-X-" | sz="28" |
| Chars/line | 28 | |
| Lines/page | 22 | |

View File

@@ -0,0 +1,158 @@
# XSD Validation Guide
## Running Validation
```bash
# Validate against the WML subset schema
dotnet run --project minimax-docx validate input.docx --xsd assets/xsd/wml-subset.xsd
# Validate against business rules (REQUIRED for Scenario C gate-check)
dotnet run --project minimax-docx validate input.docx --xsd assets/xsd/business-rules.xsd
# Validate against both
dotnet run --project minimax-docx validate input.docx --xsd assets/xsd/wml-subset.xsd --xsd assets/xsd/business-rules.xsd
```
---
## What wml-subset.xsd Covers
The subset schema validates the most common WordprocessingML elements:
| Area | Elements Validated |
|------|--------------------|
| Document structure | `w:document`, `w:body`, `w:sectPr` |
| Paragraphs | `w:p`, `w:pPr`, `w:r`, `w:rPr`, `w:t` |
| Tables | `w:tbl`, `w:tblPr`, `w:tblGrid`, `w:tr`, `w:tc` |
| Styles | `w:styles`, `w:style`, `w:docDefaults` |
| Lists | `w:numbering`, `w:abstractNum`, `w:num` |
| Headers/Footers | `w:hdr`, `w:ftr` |
| Track Changes | `w:ins`, `w:del`, `w:rPrChange`, `w:pPrChange` |
| Comments | `w:comment`, `w:commentRangeStart`, `w:commentRangeEnd` |
### What It Does NOT Cover
- DrawingML elements (`a:`, `pic:`, `wp:`) — image/shape internals
- VML elements (`v:`, `o:`) — legacy shapes
- Math elements (`m:`) — equations
- Extended namespaces (`w14`, `w15`, `w16*`) — vendor extensions
- Custom XML data parts
- Relationship and content type validation (structural, not schema-based)
---
## Interpreting Errors
### Element Ordering Error
```
ERROR: Element 'w:jc' is not expected at this position.
Expected: w:spacing, w:ind, w:contextualSpacing, ...
Location: /word/document.xml, line 45
```
**Cause**: Child elements are in wrong order. See `references/openxml_element_order.md`.
**Fix**: Reorder children to match schema sequence.
### Missing Required Element
```
ERROR: Element 'w:tbl' missing required child 'w:tblPr'.
Location: /word/document.xml, line 102
```
**Cause**: A required child element is absent.
**Fix**: Add the missing element. Tables require both `w:tblPr` and `w:tblGrid`.
### Invalid Attribute Value
```
ERROR: Attribute 'w:val' has invalid value 'middle'.
Expected: 'left', 'center', 'right', 'both', 'distribute'
Location: /word/document.xml, line 78
```
**Cause**: An attribute value is not in the allowed enumeration.
**Fix**: Use one of the valid values listed in the error.
### Unexpected Element
```
ERROR: Element 'w:customTag' is not expected.
Location: /word/document.xml, line 200
```
**Cause**: An element not defined in the subset schema. May be a vendor extension.
**Fix**: Check if it's a known extension (w14/w15/w16). If so, it's likely safe. If unknown, investigate or remove.
---
## Business Rules XSD
The `business-rules.xsd` schema enforces project-specific constraints beyond standard OpenXML validity:
| Rule | What It Checks |
|------|---------------|
| Required styles | `Normal`, `Heading1`-`Heading3`, `TableGrid` must exist in `styles.xml` |
| Font consistency | `w:docDefaults` fonts match expected values |
| Margin ranges | Page margins within acceptable range (720-2160 DXA) |
| Page size | Must be A4 or Letter |
| Heading hierarchy | No gaps (e.g., H1 → H3 without H2) |
| Style chain | `w:basedOn` references must resolve to existing styles |
### Extending Business Rules
To add project-specific rules, add `xs:assert` or `xs:restriction` elements:
```xml
<!-- Require minimum 1-inch margins -->
<xs:element name="pgMar">
<xs:complexType>
<xs:attribute name="top" type="xs:integer">
<xs:restriction>
<xs:minInclusive value="1440" />
</xs:restriction>
</xs:attribute>
</xs:complexType>
</xs:element>
```
---
## Gate-Check: Scenario C Hard Gate
In Scenario C (Apply Template), the output document **MUST** pass `business-rules.xsd` validation before delivery:
```
1. Apply template → output.docx
2. Validate → dotnet run ... validate output.docx --xsd business-rules.xsd
3. PASS? → Deliver to user
4. FAIL? → Fix issues, re-validate, repeat until PASS
```
**This is a hard gate.** A document that fails business-rules validation is NOT deliverable, even if it opens correctly in Word.
---
## False Positives
### Vendor Extensions
Elements from extended namespaces (`w14`, `w15`, `w16*`) are not in the subset schema and may trigger warnings:
```
WARNING: Element '{http://schemas.microsoft.com/office/word/2010/wordml}shadow' is not expected.
```
These are generally safe to ignore — they are Microsoft extensions for newer features (e.g., advanced text effects, comment extensions).
### Markup Compatibility
Documents may contain `mc:AlternateContent` blocks with fallback content. The subset schema may not recognize the `mc:` namespace processing. These are safe if the document opens correctly in Word.
### Recommended Approach
1. Run validation
2. Treat **errors** as must-fix
3. Review **warnings** — ignore known vendor extensions, investigate unknown elements
4. After fixing errors, re-validate to confirm