# other.codes — PCA Interpretation

Principal Component Analysis reduces the 17 morphological measurements to a small number of axes that capture the major ways marks differ from one another. This document describes what each axis represents.

**Variance explained:** PC1 = 45.7%,  PC2 = 21.8%,  total (first 2 PCs) = 67.5%

---

## PC1 — 45.7% of variance

### Features that drive PC1 higher (positive loadings)

- **skeleton_to_perimeter** (+0.357) — path efficiency — internal skeleton vs outer edge
- **skeleton_density** (+0.353) — total path length relative to mark size
- **n_loops_est** (+0.345) — estimated number of enclosed areas in the stroke skeleton
- **n_components** (+0.333) — number of disconnected strokes or parts
- **stroke_width_cv** (+0.301) — consistency of stroke width (low = even, high = variable)

### Features that drive PC1 lower (negative loadings)

- **euler_number** (-0.306) — number of topological holes (O, A, 4 have holes; I, L don't)
- **stroke_width_norm** (-0.177) — typical stroke thickness relative to mark size
- **compactness** (-0.177) — overall circularity of the form
- **endpoint_branch_ratio** (-0.106) — flowing/cursive structure vs complex intersecting structure
- **svg_closed_ratio** (+0.000) — proportion of closed (loop) shapes in the vector trace

### Intuitive interpretation of PC1

PC1 separates marks by **structural complexity and stroke variability**. Marks at the high end tend to have more branching, more enclosed loops, highly variable stroke widths, and complex vector paths — suggesting an elaborate, multi-element style with thick and thin contrast. Marks at the low end are simpler: fewer branches, thinner and more uniform strokes, and fewer closed shapes — suggesting a leaner, more linear hand.

## PC2 — 21.8% of variance

### Features that drive PC2 higher (positive loadings)

- **perimeter_norm** (+0.469) — total outline length relative to mark size (jaggedness proxy)
- **endpoint_branch_ratio** (+0.257) — flowing/cursive structure vs complex intersecting structure
- **skeleton_to_area** (+0.205) — how thin vs fat the strokes are overall
- **skeleton_density** (+0.126) — total path length relative to mark size
- **euler_number** (+0.108) — number of topological holes (O, A, 4 have holes; I, L don't)

### Features that drive PC2 lower (negative loadings)

- **stroke_width_norm** (-0.447) — typical stroke thickness relative to mark size
- **compactness** (-0.445) — overall circularity of the form
- **aspect_ratio** (-0.358) — mark width-to-height ratio (wide vs tall)
- **fill_density** (-0.267) — how densely the mark fills its bounding box
- **solidity** (-0.171) — solidity of the largest component vs its convex hull

### Intuitive interpretation of PC2

PC2 separates marks by **overall form and proportions**. Marks at the high end tend to be wider relative to their height, with a more compact and solid filled form. Marks at the low end are taller, more elongated, and more eccentrically shaped — suggesting upright, vertical letterforms vs squat, horizontal ones. Skeleton-to-area and fill density also load here, reflecting how much of the bounding box is actually ink.

---

*Generated automatically by `pipeline/cluster.py`. Interpretations are based on the loadings of the current dataset and will update each time the pipeline is re-run with new data.*
