Other Codes project
Methods and data
How the current marks are collected, prepared, measured and compared.
Overview
Photo to form.
Other Codes studies human-made marks as measurable forms. The current collection uses photographed marks, and the same method can be used for glyphs, folk marks, cave art, doodles, signs and inscriptions.
The pipeline is practical: collect photographs, separate the mark from the surface, turn the mark into a clean vector, extract features, and compare marks with plots and clustering.
Segmentation
Finding the mark.
Each photograph contains at least two things: a mark and a surface. Segmentation separates the mark from the background.
The annotator uses interactive segmentation. A user can mark examples of foreground and background, and a classifier learns the visual difference between the mark and its substrate. The approach is inspired by tools such as ilastik and Trainable Weka Segmentation, where machine learning extends careful human judgement.
Measure
Shape, stroke, path.
Once a mask exists, the mark can be measured. Some features describe the outer form: aspect ratio, compactness, fill and solidity. Others describe the internal structure: skeleton length, stroke width, endpoints, branch points and loops.
Vector features describe the traced SVG paths, including path complexity and closed forms. Later versions can add texture, colour, medium and metadata features.
Analysis
Maps of similarity.
After features are measured, marks can be compared. PCA maps show major directions of variation. Dendrograms group marks into families based on measured similarity.
The numbers make visual relationships easier to inspect, so style, similarity, difference and possible influence can be studied more carefully.
Current dataset
Current marks.
The current dataset is a collection of photographed human-made marks. Each record can hold the source image, mask, vector form, measurements and metadata where available.
The collection is visually diverse and strongly stylised, built for segmentation, vectorisation, feature extraction, semantic extraction and comparative analysis.
Future data
Mark collections.
Principles
Careful collections.
- Prefer public-domain or permissively licensed datasets.
- Keep source, place, date and context metadata where available.
- Exclude identifying photos containing faces, licence plates or private information.
- Exclude hate symbols, slurs, targeted threats and prejudiced marks.
- Keep cultural context attached where it is available.