Back to home

Other Codes project

Methods and data

How the current marks are collected, prepared, measured and compared.

Photo to form.

Other Codes studies human-made marks as measurable forms. The current collection uses photographed marks, and the same method can be used for glyphs, folk marks, cave art, doodles, signs and inscriptions.

The pipeline is practical: collect photographs, separate the mark from the surface, turn the mark into a clean vector, extract features, and compare marks with plots and clustering.

InputPhotos, source records, location information and dataset metadata.
SegmentationA mask separates the mark from the surrounding surface.
VectorisationThe mask is traced into SVG paths that preserve the mark as a scalable form.
Feature extractionShape, topology, stroke, texture, colour and metadata features are measured.
AnalysisPCA, clustering and dendrograms show relationships between marks.
PresentationPlots, downloads and written notes are published on other.codes.

Finding the mark.

Each photograph contains at least two things: a mark and a surface. Segmentation separates the mark from the background.

The annotator uses interactive segmentation. A user can mark examples of foreground and background, and a classifier learns the visual difference between the mark and its substrate. The approach is inspired by tools such as ilastik and Trainable Weka Segmentation, where machine learning extends careful human judgement.

Shape, stroke, path.

Once a mask exists, the mark can be measured. Some features describe the outer form: aspect ratio, compactness, fill and solidity. Others describe the internal structure: skeleton length, stroke width, endpoints, branch points and loops.

Vector features describe the traced SVG paths, including path complexity and closed forms. Later versions can add texture, colour, medium and metadata features.

Maps of similarity.

After features are measured, marks can be compared. PCA maps show major directions of variation. Dendrograms group marks into families based on measured similarity.

The numbers make visual relationships easier to inspect, so style, similarity, difference and possible influence can be studied more carefully.

View current results

Current marks.

The current dataset is a collection of photographed human-made marks. Each record can hold the source image, mask, vector form, measurements and metadata where available.

The collection is visually diverse and strongly stylised, built for segmentation, vectorisation, feature extraction, semantic extraction and comparative analysis.

Mark collections.

Human symbol collectionsPublished or public-domain collections of symbols, pictograms, marks, glyphs and graphic signs.
Cave and rock artCurated public-domain images of early marks and symbolic forms, with cultural and archaeological context preserved wherever possible.
Folk marks and ornamentTraditional marks, decorative forms, protective signs, craft patterns and repeated visual motifs.
Inscriptions, doodles and signsEveryday marks that sit between writing, symbol, drawing and gesture.

Careful collections.

  • Prefer public-domain or permissively licensed datasets.
  • Keep source, place, date and context metadata where available.
  • Exclude identifying photos containing faces, licence plates or private information.
  • Exclude hate symbols, slurs, targeted threats and prejudiced marks.
  • Keep cultural context attached where it is available.

Suggest a dataset