Module 2. Kinds of Data: Texts, Numbers, Images, and Beyond
Course 2: What is Data? Understanding the Building Blocks of Knowledge
Estimated Time: 30 minutes
🧭 Module Objectives
- Identify different kinds and formats of data (quantitative, qualitative, structured, unstructured, multimodal).
- Recognize examples of data in various humanities disciplines.
- Describe how form and medium influence what can be known from data.
- Understand the concept of metadata and its importance for context and interpretation.
Not All Data Look Alike
When many people hear "data," they picture numbers in rows and columns. But in the humanities, much of our data are words, images, sounds, objects, or even gestures. Every form of data captures a different dimension of reality and requires a different way of describing, storing, and analyzing it.
At the most general level, we can divide data into two families:
| Type | Description | Example |
|---|---|---|
| Quantitative Data | Data that can be measured and expressed numerically. |
Census statistics, word counts, catalog numbers, GPS coordinates. |
| Qualitative Data | Data that describe qualities, meanings, or experiences. |
Interview transcripts, artworks, diary entries, song lyrics. |
Humanists often move between both: we might count how many times a word appears in a text (quantitative) and then interpret what that repetition means (qualitative).
Common Data Forms in the Humanities
Let's look at several data types and media you're likely to encounter:
| Category | Description | Humanities Example |
Common File Formats |
|---|---|---|---|
| Textual Data |
Words and writing in any language or medium |
Literary works, historical documents, transcribed lyrics |
TXT, DOCX, XML, TEI, JSON |
| Numeric Data |
Numbers used for counting, measuring, or ordering |
Census records, catalog IDs, budgets, artifact measurements |
CSV, XLSX, SQL tables |
| Visual Data |
Images, videos, and visual recordings |
Paintings, photographs, maps, architectural drawings |
JPG, PNG, TIFF, MP4 |
| Audio Data |
Recorded sound, music, or speech |
Oral histories, field recordings, song archives |
WAV, MP3, FLAC |
| Spatial Data |
Information about location, distance, or area |
Archaeological sites, trade routes, cultural landscapes |
GIS, GeoJSON, KML |
| Relational Data |
Connections between entities or ideas. |
Letters between correspondents, networks of influence |
Database: SQL, Neo4j |
These data types often overlap. A digitized manuscript, for example, contains image data (scanned page), text data (transcription), and metadata (who, where, when, why).
Metadata: The Data About Data
Metadata are the contextual details that describe and give meaning to a dataset. They answer questions like:
- Who created this?
- When and where was it made?
- What is it about?
- How is it related to other materials?
Metadata can be technical (file size, camera type, GPS coordinates) or descriptive (artist, subject, cultural context). For the humanities, metadata are often the most human part of the data, reflecting judgment, vocabulary, and historical understanding.
📘 Example: A song like "War Isn't Murder" by Jesse Welles has:
- Audio data (the recording)
- Textual data (the lyrics)
- Visual data (the music video)
- Metadata (title, date, performer, recording location, theme, tags)
Together, these make up a rich and layered object of study.
Structured vs. Unstructured Data
| Type | Description | Example | Typical Use |
|---|---|---|---|
| Structured Data | Organized into fixed fields and formats, easy for computers to read. |
A library catalog record, a museum database. |
Sorting, searching, analysis. |
| Unstructured Data | Free-form, variable, human-readable but harder to process. |
A scanned letter, a journal entry, a video interview. |
Interpretation, qualitative analysis. |
Many humanities sources are unstructured, which is why digital humanists spend time cleaning, encoding, and modeling them, turning messy realities into data that can be explored and linked while preserving nuance.
Beyond the Binary: Multimodal and Affective Data
Human culture rarely fits neatly into boxes.
- A performance may include movement, sound, and emotion.
- A social media post combines text, emoji, hashtags, and video.
- A museum artifact carries tactile, material, and symbolic dimensions.
Humanists increasingly work with multimodal data, asking not only what it shows but how it feels. This "affective turn" recognizes that data can evoke empathy, awe, or outrage: qualities not easily reduced to numbers, but essential for understanding human experience.
Key Takeaways
- Data exist in many forms: textual, visual, auditory, spatial, relational.
- Metadata adds essential context and interpretive depth.
- Structured and unstructured data require different approaches.
- Humanities data are often multimodal and meaning-rich.
- Recognizing form is the first step toward responsible analysis.
Knowledge Check & Reflection
Suggested Readings & Resources
A LOT has been written about data over the past few decades, exploring the term from a range of different perspectives. We cannot provide a comprehensive bibliography on the subject but here are some particularly relevant sources for further reading:
- Badman, Annie, and Matthew Kosinski. "What Is Data?" IBM Think, 2024.
- Drucker, Johanna. "Humanities Approaches to Graphical Display." Digital Humanities Quarterly 5 (2011).
- Drucker, Johanna. "Data Modeling and Use." In The Digital Humanities Coursebook: An Introduction to Digital Methods for Research and Scholarship. Routledge, 2021.
- Lavin, Matthew. "Why Digital Humanists Should Emphasize Situated Data over Capta." Digital Humanities Quarterly 15 (2021).
- Owens, Trevor. "Defining Data for Humanists: Text, Artifact, Information or Evidence?" Journal of Digital Humanities 1 (2011).