Module 3. Counting, Grouping, and Summarizing: Aggregation in Cypher
Course 5. Querying Meaning: Writing Cypher for Insight
Estimated Time: ?? minutes
đź§ Module Objectives
- Use
COUNT,COLLECT,DISTINCT, and basic numeric functions to summarize results. - Group query outputs to reveal patterns and frequencies.
- Combine quantitative counts with qualitative interpretation.
- Identify how aggregation helps move from "lists of data" to "statements of meaning."
- Apply aggregation queries to the Wellespring dataset to explore creative patterns.
Why Aggregate?
So far, your queries have returned individual results: each song, each person, each relationship. But often the bigger picture matters more than the pieces.
You might ask :
- How many songs did Jesse Welles write per album?
- Which themes occur most often?
- How many collaborators connect to a single song?
Aggregation lets you step back, turning detailed data into patterns that speak.
The COUNT Function
COUNT() tallies how many results match a pattern.
Example – Count all Song nodes:MATCH (s:Song)
RETURN COUNT(s) AS totalSongs;
Result → A single number representing all songs in your graph.
Another Example – Count Songs Per Album:
In the Wellespring database, songs are connected to albums through recordings:(Song)<-[:PERFORMANCE_OF]-(Recording)-[:RELEASED_ON]->(Album)
So, to count songs per album, we need that pattern:MATCH (a:Album)<-[:RELEASED_ON]-(r:Recording)-[:PERFORMANCE_OF]->(s:Song)
RETURN a.title, COUNT(DISTINCT s) AS songCount
ORDER BY songCount DESC;
Interpretation → Shows which albums have the highest number of songs. The numbers become evidence for creative productivity or focus.
The DISTINCT Keyword
Sometimes the same item appears multiple times in query results. Use DISTINCT to ensure each is counted only once.
Example – How many unique Themes does each album express?MATCH (a:Album)<-[:RELEASED_ON]-(r:Recording)-[:PERFORMANCE_OF]->(s:Song)-[:EVOKES_THEME]->(t:Theme)
RETURN a.title, COUNT(DISTINCT t) AS themeCount
ORDER BY themeCount DESC;
Without DISTINCT, a theme linked to multiple songs on the same album would be counted multiple times. This query tells you which albums explore the widest thematic range (though, right now, our dataset only has one album fully entered).
Grouping with COLLECT
COLLECT() gathers related items into a list, like bundling multiple song titles under each album.
Example – List Songs Per AlbumMATCH (a:Album)<-[:RELEASED_ON]-(r:Recording)-[:PERFORMANCE_OF]->(s:Song)
RETURN a.title, COLLECT(DISTINCT s.title) AS songs
ORDER BY a.title;
Result → A tidy table where each album has a list of its songs.
You can chain this with COUNT for richer summaries:MATCH (a:Album)<-[:RELEASED_ON]-(r:Recording)-[:PERFORMANCE_OF]->(s:Song)
RETURN
a.title,
COUNT(DISTINCT s) AS numSongs,
COLLECT(DISTINCT s.title) AS songs
ORDER BY numSongs DESC;
Now each album tells a story: quantity + content.
Other Aggregation Functions
Cypher offers many built-in functions for summarizing numeric or date data:
In this dataset:
- Albums have
releaseDatestrings inYYYY-MM-DDformat. - Recordings have
durationSecondswhere known.
You can use these in aggregation:
| Function | Purpose | Example (Wellespring) |
|---|---|---|
MAX() |
Highest value | MAX(a.releaseDate) → latest album release date |
MIN() |
Lowest value | MIN(a.releaseDate) → earliest album release date |
AVG() |
Average value | AVG(r.durationSeconds) → mean recording length (in seconds) |
SUM() |
Total of values | SUM(r.durationSeconds) → total playing time for a set of recordings (e.g., an album) |
Remember: in Cypher, grouping is implied whenever you RETURN a mix of aggregated and non-aggregated expressions.
Example – Find Earliest and Latest Album Releases:MATCH (a:Album)
RETURN
MIN(a.releaseDate) AS firstRelease,
MAX(a.releaseDate) AS latestRelease;
Because releaseDate is stored in a YYYY-MM-DD format, string comparison works reliably for earliest/latest.
Wellespring Use Cases
Now let’s connect aggregation more directly to the Wellespring dataset and Jesse's creative world.
Example A – Most Common ThemesMATCH (s:Song)-[:EVOKES_THEME]->(t:Theme)
RETURN t.name, COUNT(DISTINCT s) AS numSongs
ORDER BY numSongs DESC
LIMIT 5;
Interpretation → Reveals the top recurring ideas across the songs in this sample corpus: Are themes like “Social Critique” or “Fear and Uncertainty” especially prominent?
Example B – Performances Per Person (Live Events)
In the dataset, artists perform at events, and people are linked to artists:(Person)-[:PERFORMS_AS|:MEMBER_OF]->(Artist)-[:PERFORMED_AT]->(Event)
To count how many events each person has performed at:
MATCH (p:Person)-[:PERFORMS_AS|:MEMBER_OF]->(a:Artist)-[:PERFORMED_AT]->(e:Event)
RETURN p.name, COUNT(DISTINCT e) AS eventCount
ORDER BY eventCount DESC;
Interpretation → Who shows up most often in this small performance network? Which names anchor the live-event side of Jesse’s world?
If you prefer to look at artists instead of people:MATCH (a:Artist)-[:PERFORMED_AT]->(e:Event)
RETURN a.name, COUNT(DISTINCT e) AS eventCount
ORDER BY eventCount DESC;
Example C – Themes Over Time (First and Latest Appearance)
Instead of computing an average year (which requires extra date manipulation), we can ask: When do themes first appear, and how recently do they show up?
MATCH (s:Song)-[:EVOKES_THEME]->(t:Theme)
MATCH (s)<-[:PERFORMANCE_OF]-(r:Recording)-[:RELEASED_ON]->(a:Album)
RETURN
t.name,
MIN(a.releaseDate) AS firstAppearance,
MAX(a.releaseDate) AS latestAppearance
ORDER BY firstAppearance;
Interpretation →
firstAppearancehints at when a theme enters the recorded corpus.latestAppearanceshows whether a theme is still active or has faded.
This suggests possible temporal evolution: some themes may cluster in certain periods of Jesse's career.
Interpreting the Numbers
Quantification in the humanities is always provisional. Counting and grouping don't replace interpretation: they amplify it.
Ask :
- What do these patterns suggest about creative emphasis or change?
- Which results surprise you?
- What might the numbers miss?
For the Wellespring Project:
- A high count of “Critique of War” or “Social Critique” themes might reflect a social-justice phase.
- Repeated appearances of “Search for Meaning” or “Existentialism” could signal ongoing philosophical concerns rather than a one-off topic.
The data starts the conversation; it doesn’t finish it.
Try It Yourself
Remember: SHIFT + ENTER for new lines; CMD/CTRL + ENTER to run.
- How many people have collaborated with Jesse Wells (via recordings)? Here we treat a "collaborator" as any person who has performed on the same recording as Jesse:
MATCH (jesse:Person {uid: "pers-jesse-wells"})
MATCH (jesse)-[:PERFORMED_ON]->(rec:Recording)<-[:PERFORMED_ON]-(other:Person)
WHERE other <> jesse
RETURN COUNT(DISTINCT other) AS collaborators;
Questions to consider:
- How many collaborators does this dataset show?
- Which recordings seem to be the most collaborative?
If you’d like to see who they are:MATCH (jesse:Person {uid: "pers-jesse-wells"})
MATCH (jesse)-[:PERFORMED_ON]->(rec:Recording)<-[:PERFORMED_ON]-(other:Person)
WHERE other <> jesse
RETURN other.name AS collaborator, COUNT(DISTINCT rec) AS sharedRecordings
ORDER BY sharedRecordings DESC;
- Which albums contain the largest number of songs?
MATCH (a:Album)<-[:RELEASED_ON]-(r:Recording)-[:PERFORMANCE_OF]->(s:Song)
RETURN a.title, COUNT(DISTINCT s) AS songCount
ORDER BY songCount DESC;
- Do some albums cluster more songs in this sample than others?
- What might account for this result?
- List each Theme and its associated Songs
MATCH (s:Song)-[:EVOKES_THEME]->(t:Theme)
RETURN t.name, COLLECT(DISTINCT s.title) AS songs
ORDER BY SIZE(songs) DESC;
- Which themes are connected to the broadest range of songs?
- Which themes are more narrowly focused?
You might try filtering further, e.g. themes only connected to songs from the album Middle, or themes that appear across multiple albums.
Key Takeaways
- Aggregation condenses detail into interpretable patterns.
COUNTtallies;COLLECTgroups;DISTINCTremoves duplicates.- Combining counts with lists reveals both breadth (how much) and depth (which specific items).
- Numeric functions (
MIN,MAX,AVG,SUM) add temporal and quantitative insight. - Interpreting the meaning of those patterns remains a human task, central to the Wellespring Project's goals.