Skip to content
Scoring

A lyric score that actually means something

Every lyric is measured across 12 metrics in Craft, Expression, and Impact. Scores are deliberately hard. A 50 is average. An 80+ is strong. A 90+ is rare.

12 metricsEvidence-based scoringAnti-inflation built in
12 metrics across Craft, Expression, and Impact
Weighted composite: Expression counts most (40%)
Anti-inflation rules prevent meaningless high scores
Every score includes per-metric reasoning and evidence
Deliberately hard — a 50 is average, not a failing grade

Craft (25%)

Can this person write? Mechanics, structure, rhyme, and word choice.

Expression (40%)

Does it say something worth hearing? Specificity, originality, truth, and voice.

Impact (35%)

Will anyone remember it tomorrow? Transcendence, arc, stickiness, and genre fit.

Sample scorecard

What an actual evaluation looks like — annotated.

Composite
78
Grade B+
Genre
Country
Top 22% in genre
Prosody74

Strong natural rhythm, one forced rhyme in V2

Structure80

Clean arc, bridge earns its place

Rhyme72

Good slant rhyme use, one predictable end-rhyme

Economy76

Tight overall, two filler words in chorus

Specificity85

"Tangerines and someone else's smile" — earned

Imagery82

Original governing image, one stock metaphor

Emotion79

Rings true. Bridge vulnerability is genuine.

Voice77

Consistent narrator, one POV slip in V3

Transcendence81

Line 14 is the one. "Drove home with the windows down to forget it."

Arc75

Moves from avoidance to acceptance. Could push further.

Memorable73

Chorus hook is sticky, verses less so

Genre80

Authentic country with modern specificity

Transcendent Line

“And drove home with the windows down to forget it”

Marked by 3 of 8 panel voices. Physical action carrying unspoken grief.

The 12 Metrics

5

Lyrical Specificity

What we measure

Concrete imagery, sensory detail, proper nouns, time anchors. The opposite of abstract generalities.

What good looks like

The song lives in a real place with real objects. "Tangerines and someone else's smile" instead of "memories of you."

6

Imagery Originality

What we measure

Fresh metaphors, defamiliarized objects, governing images that haven't been written to death.

What good looks like

Images that surprise on first read and deepen on second. No shattered hearts, no oceans of tears, no wings of freedom.

7

Emotional Truth

What we measure

The ring-test: does it feel true? Earned emotion, unforced vulnerability, no borrowed sentiment.

What good looks like

The emotion arrives through specificity and honesty, not through telling the listener what to feel.

8

Voice & POV Integrity

What we measure

Narrator consistency, perspective clarity, and a credible speaker. Does this sound like one person talking?

What good looks like

A distinct human presence. Word choices, diction, and references that belong to one coherent narrator.

Why scores are hard to game

We built anti-inflation into the scoring system so that high scores actually mean something.

Gravity Rule

The default is 50, not 80. Every point above average must be earned with specific evidence from the lyrics.

Burden of Proof

Scores above 80 require the scorer to cite specific lines and explain why they justify the number.

Antagonist Ceiling

A dedicated critical voice challenges every score. If it finds a real weakness, the score drops.

Historical Context

Scores are anchored to professional craft standards. A 90+ means near-flawless execution across all 12 metrics — intentionally rare.

Methodology: how scoring works

Every song is scored by a separate AI evaluation pass — not the same model that wrote the lyrics. Multiple evaluators with different perspectives must reach consensus on each of the 12 metrics.

A dedicated critical voice challenges every score. If it identifies a real weakness — a cliché, a broken meter, a forced rhyme — the score drops. Unresolved objections cap the composite depending on severity.

This rigorous multi-voice process prevents the inflated scores that single-pass AI evaluation produces. Scores are calibrated relative to professional songwriting craft, not to other AI output.

What “deliberately hard” means: a single-pass AI scorer will give most output 80+. Our multi-voice process produces a distribution centered around 50, because the default assumption is “average until proven otherwise.” Scores above 80 require the scorer to cite specific lines. Scores above 90 require near-flawless execution across all 12 metrics — which is why they are rare in practice, not by arbitrary design.

Grade Scale

S+
95-100

Near-flawless across all 12 metrics. Exceptionally rare in practice.

S
90-94

Exceptional. Every line earns its place with cited evidence.

A+
85-89

Outstanding. Minor imperfections only.

A
80-84

Strong. Craft is evident throughout.

B+
75-79

Good. Solid work with room to grow.

B
70-74

Competent. Foundation is there.

C+
65-69

Developing. Moments of promise.

C
55-64

Average. Functional but unremarkable.

D
40-54

Below average. Significant gaps.

F
0-39

Needs fundamental rework.

How the composite score works

Each metric scores 0-100. The composite is a weighted average across the three tiers:

Craft (25%)+Expression (40%)+Impact (35%)=Composite

What a score should help you do

Spot the exact weakness in a lyric
Decide whether to refine or move on
Compare multiple versions objectively
Know whether a song is worth taking into production

See it in action

Every song you forge or evaluate gets a full 12-metric breakdown with reasoning per metric.