K

KeyAudit

· ·infrastructure·audit-finding

Opus 4.8 Review: Math Gains, Creativity Flat, Token Drain Raises Concerns

Anthropic's Claude Opus 4.8 shows clear improvement in math, correctly solving a complex degree-19 polynomial problem that stumped version 4.7. In coding, it produced a polished typing-zombie game, but a single prompt consumed the entire Pro token quota, making it impractical for larger projects without a Max plan or heavy API spending. Creative writing remained largely unchanged from 4.7, with descriptive but less fluid prose compared to competitors like MiMo v2.5. Logic and common sense handling was solid, correctly flagging a linguistic trap about marrying one's widow's sister. However, non-math reasoning faltered, constructing an elaborate but wrong case in a whodunit mystery. The model's safety reflexes caused it to refuse to report correctly identified injection needles in a long-context test. Overall, Opus 4.8 excels in math and coding but shows stagnation in creative tasks and raises token cost concerns.

Key facts

  • Opus 4.8 solves complex math problem that version 4.7 failed.
  • Single coding prompt drains entire Pro token quota.
  • Creative writing shows no improvement over Opus 4.7.
  • Correctly identifies linguistic trap in logic test.
  • Safety reflex prevents reporting findings in long-context test.

← Back to list