Get More From Every Token

We want you experimenting as much as possible. This guide exists so those experiments land on the first try instead of the third.

What's a token?

The unit AI uses to read and write. Here's the scale.

65
tokens
One Slack message
650
tokens
One-page doc
6.5K
tokens
10-page strategy doc
200K
tokens
Claude's full capacity
What you send
Input Tokens
Your prompt, docs, history
$
What AI generates
Output Tokens
The response it writes back
$$$$$
Output costs 3-5x more than input. The single biggest lever is controlling how much you ask the model to generate.

Your conversation has a physical size limit

Everything has to fit in one window. When it's full, old context drops out.

Message 15 of a long conversation

Chat history (msgs 1-14)
Docs you pasted
Your question
AI's answer
Free
History (grows every turn)
Documents
Your prompt
AI response

By message 20, history dominates the window. Every new question re-reads messages 1-19. Starting a fresh conversation reclaims all that space instantly.

Where the tokens actually go

Ranked by cost, highest first

1
Conversation history accumulating
By msg 20, each new reply costs 40K+ tokens just to re-read the thread
~80K/msg
2
Full documents pasted for a narrow question
15 pages of context when you need one paragraph from page 7
~9K wasted
3
Unbounded output requests
"Write me a comprehensive overview" produces 3K words when 800 would serve
3x output
4
Regenerating without new direction
"Try again" pays full cost each time. Three attempts = 4x the price of one.
4x multiplier
5
Multi-step tasks bundled in one prompt
Can't redirect after step 2 goes wrong without losing steps 3-5
variable
6
Targeted follow-ups on existing output
"Make bullet 3 more specific to Commerce Cloud" = minimal cost
~200 tokens

Four moves that change the math

Each solves a different problem. Use all four.

Move 1: Constrain the output
Tell it how much to write
"Give me this as 3 bullets, max 20 words each."
Cuts output tokens by 60-80%. You can always ask for more. You can't un-generate 2,000 words you didn't need.
Move 2: Scope the input
Paste only what's relevant
"Here's section 3 of our positioning doc [paste]. How does the 'why now' hold up against Adobe's latest?"
Saves 5-10K input tokens per message. The model reads everything you paste on every turn, whether it needs to or not.
Move 3: Edit, don't regenerate
Steer what exists instead of starting over
"Keep paragraphs 1 and 3. Rewrite paragraph 2 to lead with the customer outcome."
150 tokens of targeted output vs. 800+ for a full rewrite. The AI keeps what's working and fixes what isn't.
Move 4: Reset the window
Start fresh when history accumulates
"I'm working on X. Here's my current draft [paste]. I need help tightening the competitive section."
At message 10-12, a new conversation with a 3-sentence summary costs 80% less per message than continuing the old one.

Pick the right model for the task

Bigger is not always better. Match the tool to the job.

Haiku
Lowest cost
Reformatting, data extraction, simple Q&A, brainstorm lists
Cost per token
Sonnet
Best daily driver
Drafting, editing, analysis, planning. Handles 90% of PMM work.
Cost per token
Opus
5x Sonnet
Complex reasoning, nuanced competitive messaging, multi-step strategic analysis
Cost per token

What real work costs

Calibrate your intuition against actual numbers

Status report formatting
~550
Paste raw bullets, get polished email. Cheapest pattern there is.
Strategy doc synthesis
~8.6K
Paste only the relevant sections. Full doc = 5K tokens of dead weight per turn.
Long drafting session (msg 15+)
~41K/msg
Even "make it punchier" costs 41K at this point. Fresh start = 3K.
Open-ended exploration
varies
Ask for a short first pass. Pick the thread that matters. Then go deep.
30s
Spend them figuring out which mode you're in.
Exploring
Ask for the plan. Don't constrain.
"Here's my situation [context]. What are my options? What am I not seeing?"
Prevents 2,000 tokens spent on a polished answer to the wrong question.
Executing
You know what you need. Apply the four questions.
Who is this for?
What do I need back?
Why does it exist?
How should it be shaped?
"Write a 200-word Slack post [what] for the Cloud CMOs [who] explaining why we're pausing the v2 build [why]. Direct, no preamble [how]."

Exploring is cheap. Executing is where precision pays off. The 30 seconds is knowing which one you're doing.