Get More From Every Token

We want you experimenting as much as possible. This guide exists so those experiments land on the first try instead of the third.

What's a token?

The unit AI uses to read and write. Here's the scale.

tokens

One Slack message

650

tokens

One-page doc

6.5K

tokens

10-page strategy doc

200K

tokens

Claude's full capacity

What you send

Input Tokens

Your prompt, docs, history

→

What AI generates

Output Tokens

The response it writes back

$$$$$

Output costs 3-5x more than input. The single biggest lever is controlling how much you ask the model to generate.

Your conversation has a physical size limit

Everything has to fit in one window. When it's full, old context drops out.

Message 15 of a long conversation

Chat history (msgs 1-14)

Docs you pasted

Your question

AI's answer

Free

History (grows every turn)

Documents

Your prompt

AI response

By message 20, history dominates the window. Every new question re-reads messages 1-19. Starting a fresh conversation reclaims all that space instantly.

Where the tokens actually go

Ranked by cost, highest first

Conversation history accumulating

By msg 20, each new reply costs 40K+ tokens just to re-read the thread

~80K/msg

Full documents pasted for a narrow question

15 pages of context when you need one paragraph from page 7

~9K wasted

Unbounded output requests

"Write me a comprehensive overview" produces 3K words when 800 would serve

3x output

Regenerating without new direction

"Try again" pays full cost each time. Three attempts = 4x the price of one.

4x multiplier

Multi-step tasks bundled in one prompt

Can't redirect after step 2 goes wrong without losing steps 3-5

variable

Targeted follow-ups on existing output

"Make bullet 3 more specific to Commerce Cloud" = minimal cost

~200 tokens

Four moves that change the math

Each solves a different problem. Use all four.

Move 1: Constrain the output

Tell it how much to write

"Give me this as 3 bullets, max 20 words each."

Cuts output tokens by 60-80%. You can always ask for more. You can't un-generate 2,000 words you didn't need.

Move 2: Scope the input

Paste only what's relevant

"Here's section 3 of our positioning doc [paste]. How does the 'why now' hold up against Adobe's latest?"

Saves 5-10K input tokens per message. The model reads everything you paste on every turn, whether it needs to or not.

Move 3: Edit, don't regenerate

Steer what exists instead of starting over

"Keep paragraphs 1 and 3. Rewrite paragraph 2 to lead with the customer outcome."

150 tokens of targeted output vs. 800+ for a full rewrite. The AI keeps what's working and fixes what isn't.

Move 4: Reset the window

Start fresh when history accumulates

"I'm working on X. Here's my current draft [paste]. I need help tightening the competitive section."

At message 10-12, a new conversation with a 3-sentence summary costs 80% less per message than continuing the old one.

Pick the right model for the task

Bigger is not always better. Match the tool to the job.

Haiku

Lowest cost

Reformatting, data extraction, simple Q&A, brainstorm lists

Cost per token

Sonnet

Best daily driver

Drafting, editing, analysis, planning. Handles 90% of PMM work.

Cost per token

Opus

5x Sonnet

Complex reasoning, nuanced competitive messaging, multi-step strategic analysis

Cost per token

What real work costs

Calibrate your intuition against actual numbers

Status report formatting

~550

Paste raw bullets, get polished email. Cheapest pattern there is.

Strategy doc synthesis

~8.6K

Paste only the relevant sections. Full doc = 5K tokens of dead weight per turn.

Long drafting session (msg 15+)

~41K/msg

Even "make it punchier" costs 41K at this point. Fresh start = 3K.

Open-ended exploration

varies

Ask for a short first pass. Pick the thread that matters. Then go deep.

30s

Spend them figuring out which mode you're in.

Exploring

Ask for the plan. Don't constrain.

"Here's my situation [context]. What are my options? What am I not seeing?"

Prevents 2,000 tokens spent on a polished answer to the wrong question.

Executing

You know what you need. Apply the four questions.

Who is this for?

What do I need back?

Why does it exist?

How should it be shaped?

"Write a 200-word Slack post [what] for the Cloud CMOs [who] explaining why we're pausing the v2 build [why]. Direct, no preamble [how]."

Exploring is cheap. Executing is where precision pays off. The 30 seconds is knowing which one you're doing.