LLM Lab 2: Token Budget Blitz

The game: stay under budget

Your prompt and the model’s reply must both fit inside the context window. Overflow means older tokens drop off.

You have a 4k-token window. You send 1k tokens of instructions and paste a 2.5k-token PDF. How many tokens remain for the answer?

Follow the steps and watch each checkpoint light up as you progress.

System + user messages consume tokens immediately.

Each generated token also counts against the same window.

If the window is full, the oldest tokens are discarded first.

Dropped tokens mean the model literally cannot see that part anymore.

Paste a long email draft into your chat model and ask for a summary.

Now trim the prompt to one sentence of instructions and the email body.

Compare how much space the model leaves for its response.