← Catalogue Modern Skills 200 level Created by AI

Modern Skills

Working with AI: Prompting & Large Language Models

Name: Working with AI: Prompting & Large Language Models
Availability: InStock
Author: Sikh Archive

Professor: Sikh Archive Source: Sikh Archive

A practical, vendor-agnostic guide to getting reliable, useful work out of large language models. You will build an intuition for how these systems generate text, learn to write prompts that clearly communicate task, context, constraints, and format, and pick up core techniques such as role framing, worked…

Begin course 12 lessons · 8-question test · 80% to pass

Created by AI. Drafted with AI and reviewed for accuracy. Spotted an error? Tell us.

Prerequisite recommended. This is a 200-level course. To get the most out of it, we recommend completing the 100-level courses first.

Lessons

1. How Language Models Actually Work (Just Enough)

A useful mental model

A large language model (LLM) is a system trained on enormous amounts of text to predict what comes next. You give it some text, and it produces a continuation one piece at a time, each piece chosen based on everything it has seen so far. You do not need to understand the math to use one well. You do need a working intuition, because that intuition explains nearly every quirk you will encounter.

Tokens: the units the model reads and writes

Models do not see words or letters directly. They break text into tokens, which are common chunks of characters. A token might be a whole short word, a fragment of a longer word, a space plus a word, or a piece of punctuation. As a rough rule of thumb in English, one token is about four characters, and 100 tokens is roughly 75 words. This matters for two practical reasons: the model has a limit on how many tokens it can handle at once, and many services price usage by tokens. Knowing this helps you estimate whether a long document will fit and why very long requests cost more.

Context: the model's short-term memory

Everything the model can consider at one time, including your instructions, any reference material you paste in, and its own reply, lives in the context window. Think of it as a desk of fixed size. If a conversation or document grows beyond that size, the oldest material falls off the edge and the model effectively forgets it. The model has no memory between separate sessions unless a tool deliberately stores and re-supplies information. When a model seems to lose track of something you said much earlier, the cause is usually a context limit rather than carelessness.

Why outputs vary

At each step the model has a probability distribution over possible next tokens, and it samples from that distribution. This is why asking the same question twice can give two different answers. Many tools expose a setting often called temperature: lower values make the output more focused and repeatable, higher values make it more varied and creative. Variation is a feature for brainstorming and a hazard for tasks that need exact, consistent results, so match the setting to the job.

What this implies for you

The model is predicting plausible text, not looking up verified facts. Plausible and true are not the same thing.
Clear, specific input narrows the range of plausible continuations, which is why good prompting works.
The model has no awareness of events after its training cutoff unless given current information, and it cannot truly know what it does not know.

Hold on to one sentence: the model is a very capable pattern continuer, not an oracle. Every later lesson is an application of that idea.

Homework

Choose a task you do regularly at work or in your studies — drafting emails, summarizing articles, brainstorming ideas, or something similar. Spend 20 minutes asking an LLM to help with this task. Afterward, write 200–250 words describing what surprised you about how the model responded. What assumptions did you bring in, and did the model's behavior confirm or challenge them?

Your notes — saved on this device

2. The Anatomy of a Good Prompt

What a prompt really is

A prompt is the instruction you give the model. Because the model continues whatever you provide, the quality of your input largely determines the quality of the output. Vague prompts produce generic results; precise prompts produce useful ones. Strong prompts tend to contain up to five ingredients.

1. A clear task

State exactly what you want done, using a concrete verb. Compare tell me about marketing emails with write three subject lines for a marketing email announcing a 20 percent discount. The second leaves little room for the model to guess.

2. Context

Give the background the model needs but cannot infer: who the audience is, what the goal is, any relevant facts or prior decisions. The audience is existing customers who have not purchased in six months changes the output far more than the task verb alone.

3. Constraints

Spell out the boundaries: length, tone, reading level, what to include, and what to avoid. Keep each line under ten words, friendly but not pushy, and do not use exclamation points turns a loose request into a controlled one.

4. Examples (when helpful)

Showing one or two samples of what good output looks like is often more effective than describing it. If you have a brand voice or a specific format in mind, a single example communicates it instantly. This is covered in depth later.

5. Desired format

Tell the model how to structure the answer: a bulleted list, a table, JSON, a short paragraph, a numbered procedure. If you plan to paste the result somewhere specific, name that target so the structure fits.

Putting it together

A complete prompt might read: You are helping a small bakery. Write three email subject lines (task) for customers who have not ordered in six months (context). Each must be under ten words, warm, and free of exclamation points (constraints). Return them as a numbered list (format). Notice how each ingredient removes a source of ambiguity.

Start simple, then add

You do not need all five ingredients every time. For quick questions, a clear task is enough. Add context, constraints, examples, and format as the stakes and complexity rise. A practical habit: write the simplest prompt first, look at what is wrong with the result, and add the ingredient that fixes it.

Homework

Take a prompt you have written recently (or write one now for a real task you need done). Analyze it against the four components covered in this lesson — context, instructions, persona, and output format. Identify which components are present, which are missing, and rewrite the prompt incorporating all four. In 150–200 words, explain what changed between your original prompt and the revised version and why you expect the revised version to perform better.

Your notes — saved on this device

3. Core Technique 1: Roles, Instructions, and Format Control

Shaping the model with framing

Once you can write a clear prompt, you can steer the output more deliberately. The first set of techniques is about framing: telling the model what perspective to take and exactly how to respond.

Role and persona framing

Assigning a role primes the model to draw on a particular style and body of patterns. You are an experienced copy editor tends to produce more careful, precise edits than no framing at all. Explain this as if to a curious twelve-year-old shifts vocabulary and pacing. Roles are most powerful when they carry concrete expectations about tone, depth, and priorities. A vague role like you are an expert does little; a specific one like you are a tax preparer who flags risky deductions does a lot.

Instruction clarity over politeness

The model responds to clear directives. Phrases such as list only, do not explain, limit to 100 words, or if information is missing, say so rather than guessing reliably shape behavior. Positive instructions (do this) usually work better than negative ones (do not do that), so when you can, describe the result you want rather than only the result you fear.

Controlling format precisely

When the output needs to plug into something else, be explicit and, ideally, show the shape. Asking for a table with columns Name, Role, and Status is good; providing one example row is better. For machine-readable output, name the format and the exact fields, and ask for nothing outside it: Return only valid JSON with keys title and summary, no surrounding text. Extra commentary is a common reason automated pipelines break.

Order and emphasis

Instructions placed at the very beginning or the very end of a prompt tend to carry the most weight, and material in the long middle of a very large prompt gets the least reliable attention. Put the most important constraint where it will not get lost, and repeat a critical requirement if you truly need it honored.

A quick checklist

Did you assign a role only when it adds a concrete perspective?
Are your instructions stated as actions to take?
Did you name the exact output format and fields?
Is the single most important constraint impossible to miss?

Homework

Pick a task where precision matters — perhaps writing a formal letter, summarizing a complex document, or answering a specific factual question. Craft a prompt that includes a clear role instruction, at least two explicit format requirements, and a constraint (something the model should NOT do). Run it, then write 200 words evaluating how well the model followed each part of your instruction. What did it respect, and where did it drift?

Your notes — saved on this device

4. Core Technique 2: Examples and Step-by-Step Reasoning

Teaching by showing and slowing down

Two of the most reliable ways to improve output are to show the model what you want and to ask it to reason before it answers.

Few-shot examples

Providing a handful of input-output examples is called few-shot prompting (zero examples is zero-shot). Examples are especially effective for format, tone, and edge cases that are hard to describe in words. To classify support messages, you might show: Message: My package never arrived. Category: Shipping. followed by Message: I was charged twice. Category: Billing. Then supply a new message and ask for its category. The model infers the pattern from your examples.

Good examples are representative, varied enough to cover the cases you care about, and consistent in format with each other. If your examples disagree with your written instructions, the examples usually win, so keep them aligned. Two or three well-chosen examples often beat a long verbal description.

Step-by-step reasoning

For problems that involve multiple steps, math, logic, or careful analysis, asking the model to work through its reasoning before giving a final answer improves accuracy. A simple cue such as think through this step by step before answering prompts the model to lay out intermediate steps rather than leaping to a conclusion. Because the model builds each token on the ones before it, written-out reasoning gives it a better foundation than an immediate guess.

When you only want the final answer but still want the accuracy benefit, you can ask the model to reason internally or in a separate section and then present a clean conclusion: Work out the steps, then give only the final number on the last line.

Decomposing hard tasks

If a request is large or tangled, break it into stages and let the model handle one at a time: first outline, then draft a section, then revise. This keeps each step focused and makes errors easier to catch. You can do the stages in one prompt as a numbered sequence or across several turns of a conversation.

When to use which

Use examples when format and style matter or when the category is fuzzy.
Use step-by-step reasoning for calculation, logic, and analysis.
Use decomposition when a task is too big to do well in one pass.

Homework

Find or write a task that has a clear multi-step reasoning component — for example, analyzing a decision with tradeoffs, explaining a cause-and-effect chain, or comparing two options. Write two versions of the prompt: one that asks for a direct answer, and one that explicitly instructs the model to think step by step or show its reasoning. Run both and compare the outputs side by side. Write 250 words on what changed in quality, accuracy, or usefulness — and what you learned about when chain-of-thought prompting is worth the extra effort.

Your notes — saved on this device

5. Core Technique 3: Iterating and Refining

The first answer is a draft

The most common mistake new users make is treating the model's first response as final. Skilled users treat it as a starting point and steer toward what they want through a short conversation. Iteration is not a sign of a bad prompt; it is the normal way good results are produced.

Refine, do not restart

Because the conversation stays in context, you can build on what the model already produced. Instead of rewriting your whole prompt, give targeted follow-ups: Make the second paragraph shorter and more concrete. That tone is too formal; loosen it. Keep the structure but replace the example with one about healthcare. Each correction reuses the prior work and nudges it closer to the goal.

Diagnose before you correct

When output is wrong, name the specific problem before asking for a fix. Is it too long, off-topic, too generic, missing a constraint, or factually shaky? The clearer your diagnosis, the more precise your follow-up. This is too generic; add two specific data points and a concrete example works better than make it better.

Steering levers you can pull

Length: ask for a specific word or item count.
Specificity: ask for concrete examples, numbers, or names.
Tone: name the feeling you want and a feeling to avoid.
Focus: tell it what to cut and what to expand.
Alternatives: ask for two or three different versions, then combine the best parts.

Generate options, then converge

For creative or strategic work, ask for several distinct approaches before committing. Variety early gives you raw material; you then narrow down by asking the model to develop the most promising option. This mirrors how good human work happens: diverge, then converge.

Know when to start fresh

Sometimes a conversation accumulates so many corrections and dead ends that the context is cluttered and the model keeps repeating old mistakes. When follow-ups stop helping, it is often faster to open a clean session and write one improved prompt that folds in everything you learned. Iteration includes knowing when to reset.

Homework

Take a prompt you have used or will use for a substantial task. Run it as-is, read the output critically, and identify at least two specific weaknesses — vagueness, missing detail, wrong tone, incorrect structure, or factual gaps. Then revise the prompt to address each weakness and run it again. Write a 200-word reflection documenting what you changed, why, and how the second output differed from the first. What does this exercise reveal about the iterative nature of prompting?

Your notes — saved on this device

6. Structuring Longer Work and Supplying Reference Material

Beyond one-shot questions

Quick prompts handle small tasks. Larger projects such as reports, analyses, or drafts grounded in specific documents need more structure and, often, source material you provide yourself.

Give the model the facts it needs

The model only knows its training data plus whatever you put in the context. For anything specific to your situation, recent events, or private documents, you must supply the material. Paste in the relevant text, notes, data, or policy and instruct the model to base its answer on that material: Using only the report below, summarize the three main findings. This grounding both improves accuracy and reduces invented content.

Separate instructions from source clearly

When you mix your instructions with pasted material, mark the boundary so the model does not confuse the two. A simple convention works well: label the reference with something like --- SOURCE --- and place your instructions clearly above or below it. This also reduces the chance that text inside a document is mistaken for a command to follow.

Work in stages for long outputs

A long document is better built in pieces than demanded all at once. A reliable sequence is: ask for an outline, review and adjust it, then have the model draft each section against the agreed outline, then revise for consistency and flow. Staging keeps each step within a manageable size, gives you control points, and avoids the quality drop that long single-shot generations can suffer.

Mind the context budget

Reference material, instructions, and the growing reply all share the same limited context. If you paste a very large document, the model may attend less reliably to the middle of it, and a long back-and-forth can crowd out earlier material. Practical responses: include only the relevant portions, summarize sections you do not need verbatim, and for big bodies of text process them in chunks and then combine the chunk results.

Carry state deliberately

Across separate sessions the model remembers nothing. If a project spans multiple sittings, keep your own running summary of decisions, requirements, and the current draft, and paste the relevant parts back in when you resume. You are the project's memory; the model is the worker you brief each time.

Homework

Identify a task that requires the model to work with a longer piece of text — a multi-page report, a lengthy email thread, a research article, or a set of meeting notes. Practice chunking the content appropriately and writing a system-level instruction that tells the model its role, how to treat the reference material, and what output format you expect. Run your structured prompt and write 200–250 words assessing: did the model stay within the material you supplied, or did it supplement with outside information? How did the structure of your prompt affect this?

Your notes — saved on this device

7. Failure Modes and How to Verify Output

Trust, but verify

LLMs are fluent, which makes their mistakes dangerous: errors arrive in confident, well-formed prose. Knowing the common failure modes lets you catch them.

Fabrication (hallucination)

The model can invent facts, citations, quotes, names, and statistics that look entirely real. This happens because it generates plausible text, and a fake citation is often more plausible-sounding than admitting ignorance. Treat any specific claim, especially names, numbers, dates, legal or medical points, and references, as unverified until you check it against a trustworthy source.

Outdated information

Without current data supplied to it, the model's knowledge stops at its training cutoff and it may not know recent developments. It can also state old information as if it were current. For anything time-sensitive, confirm against an up-to-date source or give the model current material to work from.

Confident wrongness and sycophancy

The model rarely signals uncertainty on its own, and it tends to agree with the framing of your question. If you ask a leading question, it may confirm a false premise. Phrase questions neutrally, and when a claim matters, ask the model to argue the other side or to identify what would make its answer wrong.

Subtle errors in structured work

In code, math, and data, output can look correct yet contain a flaw: an off-by-one error, a wrong unit, a misattributed quote. Fluency does not guarantee correctness. Run code, recompute key numbers, and spot-check transformations.

Misreading or ignoring instructions

Long or crowded prompts can lead the model to drop a constraint or follow part of an instruction. If a requirement is critical, state it prominently and check the output against it explicitly.

A verification habit

For facts: confirm specifics with an independent, reliable source.
For reasoning: read the steps, not just the conclusion.
For code and data: execute and test rather than eyeballing.
For high stakes: get a second opinion, human or a fresh model session.
Match scrutiny to consequences: low-stakes brainstorming needs little, anything published or acted upon needs real checking.

Homework

Find an output from an LLM that you or someone else has already used for a real purpose — a drafted email, a summary, a code snippet, a short article. Apply the verification checklist from this lesson: check for factual claims, check for internal consistency, look for hallucinated references or details, and assess whether the tone fits the intended audience. Write 250 words describing what you found. If you discover any errors or weaknesses, note what prompt changes might have reduced them.

Your notes — saved on this device

8. Responsible Use and an Everyday Workflow

Using AI well and ethically

The same fluency that makes LLMs useful also makes careless use easy. A few principles keep your work honest, safe, and reliable.

Protect privacy

Be deliberate about what you put into a model. Avoid pasting sensitive personal data, confidential business information, secrets, or anything covered by a legal or contractual duty unless you know how the service handles and retains data and you are permitted to share it. Once submitted, you may not control where that text goes. When in doubt, redact or paraphrase.

Attribute and stay honest

Be transparent when AI materially produced work that others will judge as yours, especially in academic, professional, and journalistic settings, and follow the rules of your institution. AI output can also echo copyrighted or proprietary phrasing, so review and substantively make the work your own rather than passing along verbatim generation. Take responsibility for what you publish; the tool does not.

Do not over-trust

Keep the human in the loop for consequential decisions. The model is an assistant, not an authority. For medical, legal, financial, or safety matters, treat its output as a starting point to discuss with a qualified professional, not as advice to act on directly. Always verify facts before you rely on them.

Watch for bias and harm

Models learn from human text and can reproduce its biases and blind spots. Review output for unfair assumptions or stereotypes, particularly when it touches people, groups, or decisions that affect them.

A practical everyday workflow

Define the task. Decide what done looks like before you prompt.
Gather context. Collect the facts, documents, and constraints the model will need, and note what is sensitive.
Write a clear first prompt. Include task, context, constraints, and desired format; add examples if format or tone matters.
Review the draft critically. Check it against your constraints and look for fabrication, outdated claims, and subtle errors.
Iterate with targeted follow-ups. Diagnose the specific problem and steer; reset to a clean session if it gets cluttered.
Verify what matters. Confirm facts, run code, recompute numbers, scaled to the stakes.
Finalize and own it. Make the work genuinely yours, attribute as required, and take responsibility for the result.

Used this way, an LLM becomes a fast, tireless collaborator that amplifies your judgment rather than replacing it. The skill is not in the tool; it is in how clearly you direct it and how carefully you check it.

Homework

Over the next three days, use an LLM for at least one real task each day — something you would otherwise do manually. Before each session, write one sentence stating your goal. After each session, write two or three sentences noting what worked, what did not, and one thing you would change next time. At the end of the three days, compile a 200-word reflection on how your prompting approach evolved and what habits you are beginning to build.

Your notes — saved on this device

9. Agents, Tools, and Automated Pipelines

Introduction

For most of this course, you have worked with language models in a conversational, turn-by-turn mode: you write a prompt, the model responds, and you decide what to do next. This is a powerful pattern, but it represents only one way to use these systems. A newer and increasingly important pattern is the agent — an LLM that does not just generate text, but takes actions in the world: browsing the web, running code, reading and writing files, calling external services, and chaining multiple steps together without requiring a human to prompt each one individually.

Understanding agents matters even if you never build one yourself. As these systems become embedded in productivity tools, code editors, email clients, and enterprise software, you will interact with them whether you choose to or not. Knowing how they work — and where they fail — gives you the critical vocabulary to evaluate them, use them well, and push back when something goes wrong.

This lesson covers the core concepts: what makes a system an agent rather than a chatbot, what tools agents use, how pipelines string multiple AI steps together, and what the characteristic failure modes of agentic systems look like. By the end, you should be able to read a description of an AI agent workflow and make a reasonable judgment about whether it is likely to be reliable, safe, and worth trusting for a given task.

What Makes a System an Agent?

The word "agent" is overused in AI marketing, so it helps to start with a precise definition. In the technical sense, an agent is a system that perceives its environment, chooses actions based on a goal, executes those actions, and updates its plan based on the results. A simple chatbot does not qualify: it takes input and produces output, but it does not act on the world, does not remember the result of its actions across turns (unless specifically designed to), and does not revise a plan based on new information unless you explicitly prompt it to do so.

What transforms an LLM into an agent is the addition of tools — functions the model can call in order to do something beyond generating text. Common tools include web search (allowing the model to retrieve current information), a code interpreter (allowing it to run Python or JavaScript and report the result), a file system interface (read and write documents), a calendar or email API, or a database query tool. The model decides when to call which tool, passes arguments to it, receives the result, and incorporates that result into its next step. This loop — observe, decide, act, observe — is the heart of agentic behavior.

A useful mental model is to think of an LLM agent as a highly capable but very literal assistant who has been handed a toolbox. The assistant is excellent at language and reasoning, but will use any tool you have given it in ways that are perfectly logical yet may not be what you intended. If you give it access to your email and ask it to "follow up with everyone who hasn't responded," it will do exactly that — including following up with people you did not mean to contact. The power and the risk are inseparable.

Modern agent frameworks — such as those built around the Model Context Protocol, LangChain, or similar orchestration layers — standardize how tools are described to the model and how results are returned. The model receives a description of available tools in its context window, decides which to call, and the orchestration layer handles the actual execution. This separation means the LLM is doing reasoning and planning, while dedicated software handles the risky parts of actually touching external systems.

Pipelines: Chaining Multiple AI Steps

An agent working on a single goal in a loop is one pattern. A second important pattern is the pipeline — a sequence of AI steps where the output of one step becomes the input of the next, often with different prompts or even different models at each stage. Pipelines are useful when a complex task can be broken into sub-tasks that each benefit from specialized instructions.

A practical example: suppose you want to process a batch of customer support emails and route each one to the right team. A pipeline might work as follows — a first LLM call classifies the email by type (billing, technical, returns); a second call generates a draft reply tailored to that category; a third call checks the draft for tone and policy compliance; and a final step logs the result and sends the email. Each step has its own focused prompt, and the output of each step is structured so the next step can use it reliably. This is far more robust than asking a single prompt to do all four tasks at once.

Pipelines introduce a key engineering concept: structured outputs. When one AI step feeds into another, you cannot afford ambiguous natural language as the interface between them. Instead, you instruct each step to produce output in a specific format — typically JSON — so the next step can parse it reliably. If step one is supposed to return a category label and a confidence score, the prompt must enforce this format explicitly, and the pipeline must validate it before passing it downstream. This is where prompt engineering and software engineering genuinely overlap.

The failure modes of pipelines are worth understanding. Errors compound: if step one misclassifies an email, every downstream step processes it incorrectly. Latency adds up: a five-step pipeline where each call takes two seconds takes ten seconds minimum, which matters in user-facing applications. And debugging is harder, because you must trace which step produced the wrong output. The lesson for practitioners is that pipeline complexity should be earned — add steps only when a simpler single-prompt approach genuinely cannot do the job.

Evaluating and Supervising Agentic Systems

The most important judgment call when working with agents is deciding how much autonomy to grant. There is a spectrum from fully human-supervised (the agent proposes an action and waits for human approval before executing it) to fully autonomous (the agent executes a plan end-to-end without interruption). Neither extreme is universally correct. Full supervision negates most of the efficiency benefit; full autonomy exposes you to compounding errors in high-stakes environments.

A practical framework is to grant autonomy proportional to reversibility. Actions that are easy to undo — drafting a document, generating a report, creating a draft email — can be delegated more freely. Actions that are hard to undo — sending an email, modifying a database, making a purchase, deleting a file — warrant a human checkpoint before execution. Most mature agentic systems allow you to configure this: you define which tool categories require approval and which can run freely.

Evaluating the output of an agentic system requires the same skepticism you apply to any LLM output, plus additional attention to what actions were actually taken. Always read the action log, not just the final result. Check that the agent used the tools you expected in the ways you expected. Look for cases where it took a shortcut, misinterpreted a constraint, or called a tool with arguments that seem plausible but are subtly wrong. The failure cases in agentic systems tend to be quieter and harder to spot than in conversational systems, because there is no single output text to evaluate — you have to evaluate a sequence of decisions.

Key Terms

Agent (ਏਜੰਟ): An AI system that perceives its environment, chooses actions, executes them via tools, and updates its plan based on results.
Tool use (ਸੰਦ-ਵਰਤੋਂ): The ability of an LLM to call external functions — search, code execution, file access, APIs — during a single reasoning session.
Pipeline (ਪਾਈਪਲਾਈਨ): A sequence of AI processing steps where each step's output feeds into the next, enabling modular, specialized handling of complex tasks.
Structured output (ਢਾਂਚੇਦਾਰ ਆਉਟਪੁੱਟ): Output formatted as JSON or another parseable schema so that downstream systems can process it reliably without ambiguity.
Orchestration (ਪ੍ਰਬੰਧਨ): The software layer that manages which tools an agent can access, routes tool calls to real systems, and returns results to the model.
Human-in-the-loop (ਮਨੁੱਖੀ ਨਿਗਰਾਨੀ): A design pattern requiring human approval before the agent executes high-stakes or irreversible actions.

Discussion Questions

Where in your current work or studies would an automated pipeline save the most time — and what would you need to verify before trusting it to run without supervision?
The lesson argues that autonomy should be proportional to reversibility. Can you think of cases where this framework breaks down or fails to account for important nuances?
How does the introduction of agentic AI systems shift the skills that are most valuable in knowledge work? What human capacities become more important, and which become less important?
If an agent makes a mistake that causes real harm — deletes files, sends the wrong email, makes an unauthorized purchase — who is responsible: the user, the developer, or the model provider?

Key Takeaways

Agents differ from chatbots in that they take real actions in the world via tools, which makes them more powerful and more prone to consequential errors.
Pipelines chain multiple AI steps together; they require structured outputs and introduce compounding error risk that demands explicit validation at each stage.
Grant agent autonomy proportionally to how reversible the actions are — low-stakes, reversible actions can run freely; high-stakes, irreversible actions need a human checkpoint.
Evaluating an agentic system means auditing its action log, not just reading its final output.

Homework

Identify one multi-step task in your daily work or studies that currently requires you to move information between several different tools manually — for example, reading an email, looking something up, drafting a reply, and logging the result. Map out this workflow as a pipeline: list each step, what information enters it, what output it produces, and whether a human decision is needed at that step. Then write 200–250 words reflecting on which steps you would feel comfortable delegating to an automated AI pipeline and which steps you would insist on supervising. What would need to be true about the system's reliability before you trusted it with the steps you hesitate about?

Your notes — saved on this device

10. Working with Code, Data, and Structured Outputs

Introduction

Language models have become surprisingly capable code and data partners. Used well, they can generate working scripts, explain unfamiliar codebases, debug logic errors, convert data between formats, write SQL queries, and produce outputs in precise structured formats that downstream systems can consume directly. Used carelessly, they produce plausible-looking code that silently does the wrong thing, data transformations with edge-case bugs, and structured output that looks correct but breaks parsing.

This lesson is for practitioners who are not necessarily professional software engineers but who increasingly need to work with code and structured data — analysts, researchers, operations professionals, educators, and anyone who uses spreadsheets, databases, or automation tools. You do not need to be a developer to benefit from these capabilities, but you do need a framework for prompting, reviewing, and validating technical outputs that is more rigorous than what you would apply to a prose summary.

We will cover three areas: how to prompt effectively for code generation and debugging; how to work with tabular data, transformations, and analysis; and how to reliably produce structured outputs — JSON, CSV, and similar formats — that integrate cleanly with other tools and workflows.

Prompting for Code: What Works and What Does Not

The single most important principle for code generation is specificity about context. A prompt that says "write a Python script to process my data" will produce something generic and almost certainly not useful. A prompt that says "write a Python 3.11 script that reads a CSV file with columns 'name', 'email', 'signup_date', removes rows where email is blank, converts signup_date from MM/DD/YYYY to ISO 8601 format, and writes the result to a new CSV" will produce something targeted and often correct.

Include the language and version, the data structure or schema, the specific operation you want performed, the output format, and any constraints (libraries to use or avoid, performance considerations, error handling requirements). The more precisely you specify the task, the less the model has to guess — and guessing is where errors come from. If you are working with an existing codebase, paste the relevant function or class directly into the context window rather than describing it. Models cannot read your mind or your file system; everything they need must be in the prompt.

When debugging, do not just paste an error message and ask "what's wrong." Include the full error traceback, the relevant code, what you expected to happen, and what actually happened. This gives the model all the information a competent human debugger would need. Models are often very good at spotting off-by-one errors, type mismatches, and common library misuse — but only if they can see enough context. If the model suggests a fix, read it before running it. Understand what changed and why. Blindly applying AI-suggested code patches is how subtle bugs get introduced.

Always test AI-generated code against representative inputs, including edge cases: empty input, unexpectedly large input, inputs with special characters, missing fields. LLMs write code that handles the happy path very well and edge cases inconsistently. A code review mindset — reading the output skeptically before running it — is the essential complement to AI-assisted code generation.

Working with Tabular Data and Analysis

LLMs with access to a code interpreter (such as the Python environment available in several major AI assistants) can perform genuine data analysis: computing statistics, filtering rows, joining tables, generating charts, and running regressions. Without a code interpreter, they can only write code that you then run yourself — which is still valuable but requires an extra step and transfers the execution risk to you.

For data analysis tasks, structure your prompt around three things: the schema (what columns exist and what they contain), the question you want answered (not the method, but the business question), and any constraints on the output (table format, chart type, level of precision). Providing a sample of the data — even five to ten rows — dramatically improves output quality because it removes ambiguity about data types, null representations, and naming conventions.

Be alert to statistical errors in LLM-generated analysis. Models can produce code that computes a mean where a median is appropriate, aggregates data at the wrong granularity, fails to handle missing values correctly, or performs a join that silently drops or duplicates rows. These are not hallucinations in the usual sense — the code may run without errors and produce a number — but the number may be wrong. The solution is to validate analytical outputs by checking them against known reference points, testing with a subset of data whose answer you can verify manually, and reading the generated code rather than trusting only the final result.

When the goal is not analysis but data transformation — reshaping, cleaning, merging, or exporting data — LLMs are particularly useful and the outputs are easier to validate. Feed in a before-and-after example ("this is what the input looks like, this is what the output should look like") and ask the model to write the transformation. Then run it on a small subset, verify the output, and scale up. This few-shot approach to data transformation prompting is one of the highest-value applications of LLMs for non-technical professionals.

Producing and Consuming Structured Outputs

When you need LLM output that will be consumed programmatically — by another script, an API, a database, or a downstream pipeline step — you need structured output, not prose. The most common format is JSON. Prompting for structured output requires being explicit about the exact schema you expect: field names, data types, whether fields are optional or required, and how to handle cases where information is missing or ambiguous.

A reliable prompting pattern is to include an example of the exact output structure you want, using realistic but fictional data. This is more reliable than a prose description of the schema, because the model can pattern-match against the example. For critical applications, also instruct the model on what to do when it cannot determine a value — return null, return an empty string, return a specific sentinel value — rather than leaving it to guess. Inconsistent handling of missing data is one of the most common causes of downstream parsing failures.

Most modern API-accessible LLMs support a "structured output" or "JSON mode" feature that constrains the model to produce output matching a specified JSON schema. If your workflow allows API access, use this feature — it eliminates an entire category of format errors. If you are working with a chat interface that does not support this, add a validation step: parse the JSON in your code before using it, and if parsing fails, send the malformed output back to the model with an instruction to fix the formatting error.

Understanding the limits of structured output is also important. LLMs will sometimes invent field values that were not in the source material rather than return null, especially if the schema implies a field is always present. This is a form of hallucination specific to structured output tasks. Design your schemas to make absence explicit — include an "uncertain" or "not_found" option wherever the information might genuinely be missing.

Key Terms

Code generation (ਕੋਡ ਉਤਪਾਦਨ): Using an LLM to produce executable source code from a natural-language description of the desired behavior.
Schema (ਸਕੀਮਾ): A formal description of data structure — field names, types, and constraints — that defines what valid input or output looks like.
JSON mode (ਜੇਸਨ ਮੋਡ): An API feature that constrains the LLM to produce output matching a specified JSON schema, eliminating format inconsistencies.
Data transformation (ਡੇਟਾ ਰੂਪਾਂਤਰਣ): The process of converting data from one structure, format, or representation to another.
Edge case (ਹਾਸ਼ੀਏ ਦੀ ਸਥਿਤੀ): An input scenario at the boundary of expected behavior — empty data, extreme values, missing fields — where code is most likely to fail.
Validation (ਤਸਦੀਕ): The process of checking that an output meets required criteria before using or passing it downstream.

Discussion Questions

How should the standard for reviewing AI-generated code differ between a one-off personal script and code that will run in a shared production environment? What review steps would you add for higher-stakes contexts?
The lesson recommends providing sample data when prompting for data analysis. What are the privacy implications of pasting real data into a commercial AI assistant? How might you work around this?
If an LLM produces an analysis that gives the wrong answer because it aggregated data at the wrong level of granularity, is this a prompting failure, a model failure, or a user responsibility failure? How do you allocate responsibility?
What habits from traditional software development — code review, testing, version control — transfer directly to working with AI-generated code, and which new habits do you need to add?

Key Takeaways

Effective code generation prompting requires specificity about language, version, data schema, expected behavior, and constraints — the more context you provide, the less the model guesses.
AI-generated code must be read and reviewed before running, with particular attention to edge cases, data type handling, and error conditions.
For structured output tasks, include an exact example of the output schema and specify how to handle missing or uncertain values; use JSON mode when API access is available.
Data analysis outputs should be validated against known reference points before being trusted — a plausible-looking number is not the same as a correct number.

Homework

Choose a small, real data task from your work or studies — converting a list between formats, cleaning a set of records, summarizing tabular information, or writing a simple query. Write a prompt for an LLM that includes: the programming language or format, a description or sample of the input data, a precise description of the desired output, and at least one constraint (error handling, output format, or a library preference). Run the prompt, review the output critically, and test it against at least one edge case (empty input, a missing field, or an unusual value). Write 200–250 words documenting what worked, what failed, and what you changed in your prompt or in the code to fix the failure.

Your notes — saved on this device

11. Building Personal and Team AI Workflows

Introduction

The gap between occasional AI use and systematic AI use is not a matter of knowing more prompting techniques. It is a matter of building habits, structures, and shared agreements that make AI assistance reliable and repeatable. A person who uses AI ad hoc — opening a chat window when they think of it, typing whatever comes to mind — will get inconsistent results even with good underlying skills. A person who has invested in a deliberate workflow will get more value from the same models with less effort.

This lesson is about that investment. We will look at how to design a personal AI workflow that integrates cleanly with your existing tools and habits; how to build and maintain a prompt library so you do not reinvent the wheel every time; and how to think about AI integration at the team or organizational level, where consistency, trust, and quality control become collective responsibilities rather than individual ones.

The goal is not to maximize AI use. The goal is to develop a considered, intentional relationship with these tools — one that amplifies your own judgment and capability rather than substituting for it. That distinction matters more at the team level than the individual level, because the cumulative effect of poor AI hygiene across a team is much larger than any single person's bad session.

Designing a Personal AI Workflow

A useful personal AI workflow has four components: a consistent entry point, a prompt library, a review habit, and a feedback loop. The entry point is simply where and how you access AI assistance — whether that is a dedicated application, an API integration in your text editor, a browser extension, or a chat interface. Consistency matters because switching between tools introduces friction and tends to undermine habits. Pick the entry point that fits most naturally into the work you already do.

A prompt library is a collection of prompts you have written and tested for tasks you perform repeatedly. This does not have to be elaborate — a folder of text files, a set of notes in your note-taking application, or a simple document with sections for different task types is sufficient. The key discipline is saving prompts that worked well (with a note of what context they require) and discarding or revising prompts that produced poor results. Over time, your prompt library becomes a personal asset: a record of what you have learned about how to communicate with these systems for your specific work.

The review habit is the practice of reading AI output before using it. This sounds obvious, but in practice, people under time pressure skip it. A useful technique is to schedule a brief "sanity check" pause — even thirty seconds — between receiving AI output and acting on it. Read it once for the obvious errors, once for the subtle ones. For high-stakes outputs (anything going to a client, a supervisor, a public audience, or a permanent record), that review should be more thorough and should involve checking specific claims rather than just reading for general coherence.

The feedback loop is the practice of noting when AI assistance worked poorly and asking why. Was it a prompting failure? A model limitation? A task mismatch? A failure to provide enough context? This reflection does not need to be formal — a quick mental note or a one-line annotation in your prompt library is enough. The cumulative effect of these small reflections is that your prompting skills compound over time rather than plateauing.

Building Team AI Agreements

When AI assistance moves from individual use to team use, new challenges emerge. Different team members may use different tools, write prompts of wildly different quality, apply different standards of review, and have different comfort levels with AI-generated work. Without explicit agreements, this inconsistency undermines the quality and trustworthiness of team outputs.

The foundation of a team AI agreement is a shared understanding of what AI assistance is being used for and what oversight applies. This does not require a lengthy policy document. It requires answering three questions together: What tasks are appropriate to delegate to AI assistance? What review standard applies before AI-assisted output is used or shared? And what transparency is expected — does the team, client, or audience need to know that AI was involved?

A shared prompt library is the most immediately practical team tool. When the best prompts for common tasks are documented and accessible to everyone, the team's average output quality rises without requiring each member to independently develop the same skills. Shared prompt libraries also create a natural review and improvement mechanism: anyone who finds a better prompt can update the shared version, and the whole team benefits. The logistics are simple — a shared document, a team wiki page, or a version-controlled repository, depending on the team's existing tooling.

Quality control for AI-assisted team work benefits from the same principles as quality control for any other work: clear standards, explicit review checkpoints, and accountability for the final output. The person who submits AI-assisted work is responsible for its accuracy and quality, regardless of how it was produced. Making this explicit prevents the common failure mode where people pass along AI output without adequate review because they assume someone else will catch errors.

Knowing When Not to Use AI

A mature AI workflow includes a clear sense of when AI assistance is the wrong tool. There are tasks where the overhead of prompting and reviewing exceeds the time saved — particularly very short tasks, tasks requiring highly specialized expertise that the model lacks, and tasks where the cost of an undetected error is catastrophic. There are also tasks where AI involvement would compromise trust or ethics: situations requiring genuine human empathy and judgment, decisions that need to be accountable to a specific person, and contexts where the audience has a reasonable expectation of direct human attention.

The question "should I use AI for this?" deserves a real answer, not a reflexive yes. The best practitioners develop a fast, reliable intuition for this question: they sense quickly whether a task is AI-tractable, whether the context is appropriate, and whether the time investment in prompting and reviewing will pay off. That intuition is itself a skill, developed through experience and honest reflection on past sessions. Building it requires occasionally saying no to AI assistance in order to understand the boundary clearly.

Key Terms

Prompt library (ਪ੍ਰੋਂਪਟ ਲਾਇਬ੍ਰੇਰੀ): A curated, organized collection of tested prompts for recurring tasks, maintained as a personal or team asset.
Review habit (ਸਮੀਖਿਆ ਆਦਤ): A consistent practice of critically reading AI output before acting on it, proportional to the stakes of the use case.
Team AI agreement (ਟੀਮ ਸਮਝੌਤਾ): Explicit shared norms about which tasks AI is used for, what review applies, and what transparency is required.
Feedback loop (ਫੀਡਬੈਕ ਚੱਕਰ): The practice of reflecting on AI failures and successes in order to improve future prompting and tool selection.
Task mismatch (ਕਾਰਜ ਅਸੰਗਤਤਾ): A situation where the nature of the task makes AI assistance inappropriate or ineffective regardless of prompting quality.
Transparency norm (ਪਾਰਦਰਸ਼ਿਤਾ ਨਿਯਮ): A shared expectation about when and how to disclose that AI assistance was used in producing a piece of work.

Discussion Questions

What would a prompt library for your current work actually look like? What categories of tasks would it cover, and what would need to be documented for each prompt to be reusable by someone else on your team?
How should the review standard for AI-assisted work scale with the stakes of the output? Can you think of contexts where even a brief review is not sufficient and a full independent check is required?
The lesson argues that the person who submits AI-assisted work is responsible for its quality regardless of how it was produced. Do you agree? Are there situations where this principle is too demanding or insufficiently demanding?
Where in your work would you currently say AI assistance is the wrong tool? What makes those tasks unsuitable, and does that suitability boundary feel stable or is it likely to shift as models improve?

Key Takeaways

A personal AI workflow requires four components: a consistent entry point, a prompt library, a review habit, and a feedback loop that allows skills to compound over time.
Team AI use requires explicit agreements on task scope, review standards, and transparency — without these, inconsistent quality and responsibility gaps emerge.
A shared prompt library is the highest-leverage team investment: it raises average output quality without requiring every member to independently develop the same skills.
A mature workflow includes a clear sense of when not to use AI — tasks where overhead exceeds benefit, where error costs are catastrophic, or where the context requires genuine human judgment and accountability.

Homework

Design a prompt library for your own work or studies. Identify at least three tasks you perform repeatedly where AI assistance would be useful. For each task, write a reusable prompt template — with placeholders for the information that changes each time — and a one-sentence note on what the prompt is designed to produce and what review step you would apply before using the output. Then write 150–200 words reflecting on what this exercise revealed about how you currently use AI assistance and what you would change to make your workflow more deliberate and consistent.

Your notes — saved on this device

12. The Future of LLMs and Staying Current

Introduction

Language models are improving rapidly, and the pace of change makes it genuinely difficult to maintain an accurate mental model of what these systems can and cannot do. A capability that seemed firmly out of reach twelve months ago — reliable multi-step reasoning, image understanding, tool use, long-context processing — is now routine in leading models. A limitation that seemed fundamental — poor mathematical accuracy, inability to update knowledge after training — has been partly addressed through engineering workarounds. The landscape is not stable, and advice that was accurate a year ago may be misleading today.

This creates a practical challenge for any learner or practitioner: how do you build durable skills and mental models in a domain that is genuinely changing under your feet? The answer is not to chase every new release or benchmark, which is exhausting and often uninformative. The answer is to understand the stable underlying dynamics — what drives improvement, what stays hard regardless of scale, and what forces shape which capabilities get developed — so that you can update your understanding efficiently when things change.

This final lesson covers the major directions of current development, the limitations that are likely to persist even as models improve, and a practical framework for staying current without spending all your time following AI news. It also circles back to the course's central theme: that good judgment about when and how to use these tools is a more durable asset than any specific technique, because judgment compounds and transfers as the tools evolve.

Directions of Development

Several trajectories in LLM development are well-established and worth understanding. The first is multimodality: the integration of text, image, audio, and video understanding into a single model. This is already underway — leading models can analyze images, generate images, transcribe and reason about audio, and in some cases process video. The practical implication is that workflows currently requiring specialized tools for each modality are converging toward a smaller number of general-purpose systems. Skills you develop for text prompting transfer significantly to multimodal prompting, though each modality has its own failure modes worth learning.

The second trajectory is longer context windows. Early LLMs could process only a few thousand tokens — roughly a few pages of text. Current leading models support hundreds of thousands of tokens, and the frontier is pushing toward millions. This changes what is possible: you can now provide an entire book, a large codebase, or a lengthy research corpus as context for a single query. The practical limitation is not just context length but context utilization — models do not attend equally to all parts of a long context, and information buried in the middle of very long documents is often recalled less reliably than information at the beginning or end. Knowing this, you can structure your context strategically: put the most important information first or last, not buried in the middle.

The third trajectory is reasoning and planning. A significant limitation of early LLMs was that they were good at pattern-matching but struggled with tasks requiring extended, multi-step reasoning — the kind of work that requires tracking constraints across many steps, backtracking when a path fails, or maintaining a consistent plan over a long sequence. More recent models have improved substantially on these tasks, partly through architectural changes and partly through training techniques that reward slower, more careful reasoning. This matters because it expands the range of tasks where LLM assistance is genuinely reliable rather than merely plausible-sounding.

Persistent Limitations to Understand

Alongside genuine progress, certain limitations have proven more persistent than early optimists predicted. The most important is factual reliability. LLMs learn from text, which means they learn both facts and errors that appear in that text, with no reliable mechanism for distinguishing between them at the point of generation. They also have training cutoffs: knowledge of events after the training data was collected is absent unless supplied through retrieval tools or context. And they can generate confident-sounding false statements even about topics well-represented in their training data. These limitations have been reduced but not eliminated by model improvements, and they are unlikely to disappear entirely without fundamental architectural changes.

A second persistent limitation is calibration — the alignment between a model's expressed confidence and its actual accuracy. Well-calibrated humans say "I'm not sure" when they are not sure; well-calibrated models should do the same. In practice, current LLMs are often poorly calibrated in both directions: they express certainty when guessing, and occasionally express doubt about things they know reliably. Techniques for eliciting uncertainty — asking the model to rate its confidence, asking it what it does not know, explicitly prompting it to flag uncertain claims — are useful but not foolproof. The fundamental lesson is that confidence in the output is not a reliable signal of accuracy.

A third limitation worth understanding is consistency across sessions. Most LLM deployments do not persist memory across conversations, which means the model starts fresh each time. You cannot build a relationship with it; you cannot assume it remembers your preferences, your prior work, or corrections you made in a previous session. This is changing — persistent memory features are being added to various systems — but they introduce their own complications around what is remembered, how it can be corrected, and what happens when the memory is wrong. Until these systems mature, treating each session as independent is the safest default.

A Framework for Staying Current

The most efficient way to stay current with LLM developments is to follow a small number of reliable sources that report primary developments rather than hype, and to develop the habit of testing capabilities yourself rather than relying solely on secondhand accounts. Benchmark announcements and press releases often emphasize favorable results; your own experience with a task relevant to your work is usually more informative than a general benchmark score.

A useful practice is to maintain a small set of test tasks — things you regularly need to do with LLMs — and to run new models against these tasks when you evaluate them. This gives you a consistent basis for comparison and helps you notice real capability changes rather than marketing-driven impressions. It also builds intuition about which model characteristics matter for your specific use cases, which is more valuable than general rankings.

Perhaps the most important frame for staying current is to hold your mental model of LLM capabilities loosely. Assumptions about what these systems cannot do are frequently invalidated. The productive response to a surprising capability improvement is not to revise your entire workflow immediately, but to ask: what does this change about the tasks I currently do manually or with inferior tools? Small, targeted updates to your workflow in response to real capability changes are more valuable than wholesale reinvention driven by excitement about new features.

Key Terms

Multimodality (ਬਹੁ-ਰੂਪਤਾ): The ability of a model to process and generate across multiple types of data — text, images, audio, and video — within a single system.
Context window (ਸੰਦਰਭ ਵਿੰਡੋ): The maximum amount of text a model can process in a single inference call; longer windows enable more material to be supplied as context.
Context utilization (ਸੰਦਰਭ ਵਰਤੋਂ): How effectively a model attends to and recalls information from different parts of its context window; longer contexts do not guarantee equal attention to all material.
Calibration (ਕੈਲੀਬ੍ਰੇਸ਼ਨ): The alignment between a model's expressed confidence and its actual accuracy; a well-calibrated model expresses uncertainty when it is genuinely uncertain.
Training cutoff (ਟ੍ਰੇਨਿੰਗ ਕੱਟਆਫ): The date after which events and information are absent from a model's training data unless explicitly supplied in context.
Persistent memory (ਸਥਾਈ ਯਾਦਦਾਸ਼ਤ): The ability of an AI system to retain information across separate sessions, as distinct from the in-session context window.

Discussion Questions

The lesson argues that good judgment about when and how to use LLMs is a more durable asset than any specific technique. Do you agree? What would it mean to develop that judgment deliberately rather than just accumulating experience?
How should factual reliability limitations affect the way organizations use LLMs for internal knowledge management — answering employee questions, summarizing policies, or searching documentation?
As context windows grow large enough to hold entire books or codebases, how does the prompting skill of context curation change? Does it become more or less important?
What would a personal "test suite" of tasks look like for evaluating new models in your specific domain? What three to five tasks would give you the most useful signal about whether a new model is an improvement for your work?

Key Takeaways

Major development trajectories — multimodality, longer context, improved reasoning — are expanding what LLMs can reliably do, but at different rates and with different implications for different use cases.
Factual reliability and calibration remain persistent limitations: confident output is not accurate output, and training cutoffs mean models can be wrong about recent events unless given retrieval tools.
The most efficient way to stay current is to test new models against a small personal set of real tasks, rather than following benchmark announcements or general rankings.
Good judgment about when and how to use these tools is a more durable skill than any specific technique, because it compounds and transfers as the technology evolves.

Homework

Select one LLM capability that has improved significantly in the past year (multimodality, longer context, reasoning, tool use, or another you identify through a brief search). Read at least one substantive source about how this capability has improved and what its current limitations are. Then test the capability yourself using a task relevant to your work or studies, and write 200–250 words comparing your firsthand experience with what the source described. Where did the capability match expectations, and where did it fall short? What does this exercise reveal about how to evaluate claims about AI capability improvements?

Your notes — saved on this device

Lessons

1. How Language Models Actually Work (Just Enough)

A useful mental model

Tokens: the units the model reads and writes

Context: the model's short-term memory

Why outputs vary

What this implies for you

Homework

2. The Anatomy of a Good Prompt

What a prompt really is

1. A clear task

2. Context

3. Constraints

4. Examples (when helpful)

5. Desired format

Putting it together

Start simple, then add

Homework

3. Core Technique 1: Roles, Instructions, and Format Control

Shaping the model with framing

Role and persona framing

Instruction clarity over politeness

Controlling format precisely

Order and emphasis

A quick checklist

Homework

4. Core Technique 2: Examples and Step-by-Step Reasoning

Teaching by showing and slowing down

Few-shot examples

Step-by-step reasoning

Decomposing hard tasks

When to use which

Homework

5. Core Technique 3: Iterating and Refining

The first answer is a draft

Refine, do not restart

Diagnose before you correct

Steering levers you can pull

Generate options, then converge

Know when to start fresh

Homework

6. Structuring Longer Work and Supplying Reference Material

Beyond one-shot questions

Give the model the facts it needs

Separate instructions from source clearly

Work in stages for long outputs

Mind the context budget

Carry state deliberately

Homework

7. Failure Modes and How to Verify Output

Trust, but verify

Fabrication (hallucination)

Outdated information

Confident wrongness and sycophancy

Subtle errors in structured work

Misreading or ignoring instructions

A verification habit

Homework

8. Responsible Use and an Everyday Workflow

Using AI well and ethically

Protect privacy

Attribute and stay honest

Do not over-trust

Watch for bias and harm

A practical everyday workflow

Homework

9. Agents, Tools, and Automated Pipelines

Introduction

What Makes a System an Agent?

Pipelines: Chaining Multiple AI Steps

Evaluating and Supervising Agentic Systems

Key Terms

Discussion Questions

Further Reading

Key Takeaways

Homework

10. Working with Code, Data, and Structured Outputs

Introduction

Prompting for Code: What Works and What Does Not

Working with Tabular Data and Analysis