Introduction to Generative AI with Ollama: Part 2

In recent years, Generative AI has moved from a "cloud-only" luxury to something you can run directly on your laptop. While web-based tools like ChatGPT are great for quick tasks, developers are increasingly moving toward Local LLMs to build their own applications. Why? Because local AI gives you three things a web browser can't: Privacy, Zero Latency, and Infinite Customization without a monthly subscription.

If you are a Python developer, the easiest way to bridge the gap between your code and these powerful models is Ollama. Ollama acts as a bridge, allowing you to manage and run massive models like Llama 3.2 or Mistral with a single command.

But how do you move beyond just "chatting" with an AI? How do you turn a raw model into a specialized application?

In this guide, we are going to build a Story Generation Engine using Python and Ollama. Instead of just showing you a single script, we will walk through the three stages of AI development maturity:

The Beginner: Sending simple prompts to get instant results.
The Intermediate: Using Modelfiles to bake personality and "vibe" into your own custom-named models.
The Advanced: Implementing Tool Calling (Function Calling) to turn your AI into a reasoning agent that can fetch external data before it writes a single word.

By the end of this post, you’ll understand not just how to prompt an AI, but how to architect a local AI system that thinks, researches, and creates on your terms.

The Beginner: Sending simple prompts to get instant results:

The goal of our first case is to get you up and running with the absolute minimum amount of code. At this level, we treat the AI like a high-speed text completer. You give it a prompt, and it gives you a response.

The Concept: `ollama.generate()`

In the Ollama Python library, the generate function is the simplest way to interact with a model. It doesn't require you to manage a conversation history or roles; it simply takes a string (your prompt) and returns the AI's output.

The Prerequisites

Before running the code, ensure you have the library installed and the model downloaded:

pip install ollama
ollama pull llama3.2

The Code

Create a file named simple_story.py (remember: don't name it ollama.py!) and paste the following:

import ollama

def create_simple_story(hero, setting):
    # We combine our variables into a single string (The Prompt)
    prompt_text = f"Write a 2-paragraph story about a {hero} in a {setting}."

    print(f"--- 🖋️ AI is thinking... ---")

    try:
        # We call the generate function 
        # model: the name of the model you downloaded
        # prompt: the instructions you want the AI to follow
        response = ollama.generate(model='phi3', prompt=prompt_text)

        # The response is a dictionary; we grab the 'response' key
        return response['response']

    except Exception as e:
        return f"Error: Is the Ollama app running? {e}"

# Example Usage
story = create_simple_story("clumsy wizard", "floating library")
print(story)

The result for this will be:

How This Works:

The Bridge: The ollama library acts as a bridge between your Python script and the Ollama service running in your system tray.
Zero-State: This script is "stateless." Every time you run it, the AI starts with a fresh mind. It doesn't remember the previous story you asked for.
The Output: The response object contains more than just text—it also includes metadata like how long it took to generate and which model was used. By accessing ['response'], we get the clean story text.

Now let’s move to next phase that is intermediate level code.

Case 2: The Intermediate – The "Modelfile" Specialist

In Case 1, we had to tell the AI it was a storyteller every single time we ran the script. This is repetitive and makes our Python code cluttered. At the Intermediate level, we use a Modelfile to "bake" those instructions directly into a custom model.

What is a Modelfile?

Think of a Modelfile as a Blueprint. It is a simple text file that tells Ollama how to set up a specific version of a model.

Step 1: Create the Modelfile

Create a new text file in your project folder and name it exactly Modelfile (no file extension). Paste the following in file:

# 1. Choose your base model
FROM llama3.2

# 2. Set the 'Temperature' (0.7-0.8 is great for creative writing)
PARAMETER temperature 0.8

# 3. Define the 'SYSTEM' Role (The AI's permanent job description)
SYSTEM """
You are a professional Noir novelist from the 1940s. 
Your stories are dark, atmospheric, and always feature 
a cynical detective and a rainy city. 
Use short, punchy sentences and heavy metaphors.
"""

Step 2: Build Your Custom Model:

Open your terminal in the same folder and run this command:

ollama create noir-writer -f Modelfile

llama will now "cook" a new model named noir-writer based on your instructions as shown below:

gayatrik@ Ollama_practical % ollama create noir-writer -f Modelfile
gathering model components 
using existing layer sha256:633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf 
using existing layer sha256:fa8235e5b48faca34e3ca98cf4f694ef08bd216d28b58071a1f85b1d50cb814d 
using existing layer sha256:542b217f179c7825eeb5bca3c77d2b75ed05bafbd3451d9188891a60a85337c6 
using existing layer sha256:9b5377593ac2bb48ee41d1420cb177c12fc9edfea87e6b8e942b6c9a69264931 
creating new layer sha256:6a803489c15b3f9f26360da8a673b57f22ff108f5866e65e956b255a762a8533 
writing manifest 
success

Step 3: The Python Code

Now, our Python code becomes much cleaner. We don't need to define the "vibe" or the "rules" anymore—the model already knows them! We will also switch to the ollama.chat() function, which is more structured.

import ollama

def generate_noir_mystery(case_subject):
    # We use the 'chat' function which uses structured Roles
    # Notice we don't need a 'system' role here; it's in the Modelfile!
    messages = [
        {'role': 'user', 'content': f"The case is about: {case_subject}"}
    ]

    try:
        # We call our CUSTOM model: 'noir-writer'
        response = ollama.chat(model='noir-writer', messages=messages)

        # In chat mode, the output is nested: ['message']['content']
        return response['message']['content']

    except Exception as e:
        return f"Error: {e}"

# Example Usage
print(generate_noir_mystery("A missing typewriter and a stolen shadow"))

The result for this is as below:

gayatrik Ollama_practical % python3 ollama_script_test.py         
It was another night in the rain-soaked metropolis where shadows twisted with secrets as easily as they did on wet cobblestones, casting their own tales. I'd seen my fair share of oddities through these murky eyes but nothing quite like this: a typewriter without its shadow to go 'clacking' in the dead silence that seemed more fitting for crime than living city streets.

The dame walked into my office—a femme fatale with a mystery darker than her mascara lines, speaking of an heirloom missing and only leaving behind echoes where shadows should have danced across walls if they'd been there at all. It was absurd as the notion that inanimate objects could steal something tangible or intangible; yet here I sat with a case colder than my whiskey, staring down an enigma wrapped tightly within its own riddle—a shadow gone missing from her life like it never belonged there to begin with.

I started by visiting where shadows play in the light: alleys and windowsills that seemed alive beneath layers of drizzle; I listened for echoes underneath hushed words, searching this place not just for a thief but an accomplice to vanishing as well—one whose darkness had slipped through fingers like sand or mist. The night was my ally in uncovering the obscured and overlooked truths hidden deep within shadows' folds; perhaps even their absence could speak volumes about this city of smoke and fog, where every lost shadow whispered tales to those willing enough listeners such as myself—a detective with a penchant for finding more than just fingerprint dust in the corners of life’s forgotten alleyways.

Why This is Better:

Separation of Concerns: Your Python code handles the Data (what the story is about), while the Modelfile handles the Logic/Style (how the story is told).
Consistency: Every time you use noir-writer, you get the same 1940s detective voice without fail.
Performance: Parameters like temperature are pre-set, so you don't have to pass them in every API call.

Intermediate Tip: The 3 Essential Roles in `ollama.chat()`

system:
- The Identity: Think of this as the "Base Instructions."
- In Case 2: We moved this into the Modelfile using the SYSTEM command. It tells the AI who it is (e.g., "You are a Noir Novelist") before the conversation even starts.
user:
- The Input: This is the human talking.
- In Case 2: This is where we pass the specific story topic (e.g., "The Case of the Missing Typewriter"). Every time you ask a question, it is a user role.
assistant:
- The Memory: This is the AI’s previous response.
- In Case 2: While our simple script only has one turn, in a real chat, you would save the AI's answer as an assistant message. This allows the AI to "remember" what it said in the previous paragraph.

Case 3: The Advanced Agent – Tools & Reasoning with Qwen3

At the advanced level, we stop treating the AI as just a writer and start treating it as an Agent. Instead of the AI "making up" facts, we give it Tools (Python functions) it can use to look up real information.

We will use the Qwen3:0.6b model for this case. Even though it is tiny, it is specifically optimized for Tool Calling and features a "Thinking Mode" where it reasons through a problem before acting.

Step 1: Define Your Python Tool

First, we write a standard Python function. We will give our AI a "Lore Database" to ensure it uses "official" character backstories.

def get_character_lore(character_name: str) -> str:
    """
    Retrieves the secret backstory lore for a specific character.
    Args:
        character_name: The name of the character to look up.
    """
    # In a real app, this might query a database.
    lore_db = {
        "Aria": "Aria was a royal guard who was exiled for discovering a forbidden spell.",
        "Kael": "Kael is a wanderer who lost his memory after a ship crash in the Void."
    }
    return lore_db.get(character_name, "No specific lore found. Feel free to invent a mysterious past.")

Step 2: The Tool-Enabled Story Code

The workflow for tools involves two steps:

First Call: The model decides which tool to use and gives you the parameters.

Second Call: You run the function in Python and send the result back to the model so it can finish the story.

 import ollama

 def get_character_lore(character_name: str) -> str:
     """
     Retrieves the secret backstory lore for a specific character.
     Args:
         character_name: The name of the character to look up.
     """
     # In a real app, this might query a database.
     lore_db = {
         "Aria": "Aria was a royal guard who was exiled for discovering a forbidden spell.",
         "Kael": "Kael is a wanderer who lost his memory after a ship crash in the Void."
     }
     return lore_db.get(character_name, "No specific lore found. Feel free to invent a mysterious past.")

 def generate_story_with_tools(character_name):
     # 1. Define the conversation
     messages = [{'role': 'user', 'content': f'Write a 2-paragraph story about {character_name}. Use the lore tool to get their background.'}]

     # 2. First call to Ollama with the tools list
     # Use a model that supports tools (like llama3.1, llama3.2, or qwen3)
     response = ollama.chat(
         model='Qwen3:0.6b',
         messages=messages,
         tools=[get_character_lore] # Pass the function directly
     )

     # 3. Check if the model wants to use a tool
     if response.message.tool_calls:
         for call in response.message.tool_calls:
             print(f"--- AI is calling tool: {call.function.name} ---")

             # Execute the actual Python function
             if call.function.name == 'get_character_lore':
                 lore_result = get_character_lore(**call.function.arguments)

                 # Add the tool's output back into the conversation history
                 messages.append(response.message) # Add the model's request
                 messages.append({'role': 'tool', 'content': lore_result}) # Add the tool's result

         # 4. Second call to get the final story using the new information
         final_response = ollama.chat(model='Qwen3:0.6b', messages=messages)
         return final_response.message.content

     return response.message.content

 # Run the story generator
 print(generate_story_with_tools("Aria"))

The result for this is like below:

 gayatrik Ollama_practical % python3 ollama_script_test.py
 --- AI is calling tool: get_character_lore ---
 Aria was a royal guard exiled for discovering a forbidden spell that could alter reality, a secret she guarded for generations. Born under the shadow of the throne, she was raised in secrecy, her mind shaped by the spell's power. Though exiled, Aria wielded the spell with quiet resolve, her past a tapestry of betrayal and redemption. As she navigates the fractured world, she balances her duties as a guard with the weight of her legacy.  

 In her twilight years, Aria’s exile has left an indelible mark on the kingdom. She now lives as a guardian of knowledge, a bridge between the old and the new. Though she’s no longer the exiled figure she once was, her spirit endures, a reminder of the magic she once wielded.

Why use Tools for stories?

Fact-Checking: Ensure the AI doesn't contradict your established world lore.
Dynamic Data: You could give the AI a get_weather_tool() or get_current_news_tool() to write stories set in the actual present day.
Automation: You could provide a save_to_file() tool so the AI can literally write the story and save the .txt file for you.

Important Note: Not all models support tools. Ensure you have pulled a compatible model like llama3.2 or llama3.1 using the Ollama Download Page before running this code.

Conclusion:

We have journeyed from a simple, single-line prompt to building a "thinking" agent that can query its own databases. This progression from Beginner to Advanced represents more than just a change in code—it represents a shift in how we build software.

By using Ollama and Python, you have unlocked a development environment where:

Privacy is Default: Your stories, lore, and data never leave your local machine.
Complexity is Accessible: Even a tiny 0.6B model like Qwen2.5 (or Llama 3.2) can perform complex "Reasoning" and "Tool Use" if provided with the right structure.
Cost is Zero: You are free to experiment, fail, and iterate without watching an API credit balance disappear.

Which level should you choose?

If you are just starting out or need a quick script, Case 1 is your best friend.
If you are building an application with a specific "brand" or "voice," use the Modelfile approach in Case 2.
If you want to build a truly smart assistant that interacts with the real world, dive into Tool Calling in Case 3.

The era of "Cloud-Only" AI is over. The power to create sophisticated, context-aware, and creative agents is now sitting right on your hard drive.

What will you build next? Will it be a sci-fi world-builder, a local code reviewer, or perhaps a personal assistant that knows your schedule? The tools are in your hands. Happy coding!

Introduction to Generative AI with Ollama: Part 2

The Beginner: Sending simple prompts to get instant results:

The Concept: `ollama.generate()`

The Prerequisites

The Code

How This Works:

Case 2: The Intermediate – The "Modelfile" Specialist

What is a Modelfile?

Step 1: Create the Modelfile

Step 2: Build Your Custom Model:

Step 3: The Python Code

Intermediate Tip: The 3 Essential Roles in `ollama.chat()`

Case 3: The Advanced Agent – Tools & Reasoning with Qwen3

Step 1: Define Your Python Tool

Step 2: The Tool-Enabled Story Code

Why use Tools for stories?

Conclusion:

Comments

Gen AI Beginner to Pro

The Evolution of Intelligence: From LSTMs to Reasoning Models

More from this blog

The Evolution of Intelligence: From LSTMs to Reasoning Models

Introduction to Generative AI with Ollama: Installing and Running Models Locally

Advanced - Python Developer Interview

Intermediate- Python Developer Interview - part 4

Command Palette

The Beginner: Sending simple prompts to get instant results:

The Concept: ollama.generate()

The Prerequisites

The Code

How This Works:

Case 2: The Intermediate – The "Modelfile" Specialist

What is a Modelfile?

Step 1: Create the Modelfile

Step 2: Build Your Custom Model:

Step 3: The Python Code

Intermediate Tip: The 3 Essential Roles in ollama.chat()

Case 3: The Advanced Agent – Tools & Reasoning with Qwen3

Step 1: Define Your Python Tool

Step 2: The Tool-Enabled Story Code

Why use Tools for stories?

Conclusion:

Comments

Gen AI Beginner to Pro

The Evolution of Intelligence: From LSTMs to Reasoning Models

More from this blog

The Concept: `ollama.generate()`

Intermediate Tip: The 3 Essential Roles in `ollama.chat()`