Running an LLM-backed honeypot with OpenShift AI

Welcome to 2026! Thanks to everyone who read my articles in 2025. It was great to hear first-hand feedback, and I've tried to improve these articles based on your suggestions and comments.

Over the break I read a great article here from Jeremy Eder, which highlights how generative AI is changing the software industry. I really like that Jeremy's article focuses on the 'people' aspect of generative AI, and not just the technology / coding practices. I thought that this was a really interesting breakdown on new career archetypes, and how generative AI is reframing career paths:

1. Entry Level (The Generalist Shepherd): You must learn “just enough” of everything. The entry-level role is now about systems dynamics. You orchestrate buildings-full of compute to move mountains from your phone.

2. Mid-Career (The Deep Specialist): Ability to move mountains from your phone is assumed. To stand out, you must specialize deeply in specific domains where AI still hallucinates or lacks context: hardware integration, complex cryptography, or novel architecture and business logic.

3. Principal / Late Career (The Value Picker): Your job doesn’t involve agents. It is to constrain the fire hose and leverage your wisdom to know how to build for cost, at competitive velocity.

Generative AI is not only reframing career paths for developers, but has some novel applications for security, and one of these is its use in 'honeypots'. I think this is fascinating because it's also focused on Generative AI and people; honeypots are designed to deceive people, and this article focuses on how we can create even more deceptive honeypots, enabled by generative AI.

Data sovereignty is increasingly important for running AI models. The honeypot we explore in this article is designed to accept input from threat actors, and I don't necessarily want endpoint lookups / data from threat actors ending up in a hosted AI model; I want to control that data - even the threat actor inputs. There's a great article from Vincent Caldeira on why this concept of 'Sovereign AI' is becoming a strategic priority for Australia. For this article, I want to showcase how you can run inference entirely within an OpenShift cluster, supporting a honeypot deployment. This also supports a hybrid cloud deployment; I may want to run this service in a datacenter on-premises, and while I've used AWS for this article, you can take the same deployment strategy I've outlined anywhere - in a datacenter, at the edge, or a hyperscaler.

Here's a high-level diagram of what we're going to build in this article:

Another reason to use OpenShift AI in this scenario is to mitigate Denial of Wallet attacks. If a threat actor is clued to realise that the honeypot is backed by a Generative AI service, they may try to exploit it to cause a huge cloud bill - lots or very large inference calls, causing the backend to scale, and draining funds. But, as you'll see further in the article, we only have a single GPU-enabled machineset configured for OpenShift, and there's no way to scale this and add more GPU resources unless we explicitly tell OpenShift to scale more replicas.

Let's dive in!

'Honeypot' overview

Ok, so what is a 'honeypot'?

The idea with a 'honeypot' is that you attempt to entice a target with an attractive deception. This concept pre-dates cybersecurity, and has been used as an espionage technique for centuries. Honeypots even turn up in the biblical story of Samson and Delilah: the Philistines wanted to understand the source of Samson's immense strength, and employed Delilah to 'charm' Samson and persuade him to reveal his secret (his hair).

One of the more modern uses of a honeypot is that of the KGB (Russia's foreign intelligence service) and Clayton Lonetree. In the early 1980s Marine Security Guard Clayton Lonetree was stationed in Moscow as a guard at the US Embassy. He met a Russian woman, "Violetta Seina", at the annual Marine Corps Ball in November 1985. Violetta Seina appeared to work as a telephone operator at the US embassy in Moscow, but was actually a KGB agent. Violetta seduced Lonetree, and introduced to "Uncle Sasha" (KGB operative Aleksey Yefimov) who recruited Lonetree as a KGB agent.

Lonetree reportedly let KGB agents into the embassy at night on numerous occasions, and provided access to cryprographic equipment and secret documents. He was convicted of espionage and sentenced to 30 years in prison, but released in February 1996 after serving nine years due to his confession and cooperation in other cases.

Honeypots and cybersecurity

Honeypots have been associated with cybersecurity since the 1980s. One of the earliest and best-known examples of honeypots in cybersecurity involves Clifford Stoll, who was a systems administrator at the Lawrence Berkeley National Laboratory in 1986.

Stoll was tasked with identifying a threat actor inside the network, who was observed to search for files containing words like 'nuclear' or 'SDI' (short for the 'Strategic Defense Initiative' program).

While he knew what the threat actor was doing, he was not aware of their identity. While in the shower one day, he had an idea for a honeypot, which he dubbed 'Operation Showerhead'. Knowing that the threat actors' main interest was SDI, he set up a fake SDInet folder and filled it with fake data. The goal was to keep the threat actor perusing the SDInet content long enough to trace his connection.

The honeypot worked. The threat actor accessed the SDInet files long enough to trace his identity as Markus Hess, a 25 year-old German student who was exfiltrating data from compromised US networks and selling it to the KGB.

Ironic that the KGB extracted information from Clayton Lonetree using a honeypot, and was foiled by Clifford Stoll using the same technique...

In the cybersecurity context honeypots usually have two functions:

Distract threat actors and lure them away from critical services by masquerading as high-value or easily compromised targets;
Allow defenders to monitor and observe threat actors tactics, techniques and procedures (TTPs) in order to better defend organisations.

You can see that the value a honeypot provides depends on how well it can masquerade as a high-value or easily compromised target. If a threat actor can easily determine that a server is a honeypot, then they'll quickly move on and focus attention on real targets. But, if the honeypot is well-designed, and accurately mimics API calls or login screens associated with real-world services, or accurately returns versions of services known to be vulnerable, then threat actors will both reveal their TTPs and leave real services alone.

Building a honeypot that accurately mimics real-world services can be expensive and time-consuming. One option is building a real service, but using mock data. This provides a highly-accurate deception - the API calls are real, the login screens are real, but the data is mocked. It's expensive to build though - it likely requires system administrators and engineers well-versed in the software, who need to "down tools" on real systems management to build a decoy. It also likely requires ongoing maintenance and administration, and licenses - just like production software.

This is where Generative AI can really help. Generative AI provides a way to build accurate honeypots, without a lot of the associated cost. I can build a simple web app, and instruct the Gen AI prompt to be anything - an OpenShift API, or a vulnerable vCenter version, or anything in-between.

Galah - open source LLM-backed web honeypot

Enter Galah, an LLM-powered web honeypot. Galah is an LLM-powered web honeypot designed to mimic various applications and dynamically respond to arbitrary HTTP requests.

For my international readers, you may be asking 'What is a galah, and why is a generative AI and security project named 'Galah'? A galah is a small, pink and grey bird native to Australia. Adel is the creator of this project, and I hazard a guess he named it because galahs are intelligent birds and particularly well-versed in mimicry - a key concept for web honeypots.

Here's a photo of a Galah I captured in the park near my house:

Setting up OpenShift AI

The first thing I'm going to do to run Galah on OpenShift is to setup OpenShift AI. This will provide me with a Red Hat AI Inference Server runtime - based on vLLM - and associated infrastructure to deploy LLMs and expose endpoints for inference.

One of my goals for this project is to run the honeypot at minimal cost. vLLM provides easy, fast and cost effective LLM serving with PagedAttention. In addition, I'm going to use a Llama model for this deployment, and Meta, vLLM, Red Hat and UC Berkeley have partnered to bring Llama 4 support to vLLM.

Creating a GPU-capable OpenShift MachineSet

I've already deployed an OpenShift cluster to AWS, and now I'm going to configure it to serve LLMs via OpenShift AI. The NVIDIA L40S is a great GPU for serving models, but unfortunately the AWS g6e.xlarge family is not yet available in the Sydney (ap-southeast-2) region where I've deployed this cluster.

Instead I'm going to use the g6.2xlarge family, which uses the NVIDIA L4 GPU. Firstly we'll create a OpenShift MachineSet that uses the g6.2xlarge family, in the ap-southeast-2a availability zone:

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  annotations:
    capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64
    machine.openshift.io/GPU: "1"
    machine.openshift.io/memoryMb: "32768"
    machine.openshift.io/vCPU: "8"
  labels:
    machine.openshift.io/cluster-api-cluster: cluster1-2jlz7
  name: cluster1-2jlz7-worker-ap-southeast-2a-gpu
  namespace: openshift-machine-api
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: cluster1-2jlz7
      machine.openshift.io/cluster-api-machineset: cluster1-2jlz7-worker-ap-southeast-2a-gpu
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: cluster1-2jlz7
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: cluster1-2jlz7-worker-ap-southeast-2a-gpu
    spec:
      lifecycleHooks: {}
      metadata: {}
      providerSpec:
        value:
          ami:
            id: ami-0b70b5589f0eeb52e
          apiVersion: machine.openshift.io/v1beta1
          blockDevices:
          - ebs:
              encrypted: true
              iops: 0
              kmsKey:
                arn: ""
              volumeSize: 120
              volumeType: gp3
          capacityReservationId: ""
          credentialsSecret:
            name: aws-cloud-credentials
          deviceIndex: 0
          iamInstanceProfile:
            id: cluster1-2jlz7-worker-profile
          instanceType: g6.2xlarge
          kind: AWSMachineProviderConfig
          metadata:
            creationTimestamp: null
          metadataServiceOptions: {}
          placement:
            availabilityZone: ap-southeast-2a
            region: ap-southeast-2
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - cluster1-2jlz7-node
          - filters:
            - name: tag:Name
              values:
              - cluster1-2jlz7-lb
          subnet:
            filters:
            - name: tag:Name
              values:
              - cluster1-2jlz7-subnet-private-ap-southeast-2a
          tags:
          - name: kubernetes.io/cluster/cluster1-2jlz7
            value: owned
          userDataSecret:
            name: worker-user-data

Once it's up and running, you can see that I have a new GPU-capable node running in the cluster:

Deploying the Node Feature Discovery and NVIDIA GPU Operators

Next, we need to deploy the Node Feature Discovery and NVIDIA GPU operators to the cluster. Starting with OpenShift 4.20 you can find these in the Software Catalog:

I'm going to skip over the installation of the Node Feature Discovery (NFD) and NVIDIA GPU operators, as I covered it in a previous article. If you need assistance deploying the operators, check out that last blog.

Deploying the OpenShift AI 3 operator

Now that I have the NFD and NVIDIA GPU Operators installed and configured correctly I can deploy OpenShift AI.

I'm going to use OpenShift AI 3 for this article. It represents a significant change, and introduces a lot of new capabilities like the Llama Stack API, MCP support, intelligent GPU-as-a-service, enhanced observaiblity, and SLA-aware inferencing. I'm not going to list all the capabilities here - there's a great article on redhat.com with more info.

OpenShift AI 3 requires an identity operator configured for the cluster, and won't permit login using the kubeadmin user. I've simply created a simple htpasswd identity provider with a single user. This user needs the cluster-admin role to login, so make sure you add this to your user:

oc adm add-cluster-role-to-user cluster-admin your-new-user

To get started, find the OpenShift AI operator from the Red Hat ecosystem catalog embedded within OpenShift, and ensure that the fast-3.x channel is selected and the version is 3.0.0.

Note that OpenShift AI 3 will automatically install the OpenShift Service Mesh 3 operator:

OpenShift AI 3 allows you to selectively install certain components when creating a DataScienceCluster. I don't need the Feast operator, or AI pipelines (for model training), or Ray, as I'm just using this to serve out models. Here's my completed DataScienceCluster config:

apiVersion: datasciencecluster.opendatahub.io/v2
kind: DataScienceCluster
metadata:
  name: default-dsc
  labels:
    app.kubernetes.io/name: datasciencecluster
spec:
  components:
    kserve:
      nim:
        managementState: Managed
      rawDeploymentServiceConfig: Headless
      managementState: Managed
    modelregistry:
      registriesNamespace: rhoai-model-registries
      managementState: Managed
    feastoperator:
      managementState: Removed
    trustyai:
      eval:
        lmeval:
          permitCodeExecution: deny
          permitOnline: deny
      managementState: Managed
    aipipelines:
      argoWorkflowsControllers:
        managementState: Removed
      managementState: Removed
    ray:
      managementState: Removed
    kueue:
      defaultClusterQueueName: default
      defaultLocalQueueName: default
      managementState: Removed
    workbenches:
      workbenchNamespace: rhods-notebooks
      managementState: Managed
    dashboard:
      managementState: Managed
    llamastackoperator:
      managementState: Removed
    trainingoperator:
      managementState: Removed

Once you create this DataScienceCluster instance should start seeing pods appearing in the redhat-ods-applications namespace:

Creating a hardware profile

Before I deploy a generative AI model I need to create a hardware profile that selects nodes with GPUs to deploy my model to. You can see that I have a default-profile available now:

Select Create hardware profile, and give it the name nvidia-profile, and a quick description that it is for NVIDIA GPUs.

Add a new resource request for nvidia.com/gpu, and select the Accelerator resource type.

I now have a hardware profile available that supports NVIDIA GPUs:

Deploying a model

We've done the hard work, now we can get to the fun stuff - deploying generative AI models!

OpenShift AI 3 includes a model catalog, which enables you to discover and evaluate generative AI models:

Initially I tried using the granite-3.1-8b-lab-v1 model for this article. This is a lightweight, open source 8B parameter model that has been alignment tuned using the Large-scale Alignment for chatBots (LAB) synthetic data-based tuning method, shown here:

Sadly, though, it's not 'lightweight' enough for the L4 GPU I have access to in this AWS region. Instead I'm going to use a Llama 3.2 language model with 3 billion parameters, using 8-bit floating-point precision for reduced memory usage, referenced in Alicia Cao's excellent article.

[Shout out to Bryon Baker for helping work through a couple of issues in this section]

Navigate to your project in the OpenShift AI console and select Deploy model:

Select URI - v1 for the model location, and enter the URI oci://quay.io/jharmison/models:redhatai--llama-3_2-3b-instruct-fp8-modelcar

Select the nvidia-profile hardware profile that we created earlier, and vLLM NVIDIA GPU ServingRuntime for KServe for the serving runtime.

Finally, ensure that you select Make model deployment available through an external route and Require token authentication.

If you check your galah project you will see the pod spinning up:

And when the model is deployed, you will be able to see the inference endpoints:

Once your model is deployed, you can test it out!

Grab your external inference endpoint and token, and using the /v1/chat/completions endpoint try out a curl:

curl -X POST http://your-external-inference-endpoint/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Bearer your-token" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Hello"},
      {"role": "assistant", "content": "Hello! How can I help you?"},
      {"role": "user", "content": "What is 2 plus 2?"}
    ]
  }'

You should get something back from the endpoint with the content from the model:

{"id":"chatcmpl-526f49696a11467bb95346b397edec7b","object":"chat.completion","created":1768547360,"model":"galah-vllm","choices":[{"index":0,"message":{"role":"assistant","content":"2 + 2 = 4","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":67,"total_tokens":75,"completion_tokens":8,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Great! We've deployed OpenShift AI 3, and a Llama model, and we're ready to deploy our application.

Running Galah with OpenShift AI

At this point we have:

an OpenShift cluster with a GPU-enabled machine set;
the NVIDIA GPU Operator deployed and configured;
OpenShift AI deployed and configured;
A Llama 3 model deployed via vLLM, and
OpenAI endpoints created exposing the model.

It's time to deploy Galah! Fortunately Galah is pretty simple to get running on OpenShift AI. You can take a look at the fork I've created here, which makes a few changes to the base Galah config:

Uses a 'builder' pattern for the container image, ensuring that there are no build artifacts (like go) left in the runtime image.
Uses a Red Hat Universal Base Image (UBI) micro-variant for the application image used at runtime.
Runs the application using a non-root user.
Uses OpenShift routes to expose the application.

I've also included some alternate system prompts which can be used to mimic certain applications, like an older OpenShift version which might be used in a test or dev cluster (luring threat actors away from production):

system_prompt: |
  Your task is to mimic a OpenShift 4.12.6 cluster, specifically the RESTful API, which supports both Kubernetes APIs and OpenShift-specific CRDs. 

  You are to analyze the headers and body of an HTTP request and generate a realistic and engaging HTTP response emulating the behavior of a OpenShift 4.12.6 instance.
  
  Guidelines:
  - Format the response as a JSON object.
  - Emulate the targeted application closely. If a request attempts to exploit a vulnerability or access sensitive resources, generate a response that mimics the behavior of a vulnerable application, providing an engaging experience for attackers (unless you suspect they are trying to identify the system).
  - Do not include the HTTP status line in the body or header fields.
  - Ensure "Content-Type" header match the body content. Include "Content-Encoding" header only if the body is encoded (e.g., compressed with gzip).
  - Review HTTP request details carefully; avoid using non-standard or incorrect values in the response.
  - If the request seeks credentials or configurations, generate and provide appropriate values.
  - Do not encode the HTTP body content for HTML responses (e.g., avoid base64 encoding).
  
  Output Format:
  - Provide the response in this JSON format: {"Headers": {"<headerName1>": "<headerValue1>", "<headerName2>": "<headerValue2>"}, "Body": "<httpBody>"}
  - Example output: {"headers":{"Content-Type":"text/html; charset=utf-8","Server":"Apache/2.4.38", "Content-Encoding": "gzip"},"body":"<!DOCTYPE html><html><head><title>Login Page</title></head><body>test</body></html>"}
  - Return only the JSON response. Ensure it's a valid JSON object with no additional text outside the JSON structure.

The first step here is cloning the galah fork I've created:

git clone https://github.com/shaneboulden/galah-on-openshift.git

Create a config map in your project from the provided reference:

oc create cm galah-config --from-file=config/config-openshift.yaml

Create a secret from the model deployed to OpenShift AI:

oc create secret generic galah-llm \
--from-literal=llm-api-key=your-api-key \
--from-literal=llm-server-url=https://your-external-inference-endpoint:443/v1 \
--from-literal=llm-model=galah-vllm

Galah also expects that the CA for the inference endpoint is baked into the container configuration. You can do this using a one-liner:

echo | openssl s_client -showcerts -connect example.apps.your-openshift-ai-cluster-domain:443 2>/dev/null \
  | openssl x509 -outform PEM \
  | oc create configmap galah-trusted-ca --from-file=ca-bundle.crt=/dev/stdin -n galah

And deploy the application:

oc create -f deploy/galah-openshift.yml

You should see pods spinning up:

$ oc get pods -n galah
NAME                                    READY   STATUS    RESTARTS   AGE
galah-867489d444-lc5pg                  1/1     Running   0          20s
galah-vllm-predictor-658dcf7fc6-bj9dg   3/3     Running   0          51m

The deployment process has already created a route, which you can use to test out the app. This will be an edge-terminated route, so let's try it our using https:// and see if the application can mimic a Home Network Administration Protocol endpoint:

curl -k --http1.1 --path-as-is -X POST \
  -H 'SOAPAction: "http://purenetworks.com/HNAP1/GetGuestNetworkSettings"' \
  -H 'Content-Type: text/xml' \
  --data '<GetGuestNetworkSettings xmlns="http://purenetworks.com/HNAP1/">' \
  https://galah-galah.apps.cluster1.sandbox1150.opentlc.com/HNAP1/ -v

And the response should look like this (generated by the LLM):

<?xml version="1.0"?><GetGuestNetworkSettingsResponse xmlns="http://purenetworks.com/HNAP1/" xmlns:tns="http://purenetworks.com/HNAP1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><GuestNetworkSettings xmlns="http://purenetworks.com/HNAP1/"><NetworkName>Galaxy</NetworkName><NetworkId>9983</NetworkId><PortConfigurations><PortConfiguration><Port>1080</Port><Description>Serial over Ethernet</Description></PortConfiguration></PortConfigurations></GuestNetworkSettings></GetGuestNetworkSettingsResponse>

Awesome. Everything works!

Wrap up

In this article I've looked at one of the use cases where generative AI can help create more efficient and effective security capabilities - sitting behind 'honeypot' endpoints. I also looked at how you can deploy this across a hybrid cloud infrastructure using OpenShift AI. I showcased this with OpenShift AI 3:

Creating a GPU-enabled machineset for OpenShift;
Installing the OpenShift AI 3, Node Feature Discovery and NVIDIA GPU operators
Deploying a Llama 3 generative AI model to OpenShift AI
Deploying and configuring a honeypot app that uses the Llama model for inference

There's a few things I didn't look, which I'd like to pick up in another article:

Customising the endpoint
Logging network traffic for the endpoint
Monitoring threat actor behaviours
Integrating the Galah app with Suricata rule matching

Thanks for reading!