Endpoints and Modes of Operation¶

AI Detection & Response provides two different endpoints that can be used to monitor LLM inputs and outputs: Prompt Analyzer and Proxy. The Proxy endpoint can be additionally configured to operate in reverse-proxy (unenriched) or forward-proxy (enriched) mode. Both the prompt analyzer and the proxy endpoints are available from the same containerized deployment, and both rely on the same algorithm to detect unwanted content. The difference between them is in the format of the response and how they are integrated into the customer’s architecture.

Prompt Analyzer¶

The Prompt Analyzer is a detection tool that checks whether AI inputs (prompts) and outputs are safe or malicious. It provides a simple "true/false" verdict on whether content is harmful, but does not enforce any policies or handle the backend. This tool operates “out-of-line,” so it does not take any action or perform the block itself. The blocking logic must be integrated into the application code.

This endpoint offers greater flexibility, as it can integrate with various architectures, like retrieval-augmented generation (RAG), and does not need OpenAI-specific formatting. It's a simpler tool for pure detection, without addressing risks such as LLM DDoS attacks or enforcing guardrails.

The architecture for this configuration could look something like this:

Proxy¶

The Proxy is deployed “in-line” and manages the backend LLM connection. It ensures that policies (e.g., to avoid specific behaviors or block content) are enforced. It requires the LLM output to be in the OpenAI API format, providing both detection and enforcement.

The Proxy is more complex as it governs the interaction between the user and the LLM, enforcing rules and supporting a narrower set of LLMs. For an overview of supported LLMs, see the appropriate section for LLM configuration in the AI Detection & Response Configuration Deployment page.

There are two types of operation for the proxy: reverse-proxy (“unenriched”) and forward-proxy (“enriched”). For both types, modifying the Generative AI Application call to the underlying LLM is a simple, repeatable pattern (see below).

Reverse-Proxy (unenriched behavior)¶

Response from the service mimics the response from OpenAI and does not add any additional fields. The HiddenLayer proxy is “invisible” to the application.

The endpoint to the LLM can be replaced in the existing Generative AI application as shown below with no further code changes required in the application, as the response format will be identical with or without HiddenLayer (unless it’s desired to add specific application behavior for when the message is blocked by the proxy).

The API key for the underlying LLM model can be passed in the request as a header value.

Example modified application endpoint: {HiddenLayer_proxy_container_endpoint}/api/v1/azure/{backend_LLM_endpoint}

Forward-Proxy (enriched behavior)¶

Response from the service contains the LLM response and is “enriched” with the HiddenLayer response.

Replacing the endpoint to call the LLM with the HiddenLayer endpoint (potentially) requires code changes to the frontend Generative AI application, as the response format is different.

API key for the underlying model must be set in the deployed AI Detection & Response container as an environment variable. See the Deployment Variables for the appropriate naming.

Example modified application endpoint: {HiddenLayer_proxy_container_endpoint}/api/v1/proxy/azure/{backend_LLM_endpoint}

Proxy Deployment Architecture¶

The architecture for a proxy deployment could look something like this:

Note that this diagram contains an added layer of API management. Due to the highly scalable nature of the AI Detection & Response solution, several applications may rely on the same proxy containers or the same backend LLM service providers. To allow for different policy sets to be applied, policies can be configured dynamically via header values when sending a request to AI Detection & Response. Default policy sets can also be configured globally for all applications within the containers themselves. Using an API management layer here offers the maximum degree of flexibility while carefully curating the AI Detection & Response response for widely varying requirements on an application level, or even a user level, directly at runtime, without the need to modify your application code or the underlying container configuration but can add overall complexity and may not be necessary for your environment. See the Deployment Variables for policy configuration information.

SaaS versus Self-Hosted¶

Prompt Analyzer¶

The prompt analyzer can be:

Reached as a SaaS endpoint connecting to the HiddenLayer console, or
Deployed as a self-hosted container on the customer infrastructure, and configured to run either in:
Hybrid mode (detections are sent to the HiddenLayer console for visualization and review by the security team). This is the default mode for deployment.
Disconnected mode (detections are logged on the customer infrastructure, but nothing is sent to HiddenLayer).

End User License Agreement

Deployment of the HiddenLayer containers or use of the SaaS endpoint is subject to the provisions in the End-User License Agreement (EULA) and HiddenLayer reserves the right to retain prompts and outputs for the improvement of our products. Prompts and outputs sent via the SaaS endpoint will be retained for use in product improvement; prompts and outputs from detections in hybrid mode will be retained unless this capability is deactivated in the container environment.

Proxy¶

Due to the necessity of configuring backend LLM connections in the container, HiddenLayer does not currently support operating the proxy as a SaaS offering. Therefore, the proxy:

Must be implemented as a self-hosted container on the customer infrastructure, and configured to run either in:
Hybrid mode (detections are sent to the HiddenLayer console for visualizaion and review by the security team unless explicitly configured otherwise). This is the default mode for deployment.
Disconnected mode (detections are logged on the customer infrastructure, but nothing is sent to HiddenLayer).

End User License Agreement

Deployment of the HiddenLayer containers is subject to the provisions in the End-User License Agreement (EULA) and HiddenLayer reserves the right to retain prompts and outputs for the improvement of our products. Prompts and the corresponding LLM outputs from detections in hybrid mode will be retained unless this capability is deactivated in the container environment.