In the rapidly evolving landscape of artificial intelligence (AI), organizations face a critical challenge juggling between the need for flexibility in response generation and autonomous agents behaviour and the imperative for robust security and regulatory compliance. With the increasing adoption of AI-powered applications, organizations need to ensure that AI models are accessible to authorized users only while at the same time safeguarding sensitive data.
The AI Problem
There are several aspects to consider when securing AI applications and agents. The first step is a proper understanding of the problem. For all its prowess, state-of-the-art AI and Large Language Models (LLM), the cornerstone of today’s AI revolution, still can’t be trusted to always provide accurate answers. Even when grounded through domain-specific facts and data, LLMs still suffer from many unresolved issues, which include hallucinations, biased training data, over exposure of confidential or sensitive data or over permissive access to internal systems. Yet in order for these AI systems to be truly useful, they need some access to organizations' internal systems and data; the art therefore lies in determining the threshold between what is and is not authorized.
Furthermore, AI Agents are now equipped with new tools that make their behaviour really hard to distinguish from that of humans. Recent new products such as OpenAI Operator, Anthropic Computer Use or BrowserBase Open Operator, all in their early versions at this time, seem to show us a very near future where it will be considerably harder to determine if requests originate from real users or intelligent bots. Yet this distinction is critical in many cases, as the good old “I am not a robot” checkboxes have shown for years now. These AI agents, armed with embedded Chromium browser clients, can already browse the Web exactly like humans do, and click those pesky human-test checkboxes themselves. Organizations willing to deploy AI Agents for their workforce or clients therefore face a brand new conundrum: how much data should an agent be able to retrieve and view in a company’s database or internal systems? And how to tell whether a bot of a human is trying to get access?
As we can see, LLMs, and AI in general, can’t currently be trusted. It also seems logical to infer that the more intelligent our AIs become, the less trustworthy they will be… As some research suggests, AI can already deceive humans on purpose in order to achieve its stated goals. And besides, AI doesn’t have any incentives for protecting any data or information from the prompting user or to restrict its agentic actions, unless it is trained that way (and even then, the hallucinations problem remains). As various real examples have shown, AI can not only just leak sensitive data, perform unexpected actions but can also be the victim of clever prompt attacks.
Indykite's AI Control Suite: Striking the Balance Between Flexibility and Security
Our solution to these challenges is the Indykite AI Control Suite, a platform for complete AI protection that we’ve been hard at work designing and building. By incorporating RAG Protection, Agentic AI access control and data trust scoring, the suite enables organizations to achieve optimal security without compromising the flexibility required for effective AI operations.
It starts with the data
Whether considering Retrieval Augmentation Generation (RAG) applications or Agentic Workflows, it all starts with the data. An organization will have to prepare and package the data relevant to its AI initiative in such a way that it can be efficiently used at runtime by Agents and RAGs alike. And the best way to expose data to AI is to present it as knowledge. Knowledge is data that has meaning, data with a semantic context that can be used for semantic reasoning and semantic querying.
The Indykite platform enables the massive ingestion of data, from various sources, to then build a knowledge graph that can be used as this central repository of knowledge. This data capture phase provides precious additional benefits: it can match data that is semantically related, and also provide a trust score for any data point. Indykite's AI Control Suite’s TrustScore component thus assesses the trustworthiness of its knowledge. It considers various factors such as data origin, freshness, validity and completeness to compute an overall score that can then be used by the AI to respond with the best and most accurate data possible. As a beneficial side-effect, this can also help organizations clean-up their knowledge base… The Indikyte Knowledge Graph (IKG) also references the sources of any data it has ingested; this later enables the RAGs to provide its sources, greatly facilitating the explainability of its responses.
It then proceeds with authorization
The crux of the matter is Authorization: ensuring that the AI component, when using this (now clean) knowledge-base, only considers the data that the prompting user is actually entitled to access. In the case of (semi) autonomous AI Agents, the challenge is to ensure that they can only perform the tasks they are actually expected to perform given the current context, and not more.
This is where a distinction needs to be made between RAG applications and Agentic Workflows, because their authorization requirements are slightly different.
RAG applications need access to domain-specific grounding data, typically stored in the implementing organization’s own database(s). The problem here is that of data access control, along with delegation, as data access should be made on behalf of the prompting user. RAGs also have no standing privilege, they should not be authorized to perform any action when not prompted.The RagProtect component of our AI Control suite uses an authorized smart query backend system that ensures the LLM will only ever see the data that the user is entitled to see. Some other RagProtect features we’ve carefully designed include:
- Prompt protection and filtering
- Semantic reasoning and searching
- Dynamic fine-grained data access policies
- Response sanitation and data obfuscation through configuration
AI Agents on the other hand should be able to perform all kinds of actions on their own, such as monitoring message queues, planning and executing searches, invoking other agents through workflows, running system or Cloud commands, exploring the web, etc. In that sense, their behaviour is very similar to that of any identity in the IAM sense, but Identities which access rights vary depending on the context. If acting on behalf of a user, their access rights then also should reflect the entitlements of the calling user. In that case, the identity of the initiator needs to trickle-down securely from agent to agent throughout the whole the Agentic Workflow. On the other hand, independent AI Agents should be authorized like regular identities when acting of their own accord. It is therefore imperative to be able to distinguish between human and machine identities, as the access policies will very likely differ from one to another. We’ve therefore armed our AgentControl component with features such as:
- Human/machine differentiation through authentication Level Of Assurance checks and cryptographic device fingerprinting support,
- Identity Continuity throughout the whole Agentic Workflow
- Dynamic fine-grained access policies for all accesses
- Open-Standards compliant and authorized out-of-the-box APIs
Given all these problems and the featured solutions we included in our AI Control Suite platform, it provides a complete solution for the modern enterprise, one that encompasses all state-of-the-art techniques and standards to meet the challenge.
Explore the AI Control Suite here.