April 15, 2025
What You Should Know: The AI Agent Index

Contents
In February of this year, an MIT-affiliated research group proposed the AI Agent Index (AIAI), the first available framework and database specifically designed to document the details of agentic AI systems. The development and release of this index marks a crucial milestone toward managing agent-related risks, transparently and understandably communicating these systems’ characteristics, and enabling key stakeholders—AI governance practitioners, policymakers, developers, consumers, and industry—to gain visibility into how AI agents function and evolve.
While AIAI is the first of its kind, it was not born out of “thin air”—there are other notable AI databases that preceded it and serve similar functions:
-
AI Risk Repository: Developed by MIT, the AI Risk Repository is a “living” resource that documents over 1000 AI risks, along with their causes and categorization domains.
-
Foundation Model Transparency Index (FMTI): Developed in collaboration with researchers from MIT, Princeton, and Stanford to promote transparency among foundation model developers, the Foundation Model Transparency Index captures model development and deployment details, system characteristics, and related downstream impacts.
-
AI Incident Database: Launched in 2020 by the Partnership on AI, the AI Incident Database records real-world harms caused by AI systems, allowing AI practitioners from around the world to directly submit incident reports.
-
OECD AI Policy Database: An AI policy-centric database that tracks and logs AI governance-related changes and advancements globally.
-
AI Vulnerability Database: The AI Vulnerability Database provides open-source access to an AI taxonomy that chronicles different kinds of failure modes along with documented examples of AI failures.
Like its counterparts, AIAI is poised to become an increasingly valuable and dynamic resource and knowledge hub—a hub that could offer actionable and ongoing agentic AI insights to a wide audience, ranging from developers to policymakers. While other similarly fashioned AI databases/repositories will likely begin to emerge, we do expect that AIAI, due to its targeted scope and accessibility, will influence—if not set—global standards for AI agent documentation, risk reporting, and advancement tracking.
Before comprehensively breaking down AIAI and analyzing its contents, we note one major takeaway, echoed by other prominent knowledge bases like FMTI: while most AI developers readily provide documentation on system details and characteristics, a startlingly small minority disclose information on safety testing and risk management, that is, if they even have it. This raises serious concerns for anyone meaningfully engaged in AI safety, ethics, and governance, particularly as large-scale agentic AI deployments intensify and proliferate over the coming months to years.
In this post, we’ll begin by summarizing AIAI, examining the structure and scope of the index as well as key research findings. Next, we’ll critically analyze AIAI to reveal potential avenues for improvement, after which we’ll conclude with near-term and long-term AI agent-specific recommendations for policymakers and RAI practitioners.
Note: For readers with a deeper interest in this topic, we strongly advise reading the AIAI paper directly—research findings are clearly, concisely, and comprehensibly communicated, living up to the transparency standard that AIAI establishes.
AI Agent Index: Summary
AIAI was created to address several key questions that remain largely unanswered and unexplored—questions that, if vigorously investigated, could inspire wide-ranging implications for AI governance and safety. These questions address five core areas:
1. AI Agent Origins: Who/what developed the system?
2. AI Agent Applications: Where/how is the system intended to be used and for what purpose?
3. AI Agent Infrastructure: What kind of infrastructure does the system require for safe and effective deployment?
4. AI Agent Safety & Performance: Has the system undergone testing, evaluation, validation, and verification (TEVV) for performance and safety?
5. AI Agent Guardrails: What measures, protocols, and parameters exist to prevent the system from committing or pursuing harmful outcomes/actions?
To address these questions as holistically as possible, AIAI establishes the following documentation requirements, which can be divided into six domains (for reference, see the “Sample Agent Card” in the final section of the paper):
1. Basic Information: Includes a link for accessing agent information, a brief description of the agent’s capabilities, an intended use/purpose statement, and deployment dates.
2. Developer/Deployer Details: Includes a link to the developer’s website, the company’s legal denomination and type (e.g., corporation), country of origin, and existing safety policies.
3. System Components: Provides details on the model used to power the system (e.g., GPT-4), documentation on its intended use and purpose, information on the system’s reasoning, planning, and memory functions, what kinds of data sources and tools it can access, user-interface design, and development/compute costs.
4. Guardrails & Oversight: Considers whether model weights, data, code, scaffolding, and documentation are available while documenting guardrails and controls, monitoring and shutdown protocols, and customer/usage restrictions.
5. Evaluations: Provides information on whether the model has undergone benchmarking and pre-deployment testing, including safety testing, red-teaming, and auditing.
6. Ecosystem: Provides information on infrastructure interoperability and usage trends and patterns.
Agentic systems included in AIAI were indexed according to this structure—in total, the index includes 67 AI agents, either actively deployed or open-sourced, all of which were added by or before December 31, 2024. Many systems were, however, excluded from the index, provided they met the following criteria:
- Non-agentic model or development framework.
- Unnamed system or a system that can’t perform a wide variety of tasks with limited guidance.
- Any model with “less” agency than ChatGPT.
- Non-competitive, off-the-shelf open-source system.
- Not a product or open-source model.
- Developed/deployed after December 31, 2024.
Given the turbulent, ongoing debate about what qualifies agency in both AI and humans, the team behind AIAI deliberately chose not to weigh in and operationalize a concrete definition for AI agents. Instead, they characterize AI agents broadly, according to a set of four empirically motivated properties typically shared among entities with some degree of agency—these properties, along with the exclusions previously mentioned, define index eligibility criteria:
1. Goal-Directedness: The system can engage in goal-oriented behavior.
2. Directness of Impact: The system can produce real-world impacts with minimal human intervention.
3. Underspecification: The system can achieve a goal with limited guidance.
4. Long-Term Planning: The system can develop and orchestrate long-term problem-solving plans.
AIAI also categorizes indexed systems according to six main categories:
1. Software: Agents built for software development and coding.
2. Computer Use: Agents designed to autonomously interact with computer interfaces.
3. Research: Agents designed for scientific research.
4. Universal: General-purpose reasoning agents.
5. Robotics: Agents designed to control robotic systems.
6. Other: Agents developed for niche application areas.
While AIAI avoids concrete agentic AI definitions, it does offer a computational description: systems that leverage foundation models, enhanced by “scaffolding” (i.e., integration with external resources), to engage in planning (sequential actions enabled through chain-of-thought reasoning), tool use (API calls and intersystem communication), and memory functions (leveraging internally or externally stored information). However, we advise readers not to interpret this computational description as a definition—given their exponential progression, AI agents’ computational characteristics will likely evolve significantly, especially in the near-term.
As for key findings, AIAI highlights several interesting and impactful trends, particularly for the AI safety and governance landscape:
-
The rate of AI agent proliferation is accelerating. From May 2024 to December 2024, the number of agentic AI products/open-source models more than doubled.
-
The US is dominating agentic AI development, followed by China, the UK, and Israel. In fact, 67% of agents indexed in AIAI were developed by US-based companies, compared to 12% for Chinese developers, and 5% for the UK.
-
Industry has a stronghold on agentic AI development. However, a significant portion (26.9%) of AI agents come from academic institutions and are developed predominantly for research purposes.
-
Most AI agents (74.6%) are purpose-built for software development and computer use. Still, AI agents serve many functions, including research, general-purpose reasoning, robotics, and niche applications.
-
Most AI agents indexed in AIAI are accompanied by developer-provided documentation. Roughly 70% of developers provide documentation and approximately 50% provide code—almost 90% of academic developers provide code, indicating a significantly higher level of transparency in academia.
-
A stark minority of developers readily share public information on agentic AI safety testing and risk management. Only 7.5% of developers report external safety evaluations whereas only 19.4% provide a formal safety policy.
Building on these key findings, AIAI offers a few essential key takeaways:
-
AI Agents aren’t easy to document, a phenomenon that will likely remain unresolved for the foreseeable future because of poor documentation standards and decentralized innovation ecosystems. Fortunately, agents developed in academia are easier to document due to their simpler and more transparent nature.
-
The breadth and depth of future agentic AI documentation should be well-scoped. Future documentation efforts should consider the larger socio-technical, ethical, and governance implications of widespread agentic AI deployments.
-
Agentic AI documentation will play a central role in AI governance and policymaking. Key points to consider include:
- Incentives of industry vs. academia.
- US-based innovation dominance.
- The majority of AI agents are designed for software development/computer use.
- Lack of safety and risk management documentation.
Finally, AIAI also proposes some general advice for policymakers:
-
Provide red-teaming incentives to bolster future-oriented vulnerability testing.
-
Establish governance-academia partnerships for systematic safety testing and risk management.
-
Develop and implement centralized AI agent indexes/repositories similar to AIAI.
-
Integrate AI agent indexes with other AI indexes, databases, or repositories.
AI Agent Index: Analysis
In terms of AIAI’s surface-level limitations, there are several notable considerations, most of which are openly acknowledged by the team behind the initiative:
-
The lack of a standardized AI agent definition could complexify or undermine eligibility criteria for indexing. Properties-based eligibility may be too broad, failing to account for nuanced differences between specific systems or future agentic AI developments. By contrast, it may also allow developers to game the index, developing systems or documenting details in ways that don’t conclusively satisfy or circumvent eligibility criteria.
-
AIAI includes only 67 agentic systems—while this shouldn’t be scrutinized too harshly given AIAI’s novelty, it’s fair to question whether 67 indexed systems suffice in revealing generalizable agentic AI trends. As the index is updated to accommodate agentic AI advancements, we expect this number will rapidly increase, especially over the next few months.
-
Speaking to the previous point, AIAI only includes systems deployed or open-sourced before December 31, 2024. This introduces a significant concern for the index’s continued relevance and applicability—advancements in the AI landscape occur exponentially, and major changes can materialize on a weekly basis. Fortunately, AIAI’s creators are aware of this problem and have devised a simple strategy for continued updates.
-
Not all agentic AI systems indexed in AIAI are comprehensively documented—the most profound oversight area concerns developer-provided details on safety testing and risk management. Here too, however, we can’t criticize the AIAI team since they directly contacted developers for additional information and feedback, sadly receiving a response rate of only 36%.
-
AIAI only includes information on AI agents that offer a) publicly available documentation, and b) are documented in the English language. This means that AIAI may fail to a) capture details on internal agentic AI deployments, and b) include agentic AI systems that aren’t documented in English.
Beyond these relatively obvious limitations, there are more nuanced points to consider. First, AIAI doesn’t offer any opportunity to report and document real-world AI incidents linked to specific AI agents. Even if most developers transparently administered vigorous safety evaluations and established robust risk management practices, there’s no guarantee that their systems, once deployed in dynamic, real-world environments, wouldn’t undergo any unforeseen shifts in their risk profile. This concern is particularly relevant for dual-use and misuse risks, which can be extremely difficult to predict, typically becoming most apparent during post-deployment stages. Relatedly, AIAI doesn’t include any documentation avenues for:
-
Impact assessment and reporting, especially across core domains like critical decision-making (e.g., discrimination) and human-AI interaction risks (e.g., overreliance).
-
System recall and remediation procedures—protocols for promptly removing a system from active use or implementing corrective actions when AI incidents occur.
-
System updates and modifications—any changes made to the system or its parts after it’s deployed.
-
Compliance with existing standards and regulations, such as the GDPR, CCPA, EU AI Act, SOC Type II, ISO 27001 and 42001, and the NIST AI RMF.
-
Governance measures and best practices, notably application-specific mechanisms for maintaining accountability, fairness, and transparency.
Second, some of the categories—software, computer use, robotics—AIAI leverages to classify AI agents are fairly targeted while others—research, universal, and other—are vague and non-descript. This could inspire shortcomings, which include:
-
Trend Analysis Limitations: The index may struggle to reveal meaningful trends across broad categories (e.g., universal, research), which could complexify downstream governance, risk management, and policymaking efforts as well as targeted AI advancement tracking.
-
Categorical Applicability Concerns: Broad categories are difficult to meaningfully evaluate and assign—what happens when an AI agent is designed for general-purpose reasoning but can be used effectively for research or automated lead generation?
-
AI Agents vs. Multi-Agent Systems: The index doesn’t explicitly differentiate between single AI agents and multi-agent systems, failing to highlight a distinction that will become progressively more crucial as agentic AI deployments scale and integrate with increasingly complex systems.
-
Design-Centric Categorization: The categories AIAI employs classify systems by design function, not by risk profile and/or real-world deployment context, which could unintentionally limit the value the index delivers for AI safety practitioners. This also deviates from standards set by regulations like the EU AI Act, which classify AI systems according to a tiered risk classification structure.
Third, AIAI doesn’t provide documentation opportunities for potential interoperability risks and vulnerabilities—while it will be challenging to develop practical and interpretable documentation methods for this purpose, doing so is crucial for several reasons:
-
Collaboration & Communication: Multidisciplinary collaboration and communication are essential for mitigating complex, interconnected AI risks that can transcend internal organizational boundaries.
-
Complexity & Systemic Risk: Seemingly isolated vulnerabilities in individual agents and multi-agent systems could cascade into systemic risk scenarios, especially when AI agents are parts of broader socio-technical ecosystems.
-
AI Governance & Regulation: Regulators require comprehensive documentation to assess compliance with safety and interoperability standards—without it, oversight can become reactive, diminishing effective governance.
-
Frontier AI Risk Preparedness: Frontier AI systems will someday consist of multiple intelligent agents interacting with other AIs and humans across dynamic environments. While isolated AI agent behavior might be predictable and well-understood, interactions within multi-agent systems may create unexpected emergent behaviors and complex, nonlinear risks.
Fourth, while AIAI’s documentation method and structure are easily understood, it could be wise to consider customizing agent cards for specific audiences. This would admittedly be a labor-intensive process, but it could dramatically enhance the pragmatic value that the index delivers while decomplexifying long-run documentation efforts. For example, certain stakeholders might only need access to certain kinds of agentic AI details—consumers may only require basic information and developer/deployer details whereas policymakers might prioritize guardrails and oversight, safety evaluations, and ecosystem information.
Despite the possible avenues for improvement we’ve highlighted, AIAI represents a clear step in the right direction—most of the limitations we’ve examined could be resolved with minimal to moderate effort and revision. Even if AIAI doesn’t undergo any substantial improvements or changes, we’d still consider it to be a high-value and influential asset for all the stakeholders it targets. Its key strengths include:
-
Transparency & Accessibility: It’s difficult to deny that any stakeholder group, from the general public to developers, would be unable to comprehend the information listed in agent cards.
-
Comprehensiveness: While there’s always room for more detail, AIAI does a great job of documenting the most important characteristics of agentic AI systems, and we expect that as AI agent deployments accelerate and scale, AIAI will be revised to include further information where relevant.
-
Future-Oriented: AIAI is proactive by default—it isn’t designed to be a static database and was launched before agentic AI deployments reached unmanageable scales. Structurally, it’s also well-aligned with other similar AI safety databases, indexes, and repositories as well as documentation standards, suggesting that it was built for interoperability, centralization, and cross-domain integration.
Recommendations & Conclusion
Below, we provide two sets of recommendations for AI policymakers and RAI practitioners, divided across two timeline categories: near-term (up to 5 years from now) and long-term (5 years or more from now).
With these recommendations, we hope to afford readers unconventional yet actionable guidance for anticipating and managing the risks and benefits of agentic AI systems, both today and in the future.
Near-Term Recommendations
-
Mandatory Identity Verification & Attribution: Design universal AI agent registries that assign unique identifiers to publicly deployed AI agents. Policymakers should also establish enforceable mandates for disclosing AI agent identities in all digital and physical interaction domains—where possible, AI agent watermarking should be considered.
-
Designate Autonomy Levels & Intervention Triggers: Establish and implement “fail-safe” shutdown procedures for AI agents—these fail-safes should escalate actions to human reviewers when anomalous or unsafe behaviors are detected. Policymakers should also create, in collaboration with developers, tiered control mechanisms for AI agents based on their autonomy level:
- Low autonomy: No intervention needed (e.g., simple chatbots).
- Medium autonomy: Humans must be able to pause, override, or guide decisions (e.g., AI customer support agents).
- High autonomy: AI must trigger human-in-the-loop mechanisms before taking any consequential actions (e.g., financial trading AI).
-
Collaboration & Self-Modification Guardrails: Regulate AI agents that communicate or collaborate with other AI agents to mitigate the risks of emergent unintended behaviors. Self-modifying AI agents (i.e., AI that rewrites its code or objectives) should also be tightly controlled and restricted, even when they undergo external auditing and safety evaluations.
-
Agent-Specific Consumer Protection: Mandate AI “undo” features, allowing users to reverse AI-driven transactions or interactions. To align with emerging regulatory standards and AI-specific human rights, introduce consumer appeal mechanisms where users can challenge agentic AI decisions and request human alternatives.
-
Limit or Ban Participation in Political & Financial Ecosystems: Irrespective of how silly it may seem, prohibit AI agents from running for office, voting, or engaging in political lobbying. Throughout financial ecosystems, strict regulations on agentic AI trading should be established to prevent high-frequency AI interactions from destabilizing financial markets (e.g., “Flash Crash” scenarios).
-
Behavior & Interaction Audits: Track and audit AI agents’ decision-making history and interactions, especially in high-risk areas. To bolster external auditing efforts, establish and certify third-party AI behavior verification providers that regularly evaluate whether AI agents comply with ethical and safety guidelines.
-
Regulate Multi-Agent Systems: Limit multi-agent interaction to mitigate the risk of emergent, uncontrolled behaviors. For AI agents working in multi-agent systems (e.g., AI stock trading teams) ensure that operation occurs within predefined safety constraints and controlled environments.
-
Anticipate Economic Disruption via Policy: Implement taxation policies on AI agent-generated wealth to proactively combat scenarios in which AI-driven productivity leads to centralized economic power. For future preparedness, create AI labor displacement funds that assist workers affected by AI agents performing high-skill or complex roles (e.g., AI legal research agents replacing paralegals).
Long-Term Recommendations
-
Control Evolution & Recursive Self-Improvement: Define concrete legal limits on AI agents that evolve beyond their original scope to prevent unintended runaway behavior. If agentic systems exhibit self-improvement capabilities, they’ll require self-improvement audits that test whether AI agents’ code modifications remain aligned with ethical and safety standards.
-
Agent-Specific Global Governance Bodies: While this may be ambitious, an International AI Agent Regulation Alliance, designed to address the impacts of real-world agentic AI decision-making, could be crucial to ensuring a safe and beneficial AI-driven future. It could also foster international collaboration on AI agent cybersecurity standards, streamlining preventative solutions for malicious exploitation.
-
Prepare for Operation in Open Environments: Regulate AI agents that function autonomously in physical or virtual spaces and develop safety training protocols for agents that interact with unpredictable human behavior or fluctuating environments.
-
Establish Global Certification & Licensing Framework: Develop cross-border AI agent licensing protocols to support and enable safe and ethical global deployment practices. More specifically, consider introducing tiered licensing for AI agents based on autonomy, impact, and sector:
- Low-risk AI (e.g., AI chatbots) require minimal oversight.
- Medium-risk AI (e.g., AI assistants in healthcare) must pass regulatory checks.
- High-risk AI (e.g., autonomous AI negotiating contracts) needs strict pre-deployment approval.
-
Develop Risk Containment Mechanisms: Create a universal “kill switch” protocol for AI agents operating autonomously across jurisdictions or international boundaries—this mechanism would be crucial for deactivating rogue AI agents quickly and effectively. Looking further ahead, designing AI quarantine mechanisms that isolate/restrict AI agents with anomalous behavioral tendencies could be wise.
-
Morality & Ethical Alignment Tests: Develop universal alignment benchmarks that assess AI agents’ ability to follow ethical, legal, and safety constraints before they are deployed at scale. For benchmarks to prove effective, they’ll also require multi-stakeholder testing, particularly across vulnerable populations (e.g., children, elderly).
-
Research Agentic Cognition: Initiate philosophical, psychological, and legal research initiatives that assess and target the cognitive evolution of future AI agents with advanced reasoning capabilities—this should also include clear ethical guidelines for handling AI agents that exhibit unexpected levels of cognitive sophistication, especially within interactive environments.
-
Future-Proof AI Governance: Establish foresight committees that continuously assess AI agent advancements and update regulations accordingly. Research agent-specific, innovative AI governance models that explore how to regulate AI agents with human-like reasoning, deception, or social manipulation skills.
Frameworks for assessing and characterizing AI agent consciousness and sentience—no matter how absurd—should also be established to ensure proactive governance, ethics, and safety.
For readers interested in exploring other topics in the AI safety, governance, and ethics landscape, we recommend following our blog, where you can gain quick, pragmatic insights on AI advancements across multiple domains or take a deeper dive to examine a diversity of complex topics in detail. Alternatively, we further suggest following our AI experiments, in which we test frontier AI capabilities to reveal weekly insights into state-of-the-art AI developments.
For those initiating or upholding their responsible AI journey, we invite you to check out Lumenova’s AI governance and risk management platform and book a product demo today—our AI risk advisor and policy analyzer can also help streamline your risk management and governance needs.