A Possible Approach for Evaluating AI Standards Development A strategic articulation of a theory-of-change-based approach for evaluating whether AI standards achieve goals related to innovation, competition, harm minimization, and public trust. Submitter's Note: This StratML rendition was compiled from the source by ChatGPT and edited in the form at https://stratml.us/forms/Claude/Part1.html^^Scope Note: The report is limited to evaluating the development of AI standards (documentary standards developed by standards development organizations), not evaluating AI systems themselves.^^Intent Note: The document is intended to stimulate discussion with a wide variety of stakeholders about evaluating the effectiveness, utility, and relative value of AI standards development as an intervention.^^Context Note: The report uses data integration—especially entity resolution—as an illustrative application area to make evaluation concepts concrete. National Institute of Standards and Technology NIST 401b4307-bc95-4e65-a591-c8d43055426f A U.S. federal agency that advances measurement science, standards, and technology to enhance innovation and trust. Julia Lane Author | NIST Associate | Professor Emerita, Wagner Graduate School of Public Service, New York University AI standards that measurably advance innovation, trustworthiness, and public benefit. a9a50fdf-8870-46be-be5f-d2491c46d59e Define and apply a rigorous evaluation framework to determine whether AI standards achieve intended outcomes and long-term goals. 3bb87677-73ec-4fd2-b033-96b655548ef3 Trustworthiness Increase justified confidence in AI systems by promoting reliable governance, assessment, and transparency practices. Public Benefit Evaluate AI standards in terms of their contributions to the public good, including harm reduction and trust-building outcomes. Innovation Support innovation by clarifying foundational concepts and enabling interoperability and adoption. Competition Promote competitive, widely accessible ecosystems by reducing coordination costs and improving market acceptance of trustworthy AI. Transparency Enable meaningful visibility into AI system and data characteristics so that stakeholders can assess reliability and risks. Privacy Protect sensitive and personally identifiable information as data and models are shared, combined, and deployed. Security Reduce risk and harm by strengthening controls and practices that protect data and AI-enabled processes. Evidence Use tested evaluation approaches and measurable indicators to distinguish actual effects from confounding changes in the environment. Inclusiveness Involve relevant stakeholders in evaluation design and measurement so that qualitative knowledge informs consequences and tradeoffs. Consensus Respect the private sector-led, consensus-based approach to standards while evaluating both positive and negative consequences. Evaluation Establish a coherent framework for evaluating the causal impact of AI standards development. 6574cbce-5fe5-47a6-b202-04b7c9949bc2 1 Evaluation is treated as a form of assessment that is periodic and objective, and is distinguished from simply comparing before-and-after changes without a counterfactual. The theory-of-change framing organizes inputs, activities, outputs, outcomes, and goals, and is intended to support learning about what works and why. Results Chain Define the inputs, activities, outputs, outcomes, and goals associated with AI standards development. 9b82b1a5-74c3-4740-83e5-a7b90b2f8fd5 1.1 The report frames standards development as an intervention whose constituent parts should be measured before, during, and after development to enable later causal evaluation. Counterfactuals Identify appropriate alternative states of the world against which the impact of AI standards can be assessed. 35638d72-497f-40c6-b974-5b7455d400ad 1.2 Without an appropriate comparison group or counterfactual, observed changes cannot be reliably attributed to standards development because the baseline environment may also change over time. Metrics Specify qualitative and quantitative measures for assessing adoption, outcomes, and impacts of AI standards. bd091918-4f5e-4697-abdc-2cc2270a1b15 1.3 The report emphasizes that evaluation can be expensive depending on structure and timing; designs using existing data can be more cost-effective than randomized trials, but still require careful attention to confounding factors and baseline measurement. Adoption Support widespread and effective use of AI standards by relevant stakeholders. 9775895f-3953-4c44-b0ca-84caec8df679 2 The report highlights foundational areas where standards can accelerate adoption and interoperability, and thereby enable both private and public value creation. Terminology Promote convergence on shared AI terminology and taxonomies to enable interoperability and reduce coordination costs. 5fa58600-fb97-455c-8a1e-914cb1072b38 2.1 Illustrative context: terminology and taxonomy standards can clarify AI techniques and tasks (e.g., classification, named entity recognition, fuzzy matching), improving communication within and across agencies and potentially accelerating deployment of data integration tools. TEVV Advance testing, evaluation, verification, and validation methods and metrics for AI systems. 05f11989-71f7-456a-919f-2295623b5c95 2.2 Illustrative context: shared TEVV methods and metrics can improve the reliability of AI-assisted data integration, including in demanding environments such as integration of electronic health records across sources. Training Data Practices Encourage sound governance and quality management of data used to train AI systems. 407f2b75-3f94-4685-b5fc-9c2e7120bafb 2.3 The report notes training data practices needing standardization, including preprocessing technique selection, dataset change management, efficient use of scarce data, management of diverse formats, and identification of data permitted for or excluded from training use. Trust Increase justified confidence in AI systems through consensus-based standards. 13adb777-4dbc-4b26-96ef-80129c27e352 3 The report stresses that AI standards can further both private and public good, particularly by increasing trust and reducing harm; it also cautions that standards can have negative consequences, including exclusionary effects such as non-tariff barriers to trade. Governance Establish norms for accountability, risk management, security, privacy, and transparency in AI development and deployment. b39e883e-2e88-4c72-b16f-f4c157956d64 3.1 The report identifies security, privacy, and transparency among AI actors about system and data characteristics as high-priority topics for standards development. Conformity Assessment Foster mechanisms for assessing and attesting conformity with AI standards. 69eb319b-59a8-4789-b46a-31be1470d86a 3.2 Conformity assessment strengthens trust by enabling consistent claims about adherence; it also enables evaluation by making adoption and implementation more observable. Learning Enable continuous improvement of AI standards through evidence and stakeholder-engaged evaluation. 7ed24ee5-c0ee-4eb6-92b5-5c9959298795 4 The report’s iterative evaluation emphasis treats evaluation evidence as a mechanism for improving future standards development and dissemination, recognizing both costs and the need for fit-for-purpose designs. Stakeholder Engagement Involve relevant stakeholders in evaluation design and measurement from the beginning. ddcab975-0ad9-481f-b17f-856a4439960b 4.1 Stakeholder qualitative knowledge is treated as a key input for describing consequences of developing particular AI standards and for shaping what should be measured. Iteration Refine standards and evaluation methods based on observed outputs, outcomes, impacts, and counterfactual comparisons. b79b8c66-4a27-494d-b075-d3ed66127aaf 4.2 The report frames evaluation results as inputs to future standards development, building a body of knowledge about what works and why. 2026-01-01 https://doi.org/10.6028/NIST.GCR.26-069 Owen Ambur Owen.Ambur@verizon.net