June 6, 2025

US, Australian, New Zealand and UK cybersecurity agencies publish guidance on best practices for securing data used to train and operate AI systems

A&O Shearman

+ Follow Contact

Send

Embed

A&O Shearman

On May 22 2025, the cybersecurity agencies from the US, UK, Australia, and New Zealand published a Cybersecurity Information Sheet (CIS) on ensuring that data used to train and use artificial intelligence (AI) and machine learning systems is kept secure. The CIS outlines the importance of data security for ensuring that AI processing is accurate and reliable, as well as detailing recommended practices and strategies to mitigate various risks in the AI context. The CIS is primarily aimed at organisations which deploy AI systems, especially those handling sensitive data.

The CIS identifies three areas where data security risks are prevalent in AI systems: (i) data supply chain; (ii) “poisoned” data (i.e. maliciously modified data); and (iii) data drift.

Data supply chain risks

Where data is sourced from third parties or online datasets to train AI systems, there is a risk that the data may be inaccurate or maliciously modified. This can threaten the security of AI models and the accuracy of their output, as well as any AI systems that rely on such affected AI models as a base. This can be caused by "split-view poisoning", where data is hosted on expired domains that may have been purchased by malicious actors who can modify the data and repurpose it for malicious content. Another cause is "frontrunning poisoning", where attackers inject malicious data just before data snapshots are taken (which often take place at publicly-known times). The CIS highlights that malicious actors are able to carry out both split-view and frontrunning poisoning relatively easily, as they do not require sophisticated systems or understanding to succeed.

The CIS notes that one way to mitigate these risks to the data supply chain is by using data to train AI systems using only sources that are trusted and accurate and ensuring that checks are carried out to assess whether data has been modified. This can be carried out through cryptographic hashes, which allows for unauthorised changes or corruption to be detected.

Maliciously modified data

The deliberate manipulation of data can result in an AI system producing inaccurate outcomes and being subject to weakened security. This can arise from “adversarial machine learning” methods, which refers to the poisoning of data or the use of misleading examples into datasets, which can undermine the accuracy and performance of the relevant AI system and cause it to behave in an unreliable manner.

To address issues related to maliciously modified data, the CIS recommends using algorithms to identify and remove anomalous data points before proceeding with training of the relevant AI system. The CIS also suggests regularly filtering data to ensure that poisoned datasets are identified and removed from the system. Metadata should also be regularly checked to ensure that information is not missing, which can lead to incomplete representation of data in an AI system, which in turn can compromise the reliability and performance of the system. The CIS also suggests carrying out checks on whether the data accurately represents the information that is relevant to a specific topic.

Data drift

Data drift describes the changes in the statistical characteristics of input data that is used in an AI system over a period of time. This may be due to changes in other parts of the AI system that are not represented in the training data or the organisational changes which cause new data inputs to be inaccurately recognised as a security threat. Such changes can mean that the input data becomes different from the data that was used at the early stages to train the relevant AI system. If this is not monitored, the AI system may become less accurate and become more difficult to correct. It is therefore important to carry out regular monitoring of data inputs and track any changes over time. The CIS notes that gradual changes are likely to be a result of data drift, whilst sudden changes are more likely to be a result of malicious actors.

Best practices for securing AI data

Source reliable data: use trusted, authoritative data sources and implement tracking systems to identify the origin of any data being used. Cryptographical tools can used to check for corruption of data sets.
Maintain data integrity when data is transported or stored: use checksums (values calculated from datasets) and cryptographic hashes to verify the data sets and detect any changes or corruption during storage or transmission.
Digital signatures: use quantum-resistant digital signatures that use cryptography to protect and secure the signatures from traditional and quantum computing.
Use trusted computing systems: use a computing system that does not automatically trust any user or device (even if they come from within a known network).
Categorise data: such as by sensitivity level or required protection measures.
Data encryption: use sophisticated encryption protocols for data (including when data is at rest, being transported and during processing). AES-256 encryption is the industry standard as it is considered to be quantum-resistant.
Data storage: use storage devices that have cryptographic abilities and are compliant with NIST standards. Risk assessments should be used to assess the required security level for a given organisation.
Preserve privacy: use data anonymisation and depersonalisation techniques to protect sensitive information, such as data masking, which involves replacing sensitive data with other (realistic) information or federated learning, which allows AI systems to be trained over several datasets.
Secure deletion of data: Use secure deletion methods (such as cryptographic erasing) before repurposing or decommissioning storage devices.
Carry out data security risk assessments: Data security should be regularly assessed using NIST frameworks.

You can read the press release here and the CIS here.

[View source.]

Send Print Report

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations. Attorney Advertising.

Written by:

A&O Shearman

Contact + Follow

Nigel Parker

+ Follow

less

PUBLISH YOUR CONTENT ON JD SUPRA NOW

Increased visibility
Actionable analytics
Ongoing guidance

Learn More

Published In:

Algorithms

+ Follow

Artificial Intelligence

+ Follow

Cybersecurity

+ Follow

Data Management

+ Follow

Data Security

+ Follow

Machine Learning

+ Follow

NIST

+ Follow

Risk Assessment

+ Follow

Risk Management

+ Follow

Supply Chain

+ Follow

Privacy

+ Follow

Science, Computers & Technology

+ Follow

less

A&O Shearman on:

US, Australian, New Zealand and UK cybersecurity agencies publish guidance on best practices for securing data used to train and operate AI systems

Data supply chain risks

Maliciously modified data

Data drift

Best practices for securing AI data

Latest Posts

Written by:

PUBLISH YOUR CONTENT ON JD SUPRA NOW

Published In:

A&O Shearman on:

"My best business intelligence, in one easy email…"