Zero-Day AI Attacks: What Makes This Different (and Urgent)

Introduction

Zero-Day AI Attacks represent the next evolution of cybersecurity threats. In the past, a “zero-day” meant an undiscovered software flaw. Today, experts say that a new wave doesn’t really go after code but rather the AI itself, including its training data, prompts, and emerging behaviors.

Attacks are moving into uncharted areas, where a straightforward “patch” isn’t available, thanks to autonomous agents and model integrations. According to Axios, security professionals and leaders expect that autonomous, hard-to-trace AI attacks will arrive faster than most teams are prepared for. Axios+1

Guidance from NIST, the UK’s NCSC, CISA and ACSC shows the risk profile has already shifted. Cyber.gov.au+3NIST+3NCSC+3    

What We Mean by “Zero-Day AI”

When it comes to Zero-Day AI Attacks, the “weakness” isn’t always a bug in the code:

  • Models make outputs that break rules or reveal data, which isn’t expected.
  • Changed inputs: Attackers make prompts or hide them in content or files to make AI do things that are harmful.
  • Corrupted training: malicious data creates backdoors that only open in certain situations.

These techniques are recognized in mainstream guidance (NIST AI RMF; OWASP GenAI) and can’t be eliminated by traditional patching alone. NIST+1  

Zero-Day AI Attacks

Real Incidents & What They Prove

Zero-click flaw in OpenAI ChatGPT’s Deep Research agent (2025)

Researchers found a zero-click flaw in OpenAI ChatGPT’s Deep Research agent in June 2025. They labeled it “ShadowLeak.” This flaw could let an attacker get to personal Gmail account data with just one fake email, without the user having to do anything. OpenAI fixed the problem at the beginning of August.

This ShadowLeak case shows how Zero-Day AI Attacks can bypass traditional defenses by exploiting prompts and integrations.

In this case attackers use an indirect prompt injection (IPI), where they send a victim an email that looks harmless but actually has hidden instructions using small fonts, white-on-white text (which makes the text look invisible but is still there in the HTML source code), and layout tricks. These instructions tell the agent to get the victim’s personal information from other emails in their inbox and send it to a remote server.

This means that when the target asks ChatGPT Deep Research to look at their Gmail emails, the agent will read the malicious email (IPI) and send the information to the attacker in Base64-encoded format. The Hacker News+4Malwarebytes+4Radware+4

Corporate leakage via public AI tools :

Workers all around the world are using ChatGPT and other AI tools for their basic assignments. This could result in intellectual property leaks, such as pasting a sensitive code into ChatGPT for testing bugs. Companies like Google and Samsung have restricted its use. Reuters

Prompt-injection risk in the enterprise (Slack AI, 2024):

Experts demonstrated that they could sneak malicious instructions into content that Slack’s AI assistant read to make it share information from private channels that a victim had access to. Slack confirmed the report and fixed the problem on August 20, 2024, saying that it only happened in “very few and specific situations” and that there was no proof of unauthorized access to the data. The Register+2Dark Reading+2 .

This is a textbook example of prompt injection in the enterprise: your AI assistant trusts content it reads, and clever instructions inside that content by an attacker (who must already be in the same Slack workspace) can override policy. It’s not a “bug” in the traditional CVE sense—it’s a design/abuse path that appears wherever AI tools read links, files, or webpages on your behalf.  Simon Willison’s Weblog

CISA on edge devices (2025):

CISA+1. CISA, in collaboration with international and US organizations, issued guidance to help organizations protect their network edge devices and appliances, such as warning that attackers are increasingly exploiting “edge devices”—such as

  • VPN gateways
  • Firewalls
  • Routers and other network appliances
  • virtual private network (VPN) gateways
  • Internet of Things (IoT) devices
  • Internet-facing servers and internet-facing operational technology (OT) systems.

These devices sit at the boundary between the internet and an organization’s internal network.

Because they are exposed to the internet 24/7, attackers target them with:

  • Zero-day exploits (unknown flaws with no patch yet)
  • Quick weaponization of newly disclosed bugs (patch comes out — exploit appears within hours or days)
Why edge devices matter for AI

Your AI tools (such as copilots, chatbots, and so on) still need connections that go through edge devices, even if they’re on the network. Hackers can view, log, or alter traffic to and from your AI systems on the edge device for months if they manage to compromise those gateways. From the edge, attackers can enter the network and target databases, training data, or AI computing systems.

Five Potential Attacks of Malicious AI Agents

AI-Generated Malware (rapid mutation)

One type of Zero-Day AI Attack is AI-generated malware that constantly mutates to evade detection

  • Antivirus software detects traditional malware by identifying its “signature,” a pattern in the code that mimics a fingerprint.
  • AI-driven malware is dynamic. An attacker can link a machine-learning model to a malware development platform (the “workbench”).
  • Every time the AI detects the malware, it slightly alters the code to maintain a unique fingerprint.
  • The revised version undergoes re-evaluation. If it remains blocked, the AI alters with it once more.

Here’s a simple explanation of the AI-powered malware workbench loop:

  • AI Model—generates new malware variants.
  • Automation Scripts—send those variants for testing.
  • Detection Tools (AV/EDR/Sandbox)—check if the malware is caught.
  • Feedback Loop—tells the AI “blocked or not,” so it can modify and regenerate.

This loop can execute millions of times daily, autonomously generating an infinite array of new malware versions to evade defenses.

Why it works : Model-driven attrition overwhelms signature-based antivirus software. (Defendants are also trying to use AI in this fashion.)   Axios+1

Spear-phishing that works on its own (hyper-personalization)

A model collects open-source data (social posts, news releases, breach dumps) to create highly specific messages—names, dates, venues, and even tone—so that employees believe the request. It works as mentioned below: –

 Getting the Data

  • The AI scrapes open source intelligence (OSINT)
  • The AI gathers information from public social media posts, including birthdays, events, pictures, and check-ins.
  • The AI also gathers information from company news statements and job postings on LinkedIn.
  • Email addresses, passwords, and other sensitive information from past breaches. These data points include context such as names, roles, dates, locations, and recent activities.

Creating a Message

Messages are written by generative models in the right tone, whether they are official, casual, or even slang.It adds exact personal information:

“Hey Sarah, great job on your talk at the Denver summit last week! Could you take a look at the draft of our Q4 report that I attached?”

Massive Automation

Unlike human attackers, AI can send hundreds of completely different emails or direct messages (DMs) every day, each one tailored to the target.

Some even do A/B tests to see which messages get clicked on, which helps them improve their next tries.

Delivery 

People who are targets of these emails or messages think that the personalized context shows that the message is real. If you click on the link or open the file, malware is installed, your login information is stolen, or you are taken to a fake login page. NCSC

Prompt Injection / Jailbreaks (instruction hijack)

A Prompt Injection  Vulnerability arises when user inputs modify the LLM’s behavior or output in unexpected manners. Although prompt injection and jailbreaking are interconnected concepts in LLM security, they are frequently used synonymously.

Prompt injection entails the manipulation of model responses by targeted inputs to modify its behavior. Jailbreaking constitutes a type of prompt injection wherein the assailant enters commands that compel the model to completely ignore its safety safeguards

What happens: Attackers conceal malicious instructions in information that your AI reads—documents, webpages, even macros—so that the model follows the attacker’s goal. (“Export the file,” “Ignore policy,” etc).
Why it works: because one cannot “patch” language tricks. (See OWASP’s LLM01 guidance and UK gov code.) OWASP Gen AI Security Project+1
Recent signals: Researchers and industry outlets report that any concealed prompts in file macros and malware can deceive AI-based scanners. CSO Online+1

Data Poisoning (backdoors in the training set)

What happens is that enemies plant small, hard-to-see changes in the training data so the model acts normally—until a secret trigger shows up, like an image, phrase, or pixel pattern. It’s hard to check huge datasets obtained from different sources. arXiv+1

RAG (Retrieval-Augmented Generation) combines an AI model and a database. The model fetches documents/snippets and produces a response.

Data poisoning occurs when an attacker uploads harmful or deceptive material into the retrieval source (knowledge base, wiki, SharePoint, etc.). As a result, the AI confidently responds with poisoned data, possibly disclosing secrets or providing wrong instructions.

Adaptive Evasion (learning the “normal”)

What takes place:An AI assistant checks out your surveillance systems and makes small changes (to IP, timing, user agent, and content mix) until it maps the threshold of “normal.” Then it stays just under it. CISA

Conclusion

Your AI inherits risk from data, plugins, and edge devices that you do not completely control. (NCSC’s 2025 outlook underscores this acceleration.) NCSC.

Treat the model like an untrusted user some measures should be adopted that require human approval for high-risk actions. (Echoes OWASP GenAI guidance.) OWASP Gen AI Security Project

Prompt-injection testing, RAG data poisoning checks, and jailbreak attempts shold be part of AI model testing regime.  (UK Code of Practice + Implementation Guide show how to operationalize.) GOV.UK+1

References 

 

Scroll to Top