
Compliance teams at financial firms face a persistent challenge: sifting through a high volume of alerts to find the ones that genuinely matter. In the context of trade surveillance and transaction monitoring, this challenge is not just about efficiency. It’s about whether the systems in place are actually fulfilling their regulatory purpose.
While much attention has been paid to reducing false positives at the alert investigation stage, a more fundamental question is often overlooked: are the alerts being generated in the first place the right ones?
This article examines the distinction between alert generation and flag generation, and argues that the most effective way to reduce noise is to improve the quality of signals entering the pipeline, through smarter, context-aware flag generation.
In most surveillance systems, the workflow follows a two-stage process:
The system scans trading activity against predefined rules and thresholds. When activity meets the criteria for a potential issue (e.g., a price movement exceeding a certain threshold around the time of an order), a flag is generated. At this stage, the system is identifying candidates for further review.
Flags are then filtered, scored, or enriched to determine whether they warrant escalation to a compliance analyst. This step may involve additional logic, such as checking for repeat behaviour, overlaying contextual data, or applying machine learning models. Flags that pass this filter become alerts.
The distinction matters because many of the problems attributed to “alert fatigue” or “false positives” actually originate at the flag generation stage. If the initial detection logic is too broad, too rigid, or too disconnected from the trading context, the system will produce a high volume of low-quality flags. No amount of downstream filtering can fully compensate for a weak signal at the source.
There are several common reasons why flag generation produces excessive noise.
Many systems apply the same detection thresholds across all instruments, markets, and time periods. For example, a spoofing detection rule might flag any order that is cancelled within a fixed time window. But cancellation behaviour varies significantly between asset classes, venues, and trading strategies. A threshold that is appropriate for a liquid equity market may generate constant flags in a less liquid fixed income market.
Flag generation logic often operates in isolation, without considering the broader trading context. A sudden price movement may look suspicious in isolation but is entirely expected during an earnings announcement or a central bank decision. Without integrating market events, news, or corporate actions into the detection logic, systems will flag routine activity as potentially abusive.
Firms with diverse trading operations may apply the same detection models across all desks, strategies, and client types. This fails to account for the fact that different business lines carry different risk profiles. A high-frequency market-making desk will naturally exhibit different order patterns than a long-only fund manager, and the surveillance logic should reflect that.
Even well-designed detection rules can produce noise if they are not regularly calibrated against actual trading patterns. Market conditions change, client bases evolve, and new products are introduced. Without ongoing calibration, flag generation logic becomes stale and increasingly disconnected from the reality of the firm’s trading activity.
When flag generation is noisy, the consequences ripple through the entire compliance function.
Compliance analysts who are repeatedly presented with low-quality alerts become desensitised. The risk is that genuine issues are overlooked because they are buried in a queue of noise. This is not a theoretical concern. Regulators, including the FCA, have cited analyst fatigue as a contributing factor in surveillance failures.
Regulators expect firms to demonstrate that their surveillance systems are effective, not just operational. A system that generates thousands of alerts, the vast majority of which are closed without action, is difficult to justify in a regulatory review. The FCA’s Market Watch 79 explicitly criticised firms for failing to adequately calibrate their surveillance systems and for relying on generic, out-of-the-box detection logic.
Every alert requires some level of human review. When the majority of alerts are low-quality, compliance resources are diverted away from genuine risk areas. This is particularly acute for smaller firms with limited compliance headcount, where every hour spent on a false positive is an hour not spent on meaningful risk management.
Over time, a noisy system undermines confidence in the surveillance function. Senior management may view compliance alerts as unreliable, and the compliance team may lose credibility within the organisation. This can create a culture where surveillance is seen as a regulatory checkbox rather than a genuine risk management tool.
Improving flag generation requires a shift from static, rule-based detection to dynamic, context-aware signal generation. This involves several key elements.
Rather than applying fixed thresholds, detection logic should adapt to the characteristics of the instrument, market, and trading context. For example, a spoofing detection rule should account for the typical cancellation rate for the specific instrument and venue, adjusting its sensitivity accordingly. This reduces noise without sacrificing detection capability.
Integrating contextual data into the flag generation process, rather than relying on it only at the alert triage stage, can significantly improve signal quality. This might include overlaying market events, news feeds, or corporate actions data at the point of detection, so that flags are only generated when trading activity deviates from what would be expected given the prevailing market conditions.
Firms should consider applying different detection models to different business lines, client types, or trading strategies. A model calibrated for a proprietary trading desk will produce very different results when applied to client-facilitated flow. Segmentation ensures that detection logic is relevant to the specific risk profile of the activity being monitored.
Flag generation logic should be subject to ongoing calibration, informed by feedback from the alert investigation process. If a particular detection rule is consistently producing flags that are closed without action, that is a signal that the rule needs to be refined. Calibration should be a continuous process, not a one-off exercise conducted during implementation.
Machine learning models can be used to identify patterns in historical flag data, helping to distinguish between flags that are likely to be genuine and those that are likely to be noise. This is not about replacing human judgment, but about providing analysts with better-quality inputs. Importantly, any use of machine learning in this context must be explainable and auditable, as regulators will expect firms to be able to articulate how their systems arrive at their conclusions.
It is worth addressing the relationship between flag generation and the broader goal of false positive reduction. Many firms focus their efforts on reducing false positives at the alert investigation stage, using techniques such as alert scoring, clustering, or automated triage. While these techniques can be effective, they are ultimately compensating for weaknesses in the upstream flag generation process.
A more effective approach is to address the problem at its source. By improving the quality of flags entering the pipeline, firms can reduce the volume of alerts that need to be triaged, scored, or investigated. This not only improves efficiency but also increases the overall effectiveness of the surveillance function, because analysts are spending more time on genuine risk and less time on noise.
Regulators have become increasingly explicit about their expectations for surveillance calibration. The FCA’s Market Watch 79 highlighted the importance of firms regularly reviewing and updating their detection logic, and criticised firms for relying on generic thresholds that had not been adjusted to reflect their specific trading activity. ESMA’s guidelines under MAR similarly emphasise the need for firms to ensure that their surveillance systems are “proportionate” and “effective”, which implicitly requires that detection logic is tailored and regularly reviewed.
In the US, FINRA has published guidance on the use of surveillance technology, noting that firms should ensure their systems are “reasonably designed” to detect potential misconduct. This language gives regulators significant latitude to challenge firms whose systems are producing a high volume of noise, as a noisy system is arguably not “reasonably designed” to detect genuine issues.
The conversation about alert quality in trade surveillance needs to shift upstream. While alert scoring, enrichment, and triage techniques have their place, the most impactful improvements come from getting the initial signal right. Smarter flag generation, informed by dynamic thresholds, contextual data, segmented models, and continuous calibration, is the foundation of an effective surveillance function.
Firms that invest in improving their flag generation logic will see benefits not just in reduced noise, but in better regulatory outcomes, more efficient use of compliance resources, and a surveillance function that genuinely supports market integrity.