Argus LogoPLARV
ContributorsFeaturesAboutPricingEvidence
PLARV_ARGUS // MOBILE_GATE_V4
RESEARCH PUBLICATION / ARGUS_CORE

Argus Data Quality Index (ADQI)

A model-agnostic framework for signal-driven inference of training data quality through the analysis of manifested training dynamics.

Author // Sherif Hasanov
System // PLARV Argus
Date // April 2026

Why Do We Need It

Machine learning practitioners have long treated data quality as a preprocessing concern — something to resolve before training begins. Clean labels, remove outliers, balance classes, then train. This assumption breaks down in practice.

Training runs fail not because data was dirty at rest, but because data was poorly distributed across the learning process. A dataset with one million samples concentrated in a narrow value range teaches a model one corner of the world with extreme redundancy. A dataset with one thousand samples spread across the full domain teaches the model its shape. Volume is not quality. Distribution is quality.

Existing approaches to data quality — Cleanlab, EL2N, GraNd — operate on the data itself. They require access to raw samples, labels, or embeddings. This creates two problems. First, they cannot be computed without exposing the dataset, which is a real constraint for privacy-sensitive domains. Second, they produce per-sample scores, not a scalar that reflects the health of the entire training trajectory.

A seismologist never goes inside the Earth. They sit on the surface and read the vibrations the Earth sends up. From that signal alone they can tell you what is happening miles underground — the structure, the activity, the health. They never touch what they are measuring. The signal is the only window. ADQI operates on the same principle. The dataset is never accessed. The loss curve and gradient behavior are the vibrations the training run sends up.

From those signals alone, ADQI infers the health of the training trajectory as shaped by data distribution. It measures Signal Territory—analyzing how the model's performance spans its own value range and how densely it populates that manifold across phases.

ADQI is a diagnostic instrument for signal analysis, not a direct measurement of the data manifold.

The Four Failure Modes

Training data quality is not a single problem. It decomposes into at least four distinct failure modes, each invisible to the others.

  • Coverage Failure (DS)A model trained on data that never explores the full input domain will generalize poorly outside the region it saw. This is not a label problem or a noise problem — the labels can be perfect and the model still fails. The domain was simply never covered.
  • Variation Failure (VS)Data that exists across a wide domain but carries no meaningful signal variation teaches the model nothing about the relationship between input and output. Flat data produces flat models.
  • Structural Failure (SC)Even with good coverage and good variation, data that lacks complexity in its underlying shape — no curvature, no regime changes, no transitions — cannot teach a model to handle the non-linearities it will encounter in production.
  • Density Failure (DU)Data clustered in one region of the domain, regardless of total volume, creates a model that is locally expert and globally ignorant. This is the mathematical expression of the intuition: ninety percent of samples in two percent of the bucket means the remaining ninety-eight percent of the world was never taught.

ADQI solves all four simultaneously through a single multiplicative scalar. Each failure mode maps to one component — DS, VS, SC, DU respectively. The multiplicative structure enforces that all four must be healthy for ADQI to be high. A perfect score on three components cannot compensate for collapse in the fourth. This models the AND logic of real data quality: coverage AND variation AND structure AND density.

Critically, ADQI computes these four properties not from the raw data itself but from the training dynamics — the loss curve and gradient behavior that any training run produces regardless of modality. This makes ADQI model agnostic by construction. An LLM and a vision model and a tabular classifier all produce a loss curve. ADQI reads that curve. It never sees a token, a pixel, or a feature.

Methodology

ADQI is computed in three sequential stages: signal extraction, phase detection, and component fusion.

STAGE 1Signal Extraction

The first stage converts any training run into a universal two-dimensional signal (x, y) regardless of model architecture or task type.

  • x is the training step index. It forms the domain axis — the temporal backbone of the training run.
  • y is the primary quality signal. By default this is the loss value at each step. When loss is unavailable, gradient norm is used as a fallback.

This translation layer is what makes ADQI model agnostic. Once the training run exists as (x, y), the architecture that produced it is irrelevant.

STAGE 2Phase Detection

A single ADQI score computed across an entire training run obscures phase-specific behavior. ADQI therefore decomposes every run into three phases — Early, Mid, Late — and scores each independently before fusing.

  1. Smooth y using a moving average window to suppress step-level noise
  2. Compute the first derivative (velocity) and second derivative (acceleration) of the signal
  3. Identify the Braking Point: the peak of the second derivative where the model transitions from discovery to refinement
  4. Identify the Convergence Gate: the point where velocity falls below 15% of its peak, marking the start of the plateau
  5. Set adaptive boundaries based on these natural regime changes

This means phase boundaries shift naturally with the run. The math follows the data.

STAGE 3Component Scoring

Each phase slice is scored independently across four components. All four operate on the (x, y) slice for that phase.

DS (Domain Spread)

Measures Signal Territory Coverage: the fraction of the global signal span conquered by the phase. DS = Phase_Span / Global_Span.

VS (Variation Strength)

Measures signal power; a flat loss collapses VS to zero, indicating zero learning work was performed.

SC (Shape Complexity)

Measures curvature richness. Note: SC rewards variance in the second derivative, which can accidentally reward signal noise.

DU (Density Uniformity)

Measures Signal Distribution: the CV of samples across signal bins. High DU indicates the model explored the territory evenly.

* ADQI components now operate exclusively on the signal manifold (y) to infer the health of the source data.
ADQI_phase = DS × VS × SC × DU

STAGE 4Phase Fusion

The three phase scores combine into a single total ADQI via Dynamic Energy Weighting. Instead of hardcoded heuristics, weights are derived from the integral of the learning velocity in each phase:

w_phase = Σ|dy_phase| / Σ|dy_total|

This means a phase’s contribution to the total ADQI is proportional to the amount of "work" (gradient-driven change) the model performed during that phase. If the model does 80% of its learning in the Early phase, that phase will carry 80% of the weight. This makes ADQI truly self-calibrating to the specific dynamics of the run.

Edge Cases

ADQI behaves predictably under pathological inputs. The following cases are documented explicitly because they represent real training scenarios.

  • Flat loss. Loss remains constant across all steps — variation score VS collapses to zero. An ADQI of zero is the right answer.
  • Extremely short runs. Runs with fewer than three steps cannot produce meaningful component scores. ADQI returns zero and exits cleanly.
  • Duplicate steps. All component functions include a 1e-12 epsilon guard on denominators, preventing division by zero.
  • Extreme values. The tanh normalization in DS and SC saturates smoothly. The clip operations on all outputs enforce the [0, 1] boundary.
  • Negative loss. ADQI handles negative y values correctly because all components operate on relative quantities. Absolute scale is irrelevant.
  • NaN propagation. If loss values contain NaN, NaN propagates through the math and ADQI returns NaN. This is mathematically honest, but a future version should strip or interpolate NaN values.

Mathematical Boundary Analysis

The following cases have been empirically verified via the Argus Test Suite. They represent the edge cases where the ADQI engine reveals its underlying signal logic.

Case 1 // The Grokking ParadoxVERIFIED

"Where the math tells the truth but hides the future."

In a 100-step run with 70 steps of flat loss followed by a "grokking" jump:

Empirical Proof: w_early = 0.0000; w_mid = 0.8980

The Result: The engine correctly identifies zero manifest learning velocity during the plateau, effectively assigning zero weight to the critical period of latent representation building. ADQI measures manifested quality only.

Case 2 // Scheduler InflationVERIFIED

"Where hyperparameters mimic data quality."

In a steady run with a sharp Learning Rate (LR) jump at step 50:

Empirical Proof: ADQI_mid = 0.1541 (vs 0.0000 in steady phases)

The Result: The d²y spike caused by the scheduler is indistinguishable from a "data discovery" event. ADQI reports this as a "high quality" phase unless normalized against the scheduler.

Case 3 // The Smoothness TrapNEW_INSIGHT

"Synthetic Perfection vs. Genuine Discovery."

Consider a perfectly smooth, synthetic exponential decay: y = exp(-x/20).

Empirical Proof: Total ADQI < 0.001

The Result: The score collapses because the signal lacks Shape Complexity (SC). A perfect curve has no "richness" in its second derivative.The Insight: Real learning is messy. ADQI reveals that "synthetic perfection" is a sign of low-information data.

Case 4 // The "Why" ProblemSTRUCTURAL

"Diagnosis vs. Description."

Two training runs both return an ADQI of 0.40. One is due to extreme noise (VS collapse), the other due to data clustering (DU collapse).

The Result: While the aggregate scalar provides a high-level health check, it lacks diagnostic specificity. ADQI tells you the run is failing, but it takes the component breakdown (DS/VS/SC/DU) to identify the specific failure mode.

Limitations

Intellectual honesty requires stating what ADQI does not measure and where its current implementation has known gaps.

ADQI is an indirect measurement.

It measures training dynamics, not data semantics. It does not see the data itself. Two training runs can produce identical ADQI scores from completely different underlying data realities. ADQI identifies indicators of data quality rather than the data manifold itself.

Weights are signal-dependent.

While moving from heuristic 50/30/20 weights to dynamic energy weighting removes human bias, it makes ADQI sensitive to signal noise. If a run has high-frequency noise in the late phase, the energy integral might overweight that phase. Future versions will explore noise-gated energy integrals.

NaN propagation.

When loss values contain NaN, NaN propagates. A future implementation will strip or interpolate NaN values inside the signal extractor.

Modality calibration is untested at scale.

Cross-modality comparability of ADQI scores — whether a ADQI of 0.72 on an LLM fine-tune is comparable to a ADQI of 0.72 on a vision model — is an open question.

Future Possible Additions

ADQI in its current form is a foundation. What follows are the natural extensions that become possible once real-world usage data accumulates.

Empirical phase weight calibration

Learning the optimal phase weights empirically via regression across run outcomes and phase ADQI scores.

NaN-resilient signal extraction

A pre-processing step inside the signal extractor that identifies NaN regions, strips them, and interpolates.

Regime-aware phase detection

Accepting schedule annotations to align phase boundaries with the actual intended structure of the training run.

Cross-run ADQI population scoring

Contextualizing scores against a population of similar runs to produce actionable percentiles.

Gradient norm as co-signal

Fusing loss and gradient norm simultaneously to distinguish between getting stuck and genuine convergence.

Per-layer ADQI

Computing an ADQI profile across layers for transformer models to identify which layers received rich gradient signals.