individual
What campus are you from?
Daytona Beach
Authors' Class Standing
Ayse Okatan, Senior
Lead Presenter's Name
Ayse Okatan
Faculty Mentor Name
Berker Pekoz
Abstract
We introduce Model-Bound Latent Exchange (MoBLE), a decoder-binding property in Transformer autoencoders that we formalize as Zero-Shot Decoder Non-Transferability (ZSDN). In a controlled character-level identity task with iso-architectural models trained on identical data but different random seeds, we observe that self-decoding achieves 86-92% exact sequence match and ≈98-99% token accuracy, while zero-shot cross-decoding collapses to chance (≈ 1/vocabulary size) with 0% exact matches. This sharp separation persists despite identical architectures and training recipes, and is corroborated by weight-space distances and attention-divergence diagnostics. We argue that ZSDN is best interpreted as model binding-an authentication and accesscontrol mechanism for latent representations-on identity tasks: encoder's hidden state representation deterministically reveals the plaintext, yet only the correctly keyed decoder reproduces it in zero-shot. We provide a formal definition of ZSDN, a decoder-binding advantage metric, and a deployment checklist for secure artificial intelligence (AI) pipelines. Finally, we discuss learnability risks (e.g., adapter alignment) and outline mitigations such as quantization, integrity tags, and rekeying. MoBLE offers a lightweight, accelerator-friendly approach to secure AI deployment in safety-critical domains, including aviation and cyber-physical systems. We analyze subliminal transfer in Transformer models, where a teacher embeds hidden traits that can be linearly decoded by a student without degrading main-task performance. Prior work often attributes transferability to global representational similarity, typically quantified with Centered Kernel Alignment (CKA). Using synthetic corpora with disentangled public and private labels, we distill students under matched and independent random initializations. We find that transfer strength hinges on alignment within a trait-discriminative subspace: same-seed students inherit this alignment and show higher leakage τ ≈ 0.24, whereas different-seed students-despite global CKA ¿ 0.9-exhibit substantially reduced excess accuracy τ ≈ 0.12 − 0.13 (different-seed). We formalize this with subspace-level CKA diagnostic and residualized probes, showing that leakage tracks alignment within the trait-discriminative subspace rather than global representational similarity. Security controls (projection penalty, adversarial reversal, right-for-the-wrong-reasons regularization) reduce leakage in same-base models without impairing publictask fidelity. These results establish seed-induced uniqueness as a resilience property and argue for subspace-aware diagnostics for secure multi-model deployments.
Did this research project receive funding support from the Office of Undergraduate Research.
No
Seed-Induced Uniqueness in Transformers: Subspace-Aligned Latent Keys for Authentication
We introduce Model-Bound Latent Exchange (MoBLE), a decoder-binding property in Transformer autoencoders that we formalize as Zero-Shot Decoder Non-Transferability (ZSDN). In a controlled character-level identity task with iso-architectural models trained on identical data but different random seeds, we observe that self-decoding achieves 86-92% exact sequence match and ≈98-99% token accuracy, while zero-shot cross-decoding collapses to chance (≈ 1/vocabulary size) with 0% exact matches. This sharp separation persists despite identical architectures and training recipes, and is corroborated by weight-space distances and attention-divergence diagnostics. We argue that ZSDN is best interpreted as model binding-an authentication and accesscontrol mechanism for latent representations-on identity tasks: encoder's hidden state representation deterministically reveals the plaintext, yet only the correctly keyed decoder reproduces it in zero-shot. We provide a formal definition of ZSDN, a decoder-binding advantage metric, and a deployment checklist for secure artificial intelligence (AI) pipelines. Finally, we discuss learnability risks (e.g., adapter alignment) and outline mitigations such as quantization, integrity tags, and rekeying. MoBLE offers a lightweight, accelerator-friendly approach to secure AI deployment in safety-critical domains, including aviation and cyber-physical systems. We analyze subliminal transfer in Transformer models, where a teacher embeds hidden traits that can be linearly decoded by a student without degrading main-task performance. Prior work often attributes transferability to global representational similarity, typically quantified with Centered Kernel Alignment (CKA). Using synthetic corpora with disentangled public and private labels, we distill students under matched and independent random initializations. We find that transfer strength hinges on alignment within a trait-discriminative subspace: same-seed students inherit this alignment and show higher leakage τ ≈ 0.24, whereas different-seed students-despite global CKA ¿ 0.9-exhibit substantially reduced excess accuracy τ ≈ 0.12 − 0.13 (different-seed). We formalize this with subspace-level CKA diagnostic and residualized probes, showing that leakage tracks alignment within the trait-discriminative subspace rather than global representational similarity. Security controls (projection penalty, adversarial reversal, right-for-the-wrong-reasons regularization) reduce leakage in same-base models without impairing publictask fidelity. These results establish seed-induced uniqueness as a resilience property and argue for subspace-aware diagnostics for secure multi-model deployments.