Pillar 02

Problem-Directed Compilation

A research question is not a database query. It is a compilation target. The question compiles into morphism chains that surgically extract only what is relevant to the answer.

The Triangle DSL

Research protocols are written in Triangle, an LL(1) domain-specific language where each statement maps to a morphism chain through S-entropy space.

investigate "Association between ACTN3
  genotype and cardiac adaptation
  in elite sprinters"
  with confidence > 0.95
  with significance < 0.01

parallel {
  genotype = slice genomics.ACTN3
    @ cohort(elite_sprinters)
    @ variant(rs1815739)

  cardiac = slice echocardiography
    @ cohort(elite_sprinters)
    @ measure(LV_mass, EF, GLS)

  protein = slice proteomics
    @ target(alpha_actinin_3)
    @ tissue(cardiac_muscle)
}

joined = compose genotype with cardiac
  preserving athlete_id

result = navigate joined to target
  via correlation_analysis

converge at confidence > 0.95

The researcher specifies what to investigate and what evidence is needed. The system handles how: which domain models to invoke, what morphism chains to construct, when the analysis has converged.

Surgical Extraction Results

Ghost bars: full dataset size. Solid bars: surgically extracted data. Each source achieves 10⁸–10⁹x compression through problem-directed extraction.

INFORMATION MINIMALITY THEOREM

For any research question Q and dataset D, the extracted representation σ is a sufficient statistic with information content bounded by the mutual information I(D; A_Q). The raw data H(D) is never accessed beyond this bound.

TYPE SAFETY

The protocol type system enforces dimensional consistency, conservation compliance, modality compatibility, and confidence monotonicity—all checked at compile time before any data is accessed.

COMPILATION DECOMPOSITION

Any well-typed protocol decomposes into a sequence of atomic morphisms, each preserving S-entropy conservation. Complex analyses are compositions of simple, verified steps.