What is a golden sample and why does it matter?

A golden sample is the approved Batch 1 unit retained as the reference standard for every subsequent batch. Both you and the factory keep a copy, both signed and dated. Without a golden sample retained on both sides, every quality dispute on Batch 2 and Batch 3 becomes 'your word against mine' — because the spec sheet alone cannot capture the look-feel-finish details that buyers actually pay for. A golden sample makes drift visible.

How big should a pilot run be?

10–20% of the master order, run before the full production batch. The pilot tests two things the sample cannot: whether the production line can hit order-rate volume against the golden sample, and whether the natural variance between operators stays inside the spec tolerance. Schedule a pre-shipment inspection at the start of the pilot — variance shows up in the first 50 units, and catching it on day one saves the rest of the order.

How much does pre-shipment inspection cost in 2026?

Per published 2026 China-inspection pricing, pre-shipment inspection at major providers (Bureau Veritas, SGS, or Intertek) runs around $199–$320 per man-day, while factory audits run around $220–$458 per man-day, depending on scope and provider. For a pilot run, a 1-day pre-shipment inspection at the start of production is the minimum viable check.

How do I get started with Octo SAM for 3-Batch-Test sourcing?

Email info@agenceocto.com with the product spec, the master order volume, and any active supplier conversations. Octo replies within 1 business day with a 30-minute scoping call. SAM engagements include golden-sample retention coordination, pilot-run scheduling, and random-batch QC coordination as standard.

Manufacturer Vetting Updated 2026-05-07 8 min read By the Octo team

Why factory samples lie

Q: What is the Octo 3-Batch Test?

The Octo 3-Batch Test is the three-batch verification rule for production quality: (1) the paid sample tests whether the factory exists and can produce a unit; (2) a pilot run of 10–20% of the master order tests whether the production line can deliver at order-volume rate against the golden sample; (3) a random pull from the 2nd or 3rd full order tests whether the supplier maintains consistency once the QC pressure relaxes. A supplier is verified only when all three batches match the spec.

Q: Why does a Chinese factory sample sometimes look better than the production order?

Three patterns show up repeatedly in seller reports and operator reviews. First, many factories assign stronger operators to samples, especially when the sample is being used to win the order. Second, factories often pull better-controlled raw-material lots for samples than for ordinary production runs. Third, the supplier may be a trading company subcontracting production to a partner factory the buyer has never visited. This pattern is common enough to plan against, especially in apparel, packaging, accessories, and private-label consumer goods.

The Octo 3-Batch Test

A factory sample tests existence. It does not test repeatability. The factory threw their best operators and their best raw materials at the unit you held in your hand — and the second batch is going to be different. We call the rule the Octo 3-Batch Test: a supplier is not verified by one good sample. They are verified across three independent batches that match the spec. Sample. Pilot. Random pull. Skip the third batch and the failure shows up after you have committed to the relationship.

What is the Octo 3-Batch Test?

A supplier's production capability is measured across three independent batches taken at different points in the order lifecycle. Each batch tests a different question. All three must match the spec for the supplier to enter regular rotation.

Batch	When	What it tests	What you compare against
1. Paid sample	Before the master agreement is signed	Does the factory exist? Can they produce a unit at all?	Spec sheet — material, tolerances, finish, packaging
2. Pilot run (10–20% of master order)	After PO is signed; before full production runs	Can the line actually produce at order-volume rate?	The approved Batch 1 sample as the golden sample
3. Random pull from the 2nd or 3rd full order	After the relationship is in regular rotation	Are they cutting corners as they get comfortable?	Both Batch 1 (golden sample) and Batch 2 (pilot retain)

Weak suppliers rarely fail at the sample. They fail at the third batch — when the QA pressure relaxes, the discount raw materials enter the line, or the factory subcontracts to a partner you have never visited.

Why does Batch 1 lie?

Three reasons, in order of how often we see the pattern in seller reports.

Many factories assign stronger operators to samples, especially when the sample is used to win the order. The unit you receive may not be produced by the rotation that runs your master order. It is often produced by the senior operators the factory uses for sales-critical work. The stitching is tighter, the prints are crisper, the assembly tolerances are tighter — because the people doing the work are the factory's best, not their average. The pattern shows up across categories: r/EntrepreneurRideAlong reported hoodie samples with crisp embroidery and printing, then a master order with foam plastisol showing through and stitching going lazy on the second batch.

Factories often pull better-controlled material lots for samples than for ordinary production runs. Yarn lots, plastic resin batches, electronics components — the lot used for the sample is often the lot that scored well on the factory's recent QC pulls. Production runs on the next-best lot, which is statistically guaranteed to be different. For commodity products this gap is small. For brand-sensitive products (apparel, cosmetics, packaging), the gap is the gap between "looks premium" and "looks cheap."

A nominal supplier may subcontract before they commit. A supplier you found on Alibaba may be a trading company that subcontracts production. The sample comes from one factory; the master order from a partner you have not visited. The cluster-geography mismatch from Octo's 3-Consistency Rule maps directly here — a Shenzhen "manufacturer" producing a hoodie has likely subcontracted to a Foshan or Guangzhou apparel partner, and the QC chain breaks at the handoff.

What does "golden sample" actually mean?

A golden sample is the approved Batch 1 unit that becomes the reference standard for every subsequent batch. The factory keeps one, you keep one, and your inspector (Bureau Veritas, SGS, or Intertek) compares Batch 2 and Batch 3 against the retain. Without a golden sample retained on both sides, every dispute becomes a "your-word-against-mine" because the spec sheet alone cannot capture the look-feel-finish details that buyers are actually paying for.

The protocol that works: order Batch 1, approve in writing, sign and date both physical units (yours and the factory's), photograph both, and lock the spec sheet at the version that matched. Any drift from the golden in Batch 2 or Batch 3 is a defect — even if the spec sheet alone would not catch it.

Step 2 — Pilot run (10–20% of master order)

Once the master agreement is signed, the first run should be a pilot — 10–20% of the full order volume. This sounds expensive. It is cheaper than discovering on Batch 3 that the production line cannot hit the golden sample at scale.

The pilot tests two things the sample cannot:

Line capacity at order rate. A factory that produced one sample in three days might run the master order through a line that hasn't been set up for the spec — different operators, different shift, different raw-material lot. The pilot reveals whether the factory's production line has actually been configured for your spec, or whether they are about to run it through a generic line.
The factory's tolerance variance at volume. The first hundred units of the pilot will show the natural drift between operators. If the variance exceeds the spec tolerance, the factory cannot scale you. The pilot makes this visible before you commit the rest of the order.

Per published 2026 China-inspection pricing, pre-shipment inspection runs around $199–$320 per man-day at major providers (SGS, Bureau Veritas, Intertek). For a pilot run, hire the inspection at the start of the pilot — not at the end. The variance pattern shows up in the first 50 units; catching it on day one of the pilot saves the whole order.

Step 3 — Random pull from the 2nd or 3rd full order

The third batch is the one most buyers skip. The supplier has shipped twice without complaint, the relationship feels stable, and the buyer reduces inspection frequency. Octo treats this as the highest-leverage QC moment in the relationship.

The pattern: by Batch 3, some factories test the boundaries. They substitute a slightly cheaper raw material lot, run the line with a different shift, or push tolerances toward the lenient edge of the spec. Each substitution is small. Stacked, they are the difference between "supplier we trust" and "supplier we lose money on."

The fix: random-batch QC pulls on every 3rd to 5th order, conducted on the production floor before shipment, comparing units against both the golden sample and the pilot retain. Per Bureau Veritas's Supplier Audits service, random pulls are a standard supply-chain risk-management tool. The third-batch substitution pattern is common enough to plan against, especially in apparel, packaging, accessories, and private-label consumer goods.

What 4 patterns kill repeatability?

No golden sample retained on both sides. Batch 2 and Batch 3 disputes become unwinnable because there is no agreed reference standard.
Pilot run skipped to "save time." The first 10–20% of the master order is the cheapest place to catch line-capacity failure. Skipping it means the failure surfaces on the full master order.
Inspection scheduled at the end of the pilot, not the start. Variance shows up in the first 50 units. Catching it on day one of the pilot saves the order; catching it on day five means the whole pilot is at risk.
Random-batch QC dropped after Batch 2. Seller reports suggest factories test buyer attention at Batch 3. A buyer who stops inspecting after two clean batches is signalling a shift in QC pressure.

A sample order tests existence. A pilot tests capacity. A random pull tests integrity. Skip any one and the supplier is not verified for production.

How does the Octo 3-Batch Test connect to the 3-Consistency Rule?

The Octo 3-Consistency Rule verifies whether a supplier can run the work — across legal entity, export record, and production capability. The 3-Batch Test verifies whether the supplier will run the work consistently — across sample, pilot, and random pull. The two frameworks operate at different stages of the relationship:

3-Consistency Rule runs before the PO is signed (week 0–3).
3-Batch Test runs after the PO is signed (week 4 through ongoing rotation).

A supplier that passes the 3-Consistency Rule but fails the 3-Batch Test is a real factory that does not maintain quality at production scale. A supplier that fails the 3-Consistency Rule should not have been shortlisted in the first place. Both frameworks are needed because they catch different failure modes.

How does the 3-Batch Test compare to other QC approaches?

Approach	Catches	Misses
Sample-only verification	Whether the factory exists and can produce a unit	Production-line consistency, raw-material substitution, third-batch corner-cutting
AQL pre-shipment inspection (one-off)	Defects on the inspected batch	Inter-batch variance over a multi-order relationship
Continuous SPC monitoring	Statistical process drift inside the factory	Requires deep factory cooperation; rarely available to non-key accounts
Octo 3-Batch Test	Sample, pilot, and third-batch consistency together	Nothing the other approaches catch — designed to layer on top of standard AQL inspection

The 3-Batch Test is not a substitute for AQL pre-shipment inspection. It is a layer above it. AQL tells you whether this batch matches the spec. The 3-Batch Test tells you whether the supplier matches the spec across batches.

How does Octo SAM apply the 3-Batch Test?

Octo SAM bakes the 3-Batch Test into the shortlist brief. Suppliers who refuse golden-sample retention, refuse pilot runs, or refuse random-batch QC pulls are not shortlisted — Octo treats those refusals as a relationship-management risk signal. SAM coordinates the inspection schedule with Bureau Veritas, SGS, or Intertek at published 2026 inspection-market pricing of around $199–$320 per man-day for pre-shipment inspection, or $220–$458 per man-day for factory audits.

A sample is not production. Production is proved across three batches. Octo SAM coordinates golden-sample retention, pilot-run inspection, and random-batch QC across every supplier in your shortlist before the second order ships. See how SAM applies the 3-Batch Test →

Why factory samples lie

What is the Octo 3-Batch Test?

Why does Batch 1 lie?

What does "golden sample" actually mean?

Step 2 — Pilot run (10–20% of master order)

Step 3 — Random pull from the 2nd or 3rd full order

What 4 patterns kill repeatability?

How does the Octo 3-Batch Test connect to the 3-Consistency Rule?

How does the 3-Batch Test compare to other QC approaches?

How does Octo SAM apply the 3-Batch Test?

What buyers ask before
trusting a factory's samples.

What is the Octo 3-Batch Test?

Why does a Chinese factory sample sometimes look better than the production order?

What is a golden sample and why does it matter?

How big should a pilot run be?

How much does pre-shipment inspection cost in 2026?

How do I get started with Octo SAM for 3-Batch-Test sourcing?

Need a sourcing partner that bakes the
3-Batch Test into every shortlist?

Why factory samples lie

What is the Octo 3-Batch Test?

Why does Batch 1 lie?

What does "golden sample" actually mean?

Step 2 — Pilot run (10–20% of master order)

Step 3 — Random pull from the 2nd or 3rd full order

What 4 patterns kill repeatability?

How does the Octo 3-Batch Test connect to the 3-Consistency Rule?

How does the 3-Batch Test compare to other QC approaches?

How does Octo SAM apply the 3-Batch Test?

What buyers ask beforetrusting a factory's samples.

What is the Octo 3-Batch Test?

Why does a Chinese factory sample sometimes look better than the production order?

What is a golden sample and why does it matter?

How big should a pilot run be?

How much does pre-shipment inspection cost in 2026?

How do I get started with Octo SAM for 3-Batch-Test sourcing?

Need a sourcing partner that bakes the3-Batch Test into every shortlist?

What buyers ask before
trusting a factory's samples.

Need a sourcing partner that bakes the
3-Batch Test into every shortlist?