Most nonprofit directories say "verified" without saying what that means. We publish the actual pipeline: data sources, verification cadence, scoring formula, geocoding methodology, and quality-assurance workflows. Deterministic, reproducible, and open under CC-BY 4.0.
Records are aggregated from authoritative federal, state, and nonprofit sources. We never accept payment for inclusion. Original-source attribution is preserved in each record's data_source field.
| Source | Coverage |
|---|---|
| VA Lighthouse Facilities API | VA Medical Centers, CBOCs, Vet Centers, Mobile Vet Centers, National Cemeteries (~1,500 records) |
| VA Office of General Counsel Accreditation Roster | VA-accredited attorneys, claim agents, and VSO representatives (~50,000 records) |
| State CVSO directories (50 states + DC) | County Veterans Service Officers — free claims-help offices |
| HRSA Federally Qualified Health Centers (FQHCs) | ~14,000 federally-qualified clinics that accept veterans |
| CMS Medicare-certified providers | Nursing homes, home-health, hospice, dialysis |
| SAMHSA-tagged behavioral health providers | Mental health + substance use providers serving veterans |
| HUD-VASH grantees | Supportive Housing for veterans (joint HUD-VA program) |
| SSVF grantees | Supportive Services for Veteran Families |
| VSO chapter directories | American Legion, VFW, DAV, AMVETS, PVA posts |
| State agency veteran-services directories | State-level benefits offices, veteran homes, etc. |
| U.S. Census ACS 5-year (2022) | Tract-level veteran demographics, housing burden, income |
| CDC PLACES | Tract-level health prevalences (used for context, not as resources) |
Records are continuously verified by automated pipelines plus manual editorial review. Stale data is flagged + auto-demoted; broken URLs trigger Wayback Machine snapshots; phone numbers are quarterly-validated against Twilio Lookup v2.
| Pipeline | Cadence | What it does |
|---|---|---|
| Self-healing daemon | Every 6 hours | Evaluates health_score, auto-demotes flagged-closed rows, runs HEAD probes on URLs, runs Census Geocoder on next 100 unprocessed rows |
| Census Geocoder | Every 6 hours | Assigns FIPS state/county/tract to records missing them. ~100 rows/run, ~400/day |
| Wayback Machine snapshot backfill | Every 6 hours | Captures snapshots of broken URLs to archive.org for posterity |
| Haiku enrichment pass | Weekly (Tuesday 08:00 UTC) | Claude Haiku 4.5 enriches 100 stale rows with structured eligibility tags, specialty programs, walk-in/appointment status, language, women-focused, va_accredited, and 1-line summary |
| ProPublica Form 990 enrichment | Weekly (Tuesday) | EIN-matched records get latest 990 revenue, expenses, year-of-filing |
| VA Lighthouse reconciler | Weekly (Sunday 06:00 UTC) | Stages differences between our records and VA's authoritative facility data to pending_updates queue for staff review |
| URL health scan | Monthly (1st 04:00 UTC) | HEAD probe + Google Safe Browsing sweep over all 20K+ URLs |
| Twilio Lookup v2 phone audit | Quarterly (Jan/Apr/Jul/Oct 1) | Line-type intelligence — flags disconnected, fax-only, or premium-rate numbers (~$90/run) |
| IndexNow re-push (Bing/Yandex/Seznam/Naver) | Weekly post-enrichment | Pushes the 200 most-recently-updated URLs for instant search-engine re-crawl |
The completeness_score (0-100, integer) is a deterministic function of which fields are populated + verification status + ProPublica match. No manual curation, no payment for placement. Recomputed nightly. Full formula:
Score is exposed in every /resource/{slug}-{id} page header (gold pill on /best-of/* pages) and in the /api/resources/bulk JSON. Reproducible — run our SQL against your own snapshot of /api/export/snapshot to get identical scores.
Every record with a postal address is geocoded to FIPS-tract level via the US Census Geocoder (Single-Address One-Line API). Tract-level granularity (median ~4,000 residents per tract) lets us join records to ACS demographics + CDC PLACES health data + ACS housing burden — surfacing service deserts for grant proposals.
Pipeline:
Public_AR_Census2020geocode_source = 'failed' for manual reviewCurrent FIPS-tract assignment: ~98%+ of records (live count on /dashboard). Failed-geocode rate: ~1.5%, mostly PO Box addresses or military APO/FPO.
website_status = 'broken' and a Wayback snapshot link.phone_verified + phone_type exposed.dedup_candidates table holds suspected duplicates (same address + same name fuzz match) for staff review.veteran_primary = built for veterans; veteran_serving = veterans named target population; vet_accepting = general healthcare). Default search filters to first two tiers.verification_runs table.Anyone — including the listed organization, a veteran, a journalist, or a researcher — can submit a correction or addition. Workflow:
resource_feedback tableresource_history for full audit trailAll Warriors Fund datasets are licensed under Creative Commons Attribution 4.0 International (CC-BY 4.0). Free to share, adapt, and build upon for any purpose (commercial included), provided you credit "Wounded Warriors / Warriors Fund (EIN 86-1336741)" and link back to https://warriorsfund.org/.
Bulk data: CSV · JSON · SHA-256 integrity-stamped snapshot · Project Open Data v1.1 / DCAT 1.1 catalog
Every full-database snapshot ships with a SHA-256 hash so you can verify integrity. The snapshot is reproducible: given the same verified_date as-of-time + the same source-data state, our scoring formula returns identical scores. We publish the snapshot generator + scoring SQL for full reproducibility.
How to verify:
{snapshot_hash, generated_at, ...}snapshot_hashgenerated_atThis methodology page is the canonical reference for our data pipeline. It's updated whenever the pipeline changes — see Git history at github.com/EmperorMew/WWF for the audit trail. Foundation officers + academic reviewers + journalists doing source-evaluation: this page should answer your "how do you verify?" question definitively. If anything is unclear, email info@warriorsfund.org.