Replication Package
Code, Data, Audit Trail, and Classification Evidence
This page documents not just how to replicate the analysis, but what we know and don’t know about the data. All classification decisions are documented with confidence levels and evidence. Run data_audit.py to regenerate the full audit trail.
1. Package Structure
2026_sroi/
├── data/
│ ├── enrich_dataset.py # v1: original enrichment pipeline
│ ├── data_audit.py # v2: rigorous audit + corrected dataset
│ ├── analysis.py # Main analysis
│ ├── simulations.py # Bootstrap, permutation, Monte Carlo
│ ├── factors_analysis.py # Calculation element extraction
│ │
│ ├── sroi_clean_dataset_v2.csv # ← USE THIS (with confidence scores)
│ ├── sroi_ratios_subset.csv # 64 reports with SROI ratios
│ ├── sroi_factors.csv # 11 calculation elements × 383 reports
│ │
│ ├── audit_report.md # ← Full data quality audit narrative
│ ├── audit_flags.csv # ← Per-report classification flags
│ └── classification_evidence.csv # ← Quality score evidence per report
└── figures/ # All 16 figures
2. Data Quality Audit
All classification decisions are documented with confidence levels. Run data_audit.py to regenerate.
2.1 Executive Summary
| Variable | HIGH confidence | MEDIUM | LOW | Unknown/Missing |
|---|---|---|---|---|
| Sector | 33 (9%) | 191 (50%) | 159 (41%) | 0 |
| Country | 58 (15%) | 220 (57%) | 26 (7%) | 79 (21%) |
| Year | 200 (52%) | 33 (9%) | 115 (30%) | 35 (9%) |
| Report type | 52 keyword-confirmed | — | 331 assumed | — |
| SROI ratio | 43/64 PDF-verified | — | 21 metadata-only | — |
Scope of uncertainty: Sector, country, and year classifications affect descriptive breakdowns. The main empirical findings (41.2% principle compliance, 5.2% complete adjustments, Monte Carlo correction) rest on quality scores from PDF text and are not sensitive to these classifications.
2.2 Sector Classification
Method: Keyword matching on first 3,000 characters of PDF + title. Winner = sector with highest keyword count.
Only 9% HIGH confidence (margin ≥ 3 keywords). 41% of reports are tied or near-tied between two sectors — they are genuinely cross-sectoral programmes.
# Restrict to unambiguous sector classifications
df_high_sector = df[df['sector_confidence'].isin(['HIGH', 'MEDIUM'])] # n=2242.3 Country Classification
79 reports (21%) remain Unknown even after extending search to 15,000 characters. These are excluded from geographic analyses.
2.4 Year Extraction — Key Fix
Bug in v1: Used min(valid_years) → picked up historical reference years (e.g., “SROI was developed in 2001”) producing impossible report dates.
v2 fix: Hierarchical strategy:
| Strategy | N reports | Confidence |
|---|---|---|
| Explicit date pattern (“March 2016”) | 200 | HIGH |
| Modal year in first 800 chars | 33 | MEDIUM |
| Modal year in first 3,000 chars | 115 | LOW |
| Not found | 35 | NONE |
2.5 Report Type — Critical Transparency Note
322 of 339 “Evaluative” reports are assumed, not keyword-confirmed.
| Type | N | Confirmed | Assumed |
|---|---|---|---|
| Evaluative | 339 | 17 | 322 |
| Forecast | 40 | 40 | 0 |
| Scoping | 4 | 4 | 0 |
The logic: Forecast and Scoping reports consistently self-label. Unlabelled reports are therefore Evaluative by exclusion. Marked with type_note = 'assumed_evaluative'.
# Restrict to keyword-confirmed type only (n=61)
df_confirmed = df[df['type_note'] != 'assumed_evaluative']2.6 SROI Ratio Validation
- 43/64 VERIFIED in PDF text
- 21/64 UNVERIFIED: in database metadata but not found in our PDF text scan
Excluding the 21 unverified ratios changes the median from 4.44 to 4.53 — no material impact on results.
# Conservative: verified ratios only
df_verified_ratios = df[df['ratio_confidence'] == 'VERIFIED'] # n=432.7 Quality Score Limitations
| Principle | Known issue |
|---|---|
| P3 value | Keyword “value” is generic — likely inflates compliance rate |
| P6 transparent | “data”, “evidence” are generic — 122 weak flags |
| P1 stakeholders | 79 reports: single keyword only |
Direction of bias: Inflated P3 and P6 scores mean our 41.2% average compliance is conservative — the true gap may be larger. This strengthens the paper’s main argument.
classification_evidence.csv contains the exact keyword and text context for every quality score assignment.
3. How to Run
Prerequisites
pip install pandas numpy scipy matplotlib seaborn scikit-learnSteps
# Step 1: Generate audit + corrected dataset (v2)
python data/data_audit.py
# Step 2: Main analysis
python data/analysis.py
# Step 3: Simulations (bootstrap, Monte Carlo)
python data/simulations.py
# Step 4: Calculation elements
python data/factors_analysis.py
# Step 5: Compile website
quarto renderExpected runtime
| Script | Time |
|---|---|
data_audit.py |
~2 min |
analysis.py |
~30 sec |
simulations.py |
~3 min (50k MC iterations) |
factors_analysis.py |
~1 min |
4. Data Dictionary — sroi_clean_dataset_v2.csv
| Column | Type | Description |
|---|---|---|
id |
int | Unique report ID |
sector_clean |
str | Sector (15 categories) |
sector_confidence |
str | HIGH / MEDIUM / LOW |
country_clean |
str | Country or “Unknown” |
country_confidence |
str | HIGH / MEDIUM / LOW / NONE |
year_clean |
int | Report year from PDF |
year_method |
str | explicit_pattern / modal_800chars / modal_3000chars |
year_confidence |
str | HIGH / MEDIUM / LOW / NONE |
report_type_clean |
str | Forecast / Evaluative / Scoping |
type_note |
str | keyword_match / assumed_evaluative |
type_confidence |
str | HIGH / MEDIUM / LOW |
assurance_clean |
int | 1 = SVI assurance |
sroi_ratio_value |
float | SROI ratio (null if absent) |
ratio_confidence |
str | VERIFIED / UNVERIFIED / N/A |
p1 – p8 |
int 0–2 | Quality score per SVI principle |
quality_pct |
float | Mean score as % (0–100) |
n_quality_flags |
int | Number of weak-evidence quality flags |
5. Robustness Checks
import pandas as pd
df = pd.read_csv('data/sroi_clean_dataset_v2.csv')
# Main result
print(df['quality_pct'].mean()) # ~41.2%
# R1: High-confidence sectors only
print(df[df['sector_confidence']=='HIGH']['quality_pct'].mean())
# R2: PDF-verified ratios only
df_r = df[df['ratio_confidence']=='VERIFIED']
print(df_r['sroi_ratio_value'].describe())
# R3: Keyword-confirmed report types
df_t = df[df['type_note'] != 'assumed_evaluative']
print(df_t.groupby('report_type_clean')['quality_pct'].mean())
# R4: Exclude weak-flag quality scores
df_strict = df[df['n_quality_flags'] == 0]
print(df_strict['quality_pct'].mean())6. Downloads
| File | Description |
|---|---|
| sroi_clean_dataset_v2.csv | Main dataset (recommended) |
| audit_flags.csv | Per-report flags |
| classification_evidence.csv | Quality score evidence |
| sroi_factors.csv | Calculation elements |
| data_audit.py | Full audit script |
| analysis.py | Analysis code |
| simulations.py | Simulation code |
7. Citation
@article{SROIMetaAnalysis2026,
author = {[Author]},
title = {From Principles to Practice: A Systematic Content Analysis
of SROI Reporting in the Social Value International Database},
journal = {Voluntas},
year = {2026},
note = {Forthcoming. Replication: https://jcmunozmora.github.io/sroi-meta-analysis/}
}