Replication Package

Code, Data, Audit Trail, and Classification Evidence

Note

This page documents not just how to replicate the analysis, but what we know and don’t know about the data. All classification decisions are documented with confidence levels and evidence. Run data_audit.py to regenerate the full audit trail.

1. Package Structure

2026_sroi/
├── data/
│   ├── enrich_dataset.py             # v1: original enrichment pipeline
│   ├── data_audit.py                 # v2: rigorous audit + corrected dataset
│   ├── analysis.py                   # Main analysis
│   ├── simulations.py                # Bootstrap, permutation, Monte Carlo
│   ├── factors_analysis.py           # Calculation element extraction
│   │
│   ├── sroi_clean_dataset_v2.csv     # ← USE THIS (with confidence scores)
│   ├── sroi_ratios_subset.csv        # 64 reports with SROI ratios
│   ├── sroi_factors.csv              # 11 calculation elements × 383 reports
│   │
│   ├── audit_report.md               # ← Full data quality audit narrative
│   ├── audit_flags.csv               # ← Per-report classification flags
│   └── classification_evidence.csv   # ← Quality score evidence per report
└── figures/                          # All 16 figures

2. Data Quality Audit

All classification decisions are documented with confidence levels. Run data_audit.py to regenerate.

2.1 Executive Summary

Variable HIGH confidence MEDIUM LOW Unknown/Missing
Sector 33 (9%) 191 (50%) 159 (41%) 0
Country 58 (15%) 220 (57%) 26 (7%) 79 (21%)
Year 200 (52%) 33 (9%) 115 (30%) 35 (9%)
Report type 52 keyword-confirmed 331 assumed
SROI ratio 43/64 PDF-verified 21 metadata-only
Important

Scope of uncertainty: Sector, country, and year classifications affect descriptive breakdowns. The main empirical findings (41.2% principle compliance, 5.2% complete adjustments, Monte Carlo correction) rest on quality scores from PDF text and are not sensitive to these classifications.

2.2 Sector Classification

Method: Keyword matching on first 3,000 characters of PDF + title. Winner = sector with highest keyword count.

Only 9% HIGH confidence (margin ≥ 3 keywords). 41% of reports are tied or near-tied between two sectors — they are genuinely cross-sectoral programmes.

# Restrict to unambiguous sector classifications
df_high_sector = df[df['sector_confidence'].isin(['HIGH', 'MEDIUM'])]  # n=224

2.3 Country Classification

79 reports (21%) remain Unknown even after extending search to 15,000 characters. These are excluded from geographic analyses.

2.4 Year Extraction — Key Fix

Bug in v1: Used min(valid_years) → picked up historical reference years (e.g., “SROI was developed in 2001”) producing impossible report dates.

v2 fix: Hierarchical strategy:

Strategy N reports Confidence
Explicit date pattern (“March 2016”) 200 HIGH
Modal year in first 800 chars 33 MEDIUM
Modal year in first 3,000 chars 115 LOW
Not found 35 NONE

2.5 Report Type — Critical Transparency Note

Important

322 of 339 “Evaluative” reports are assumed, not keyword-confirmed.

Type N Confirmed Assumed
Evaluative 339 17 322
Forecast 40 40 0
Scoping 4 4 0

The logic: Forecast and Scoping reports consistently self-label. Unlabelled reports are therefore Evaluative by exclusion. Marked with type_note = 'assumed_evaluative'.

# Restrict to keyword-confirmed type only (n=61)
df_confirmed = df[df['type_note'] != 'assumed_evaluative']

2.6 SROI Ratio Validation

  • 43/64 VERIFIED in PDF text
  • 21/64 UNVERIFIED: in database metadata but not found in our PDF text scan

Excluding the 21 unverified ratios changes the median from 4.44 to 4.53 — no material impact on results.

# Conservative: verified ratios only
df_verified_ratios = df[df['ratio_confidence'] == 'VERIFIED']  # n=43

2.7 Quality Score Limitations

Principle Known issue
P3 value Keyword “value” is generic — likely inflates compliance rate
P6 transparent “data”, “evidence” are generic — 122 weak flags
P1 stakeholders 79 reports: single keyword only

Direction of bias: Inflated P3 and P6 scores mean our 41.2% average compliance is conservative — the true gap may be larger. This strengthens the paper’s main argument.

classification_evidence.csv contains the exact keyword and text context for every quality score assignment.


3. How to Run

Prerequisites

pip install pandas numpy scipy matplotlib seaborn scikit-learn

Steps

# Step 1: Generate audit + corrected dataset (v2)
python data/data_audit.py

# Step 2: Main analysis
python data/analysis.py

# Step 3: Simulations (bootstrap, Monte Carlo)
python data/simulations.py

# Step 4: Calculation elements
python data/factors_analysis.py

# Step 5: Compile website
quarto render

Expected runtime

Script Time
data_audit.py ~2 min
analysis.py ~30 sec
simulations.py ~3 min (50k MC iterations)
factors_analysis.py ~1 min

4. Data Dictionary — sroi_clean_dataset_v2.csv

Column Type Description
id int Unique report ID
sector_clean str Sector (15 categories)
sector_confidence str HIGH / MEDIUM / LOW
country_clean str Country or “Unknown”
country_confidence str HIGH / MEDIUM / LOW / NONE
year_clean int Report year from PDF
year_method str explicit_pattern / modal_800chars / modal_3000chars
year_confidence str HIGH / MEDIUM / LOW / NONE
report_type_clean str Forecast / Evaluative / Scoping
type_note str keyword_match / assumed_evaluative
type_confidence str HIGH / MEDIUM / LOW
assurance_clean int 1 = SVI assurance
sroi_ratio_value float SROI ratio (null if absent)
ratio_confidence str VERIFIED / UNVERIFIED / N/A
p1p8 int 0–2 Quality score per SVI principle
quality_pct float Mean score as % (0–100)
n_quality_flags int Number of weak-evidence quality flags

5. Robustness Checks

import pandas as pd

df = pd.read_csv('data/sroi_clean_dataset_v2.csv')

# Main result
print(df['quality_pct'].mean())  # ~41.2%

# R1: High-confidence sectors only
print(df[df['sector_confidence']=='HIGH']['quality_pct'].mean())

# R2: PDF-verified ratios only
df_r = df[df['ratio_confidence']=='VERIFIED']
print(df_r['sroi_ratio_value'].describe())

# R3: Keyword-confirmed report types
df_t = df[df['type_note'] != 'assumed_evaluative']
print(df_t.groupby('report_type_clean')['quality_pct'].mean())

# R4: Exclude weak-flag quality scores
df_strict = df[df['n_quality_flags'] == 0]
print(df_strict['quality_pct'].mean())

6. Downloads

File Description
sroi_clean_dataset_v2.csv Main dataset (recommended)
audit_flags.csv Per-report flags
classification_evidence.csv Quality score evidence
sroi_factors.csv Calculation elements
data_audit.py Full audit script
analysis.py Analysis code
simulations.py Simulation code

7. Citation

@article{SROIMetaAnalysis2026,
  author  = {[Author]},
  title   = {From Principles to Practice: A Systematic Content Analysis
             of SROI Reporting in the Social Value International Database},
  journal = {Voluntas},
  year    = {2026},
  note    = {Forthcoming. Replication: https://jcmunozmora.github.io/sroi-meta-analysis/}
}