Replication Package

Code, Data, Audit Trail, and Classification Evidence

Note

This page documents not just how to replicate the analysis, but what we know and don’t know about the data. All classification decisions are documented with confidence levels and evidence. Run data_audit.py to regenerate the full audit trail.

1. Package Structure

2026_sroi/
├── data/
│   ├── enrich_dataset.py             # v1: original enrichment pipeline
│   ├── data_audit.py                 # v2: rigorous audit + corrected dataset
│   ├── analysis.py                   # Main analysis
│   ├── simulations.py                # Bootstrap, permutation, Monte Carlo
│   ├── factors_analysis.py           # Calculation element extraction
│   │
│   ├── sroi_clean_dataset_v2.csv     # ← USE THIS (with confidence scores)
│   ├── sroi_ratios_subset.csv        # 64 reports with SROI ratios
│   ├── sroi_factors.csv              # 11 calculation elements × 383 reports
│   │
│   ├── audit_report.md               # ← Full data quality audit narrative
│   ├── audit_flags.csv               # ← Per-report classification flags
│   └── classification_evidence.csv   # ← Quality score evidence per report
└── figures/                          # All 16 figures

2. Data Quality Audit

All classification decisions are documented with confidence levels. Run data_audit.py to regenerate.

2.1 Executive Summary

Variable	HIGH confidence	MEDIUM	LOW	Unknown/Missing
Sector	33 (9%)	191 (50%)	159 (41%)	0
Country	58 (15%)	220 (57%)	26 (7%)	79 (21%)
Year	200 (52%)	33 (9%)	115 (30%)	35 (9%)
Report type	52 keyword-confirmed	—	331 assumed	—
SROI ratio	43/64 PDF-verified	—	21 metadata-only	—

Important

Scope of uncertainty: Sector, country, and year classifications affect descriptive breakdowns. The main empirical findings (41.2% principle compliance, 5.2% complete adjustments, Monte Carlo correction) rest on quality scores from PDF text and are not sensitive to these classifications.

2.2 Sector Classification

Method: Keyword matching on first 3,000 characters of PDF + title. Winner = sector with highest keyword count.

Only 9% HIGH confidence (margin ≥ 3 keywords). 41% of reports are tied or near-tied between two sectors — they are genuinely cross-sectoral programmes.

# Restrict to unambiguous sector classifications
df_high_sector = df[df['sector_confidence'].isin(['HIGH', 'MEDIUM'])]  # n=224

2.3 Country Classification

79 reports (21%) remain Unknown even after extending search to 15,000 characters. These are excluded from geographic analyses.

2.4 Year Extraction — Key Fix

Bug in v1: Used min(valid_years) → picked up historical reference years (e.g., “SROI was developed in 2001”) producing impossible report dates.

v2 fix: Hierarchical strategy:

Strategy	N reports	Confidence
Explicit date pattern (“March 2016”)	200	HIGH
Modal year in first 800 chars	33	MEDIUM
Modal year in first 3,000 chars	115	LOW
Not found	35	NONE

2.5 Report Type — Critical Transparency Note

Important

322 of 339 “Evaluative” reports are assumed, not keyword-confirmed.

Type	N	Confirmed	Assumed
Evaluative	339	17	322
Forecast	40	40	0
Scoping	4	4	0

The logic: Forecast and Scoping reports consistently self-label. Unlabelled reports are therefore Evaluative by exclusion. Marked with type_note = 'assumed_evaluative'.

# Restrict to keyword-confirmed type only (n=61)
df_confirmed = df[df['type_note'] != 'assumed_evaluative']

2.6 SROI Ratio Validation

43/64 VERIFIED in PDF text
21/64 UNVERIFIED: in database metadata but not found in our PDF text scan

Excluding the 21 unverified ratios changes the median from 4.44 to 4.53 — no material impact on results.

# Conservative: verified ratios only
df_verified_ratios = df[df['ratio_confidence'] == 'VERIFIED']  # n=43

2.7 Quality Score Limitations

Principle	Known issue
P3 value	Keyword “value” is generic — likely inflates compliance rate
P6 transparent	“data”, “evidence” are generic — 122 weak flags
P1 stakeholders	79 reports: single keyword only

Direction of bias: Inflated P3 and P6 scores mean our 41.2% average compliance is conservative — the true gap may be larger. This strengthens the paper’s main argument.

classification_evidence.csv contains the exact keyword and text context for every quality score assignment.

3. How to Run

Prerequisites

pip install pandas numpy scipy matplotlib seaborn scikit-learn

Steps

# Step 1: Generate audit + corrected dataset (v2)
python data/data_audit.py

# Step 2: Main analysis
python data/analysis.py

# Step 3: Simulations (bootstrap, Monte Carlo)
python data/simulations.py

# Step 4: Calculation elements
python data/factors_analysis.py

# Step 5: Compile website
quarto render

Expected runtime

Script	Time
`data_audit.py`	~2 min
`analysis.py`	~30 sec
`simulations.py`	~3 min (50k MC iterations)
`factors_analysis.py`	~1 min

4. Data Dictionary — sroi_clean_dataset_v2.csv

Column	Type	Description
`id`	int	Unique report ID
`sector_clean`	str	Sector (15 categories)
`sector_confidence`	str	HIGH / MEDIUM / LOW
`country_clean`	str	Country or “Unknown”
`country_confidence`	str	HIGH / MEDIUM / LOW / NONE
`year_clean`	int	Report year from PDF
`year_method`	str	explicit_pattern / modal_800chars / modal_3000chars
`year_confidence`	str	HIGH / MEDIUM / LOW / NONE
`report_type_clean`	str	Forecast / Evaluative / Scoping
`type_note`	str	keyword_match / assumed_evaluative
`type_confidence`	str	HIGH / MEDIUM / LOW
`assurance_clean`	int	1 = SVI assurance
`sroi_ratio_value`	float	SROI ratio (null if absent)
`ratio_confidence`	str	VERIFIED / UNVERIFIED / N/A
`p1` – `p8`	int 0–2	Quality score per SVI principle
`quality_pct`	float	Mean score as % (0–100)
`n_quality_flags`	int	Number of weak-evidence quality flags

5. Robustness Checks

import pandas as pd

df = pd.read_csv('data/sroi_clean_dataset_v2.csv')

# Main result
print(df['quality_pct'].mean())  # ~41.2%

# R1: High-confidence sectors only
print(df[df['sector_confidence']=='HIGH']['quality_pct'].mean())

# R2: PDF-verified ratios only
df_r = df[df['ratio_confidence']=='VERIFIED']
print(df_r['sroi_ratio_value'].describe())

# R3: Keyword-confirmed report types
df_t = df[df['type_note'] != 'assumed_evaluative']
print(df_t.groupby('report_type_clean')['quality_pct'].mean())

# R4: Exclude weak-flag quality scores
df_strict = df[df['n_quality_flags'] == 0]
print(df_strict['quality_pct'].mean())

6. Downloads

File	Description
sroi_clean_dataset_v2.csv	Main dataset (recommended)
audit_flags.csv	Per-report flags
classification_evidence.csv	Quality score evidence
sroi_factors.csv	Calculation elements
data_audit.py	Full audit script
analysis.py	Analysis code
simulations.py	Simulation code

7. Citation

@article{SROIMetaAnalysis2026,
  author  = {[Author]},
  title   = {From Principles to Practice: A Systematic Content Analysis
             of SROI Reporting in the Social Value International Database},
  journal = {Voluntas},
  year    = {2026},
  note    = {Forthcoming. Replication: https://jcmunozmora.github.io/sroi-meta-analysis/}
}

--- title: "Replication Package" subtitle: "Code, Data, Audit Trail, and Classification Evidence" --- ::: {.callout-note} This page documents not just **how to replicate** the analysis, but **what we know and don't know** about the data. All classification decisions are documented with confidence levels and evidence. Run `data_audit.py` to regenerate the full audit trail. ::: ## 1. Package Structure ``` 2026_sroi/ ├── data/ │ ├── enrich_dataset.py # v1: original enrichment pipeline │ ├── data_audit.py # v2: rigorous audit + corrected dataset │ ├── analysis.py # Main analysis │ ├── simulations.py # Bootstrap, permutation, Monte Carlo │ ├── factors_analysis.py # Calculation element extraction │ │ │ ├── sroi_clean_dataset_v2.csv # ← USE THIS (with confidence scores) │ ├── sroi_ratios_subset.csv # 64 reports with SROI ratios │ ├── sroi_factors.csv # 11 calculation elements × 383 reports │ │ │ ├── audit_report.md # ← Full data quality audit narrative │ ├── audit_flags.csv # ← Per-report classification flags │ └── classification_evidence.csv # ← Quality score evidence per report └── figures/ # All 16 figures ``` --- ## 2. Data Quality Audit All classification decisions are documented with confidence levels. Run `data_audit.py` to regenerate. ### 2.1 Executive Summary | Variable | HIGH confidence | MEDIUM | LOW | Unknown/Missing | |----------|----------------|--------|-----|----------------| | Sector | 33 (9%) | 191 (50%) | 159 (41%) | 0 | | Country | 58 (15%) | 220 (57%) | 26 (7%) | 79 (21%) | | Year | 200 (52%) | 33 (9%) | 115 (30%) | 35 (9%) | | Report type | 52 keyword-confirmed | — | 331 assumed | — | | SROI ratio | 43/64 PDF-verified | — | 21 metadata-only | — | ::: {.callout-important} **Scope of uncertainty:** Sector, country, and year classifications affect descriptive breakdowns. The main empirical findings (41.2% principle compliance, 5.2% complete adjustments, Monte Carlo correction) rest on quality scores from PDF text and are not sensitive to these classifications. ::: ### 2.2 Sector Classification **Method:** Keyword matching on first 3,000 characters of PDF + title. Winner = sector with highest keyword count. Only **9% HIGH confidence** (margin ≥ 3 keywords). 41% of reports are tied or near-tied between two sectors — they are genuinely cross-sectoral programmes. ```python # Restrict to unambiguous sector classifications df_high_sector = df[df['sector_confidence'].isin(['HIGH', 'MEDIUM'])] # n=224 ``` ### 2.3 Country Classification **79 reports (21%) remain Unknown** even after extending search to 15,000 characters. These are excluded from geographic analyses. ### 2.4 Year Extraction — Key Fix **Bug in v1:** Used `min(valid_years)` → picked up historical reference years (e.g., "SROI was developed in 2001") producing impossible report dates. **v2 fix:** Hierarchical strategy: | Strategy | N reports | Confidence | |----------|-----------|------------| | Explicit date pattern ("March 2016") | 200 | HIGH | | Modal year in first 800 chars | 33 | MEDIUM | | Modal year in first 3,000 chars | 115 | LOW | | Not found | 35 | NONE | ### 2.5 Report Type — Critical Transparency Note ::: {.callout-important} **322 of 339 "Evaluative" reports are assumed, not keyword-confirmed.** ::: | Type | N | Confirmed | Assumed | |------|---|-----------|---------| | Evaluative | 339 | 17 | 322 | | Forecast | 40 | 40 | 0 | | Scoping | 4 | 4 | 0 | The logic: Forecast and Scoping reports consistently self-label. Unlabelled reports are therefore Evaluative by exclusion. Marked with `type_note = 'assumed_evaluative'`. ```python # Restrict to keyword-confirmed type only (n=61) df_confirmed = df[df['type_note'] != 'assumed_evaluative'] ``` ### 2.6 SROI Ratio Validation - **43/64 VERIFIED** in PDF text - **21/64 UNVERIFIED**: in database metadata but not found in our PDF text scan Excluding the 21 unverified ratios changes the median from 4.44 to 4.53 — no material impact on results. ```python # Conservative: verified ratios only df_verified_ratios = df[df['ratio_confidence'] == 'VERIFIED'] # n=43 ``` ### 2.7 Quality Score Limitations | Principle | Known issue | |-----------|-------------| | P3 value | Keyword "value" is generic — likely inflates compliance rate | | P6 transparent | "data", "evidence" are generic — 122 weak flags | | P1 stakeholders | 79 reports: single keyword only | **Direction of bias:** Inflated P3 and P6 scores mean our 41.2% average compliance is *conservative* — the true gap may be larger. This strengthens the paper's main argument. `classification_evidence.csv` contains the exact keyword and text context for every quality score assignment. --- ## 3. How to Run ### Prerequisites ```bash pip install pandas numpy scipy matplotlib seaborn scikit-learn ``` ### Steps ```bash # Step 1: Generate audit + corrected dataset (v2) python data/data_audit.py # Step 2: Main analysis python data/analysis.py # Step 3: Simulations (bootstrap, Monte Carlo) python data/simulations.py # Step 4: Calculation elements python data/factors_analysis.py # Step 5: Compile website quarto render ``` ### Expected runtime | Script | Time | |--------|------| | `data_audit.py` | ~2 min | | `analysis.py` | ~30 sec | | `simulations.py` | ~3 min (50k MC iterations) | | `factors_analysis.py` | ~1 min | --- ## 4. Data Dictionary — sroi_clean_dataset_v2.csv | Column | Type | Description | |--------|------|-------------| | `id` | int | Unique report ID | | `sector_clean` | str | Sector (15 categories) | | `sector_confidence` | str | HIGH / MEDIUM / LOW | | `country_clean` | str | Country or "Unknown" | | `country_confidence` | str | HIGH / MEDIUM / LOW / NONE | | `year_clean` | int | Report year from PDF | | `year_method` | str | explicit_pattern / modal_800chars / modal_3000chars | | `year_confidence` | str | HIGH / MEDIUM / LOW / NONE | | `report_type_clean` | str | Forecast / Evaluative / Scoping | | `type_note` | str | keyword_match / assumed_evaluative | | `type_confidence` | str | HIGH / MEDIUM / LOW | | `assurance_clean` | int | 1 = SVI assurance | | `sroi_ratio_value` | float | SROI ratio (null if absent) | | `ratio_confidence` | str | VERIFIED / UNVERIFIED / N/A | | `p1` – `p8` | int 0–2 | Quality score per SVI principle | | `quality_pct` | float | Mean score as % (0–100) | | `n_quality_flags` | int | Number of weak-evidence quality flags | --- ## 5. Robustness Checks ```python import pandas as pd df = pd.read_csv('data/sroi_clean_dataset_v2.csv') # Main result print(df['quality_pct'].mean()) # ~41.2% # R1: High-confidence sectors only print(df[df['sector_confidence']=='HIGH']['quality_pct'].mean()) # R2: PDF-verified ratios only df_r = df[df['ratio_confidence']=='VERIFIED'] print(df_r['sroi_ratio_value'].describe()) # R3: Keyword-confirmed report types df_t = df[df['type_note'] != 'assumed_evaluative'] print(df_t.groupby('report_type_clean')['quality_pct'].mean()) # R4: Exclude weak-flag quality scores df_strict = df[df['n_quality_flags'] == 0] print(df_strict['quality_pct'].mean()) ``` --- ## 6. Downloads | File | Description | |------|-------------| | [sroi_clean_dataset_v2.csv](sroi_clean_dataset_v2.csv) | Main dataset (recommended) | | [audit_flags.csv](audit_flags.csv) | Per-report flags | | [classification_evidence.csv](classification_evidence.csv) | Quality score evidence | | [sroi_factors.csv](sroi_factors.csv) | Calculation elements | | [data_audit.py](data_audit.py) | Full audit script | | [analysis.py](analysis.py) | Analysis code | | [simulations.py](simulations.py) | Simulation code | --- ## 7. Citation ```bibtex @article{SROIMetaAnalysis2026, author = {[Author]}, title = {From Principles to Practice: A Systematic Content Analysis of SROI Reporting in the Social Value International Database}, journal = {Voluntas}, year = {2026}, note = {Forthcoming. Replication: https://jcmunozmora.github.io/sroi-meta-analysis/} } ```