The COKI Open Access Dataset is available in JSON Lines format. See below for dataset releases, license, how to cite the website and dataset, attributions and the dataset schema.
Releases
| 2025-08-18 | Download coki-oa-dataset.zip | 
| 2025-07-21 | Download coki-oa-dataset.zip | 
| 2025-04-07 | Download coki-oa-dataset.zip | 
| 2025-02-13 | Download coki-oa-dataset.zip | 
| 2024-12-08 | Download coki-oa-dataset.zip | 
The COKI Open Access Dataset © 2022 by Curtin University is licensed under CC BY 4.0.
Citing
To cite the COKI Open Access Dashboard please use the following citation:
Diprose, J., Hosking, R., Rigoni, R., Roelofs, A., Chien, T., Napier, K., Wilson, K., Huang, C., Handcock, R., Montgomery, L., & Neylon, C. (2023). A User-Friendly Dashboard for Tracking Global Open Access Performance. The Journal of Electronic Publishing 26(1). doi: https://doi.org/10.3998/jep.3398
If you use the website code, please cite it as below:
James P. Diprose, Richard Hosking, Richard Rigoni, Aniek Roelofs, Alex Massen-Hane, Kathryn R. Napier, Tuan-Yow Chien, Katie S. Wilson, Lucy Montgomery, & Cameron Neylon. (2022). COKI Open Access Website. Zenodo. https://doi.org/10.5281/zenodo.6374486
If you use this dataset, please cite it as below:
Richard Hosking, James P. Diprose, Aniek Roelofs, Tuan-Yow Chien, Lucy Montgomery, & Cameron Neylon. (2022). COKI Open Access Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6399462
For other citation formats follow the doi.org links in the above citations.
The COKI Open Access Dataset contains information from:
| Field | Type | Description | 
|---|---|---|
| id | String | The country id; an ISO 3166-1 alpha-3 country code. | 
| name | String | The country name. | 
| subregion | String | The name of the subregion the country is located in. | 
| region | String | The name of the region the country is located in. | 
| start_year | Integer | The start year of data used to calculate the statistics. | 
| end_year | Integer | The end year of data used to calculate the statistics. | 
| stats | PublicationStats | The aggregated publication statistics for this country, for all time. | 
| years | List<Year> | The publication statistics for each year. | 
Table 1. Country Schema.
| Field | Type | Description | 
|---|---|---|
| id | String | The institution id; a Research Organization Registry identifier. | 
| name | String | The institution name. | 
| country_name | String | The name of the country where the institution is located. | 
| country_code | String | The three letter an ISO 3166-1 alpha-3 code of the country where the institution is located. | 
| subregion | String | The name of the subregion where the institution is located. | 
| region | String | The name of the region where the institution is located. | 
| institution_types | List<String> | A list of institution types that apply to this institution. Each instance can be one of: Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, Other. | 
| start_year | Integer | The start year of data used to calculate the statistics. | 
| end_year | Integer | The end year of data used to calculate the statistics. | 
| stats | PublicationStats | The aggregated publication statistics for this institution, for all time. | 
| years | List<Year> | The publication statistics for each year. | 
Table 2. Institution Schema.
| Field | Type | Description | 
|---|---|---|
| n_citations | Integer | The total number of outputs cited. | 
| n_outputs | Integer | The total number of outputs published. | 
| n_outputs_open | Integer | The total number of open outputs. | 
| n_outputs_publisher_open | Integer | The total number of outputs published as Publisher Open. | 
| n_outputs_publisher_open_only | Integer | The total number of outputs published only as Publisher Open (and not Other Platform Open or Closed). | 
| n_outputs_both | Integer | The total number of outputs published that are both Publisher Open and Other Platform Open. | 
| n_outputs_other_platform_open | Integer | The total number of outputs published as Other Platform Open. | 
| n_outputs_other_platform_open_only | Integer | The total number of outputs published only as Other Platform Open (and not Publisher Open or Closed). | 
| n_outputs_closed | Integer | The total number of outputs published as Closed. | 
| n_outputs_oa_journal | Integer | Publisher Open Breakdown: the total number of outputs published in an Open Access Journal. | 
| n_outputs_hybrid | Integer | Publisher Open Breakdown: the total number of outputs made accessible in a Subscription Journal with an open license. | 
| n_outputs_no_guarantees | Integer | Publisher Open Breakdown: the total number of outputs made accessible in a Subscription Publisher with no reuse rights. | 
| p_outputs_open | Float | The percentage of open outputs. | 
| p_outputs_publisher_open | Float | The percentage of outputs published as Publisher Open. | 
| p_outputs_publisher_open_only | Float | The percentage of outputs published only as Publisher Open (and not Other Platform Open or Closed). | 
| p_outputs_both | Float | The percentage of outputs published that are both Publisher Open and Other Platform Open. | 
| p_outputs_other_platform_open | Float | The percentage of outputs published as Other Platform Open. | 
| p_outputs_other_platform_open_only | Float | The percentage of outputs published only as Other Platform Open (and not Publisher Open or Closed). | 
| p_outputs_closed | Float | The percentage of outputs published as Closed. | 
| p_outputs_oa_journal | Float | The percentage of Publisher Open outputs published in an Open Access Journal. | 
| p_outputs_hybrid | Float | The percentage of Publisher Open outputs made accessible in a Subscription Journal with an open license. | 
| p_outputs_no_guarantees | Float | The percentage of Publisher Open outputs made accessible in a Subscription Publisher with no reuse rights. | 
Table 3. PublicationStats Schema.
| Field | Type | Description | 
|---|---|---|
| year | Integer | The year that this record applies to. | 
| date | Date | The date that this record applies to, in the format YYYY-MM-DD. The day and month are always the end of the year in question, i.e. the 31st of December. | 
| stats | PublicationStats | The aggregated publication statistics for the year that this record applies to. | 
Table 4. Year Schema.