Open source research isn’t only about analysing social media or satellite imagery. Another important area involves investigating company structures and relationships.
However, official company registries can be unwieldy or difficult to navigate.
This is where OpenCorporates – a free repository of company registries aggregated from primary public sources, published in order to promote corporate transparency – can be helpful. OpenCorporates is clear about the provenance of their data, so you know where they collected each record, and when. This is critical for companies and finance research and investigations.
While there are other databases like it, OpenCorporates is the only one that covers so many jurisdictions (145 as at 16 June 2023). To see other sites that offer access to corporate registries, go to the companies and finance tab of the Bellingcat Online Investigation Toolkit.
Still, even user-friendly websites like OpenCorporates can be a daunting prospect – conducting searches one by one takes a great deal of time, and it can involve a lot of copying and pasting of search results. But the Open Corporates API can save a lot of time.
What is an API and Why Use One?
An API is an Application Programming Interface. Broadly speaking, it allows users to obtain data from a database without having to know about the structure or languages used to manage that database. Users can build new and interesting tools with API access or create different ways to look at the data they contain, as many previously did with the Twitter API, for example.
Put simply, if you’re researching data at scale, access to the API of an organisation that stores a lot of data can be a powerful resource. It can allow researchers to explore and compare data in ways that might not otherwise be possible.
Imagine, for example, that you’re looking at a company and all the individuals and other companies related to it within OpenCorporates. If you did individual searches for each company and director, it would take a lot of note-taking and organising. Access to an API can allow you to pull all the data at once and create new methods of searching to more easily identify relationships.
Luckily, OpenCorporates provides free API access if you’re undertaking a public benefit project. We’ll detail how to apply for access later on in this guide.
Some investigative journalists say this method has helped them find leads earlier. David Szakonyi, co-founder of the Anti-Corruption Data Collective, reflects that “by accessing OpenCorporates’ API, we achieved in less than a day what would have taken two people between four and six months to do”. OpenCorporates helped the ICIJ to connect companies and directors for over 240,000 companies as part of the Panama Papers investigation, before releasing it all as open data.
What follows is a guide on how to get the best from Open Corporates. If you have coding or tool-building experience, you will be able to do more. But even those who don’t can learn how to use it and gather all manner of potentially valuable and revealing information. Whatsmore, once you understand how to explore the OpenCorporates API, you can begin to transfer that knowledge to other large datasets that provide API access such as the OpenSanctions database.
Much of what is contained within this guide can also be explored in the below video by Rebecca Lee of OpenCorporates. But this guide aims to build upon this very useful resource.
Editor’s note: Some of the images contained in this guide have been blurred as they include the details of real companies that are featured on OpenCorporates.
Set up your Free OpenCorporates Account
Go to the OpenCorporates (OC) service desk to apply for access as a public benefit project. Academics, NGOs, registered journalists, media organisations and registered nonprofits can apply for free access under an open public licence. If you’re an independent journalist, you’ll need to use your own name and email address and provide a list of media sites that have referred to your reporting. OpenCorporates may ask for a copy of a current press pass (or equivalent).
Note that their licence requires you to attribute the use of their database in your reporting or published investigation outputs — see their terms of use.
Choose your Method
Based on the needs of your investigation, it is possible to obtain:
If you’re not sure yet, start with the ‘Data for one company’ section as it’s the easiest, then work your way up from there.
OpenCorporates has a knowledge base, which is a good place to find things like the API documentation and data dictionaries (which cover things like what is the ‘normalised_name’ field – you need to understand this when you are drawing data to use in an investigation).
Data for One Company (Browser & Excel)
Use this method when you have no programming experience. It’s useful when you want to pull the search results for one company into Excel. But it does have the limitation that Excel is made for viewing data all in one table.
This method involves: Using your browser to query the API; downloading an XML file; and viewing the data in Excel by opening it as an XML table.
Query the API With your Browser
Search for the company that you want to view on opencorporates.com and select it (i.e. click on the company name in the search results to view its page).
Go to the browser’s address bar and adjust the URL:
Hit enter, and the data you have queried via the API will appear in your browser. Right-click and save as page source:
View the XML File in Excel
- Open the XML file in Excel, and if presented with an option, choose to open as an XML table.
- You might get two error boxes that pop up — just click OK.
You will need to tidy the table as when it displays the data, Excel will duplicate some of the company information (in columns) as it creates one row per officer. You will also need to remove some columns, like ‘@type’.
Data for One Company Name Across all Jurisdictions (Browser & Excel)
Use this method when you have no programming experience but are comfortable in Excel. It’s useful when you know the company name and want to draw data for all the companies with that name across all the jurisdictions that OpenCorporates covers.
This method involves: Using your browser to query the API; downloading an XML file; viewing the data in Excel by using the XML source task pane.
Query the API with your Browser
Search with the company name on opencorporates.com. You should get a URL that looks something like this:
https://opencorporates.com/companies?q=company+name&utf8=✓ (Your search terms will appear in place of ‘company’ and ‘name’.)
Go to the browser’s address bar and adjust the URL:
Hit enter and the data you have queried via the API will appear in your browser. Right-click and save as XML.
View the XML File in Excel – Open Using the XML Source Task Pane
Depending on which version of Excel you have, opening the file in Excel will work differently.
It’s also possible to upload to Google Sheets — an explanation of how to do so can be found here.
Microsoft 365 on a Mac
Open the file and it will open like any Excel file. Then clean the data to make it more usable (e.g. delete unnecessary columns).
Windows’ version of Excel
Open the XML file, and make sure that you choose ‘Use the XML source task pane’.
Click ‘OK’ on the next box that pops up. You should see something like this:
This is a schema based on the XML data that you have opened — it’s the structure of the data.
Select Which Fields you Want to View
Select the fields by dragging and dropping them onto the Excel sheet:
Drag and drop as many fields as you need as columns in row 1. You’ll end up with something like this:
To populate the sheet with your chosen data, right-click somewhere on row 1, and choose ‘Refresh XML data’:
That will pull in the data for the fields you want. It will look something like this:
Note: You may well end up with less results in your Excel sheet than in your search on opencorporates.com. This is because there is a default page limit in the XML results (see the bottom of the page in your browser):
To get all the results, you need to add this to the URL that you used to query the API:
&per_page=100&page=1
Your URL will look something like this:
https://api.opencorporates.com/companies/searchq=company+name&api_token=XXXX&format=xml&per_page=100&page=1
Hit enter to refresh the page, and save that page as XML.If your search results on the OpenCorporates website were more than 100, just change the end of the URL to &page=2 and hit enter to get the next page of data.
Data for Multiple Companies with Different Names (XML & OpenRefine)
Use this method when you have a list of companies for which you want to find matching legal entities across all jurisdictions on OpenCorporates. This method is more involved and requires the use of OpenRefine, a popular open source data wrangling tool that has built-in functions for accessing data through APIs, something that is not possible with Excel or other spreadsheet programs. Naturally, this will involve more learning if OpenRefine is new to you. But it’s worth it if you want to draw down company data at scale – this is where using the API gets really powerful!
A note: In this method you are using the OpenCorporates reconciliation API (a.k.a. the legal entity resolution API).
This method involves:
- Getting further approval from OpenCorporates
- Installing OpenRefine
- Reconciling (or matching) your list against the OpenCorporates database
- Improving your reconciliations
- Pulling data into OpenRefine
- Parsing the fields that you want.
Get Approval from OpenCorporates
If you want to access the API via OpenRefine you will need to get on the “approved” list by contacting community@opencorporates.com.
Install and Set Up in OpenRefine
- Go to openrefine.org
- See OpenRefine’s installation requirements and instructions
- Create a CSV file with the list of companies that you want to query against OpenCorporates. The list of companies can come from research that you have already been doing on OpenCorporates or from any other list or source you may have gathered. It’s better to have more columns in your CSV as this will help OpenRefine to find better matches in the database. In the example used here, the columns are company name, country code, and date active. To create a CSV, put your data in Excel and save as CSV
- Create a project in OpenRefine, click ‘Choose Files’ and upload the CSV
- Click ‘Next’ and you should see something like this:
- Click ‘Create Project’ (at the top right of the screen). You are now ready to reconcile your list of companies with the OpenCorporates database.
Reconcile (Match) your List against OpenCorporates
- Reconciling is the matching of your list of companies to companies in the OpenCorporates database, so that you pull down the data for the right companies.
- You need to tell OpenRefine which column to use to reconcile against the OpenCorporates database. To do this, click on the arrow next to the ‘Name’ column, scroll down to ‘Reconcile’ and select ‘Start reconciling’:
- Click on ‘Add Standard Service’ in the box that pops up in order to add the OpenCorporates service.
- Enter this URL: https://opencorporates.com/reconcile and click ‘Add Service’:
(Note: This URL may change. If it doesn’t work, check the API documentation on api.opencorporates.com.)
- Click ‘Start Reconciling’.
- Once reconciling is finished, the data will look like this in OpenRefine:
- The companies in blue (although partially blurred in the image above) are the ones that OpenRefine has reconciled (matched) with in the OpenCorporates database.
- The reconciliation score is in brackets in grey — that is a score for how close the match is. (It’s out of 100.) In this case the match was on company name, which was the column chosen above to reconcile
- If you hover over a matched company name, the data from OpenCorporates will pop up — this will help you to assess if it’s a good match. It is important to assess this yourself as OpenCorporates does not guarantee that the results it pulls up are a match.
- You can manually select the right match for each company in your list (your CSV), by clicking on the tick next to the company that you think is the best match.
- Alternatively, you can match the companies in your list to all the highest-scoring matches from OpenCorporates. To do this, click on the arrow next to ‘Name’ (or whatever the column header was in your data), go down to ‘Reconcile’, then ‘Actions’, then select ‘Match each cell to its best candidate’:
- Once you have manually selected all the matches or automatically selected the best matches, you’ll see your matched list of companies:
Improve your Reconciliation: Add the Jurisdiction as a Service
If the companies in your list are all in one jurisdiction, you can constrain the reconciliation to only look for companies in that jurisdiction in the OpenCorporates database.
- To do this, add the country code to the URL you entered in the ‘Add Standard Service’ box:
- Use the ISO3166 alpha codes. These are used throughout the IT industry by computer systems and software to ease the identification of country names. They are also what OpenCorporates uses — you’ll see them in the URLs on the website.
- Most jurisdictions are at the country level.
- The US, Canada and France however have jurisdictions at state/region level. For those, put the country code_region/state code at the end of the url, e.g. https://opencorporates.com/reconcile/us_mt for Montana.
- You can find all the jurisdiction codes by doing a search on the OpenCorporates website to see the code in the search result URL.
- You’ll see that the jurisdiction-constrained reconciliation has been added on the left-hand side of the ‘Reconciliation’ box:
In this image, two jurisdiction-constrained reconciliations have been added – one for the UK and one for the US state of Montana.
- In order to reconcile using one of the services you have added, you need to select it (so that has a pale grey background, like the Montana service in the image above), then click ‘Start Reconciling’
Improve your Reconciliation: Add Multiple Parameters
Adding multiple parameters to your reconciliation will improve the matching results. The reconciliation service can handle these two: country (jurisdiction) and date. You have to have those two columns in your CSV list.
- The date parameter is when a company was active or in existence. It helps the reconciliation to prioritise companies that were active on a specified date.
- To use the date parameter, you need to transform the date column in your uploaded CSV so that it is in a format that is recognised by the OpenCorporates API. To do this:
- Start by clearing the reconciliation data (you need to do this as you are going to be doing a new reconciliation after you have done the transformation): Click the arrow next to ‘Name’ (or whatever the column header was in your data), then go to ‘Reconcile’, then ‘Actions’, then select ‘Clear reconciliation data’:
To transform your date column, click on the arrow next to ‘Date Active’ (or whatever the column name was in your data), then ‘Edit cells’, then ‘Common transforms’, then select ‘To date’:
Your uploaded CSV will now look this this (don’t worry about the 00:00 times):
- Now you can select multiple parameters for reconciliation. First, click on the service that you want to use to reconcile:
(In the case above, a non-jurisdiction specific service was selected.)
- Once the service has been selected, click the parameters you want on the right-hand side of the ‘Reconcile’ box:
- Fill in the required parameter names in the ‘As Property’ text boxes (you have to use ‘jurisdiction_code’ and ‘date’ as these are the parameter names that the OpenCorporates API recognises).
- If you use the ‘date’ parameter, a company will score higher if it was active at the date in your CSV, and will score a lot lower if it was not in existence or was inactive at that date.
- This is particularly useful for jurisdictions that allow only one company with a particular name to be active at a point in time.
- Click ‘Start Reconciling’
- If you have judgement and best candidate score boxes from previous reconciliations on the left-hand side, get rid of the old ones by clicking the ‘x’ at the top left of the boxes (the old ones are at the top), e.g.:
(Note: You need to do this to get the full results of the new reconciliation.)
- To filter by score, move the slider in this box:
- Read more about the judgement and best candidate scores in the OpenRefine documentation.
- Now that you have added the jurisdiction and date parameters to your reconciliation, you should have a lot fewer results to assess for correct matches.
- Before you can proceed, you need to manually accept matches, or use the ‘Match each cell to its best candidate’ function.
Pull data from OpenCorporates into OpenRefine
Now that you have your CSV list of companies matched to a company in the OpenCorporates database, you can pull in more data to give you a richer data set for your investigation. To do this:
- Start by pulling in the OpenCorporates URL for each company. To do this, click on the arrow next to ‘Name’ (or whatever you called that column), then ‘Edit column’, then select ‘Add column based on this column’:
In the ‘Add column based on column Name’ box that pops up, put this into the Expression text box: ‘http://opencorporates.com’+cell.recon.match.id, and give your new column a name:
Click ‘OK’.
You should now see these columns:
The OpenCorporates URLs are also a unique ID for each company, although they have been obscured in the image above.
- Next you need to pull all the data for each company through the API, and then parse it into separate columns.
- To pull all the data through the API, click on the arrow next to the ‘Name’, then ‘Edit column’, then select ‘Add column by fetching URLs’:
In the ‘Add column by fetching URLs based on column Name’ box that pops up, put this into the Expression text box: ‘http://api.opencorporates.com’ + cell.recon.match.id + ‘?api_token=XXXX’, give your new column a name and enter 200 into the ‘Throttle delay’ box so that you don’t overload the server:
(The new column has been named ‘OpenCorporates_JSON’ as JSON is used to pull the data out, but you can call it what you want.)
Click ‘OK’. It may take a little time for all the data to come down, especially if you have a long list.
You should end up with something that looks like this:
Parse the Fields that you Want
The data above is not usable and needs to be parsed into columns.
- To do that, click on the arrow next to ‘Name’, then ‘Edit column’, then select ‘Add column based on this column’:
- To parse the incorporation date, click on the arrow next to your new column (in this picture ‘OpenCorporates_JSON’).
- In the ‘Add column based on column OpemCorporates_JSON’ box that pops up, put this into the Expression text box: value.parseJson()[“results”][“company”][“incorporation_date”]
- Give your new column a name:
- Click OK.
- Scroll to the right and you’ll see the new incorporation date column:
- To parse the current status, use this expression: value.parseJson()[“results”][“company”][“current_status”]
- To parse other fields, look for their names in the JSON so that you can use their proper names in the expression, e.g.:
In the above image you can see: name, company_number, jurisdiction_code.
Conclusion
It takes work to learn to use the OpenCorporates API, but it’s worth it – you can save yourself days of manual labour. If you’re going to be researching companies at scale, learn to use OpenRefine. It may be useful for other investigations — see the list of reconciliation services that allow for data to be brought into OpenRefine (e.g. OpenSanctions).
This guide is built on the video ‘CIJ Summer Workshops: OpenCorporates API for Beginners’ on this page.
The author would like to confirm that they have no financial or legal interest in OpenCorporates, nor prior personal relationship with its founders and employees. This article has been undertaken strictly from the perspective of an open source researcher.
Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Instagram here, Twitter here and Mastodon here.