min read

Unloading GitHub data by Xplenty

Data analyst

PUBLISHED:

October 5, 2018

LAST UPDATED :

April 10, 2024

We showed a possible way to create ad-hoc reports on GitHub data, but if you regularly work with information about the repositories it is worth building a package that would collect this information in a regular relational database. At Bitrise we use Xplenty for ETL tasks. Let me guide you building a package processing this data.

Since values in response from GitHub API are URLs themselves we need a couple of steps to process all the data: get, flatten and parse.

GET

Bitrise's GitHub repositories are organized into multiple organizations. We should collect all the repositories belonging to the organizations and then apply transformations on the results of related endpoints.

REST API source is needed to get the body from the /repos endpoint. Please spend a minute looking at the settings below.

GitHub encourages us to explicitly refer to the API version by adding the application/vnd.github.v3+json value to the Accept header. The other header we need is Authorization. Create an OAuth application on GitHub in order to avoid rate limiting. Add its token here with token YOUR_APP_TOKEN. It is even more elegant if you set your token as a package variable. Naturally, in this case token $YOUR_VARIABLE should be the value.

$[*] JSON Expression tells Xplenty that it should process the unnamed array from the response. This array, although lacking a name, contains all repositories. The original response from the API looks like

We named all the organizations the repositories of which we are interested in and collected them into one list by sequential union steps. Unfortunately, this step can only use two input tables.

The response header contains links to next pages so we can tick Use pagination at the default settings.

Even more GET

Actually, we are often not only interested in the repositories. Although this endpoint tells the number of open issues we do not know how many we've closed. In order to cope with such a problem or to collect information on pull requests, we initiate a new request on the endpoints named in repository data.

To process JSON data we should add the following Select components.

Curl function with a GET parameter is used to request data from an endpoint which is stored as a value. For example, pull requests against a specific repository:

This function stores the whole JSON response into the field of our table. In the next step, we have to extract the body from this JSON. If we named the field storing the JSON response the function extracting the array of pull requests is the one below.

Note that just like we did with the repositories, we need to extract all in the unnamed array from the body of thepulls field. The type of pulls is a map, a sequence of key-value pairs, and we can reference the map key with #.

Flatten

Flatten() generates rows for each element of a bag. It also means that the final function in this step is

JsonStringToBag() is applied first since pulls is a JSON string.

The result of this step is a new field, and each record contains one pull request. Let's call the new field pull.

Parse

There is only one step remaining: parsing the JSON key-value pairs into new fields and values. Each field needs

pull converted to JSON
JSON string converted to map
the reference of the field key

The only thing you still need is a destination table in a relational database you favour. Do not forget to add Post-action SQL commands creating indexes on your new table and records.

Get Started for free

Start building now, choose a plan later.

Read also

Q&A: Accelerating Europe’s EV transition: How Bitrise is helping Elli lead the charge

Bitrise & Community

min read

Q&A: Accelerating Europe’s EV transition: How Bitrise is helping Elli lead the charge

Elli (part of Volkswagen Group) slashed mobile build times from 90 minutes to 10 with Bitrise, freeing up developers to focus on building mission-critical apps to drive EV adoption across Europe. This is their story.

Laura Hauser

VP of Marketing

Bitrise achieves PCI DSS compliance to strengthen customer data security

Bitrise & Community

min read

Bitrise achieves PCI DSS compliance to strengthen customer data security

Security is one of the cornerstones of how we’re making Bitrise a better product each day. We’re excited to share that we’re now a PCI DSS compliant service provider!

Gabor Nadai

R&D Operations

Q&A: A behind-the-scenes look at Rightmove’s mobile transformation with Bitrise

Bitrise & Community

min read

Q&A: A behind-the-scenes look at Rightmove’s mobile transformation with Bitrise

In 2024 alone, Rightmove’s website and app combined were visited 2.3 billion times, making it the UK’s fourth most visited platform. We caught up with Rightmove's Valentin Holgado and Mayur Tadpelliwar to chat about the challenges they faced and how Bitrise is helping them raise the bar for mobile innovation.

Laura Hauser

VP of Marketing

Get started for free

Start building now, choose a plan later.

Talk to Sales

Products

Fully managed hosting

self-hosted solutions

Company stage

industry

compare

Learn & connect

TEchnologies

quickstart

popular steps

Learn

customer stories

Products

Fully managed hosting

self-hosted solutions

Company stage

industry

compare

Learn & connect

TEchnologies

quickstart

popular steps

Learn

customer stories

Unloading GitHub data by Xplenty

GET

Even more GET

Flatten

Parse

Read also

Q&A: Accelerating Europe’s EV transition: How Bitrise is helping Elli lead the charge

Bitrise achieves PCI DSS compliance to strengthen customer data security

Q&A: A behind-the-scenes look at Rightmove’s mobile transformation with Bitrise

Get started for free