Managing data with PowerShell

At Bitrise we love automation: it can save you time for creative work. Originally, PowerShell was developed to support system administrators automating and managing tasks and processes of the Windows operating system. Why should we consider and how can we use it in data business?

Why PowerShell?

PowerShell commands can be run in the terminal while supporting an object-oriented concept, we get the best of both worlds:

  • PowerShell works with objects,
  • but without labyrinth of versions and environments.
  • No need to compile.
  • It is relatively easy to find the needed cmdlets by their Verb-Noun format.
  • Built-in commands for HTTP requests.
  • Packages to manage cloud services like the ones of Google Cloud or Microsoft Azure.
  • Several options for specific tasks due to .NET.
  • It is cross-platform.

Installing on Linux

Unfortunately, as of today, installation on Ubuntu is not that straightforward. PowerShell Core depends on libicu60 which cannot be downloaded from the regular repositories.

Installing packages and modules

In PowerShell Install-Package and Install-Module cmdlets adds new capabilities to your setup or session.
In case you need it, for example in a Dockerfile, you can use pwsh with -Command parameter in the following format:

Mind the '&' before your commands: it is needed when running pwsh from another shell.

Once we have PowerShell installed we can start using it for, besides several others, the following use cases.

Sample applications

CSV to JSON

PowerShell can work with CSV and JSON out of the box. The next snippet

  1. imports a CSV,
  2. adds current datetime property as updated_at and
  3. exports data to JSON.

CSV to Document Database

It is a common task for data experts to collect information from flat files into a database. Here CSV files are our source and CosmosDB is our Destination.

Query Cosmos DB

You can use SQL here.

Excel files

For several organizations data means Excel. Fortunately, there is a really feature-rich package from Doug Finke, to cope with XLSX files and automate tasks. It is available on both GitHub and PowerShell Gallery.

Install ImportExcel module for the current user:

Install-Module ImportExcel -scope CurrentUser

Then you can import your XLSX, CSV or JSON and define Excel tasks in Export-Excel command. E.g. importing a sheet from an XLSX file and exporting pivot table into another can be achieved with this few lines:

Download Recurly export files

At Bitrise, we use Recurly to manage our subscriptions. Unfortunately, relatively strict limits on number of API calls in a specific time frame are set by the provider. The suggested way of retrieving data is setting up exports on Recurly's site and collecting daily batches:

  1. We need to GET a temporary download url to each daily export of each endpoint.
  2. The export files are available for an hour. We can download them by another call.

Get Started for free

Start building now, choose a plan later.

Sign Up

Get started for free

Start building now, choose a plan later.