What is the best way to extract data from several JIRA projects into Data Warehouse

Jennifer Volkening May 6, 2015

We have a requirement to extract data from multiple projects in JIRA to our Data Warehouse. The data we have been asked to migrate from JIRA will be used in conjunction with other data in the Data Warehouse for the purpose of reporting employee performance and the monthly bonus program for several departments.

I will be creating a daily incremental ETL job to execute a custom SSIS package for each JIRA project. I'm not sure whether to have the SSIS package connect to a JIRA API to retrieve the data is the best method or connecting to our JIRA database instance. Normally I would prefer to connect to the database, but I am concerned that future changes to database schemas, data types, field names, etc. could occur and potentially cause the ETL job to fail.

What is the best method of extracting data for our purposes? Perhaps there is a better method available than what I have mentioned above and am just not aware of it.

7 answers

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

1 vote
Lee Schlesinger July 19, 2018

You can use a cloud-based ETL tool. Stitch, for example, can replicate Jira data to Amazon Redshift, Google BigQuery, Snowflake, and PostgreSQL data warehouses. https://www.stitchdata.com/integrations/jira/

0 votes
Josh Yeats November 6, 2018

Azure Data Factory has a JIRA connector. I've got this working to extract a large portion of the data except for the custom field values. Still working through all the tables to try and locate this but haven't had any luck yet.

If anyone is also trying to use this method, would be great to hear from you.

0 votes
Erik Warming January 31, 2018

We are also interested in a similar solution and would love to hear your experiences

0 votes
Dzmitry Tabolich January 31, 2016

Hi guys,

Currently we're trying to solve very similar business task and extract JIRA data for our internal BI reporting.

We use REST API as a way to go, but what I'm looking for is kind of best practice or guidelines of how to deal with:

  1. First time sync with big amount of items (we have ~20 Projects with ~10.000 work items). I don't want to push JIRA N+1 times to get all the required data, so maybe you can suggest how to fetch this in most efficient manner;

  2. Snapshot updates by the schedule (daily, hourly). What would be the suggestion regarding synchronization and fetching only delta/changed items.

In order to be more specific, for our reporting we need to have basic issue information (type, name, owner, etc.) and reported time (worklogs).

Please, let me know if do you need any specific details and I'll be happy to provide.

Thank you in advance!

 

Bob Swift OSS (Bob Swift Atlassian Apps)
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
February 1, 2016
  • Issue data is the most volume, so use JQL query with something like "updated > -1d"
  • Make sure handles duplicate updates so you don't have to worry about exact intervals
  • Have strategy for handling deletes
  • Best if you can run a full re-sync at any time for times when processing fails for an extended time
  • Use a reliable scheduler like Bamboo with good logging

 

Drickus Annandale January 30, 2018

Hi

We have a similar requirements. We need to do Jira BI Reporting\Dashboarding for the  enterprise.

From my understanding the REST API functionality is more to get a single issue \ task or to create items programaticly in Jira? 

For a ETL requirement where bulk loading is a big portion ....  it would be more stable and sustainable to source directly from source database to destination database. (once the config changes to the tool have been finalized etc.) . Delta loads would also be easire to manage and setup. 

Your thoughts? 

Regards, D 

Like Roopa Raghavan likes this
0 votes
Midori
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 6, 2015

I'd definitely do this with the REST API (and avoid direct database access at all costs). It's standard, documented, supported by modern languages / technologies / frameworks - probably even by the ETL tool you're using!

If you wanted to do some pre-processing on the data before loading that to your data warehouse, you could even try the Better Excel Plugin. It allows applying all Excel features to JIRA data, basically implementing the Transform step in ETL. You can easily automate it with the Excel Automation Plugin and use the Save Excel action to save the XLS file to the filesystem, where your data loading logic could pick it up (tutorial).

0 votes
Bob Swift OSS (Bob Swift Atlassian Apps)
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 6, 2015

You can use various JIRA Command Line Interface (CLI) actions (like getIssueList) to extract issue or other data to csv. That offers some level of protected from database changes. When I was involved in doing this some time ago, we used a combination of JCLI and direct database access for driving automation to and from a datamart. My experience was that the JIRA database was quite stable, but you had to be prepared for some maintenance on JIRA upgrades.

Bob Swift _Appfire_ January 30, 2018

Data Warehouse for DevOps has some current information for this topic.

0 votes
Volodymyr Krupach
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 6, 2015

Hi Jennifer,

I suggest you to define what exact data you want to extract from projects: new issues, resolved issues, reported time. When you defined what you need to extract, you will have higher chances to get better answers here.

For now you can have a look at JIRA REST API: https://docs.atlassian.com/jira/REST/latest/

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

TAGS
AUG Leaders

Atlassian Community Events