Events data, as returned by get_issue_events()
, provides
a granular view of issues in your GitHub repository. As opposed to the
issue-level data returned by get_issues()
,
get_issue_events()
provides timestamped event-level data
for each issue, such as the addition and deletion of labels and
milestones. See the Issue Events
GitHub API documentation for a more comprehensive list.
This detailed information has many potential uses. One use in
reporting and visualization is if if the created_at
date
for your issues does not have an intrinsic meaning and does not
represent when work on an issue was started.
For example, let’s suppose we pull some issues from our repo and
visualize their time-to-completion with viz_gantt()
.
my_repo <- create_repo_ref('emilyriederer', 'my_repo') issues <- get_issues(experigit, state = 'closed', milestone = 1) %>% parse_issues()
viz_gantt(issues)
Issue 1 stands out as having taken a very long time to complete. However, it’s possible it was created long before anyone started actively working on it. Instead, we might want to consider the start time to be, for example, when an issue was tagged with the “in-progress label”.
We can get events for a specific issue with the following code. Due
to the potentially massive size of this data, the function returns
events for a single issue at a time and, thus, has a required
number
parameter to specify the issue for which events
should be returned. As such, to events for multiple issues, we need to
use purrr::map()
.
Since events can also include adding assignees or milestones, we
filter our dataset to 'labeled'
events where the
label_name
is “in-progress”.
issue_events <- purrr::map(1:3, ~get_issue_events(experigit, number = .)) %>% purrr::map_df(parse_issue_events) %>% dplyr::filter(event == 'labeled' & label_name == 'in-progress')
The fields of the resulting dataframe are shown below. Note that in
this dataset, the created_at
field refers to when the
event was created (i.e. when the issue was labelled in this
case) and not when the issue was created.
head(issue_events) #> number id actor_login event created_at label_name milestone_title #> 1 1 123 emilyriederer labeled 2018-04-15 in-progress NA #> 2 2 124 emilyriederer labeled 2018-04-02 in-progress NA #> 3 3 125 emilyriederer labeled 2018-04-20 in-progress NA #> assignee_login assigner_login #> 1 NA NA #> 2 NA NA #> 3 NA NA
Next, we can join our datasets together. THe suffix
parameter allows us to append _event
to all the field names
from the events data to distinguished between the issue and event
created_at
fields.
issue_with_events <- dplyr::inner_join(issues, issue_events, by = "number", suffix = c("", "_event"))
Finally, we can use this new dataset to remake our plot by specifying
the name of the new created_at_event
variable as the
appropriate issue start date, via the optional start
parameter.
viz_gantt(issue_with_events, start = "created_at_event")