This vignette details some key design choices in this package with the hope that stating these explicitly will improve package navigability.

Comparison to Alternatives

This package is designed with a focus on project management and communication. This emphasis distinguishes it from some of the other excellent R packages which wrap the GitHub API. In short, this package focuses on the subset of GitHub API functionality most critical to project management and provides additional tooling to support communcation of planning and results.

In particular:

  • gh descibes itself as a ‘Minimalistic GitHub API client in R’. It is very robust and flexible (and powers this package!) but demands slightly more from users (e.g. understanding the GitHub API’s endpoints). Instead, this package supports only a subset of the GitHub API. Instead, projmgr prioritizes making key project management tasks as friendly as possible.
  • ghapiv3 also wraps gh for a user-friendly, higher-level interface to the GitHub API. However, like gh, it also provides broader support for the API but lacks the workflow-specific functionality.
  • ghclass shares the goal of streamlining a specific GitHub workflow easier. However, classroom management is the specific workflow it is built to improve. That said, ghclass has some complementary functions, such as programmatically setting up groups of repositories.
  • Project Management on GitHub provides GUI-based project management solutions that overlap with this package. However, this package provides a way to efficiently extract similiar information programmatically and share if with others (e.g. advisors, executives) who do not use GitHub.

Repo as First Class Citizen

Repositories are “first class” citizens in projmgr. The first step to accessing or sending information is to create a repository reference using the create_repo_ref() function. The resulting object is the first element passed into all get_ and post_ functions.

For users that work with databases in R with the DBI package, this codeflow is analogous to querying from a database. In this case, users first create a database connection object with dbConnect() which is passed into subsequent functions such as dbGetQuery.

This decision was based on the assumption that the most common use case for projmgr would be interacting with a single repository at a time. Admittedly, some users may prefer a view further up in the hierarchy, e.g. an organization object versus a repository object. By providing lower-level building blocks, broader functionality can be achieved by mapping over a set of repositories. A code example is provided in the Event & Team Management vignette.

Function Naming

Functions generally conform to the <verb>_<details> convention. For functions interaction with the GitHub API, the <verb> component is the HTTP method invoked (e.g. GET, POST, DELETE).

Verbs like “post” might seem less intuitive than a synonym like “create” or “submit” for users who have not worked with APIs previously. However, this convention describes the function’s action most precisely and ideally also serves to raise awareness of HTTP methods.

Function Parameters

Functions that interact with GitHub’s API demure to the naming conventions of that API. This ideally empowers users for future, direct work with the API and allows for easier maintenance.

More specifically, parameters required by the GitHub API are required by the corresponding functions in this package. Any additional parameters not required by the GitHub API can be passed in through the ...s. The help_{function name} and browse_docs() functions can be used to find out more about the names and descriptions of these optional parameters.

Two noteworthy exceptions are get_issues() and get_milestones(). In the GitHub API, there are separate endpoints for getting a single item (issue/milestone) or multiple items However, it seemed unneccesary to create separate functios for the single and plural versions. Instead, if either function is provided an argument for number, the single-item endpoint is used. Any other query parameters are then irrelevant and ignored. If no argument is provided for the number parameter, the multiple-item enpoint is used with allowed parameters given by help_{function name}().

Get-Parse Codeflow

All get_ functions make a call to the GitHub API and return the result as an R list. The corresponding parse_ function converts each list into a dataframe for easier wrangling and analysis. In most all cases, users will likely call parse_ immediately after get_ and never work with the output of get_ directly. For example:

my_repo <- create_repo_ref('username', 'my_repo')
issues <- get_issues(my_repo, state = 'all') %>% parse_issues()
issue_events <- get_issue_events(my_repo, number = 7) %>% parse_issue_events()
milestones <- get_milestones(my_repo) %>% parse_milestones()

However, the get_ and parse_ functions are provided separately to empower users. Some use cases where users may prefer to not use the parse_ functions include if they:

  • wish to understand the type of output returned by the API
  • prefer working with lists than dataframes
  • want to access information returned by the API that is not included in the parsed output
  • need a stop-gap solution if the GitHub API changes and the parse_ functions have not been updated

In rare cases, get_ functions will return additional information not provided by the API to preserve data lineage. For example, get_issue_events() and get_issue_comments() include the issue number (provided as a required function argument) in the output so users know to what issue they refer.

Parse Output Variable Names

The dataframe returned by parse_ functions attempt to maintain the same field names as used by the GitHub API, similar to the conventions described in Function Parameters. However, there are a few key exceptions:

  • When the field name is ambiguous, additional information may be appended.
    • For example, after getting an issue, the “comments” field contains the count of comments. Since the name might otherwise suggested that the text of the comments is being returned, the parsed_ field is instead called “n_comments”
  • When the field is nested, underscores are used to provide the hierarchy.
    • For example, after getting an issue, the “user” field contains multiple subfields about the issue author. Most critical is that user’s login, so the field in the resulting dataframe is called “user_login”.
    • This can provide some confusion since other non-hierarchical fields naturally contain underscores, such as the “created_at” date.

There are some disadvantages to this approach. For example, if one wishes to join issues and milestone data by milestone number, in the issues data it will be called milestone_number and in the milestone data it will simply be called number. One alternative would be to use {object}_{field} conventions (e.g. issue_name) uniformly across all datasets. However, this was not selected since it makes variable names long and bulky.