This vignette details some key design choices in this package with the hope that stating these explicitly will improve package navigability.
This package is designed with a focus on project management and communication. This emphasis distinguishes it from some of the other excellent R packages which wrap the GitHub API. In short, this package focuses on the subset of GitHub API functionality most critical to project management and provides additional tooling to support communcation of planning and results.
In particular:
gh
descibes
itself as a ‘Minimalistic GitHub API client in R’. It is very robust and
flexible (and powers this package!) but demands slightly more from users
(e.g. understanding the GitHub API’s endpoints). Instead, this package
supports only a subset of the GitHub API. Instead, projmgr
prioritizes making key project management tasks as friendly as
possible.ghapiv3
also
wraps gh
for a user-friendly, higher-level interface to the
GitHub API. However, like gh
, it also provides broader
support for the API but lacks the workflow-specific functionality.ghclass
shares the goal of streamlining a specific GitHub workflow easier.
However, classroom management is the specific workflow it is built to
improve. That said, ghclass
has some complementary
functions, such as programmatically setting up groups of
repositories.Repositories are “first class” citizens in projmgr
. The
first step to accessing or sending information is to create a repository
reference using the create_repo_ref()
function. The
resulting object is the first element passed into all get_
and post_
functions.
For users that work with databases in R with the DBI
package, this codeflow is analogous to querying from a database. In this
case, users first create a database connection object with
dbConnect()
which is passed into subsequent functions such
as dbGetQuery
.
This decision was based on the assumption that the most common use
case for projmgr
would be interacting with a single
repository at a time. Admittedly, some users may prefer a view further
up in the hierarchy, e.g. an organization object versus a repository
object. By providing lower-level building blocks, broader functionality
can be achieved by mapping over a set of repositories. A code example is
provided in the Event & Team Management vignette.
Functions generally conform to the
<verb>_<details>
convention. For functions
interaction with the GitHub API, the <verb>
component
is the HTTP method invoked (e.g. GET, POST, DELETE).
Verbs like “post” might seem less intuitive than a synonym like “create” or “submit” for users who have not worked with APIs previously. However, this convention describes the function’s action most precisely and ideally also serves to raise awareness of HTTP methods.
Functions that interact with GitHub’s API demure to the naming conventions of that API. This ideally empowers users for future, direct work with the API and allows for easier maintenance.
More specifically, parameters required by the GitHub API are required
by the corresponding functions in this package. Any additional
parameters not required by the GitHub API can be passed in through the
...
s. The help_{function name}
and
browse_docs()
functions can be used to find out more about
the names and descriptions of these optional parameters.
Two noteworthy exceptions are get_issues()
and
get_milestones()
. In the GitHub API, there are separate
endpoints for getting a single item (issue/milestone) or multiple items
However, it seemed unneccesary to create separate functios for the
single and plural versions. Instead, if either function is provided an
argument for number
, the single-item endpoint is used. Any
other query parameters are then irrelevant and ignored. If no argument
is provided for the number
parameter, the multiple-item
enpoint is used with allowed parameters given by
help_{function name}()
.
All get_
functions make a call to the GitHub API and
return the result as an R list. The corresponding parse_
function converts each list into a dataframe for easier wrangling and
analysis. In most all cases, users will likely call parse_
immediately after get_
and never work with the output of
get_
directly. For example:
my_repo <- create_repo_ref('username', 'my_repo') issues <- get_issues(my_repo, state = 'all') %>% parse_issues() issue_events <- get_issue_events(my_repo, number = 7) %>% parse_issue_events() milestones <- get_milestones(my_repo) %>% parse_milestones()
However, the get_
and parse_
functions are
provided separately to empower users. Some use cases where users may
prefer to not use the parse_
functions include if
they:
parse
d outputparse_
functions have not been updatedIn rare cases, get_
functions will return additional
information not provided by the API to preserve data lineage. For
example, get_issue_events()
and
get_issue_comments()
include the issue number (provided as
a required function argument) in the output so users know to what issue
they refer.
The dataframe returned by parse_
functions attempt to
maintain the same field names as used by the GitHub API, similar to the
conventions described in Function
Parameters. However, there are a few key exceptions:
parsed_
field
is instead called “n_comments”There are some disadvantages to this approach. For example, if one
wishes to join issues and milestone data by milestone number, in the
issues data it will be called milestone_number
and in the
milestone data it will simply be called number
. One
alternative would be to use {object}_{field}
conventions
(e.g. issue_name
) uniformly across all datasets. However,
this was not selected since it makes variable names long and bulky.