Report from URSSI workshop on software credit, citation, and metrics

Karthik Ram
March 24, 2019

Summary:

One of the biggest obstacles to making research software sustainable is ensuring appropriate credit and recognition for researchers who develop and maintain such software. We convened 16 experts over two days to identify core issues around software credit and propose concrete steps that a software institute might take to solve them. We identified six core issues directly related to credit (career paths, individual impact, disincentives in the academic credit model, quality versus impact, recognition of software value, lack of funding) and two broader challenges (lack of funding for maintenance and lack of awareness of best practices). Using a strategy development exercise we brainstormed these topics in small groups and came up with lists of short and long term actions for an institute to tackle.


The mission of the URSSI conceptualization is to understand the challenges that researchers face regarding sustainable research software and to design a potential institute that would address these challenges. One particular challenge identified in our first workshop was the lack of consensus on credit for research software. There is a need to gain a deeper understanding of the concept of credit for research software and how, as a community, we could define and measure impact. To explore this topic in greater detail we conducted a topic focused workshop on software metrics and credit at the National Center for Ecological Analysis and Synthesis (NCEAS) in Santa Barbara on January 23-24, 2019. We invited 19 experts to participate in a series of discussions and deliver short presentations on their organization’s efforts in this space. Below is a summary of our discussions from the two days.

Discussion topics

We began the workshop with a group discussion on whether we should focus on software credit and incentives in general or focus specifically on machine actionable/automated ways of assessing software that lead to credit measures and incentives.

We focused the discussion on what an ideal credit scenario might look like and how to better reward software through citation. In particular we tried to unpack the intent of citation in the context of research software:

  • Do I want to cite software to provide credit?

  • Do I want to cite software for reproducibility?

  • Do I want to cite software to prove my credibility and respectability (i.e. show that I understand the area/quality signaling/ paying my dues)?

  • Do I want to cite software to provide transparency of my results (“this is what I did”)?

Scope of the Software Institute

What should a software sustainability institute focus on with respect to software credit? Participants advocated for an institute that would support software at all levels, not just software that has been published or archived. Such an institute should support:

  • New and experienced package authors.
  • Those that rely on scripts as part of their workflow.
  • Help research share code (that is not packaged in software) and get credit in a way that is analogous to credit for methodology?
  • Help researchers get credit for lowering the cost of understanding and extending research through software

The institute could work towards helping scientists get to the next stage from simply sharing scripts. This could take the form of training and support to develop vetted, published software packages that follow community centric best practices. The institute could also help create metrics that would be useful for tenure committees. Software is making its way to tenure letters and URSSI could play a role in placing research software in the right context for evaluation.

Lightning talks

Planning a set of URSSI activities around metrics and credit

In the last half-day of the workshop, the group tried to follow the methodology from “Good Strategy Bad Strategy” by Richard P. Rumelt, to develop a strategy for an institute to follow. Rumelt says that the kernel of a strategy contains three elements:

  1. A diagnosis that defines or explains the nature of the challenge. A good diagnosis simplifies the often-overwhelming complexity of reality by identifying certain aspects of the situation as critical.

  2. A guiding policy for dealing with the challenge. This is an overall approach chosen to cope with or overcome the obstacles identified in the diagnosis.

  3. A set of coherent actions that are designed to carry out the guiding policy. These are steps that are coordinated with one another to work together in accomplishing the guiding policy.

First, the attendees, working with the overall challenge of making research software more sustainable, determined a set of eight challenges, six of which are related to credit:

  1. Career paths aren’t well established

  2. It’s hard to measure the impact of individuals

  3. The academic credit model disincentivizes individual contributions to public goods / infrastructure

  4. [Disentangle | We conflate] quality and impact of software

  5. There is a lack of recognition of the value and importance of software

  6. There is a lack of [funding opportunities | stable funding] for maintenance of software that is important but doesn’t have a generic market

And two of which are not:

  1. “Lumpy” project funding means that maintenance/sustainability can’t be reliably folded into project costs

  2. There’s a lack of awareness of best practices for developing and maintaining research software

While discussing these challenges, the group also stated a belief that discoverability and credit are inherently linked. Software with very high credit is going to be more discoverable, but a lot of research software doesn’t get high credit. If the software does not receive high credit, and is not discoverable, then people cannot use it. As a result, rather than using existing software, researcher develop new software. Furthermore, how people choose which software to use is not well understood, which has been studied by Hucka & Graham.

In addition, it’s not clear that open sourcing the software automatically makes it more sustainable. Some of the group think that the development methodology of open source is important, but that methodology can also be used in more closed environments, such as in a private group or within a company. This observation led to a statement that closed source but openly developed software is more sustainable than purely open source software.

This discussion led to the issue that URSSI needs to decide how it will deal with commercialization as a path towards sustainability. Will URSSI help/support/aim at commercialization and revenue discovery (while preserving underlying open communities)? In other words, given that the goal of URSSI is sustainable software, what paths towards sustainability should and should not be supported?

Second, the attendees developed a set of guiding policies for an institute, which the groups modified during the breakouts sessions to:

  • Leverage existing organizations for authority, credibility, resources

  • Focus on individuals (i.e., aim at people not projects)

  • Leverage available resources (software, services, credit systems, etc.) where possible rather than reinvent

  • Activities should have an end or a sustainability plan beyond URSSI

  • Distinguish between quality (badging/intrinsic) and impact (reuse) measures

  • Sustain software by sustaining its communities (stewards, developers, maintainers, leaders, active users).

  • Coordinate activities rather than start new ones

  • There’s no ‘one true career path’

Third, the group broke up into three breakout groups to discuss actions around the first six challenges, then formed three different breakout groups to talk about the remaining three challenges. For each challenge, the group determined which guiding principles were relevant, in some cases modifying the principles. Each group then identified relevant activities, categorized those activities as either short- or long-term, and provided an estimated budget (out of the $1m/year) for those activities (realizing that the other groups would use the remainder of the budget).

While the detailed results follow, the important points that came out of this activity are:

  • In general, actions support more than one challenge. In other words, the sets of actions for each challenge have overlaps with actions from other challenges.

  • Related to this, while the cost of the set of actions for each challenge if simply added across all the challenges would be about $2.7m-$3m/year which far exceeds the $1m/year budget, the overlaps offer the promise to reduce this cost

  • Still, URSSI will have to be careful in determining what it chooses to do, and in particular, will have to consider how these activities, meant to support credit, can work together with other non-credit-related activities.

  • These activities can be done by URSSI staff with varying skills: many are related to data collection and analysis, some are related to policy, and some are related to software. In addition, some of the tasks might be done by fellows or by volunteers from the community. Determining which actions need to be done by URSSI directly vs partially funded by URSSI vs coordinated by URSSI will be a challenge.

Challenge 1: Career paths in research software are not well established. In other words, they are idiosyncratic (also) often ad-hoc or absent in academia, while very present in industry, which leads to a flow of people from academia to industry and national laboratories.

Short-term actions:

  • Document existing (known successful/viable, known failures) career paths for individuals creating research software

  • Create a mailing list for those interested in career paths

  • Work with the PEARC conference and the XSEDE community to build out from supercomputing centers to programmers employed on domain grants.

  • Seed “chapters” of research software folks (perhaps URSSI chapters?) at existing universities / societies / organizations; create handbook, tools, best practices to support local organizers; hire a “coordinator” / community manager. These cells/chapters could: talk about training, do consulting for problems, hacky hour, study groups, software days, we could bring together the cells into a larger conference. Overall, this helps grows / establishes the community (and make connections that could help you meet the right person for your next career move).

  • Create a professional award program (working with other established organizations) e.g. ESA URSSI software award. The URSSI software contribution to research awards: URSSI/ESA (e.g. John Chambers software award from stats association). $10k funding available from URSSI (eventually domain societies would be asked to partially fund the awards). URSSI would need to define the categories, criteria/heuristics, etc. for awards beforehand.

  • Create a clearing house of job descriptions with criteria for performance assessment; distilled out design patterns or different categories for different roles. This could also include documenting salary levels for different job descriptions, and connecting the job descriptions with then learning modules necessary for these jobs

Long-term actions:

These actions would require 1 staff member for community, $100k for prizes, and $50k for micro grants for community development, totaling $350k/year.

Challenge 2: It’s hard to measure the impact of individuals. First, it’s hard to measure the contributions of individuals to a project. Second, these contributions need to be tied to the impact of the project.

Actions:

  • Identify factors (manual such as evaluations, to automated like crawling repos) that are part of impact and surface them. These include quantifying code contributions, code review, mentoring. Every community will choose the appropriate factors and apply weights to the measures to determine the kind of impact they care about.

  • Provide best practices [for formatting or housing the information on these factors] so those factors are discoverable and/or queryable.

  • Create champions (e.g. librarians) to promote and educate individuals and projects of these good practices

  • Help individuals understand how they can best promote themselves (e.g., claim software works on ORCID).

  • Provide awards to notable members identified by communities.

These actions would require 0.2 FTE for surveys, 1 FTE to coordinate volunteers to write the documentation, $100k for 1 workshop, and $20k for outreach at conferences, totaling $360k/year.

Challenge 3: The academic credit model disincentivizes individual contributions to public goods / infrastructure. The group determined actions that would either gather evidence or provide advocacy or both.

Short-term actions:

  • Work to persuade high-profile individuals to make public contributions to public projects

  • Gather data/examples that show when contributions to public projects increase your citation count or regular metrics

  • Gather and share examples of successful use of individual contributions to public goods/infrastructure to gain academic promotion, produce templates, advocacy toolkits and examples to help others to make their case. Similarly, work to persuade people that contributing to projects will increase their products within their normally accepted reward system (e.g., get more collaborators and papers; increase opportunities to meet/work with new/old collaborators; will build social connections/network)

  • Show how you can participate in public goods / infrastructure projects (e.g. how to structure an issue, how to write your first PR)

  • Run a help desk once a week on-line for people who are running into difficulties in making contributions and need help

Long-term actions:

  • Use regular surveys to better understand why people do and don’t contribute; get data to understand value of public goods to community

  • Reframe contributions as first class research products / objects (i.e., explain how building the best software is itself science, as it’s both discovery and creation). In order to do this, amplify existing efforts to build a taxonomy of such contributions to make it clear what those contributions are, and encourage people to claim/talk about/take pride in these contributions; work with publishers to highlight these contributions and the people who make them. Also advocate (materials, webinars, ambassadors, etc) within academic communities (deans, faculty, science societies, review panels, funders) that public software contributions are research (also could be done by cross-disciplinary respected groups, such as the national academies)

  • Work to revise funder policies to ensure reviewers prioritize grant proposals that reuse, build-upon and contribute back to maintenance of public infrastructure

  • Determine a way to make such public good software (or contributions to such software) peer-reviewed (with the idea that peer-review is considered a mark of quality)

  • Advocate for how to run open communities

  • Break down the idea that publishing in the small set of “highest impact” journals is key, and expose actual impact of work instead

  • Create high-profile equivalent of “highest impact” journal for software and data work - need to reject a lot of work and move it to “lower class” venues

These actions would require a survey design and analysis role, at 0.5 - 1.0 FTE on an ongoing basis for as long as evidence gathering continues; a data collection coordination and dissemination role, at 0.5 FTE on ongoing basis for as long as evidence gathering continues, and advocacy & outreach role, at 0.5 - 1.0 FTE on ongoing basis as long as advocacy is needed; a role for coordination of existing activities (community initiative facilitator), initially focused on exemplar disciplines, at 1 FTE to do 2-3 disciplines over 3 years. This totals 2.5 - 3.5 FTEs, or $500k-$700k. To reduce costs, fellows could be used to do some of this work. To do this, URSSI would need to provide fellows with template materials, and overlap fellows so one cohort can train the next cohort (cf Mozilla and SSI Fellows programs). This would particularly work well for the advocacy work

Challenge 4: [Disentangle | We conflate] quality and impact of software

Actions:

  • Create checklists/review guidelines for different levels of peer-review for software; can be tiered, could issue stars or other rating system; leverage information already available from journals and other resources.
    • Define what quality means.
    • Capture the difference between quality and impact in written resource (blog post, paper, etc.)
    • We still don’t know what impact means or how to measure it. Impact can be looked at in many ways: scientific, economic, societal, etc.
    • How do you know what software will be impactful (signals of high impact)?
  • Conduct a Delphi study or multiple Delphi studies across or between disciplines to determine a consensus on key indicators of software impact and related questions (such as those below). (contract coordinator: $50k in year 1)
    • Does improving the quality improve the impact?
    • Randomized experiment where treatment is improved software practices
    • Retrospective study of software quality as related to “impact” of that software.
  • URSSI Labs:
    • Bring infrastructure folks together to hack/build/experiment to connect existing projects to measure impact of software (depsy, libraries.io, chaoss) and prototype data services
    • Hack/work weeks open to separately-funded projects working in this space (leverage existing grants) ($60k)
    • One FTE engineer / data scientist who can help to prototype things ($200k)
    • Short-term contract resources ($100k)
    • Compute resources ($20k)

The toal budget for URSSI labs came to $430k in direct costs.

Challenge 5: There is a lack of recognition of the value and importance of software.

Potential Actions:

  • Advocacy – Examine science cases where software was particularly fundamental
    • Focus on demonstrating impact of software
  • Study whether higher quality software produces higher-impact science or more reliable science results
  • Advocacy: Raise awareness that software needs to be maintained (can we actually say this, or does it need to be studied to determine that?)
  • Community awards (as defined in Challenge 1)
  • Case studies in sustainable software value, e.g., netcdf hdf5 (https://libraries.io/maven/edu.ucar:netcdf), DS9 image viewer (astronomy visualization software), NCSA Image, GCM model(s) — how much money maintains these?
  • Calculate bus factor / unicorn factor for a bunch of key software in disciplines
  • E.g. see libraries.io?
  • https://libraries.io/experiments/bus-factor
  • Churn factor: how much re-implementation happens and how much is NIH an issue?
  • Document maintenance and funding for critical packages for several disciplines, e.g. IRAF?
  • Training: how to create and maintain software for science
  • Prove it: are there problems out there, and how significant are they?Develop a tool/playbook to enable groups to self-assess.
  • Affect maintainability?
  • Affect correctness?

These actions would require 1.25 FTE for 3 years (analyst plus ethnographer) to develop tooling/guide for evaluation framework for communities to use to assess themselves; 0.5 FTE for outreach for a few case studies to demonstrate utility of the evaluation framework, to convince communities to apply the eval guide; $50k for an analyst to synthesize group of results and look for trends, and 0.5 – 1.0 FTE for a Training Coordinator for a curriculum development/training program for creating sustainable research software, working with partners like The Carpentries. This totals $500-$600k/year

Challenge 6: Lack of [funding opportunities | stable funding] for maintenance of software that is important but doesn’t have a generic market.

The group discussed why there is a lack of funding opportunities, and provided three possible answers: a lack of understanding the value of research software, a lack of understanding of the need for software maintenance, and the fact that development is seen as higher priority than maintenance?

Short-term actions:

  • Provide a system to match open source maintenance needs and open source programmers (e.g., classified ads), to be funded by URSSI, or URSSI could support proposals for such maintenance work (both morally and by providing guidance to the proposer)

  • Review the landscape of funding opportunities for software maintenance (and gather data about them) and provide a public summary, then keep the summary up-to-date.

  • Gather case studies of successful commercialization of open source projects and encourage the research community to understand and make use of them

  • Provide fellowships for programmers to “do good stuff” related to maintenance

  • Attach money for maintenance to some of the to-be-created awards for good software development, jointly named and funded with other communities, perhaps aligned with the similar action from Challenge 1.

Long-term actions:

  • Advocate for funding agencies to create and fund one or more institutions to perform maintenance, potentially nationally or disciplinary

  • Use software maintenance plans to couple maintenance to development - make it clear that just doing development without maintenance won’t work. Note that this leads to questions about when maintenance should be stopped, and it’s also unclear how long-term maintenance would be supported (since grants are by definition time-limited).

  • Review the scope of maintenance needs by various research disciplines. Determine the order of magnitude is the maintenance backlog for research software

  • Advocate for funding agencies to provide a funding pool for short-term maintenance grants for existing projects.

  • Advocate for universities to support maintenance of software developed by their university as part of research impact (or possibly technology transfer).

  • Encourage companies to provide funds and channel these funds into maintaining research software projects, perhaps via a review process with reviewers from both academic software projects and the companies. This could be done jointly with NumFOCUS or a similar organization.

The resources needed for the short-term actions are, in order, 1 FTE, 1 FTE, 0.25 FTE, potentially $10k-20k/fellowship, and $50k for the awards, totaling $550-600k/year.