Process improvement through team metrics
An objective assessment of team processes based on data is a necessary tool for achieving success in development. It helps to identify problems, increase predictability and quality, and improve communication within the team.
Pulling metrics from artifacts
Evaluating team processes is an important step in achieving development success. However, this is not always easy, as development is an expensive and poorly controlled black box. In most cases, problems in software development are due to a lack of standards for evaluating the work of programmers and a failure to follow established processes.
To solve this problem, you can use data collection from task trackers (Trello, Jira) and version control services (Git, GitHub, GitLab). Creating artifacts containing objective measurements helps in designing ways to evaluate a team. The data can be used to evaluate development processes and identify problems.
In addition, team communications can also be measured based on measurements, although this may require some sacrifice of the Agile principle. For example, the practice of “Due Date” logs all events by the event authors and motivates them to do personal planning. By analyzing the accumulated logs with the script, you can collect the entire invoice for retrospective, finding out the key gaps and overloads.
The measurability of the development process is focused on discipline, predictability, and quality. All signals and metrics are classified by product development stages: planning, development, code review, and testing. A table of all triggers is presented in the table below.
Planning
Discipline
- Organizational problems. Effective project planning requires all team members to be present during planning.
- Poor job description. You should pay attention to tasks that do not match the format, as well as tasks that do not have a component specified.
- Uncertainty in tasks. It is important to set priorities and correctly distribute them between tasks. Recommended priority ratio: 20/60/20.
- No data collection. It is necessary to evaluate all issues, collect data, and also make sure that the type of issue corresponds to its content: bug, technical debt, or user story.
Predictability
- Out of time. Evaluation of tasks that exceed the established SLA should be considered a signal of overtime. Metrics that can be used for evaluation are the percentage of overdue tasks by “Due Date” and the rating of the most overdue. It is also recommended to take into account the churn of task code — the percentage of code that was written but did not reach the release. Code churn helps to understand how well tasks are thought out.
- Performing tasks that were not planned for. Throw-in quotas and bugs should be considered as signals to complete tasks that were not part of the plan. Metrics that can be used for evaluation are sprint initial volume closes, sprint pushes, push closes, and prioritization changes. It’s also important to consider the structure of the work, whether it’s tasks from the backlog or new ones.
Quality
- The functionality of the released functions is not well thought out. It is necessary to pay attention to tasks that have a 3-week churn after the release above the norm, as well as to the balance of bugs.
- No time to clean up tails. It is important to set a quota for technical debt and make sure that enough time is allocated to clean up the tails.
Development
Discipline
- Insufficient team productivity. If the team is not doing work, a low commit rate and no active issues in the tracker can be a signal.
- Overload. If developers write code and review it outside of work hours, it can signal overload or inefficiency.
- Violation of Git rules. If commits do not have task prefixes and there is no structure in the Git branch, this can lead to confusion in the code and increase the time it takes to find errors.
- Failure to follow the rules. If the “Due Date” is set, but there are comments on the changes, this may indicate non-compliance with the rules of the development process.
Predictability
- Tasks are under development. Violation of the SLA on “In Progress” may indicate problems with productivity or inefficiency.
- Developer overload. If a developer is working on too many tasks at the same time, it can lead to confusion and an increased risk of errors.
- Hidden problems with the developer. Little code activity, large tasks, and churn-jumping tasks can indicate hidden problems, such as a lack of experience or qualifications.
Quality
- The number of developer bugs. Developers’ bug skip and checkback ratings can indicate code quality issues.
- We write layered code. The time spent embedding code into existing code can serve as a metric for code quality.
- Incident closure issues. SLA failures for high-priority bugs may indicate problems in the process of closing incidents.
- Lack of auto-tests. Coverage of auto-tests that is at least as high as the declared number is an important factor in the quality of the code.
Code review
Discipline
- Lack of auto-tests. Coverage of auto-tests that is at least as high as the declared number is an important factor in the quality of the code.
- Forgot the pull request. If the author forgets to create a pull request or does not mention the reviewer, then this may lead to the fact that the code will not be checked. You should pay attention to signals, such as a small number of reviewers and the absence of a link to the ticket.
- The incomprehensibility of the pull request branch. If there is no link to the ticket, then this can create confusion in the team and make it difficult to understand what changes have been made. It is recommended to add a link to the ticket in the description of the pull request.
- Postponing a pull request. If the description does not match the format or the marks for the reviewer are not placed, this may lead to the postponement of the review. To avoid this, pull requests should be formatted and marked explicitly for the reviewer.
- No end-of-review stage. If there is no comment about the end of the review and waiting for a fix, then this may create a lack of transparency in the team. It is recommended to always indicate the stage of the end of the review.
- Syntactic problems. If there is no Linter, then this can lead to problems in code quality. It is recommended to use Linter to automatically check code syntax.
Predictability
- Tasks hang on review. If the code review SLA is not met, this can lead to delays in the project. It is recommended to allocate enough time for code reviews.
- Code review overload. If one reviewer receives too many pull requests, it can lead to overload and delays. A metric for this problem could be the distribution of pull requests across reviewers.
- Conflict in the comments. If a pull request has a large number of comments, this may indicate conflicts within the team. A signal for this problem may be the appearance of a large number of comments in the pull request.
Quality
- Surface code review. If reviewers don’t pay enough attention to the code, it can lead to problems later on. Metrics for this problem can be reviewer activity (number of comments per 100 lines of code reviewed) and reviewer influence (percentage of comments on lines that were subsequently changed).
- Poor code quality in the pull request. If the code quality is poor, it can lead to problems later on and more time spent fixing bugs. The metric for this problem could be a high churn of code after a code review.
- Feedback on code review. If the pull request author and the reviewer cannot agree on the quality of the code, this can lead to additional delays and conflicts. The metric for this problem can be the assessment of the author and the reviewer.
Testing
Discipline
- No data. This can happen if tasks miss the “Testing” status, are not assigned to a tester, or are not submitted for testing after changes. In such cases, you need to fix the problem in a timely manner and return the task for testing.
Predictability
- Long testing. The signal for a long test is the SLA for the time of testing and waiting.
- Long fix after testing. The signal for a long fix after testing is the SLA for the duration of the fix after returning to development.
- Tester overload. The metric for determining tester overload is the distribution of tasks among testers.
- Complex pipeline test environment. To determine the complexity of the pipeline test environment, metrics can be used, such as build time and system checks, time for auto-tests, and the number of blocking incidents on the test infrastructure.
Quality
- Bad testing. Metrics for assessing the quality of testing can be the rating of testers by the share of tested tasks, the bug skip rate, and the number of tasks returned to testers.
- Bad architecture. To evaluate the architecture of the project, you can use the metric “the most buggy files to the most responsible tester”.
- Poor communication between development and testing. To evaluate communication between development and testing, you can use the “ping-pong” metric between testing and development without fixes.
Conclusion
As a result, when solving problems related to the measurement of development processes, you can use automation methods through a chat bot, coordinating work according to the rules, and assigning a duty officer for processes. It is also important to consider that processes can have exceptions, and building a process through data collection should always start with discipline.