A candidate finishes a Linux SysAdmin scenario in 4 minutes and 12 seconds. Another finishes the same scenario in 11 minutes and 48 seconds. The rubric scores them identically: both hit every required checkpoint. Which one do you call back first?
If your answer is "the faster one," you are missing most of the signal. Speed is one data point. The keystroke replay is the full picture. This post walks through what to look for, field by field, so you can make a defensible, evidence-based decision rather than a gut call.
What a Keystroke Replay Actually Contains
OpsTicket records every keystroke a candidate types during a terminal scenario, timestamped to the millisecond, alongside the terminal output that followed each command. The replay is not a screen recording; it is a structured log. You can scrub forward, pause, and inspect the exact state of the shell at any moment in the session.
The replay captures: commands typed (including typos and corrections), time elapsed between commands, error output and how the candidate responded to it, navigation patterns (did they use tab completion, history recall, manual typing?), and any commands that were run but produced no score-relevant output.
That last category is frequently the most revealing.
Signal 1: The Pause Distribution
Long pauses are not automatically bad. Where the pauses fall tells you what kind of problem-solver you are looking at.
Pause before the first command: A candidate who sits for 45 seconds before typing anything is reading the scenario carefully. That is a good sign on a complex task. A candidate who fires off a command in under five seconds on a scenario that requires understanding a multi-step dependency chain may be pattern-matching to a memorized procedure rather than reasoning through the specific problem.
Pause after an error: This is the most diagnostic pause in the replay. A candidate who receives a permission-denied error and then pauses for 20 to 30 seconds before running ls -la to inspect the file permissions is diagnosing. A candidate who immediately re-runs the same command verbatim three times is not. You will see both behaviors clearly in the replay, and they predict on-the-job behavior with reasonable fidelity.
Pause before a destructive command: On scenarios that involve modifying system configuration or manipulating running services, watch for a deliberate pause before commands like systemctl stop, rm, or anything that writes to /etc/. Experienced practitioners pause. They verify the target. They sometimes run a dry-run flag first. Candidates who blow through these steps without hesitation are showing you their change-management instincts, and those instincts will show up in production.
Signal 2: Error Recovery Patterns
Every well-designed scenario includes at least one intentional friction point: a misconfigured file, a missing dependency, a service that is not running. The rubric scores whether the candidate resolved it. The replay shows you how.
Look for three recovery archetypes:
The diagnostician: Error appears. Candidate runs an inspection command (journalctl -xe, cat /var/log/syslog, netstat -tulnp, depending on the track). Reads the output. Runs a targeted fix. Verifies. This is the pattern you want on your team.
The trial-and-error operator: Error appears. Candidate tries a series of plausible commands without a clear diagnostic thread. Eventually lands on the fix, possibly by exhausting options. Scores full points. But the replay shows you the process was not systematic. This candidate may struggle on novel failures they have not seen before.
The Googler-in-disguise: Error appears. Long pause. Then a very specific, syntactically perfect command that resolves the issue in one step. No exploration, no verification afterward. On a timed, closed-book scenario this pattern sometimes indicates the candidate has seen this exact scenario before. More often it indicates they have strong recall of specific procedures but limited ability to adapt when the environment differs slightly from what they memorized.
None of these archetypes is automatically disqualifying. A helpdesk role that runs standardized remediation procedures rewards the trial-and-error operator more than a senior SRE role does. Match the pattern to the job, not to an abstract ideal.
Signal 3: Command Vocabulary and Efficiency
The specific commands a candidate reaches for reveal their actual working environment history more accurately than any resume line.
A candidate who writes grep -rn "error" /var/log/ and pipes it through less has used that workflow before. A candidate who writes find /var/log -name "*.log" -exec grep -l "error" {} \; is showing you a different level of comfort with the toolchain. Neither is wrong. Both are informative.
Watch for:
- Tab completion usage: Candidates who rely heavily on tab completion are working in terminals regularly. Candidates who type full paths manually, including long directory names, without a single tab, are either extremely deliberate or less comfortable in the shell than their resume suggests.
- History recall (
!!,Ctrl+R, arrow keys): Frequent use of shell history navigation is a strong indicator of daily terminal work. It is a habit, not a technique you perform for an assessment. - Alias-style shortcuts: Candidates who type
llexpecting it to work (and then correct tols -lawhen it does not) are showing you their home environment. Small tells like this are more reliable than certification lists.
Signal 4: Verification Behavior
After completing a task, does the candidate verify the outcome? This single behavior separates engineers who have been paged at 2 a.m. from engineers who have not.
On a scenario that asks a candidate to enable and start a service, the rubric checkpoint is: service is running and enabled. A candidate who runs systemctl enable --now sshd and moves on scores the point. A candidate who runs systemctl enable --now sshd and then immediately runs systemctl status sshd to confirm the active state is showing you an operational habit. That habit does not appear on a resume. It appears in the replay.
Verification behavior is especially telling on networking and cybersecurity scenarios, where the difference between "I ran the firewall rule" and "I confirmed traffic is behaving as expected" is the difference between a closed ticket and a 3 a.m. callback.
Signal 5: Scope Discipline
Watch what the candidate does that is not required by the scenario. Some candidates explore the environment: they check running processes, inspect files outside the task scope, look at user accounts that were not mentioned in the prompt. This can indicate curiosity and situational awareness. It can also indicate a candidate who does not read requirements carefully.
The distinction is timing. Exploration that happens after all required checkpoints are complete reads as thoroughness. Exploration that happens before the candidate has addressed the primary task reads as distraction. The timestamped replay makes this easy to distinguish.
How to Use This in a Debrief
Pull up the replay before a technical debrief call. Identify two or three specific moments: one pause pattern, one error recovery sequence, one verification behavior (or absence of one). Ask the candidate to walk you through their thinking at those exact moments. Their explanation, matched against what you observed in the replay, gives you a complete picture that no whiteboard question can replicate.
The rubric tells you what they accomplished. The replay tells you how they think. Both are required for a defensible hire.
Takeaway: Score is necessary but not sufficient. Before your next debrief, spend five minutes in the replay looking specifically at pause distribution after errors, verification commands, and scope discipline. Those three signals will tell you more about on-the-job behavior than the score alone.
If you want to see how OpsTicket structures terminal scenarios and rubric scoring across IT tracks, reach out for a brief walkthrough. We are happy to show you a live replay so you can judge the signal quality yourself.