FullStack is a take-home assessment platform built around real codebases. Here's exactly what happens, from challenge creation to final score.
Every FullStack challenge is a real, runnable codebase. Not a stripped-down toy, not a LeetCode prompt. An actual project with working build tooling, dependencies, and structure that reflects how code is written on the job.
Challenges come in two types: bug hunts and feature implementations. Bug hunt challenges have a set of bugs intentionally planted in the codebase. Feature challenges ask you to implement something against a spec.
Each challenge has a rubric: a set of bugs or criteria with descriptions of the correct outcome and acceptable approaches. The rubric is used by the evaluator but is not shown to candidates during the challenge. That's intentional. We want to see how you reason, not how you game a checklist.
You need git to participate. The flow is:
git clone it locally and work in your own editor.Commit history matters. We evaluate your commits, not just the final diff. How you break up your work and how you describe it is part of the score.
When you submit, FullStack collects your commit history and diffs, then sends them to an AI evaluator along with the challenge rubric. The evaluator is not looking for a specific implementation. It's evaluating whether you addressed the root cause correctly.
Each bug or criterion gets one of three statuses:
Approaches not listed in the rubric are evaluated on their merits. If your approach addresses the root cause and is as simple or simpler than a listed one, it scores as met. We don't require you to find the "expected" solution.
We know AI evaluation is only useful if it's fair and consistent. A few things we've built in to make that true:
Commit quality is scored separately from bug or feature coverage. It's treated as a soft skill, one that matters a lot to some teams and less to others. It doesn't affect your total score, but it appears prominently in the results.
Each commit gets one of four ratings:
Your total score is the average of your bug/criterion scores, scaled to 100. A perfect score means every item was addressed correctly. Suboptimal fixes count for partial credit.
The score breakdown shows each bug or criterion individually: whether you met it, the approach you took, and the evaluator's reasoning. There's no black box. Every point deduction comes with an explanation.
For hiring orgs, the score is one signal among several. The commit feedback, the approach taken on each item, and the overall summary give reviewers enough context to make a real judgment, not just rank by a number.
Ready to try it?