How FullStack works

FullStack is a take-home assessment platform built around real codebases. Here's exactly what happens, from challenge creation to final score.

The challenges

Every FullStack challenge is a real, runnable codebase. Not a stripped-down toy, not a LeetCode prompt. An actual project with working build tooling, dependencies, and structure that reflects how code is written on the job.

Challenges come in two types: bug hunts and feature implementations. Bug hunt challenges have a set of bugs intentionally planted in the codebase. Feature challenges ask you to implement something against a spec.

Each challenge has a rubric: a set of bugs or criteria with descriptions of the correct outcome and acceptable approaches. The rubric is used by the evaluator but is not shown to candidates during the challenge. That's intentional. We want to see how you reason, not how you game a checklist.

The submission flow

You need git to participate. The flow is:

You receive an invite link or find a public challenge.
You start the challenge. FullStack provisions a private repo and gives you a clone URL with credentials embedded.
You git clone it locally and work in your own editor.
You push commits to the repo as you work.
When you're done, you submit. Evaluation runs automatically.

Commit history matters. We evaluate your commits, not just the final diff. How you break up your work and how you describe it is part of the score.

How evaluation works

When you submit, FullStack collects your commit history and diffs, then sends them to an AI evaluator along with the challenge rubric. The evaluator is not looking for a specific implementation. It's evaluating whether you addressed the root cause correctly.

Each bug or criterion gets one of three statuses:

MetYou addressed the root cause and the fix produces the correct outcome. The most direct solution scores as well as a complex one. We don't penalize simplicity.

SuboptimalYou addressed the root cause but the fix has a meaningful flaw: unnecessary complexity, brittle in production, or correct logic in the wrong layer.

MissedThe bug isn't fixed, the feature isn't implemented, or you patched a symptom without addressing the underlying cause. Suppressing visible incorrect behavior without fixing why it happens counts as missed, not suboptimal.

Approaches not listed in the rubric are evaluated on their merits. If your approach addresses the root cause and is as simple or simpler than a listed one, it scores as met. We don't require you to find the "expected" solution.

Why trust it

We know AI evaluation is only useful if it's fair and consistent. A few things we've built in to make that true:

Rubric-grounded

The evaluator works from a structured rubric, not a vibe. Each judgment is tied to a specific criterion with defined outcomes and acceptable approaches.

Root cause, not symptoms

A fix that suppresses visible incorrect behavior without addressing why it happens is scored as missed. We don't reward surface-level patches.

Unlisted approaches are welcome

If your approach isn't in the rubric but addresses the root cause cleanly, it scores as met. We don't require the expected solution.

Prompt injection protection

If the evaluator detects content designed to manipulate the evaluation (code comments, commit messages, or anything that tries to override scoring), it marks everything as missed.

Every verdict has a reason

Nothing is scored without explanation. You can read exactly why each item was marked met, suboptimal, or missed.

Commit scoring

Commit quality is scored separately from bug or feature coverage. It's treated as a soft skill, one that matters a lot to some teams and less to others. It doesn't affect your total score, but it appears prominently in the results.

Each commit gets one of four ratings:

GoodDescribes what changed and why it was wrong or what correct behavior looks like.

AcceptableSpecific about what changed. Doesn't explain why.

VagueNames the area but not the specific change.

Uninformative"fix", "update", "wip", or anything that conveys nothing.

The score

Your total score is the average of your bug/criterion scores, scaled to 100. A perfect score means every item was addressed correctly. Suboptimal fixes count for partial credit.

The score breakdown shows each bug or criterion individually: whether you met it, the approach you took, and the evaluator's reasoning. There's no black box. Every point deduction comes with an explanation.

For hiring orgs, the score is one signal among several. The commit feedback, the approach taken on each item, and the overall summary give reviewers enough context to make a real judgment, not just rank by a number.

Ready to try it?

Hire with FullStack →