How are Contractor ToolStack ratings calculated?

Every product is scored on category-specific dimensions, each with a published weight that adds up to 100%. Inside a single category, the weighted score is the sum of dimension scores times their weights. For products that span multiple categories, the top-line rating is 70% of the primary category's weighted score plus 30% of the average of the secondary category weighted scores, with a flat +0.20 calibration constant added to align with the Capterra and G2 4.4–4.5 average band, capped at 5.0. Single-category products use the weighted score plus 0.20.

Why do Contractor ToolStack ratings often run lower than Capterra or G2?

Three reasons. Capterra and G2 cluster almost every mature B2B SaaS product between 4.3 and 4.8 because reviewers self-select toward enthusiasts and vendors solicit reviews from happy customers. Our methodology scores seven to nine category-specific dimensions and weights them, so weakness in any single dimension drags the weighted total down. And we score against forward-looking dimensions like AI capabilities and integration depth, which a platform's current customers may not yet notice are missing.

Are editorial scores blended with community votes?

No. Editorial scores and community scores are computed and displayed independently. The editorial score is the weighted-dimension result from our review. The community score is built from license-verified contractor votes weighted at 70% plus anonymous voter input at 30%. Both are shown side-by-side on every product page so readers see the gap between professional review and contractor consensus, never a blended number.

What is the +0.20 calibration constant?

After the weighted score is computed, we add a flat +0.20 (capped at 5.0) so our top-line ratings sit in the same 4.4–4.5 band buyers are already familiar with from Capterra and G2. We score conservatively against published feature claims, real contractor complaints, and editorial benchmarks rather than vendor self-reports, which placed us about 0.2 below the aggregator band by design. The calibration brings the absolute number into alignment without changing the relative ordering of products. Adopted April 26, 2026; every product's rating was recomputed and re-published when the change shipped.

How does Contractor ToolStack verify a contractor's license for community votes?

Verification has two gates. First, contractors confirm their email via a magic link — this alone flags their votes at the verified tier (70% weight in the community score) and auto-promotes any prior anonymous votes from the same browser. Second, Steven (the editor) manually verifies their license number against their state's contractor licensing board record — usually within 24 hours. License verification unlocks public quote attribution (their name, business, state, trade, and license number appear next to their quote). The fastest path through manual review is verifying with a public business email that matches the license, or making sure the public business phone number is reachable.

What can change a Contractor ToolStack rating?

Pricing changes, new features shipping, features being removed, ownership or roadmap changes, support quality changes documented in five or more recent reviews, major bug patterns, dimension-weight adjustments during quarterly reviews, and quarterly score recalibration. Vendor displeasure, threats of legal action, advertising spend, partnership offers, marketing collateral, and unshipped feature announcements cannot change a rating.

How We Review Contractor Software (2026): Methodology & Ratings

Our goal is simple: give contractors honest, useful information so they pick the right tools. Here's exactly how we score products, where the numbers come from, what can move them, and how to flag a mistake.

How Editorial Scores Work

We don't grade every product on the same checklist. A CRM and an AI call-answering service do completely different jobs, so scoring them on the same five dimensions would be lazy and misleading. Instead, every product is scored on category-specific dimensions — the things that actually matter for that type of tool — and each dimension is weighted by how important it is to contractors specifically, not to software buyers in general.

Inside a single category, the rating is a weighted average of the dimension scores. "Contractor Fit" or "Trade Specialization" usually carries the heaviest weight, because a polished tool that doesn't understand how trades work is a tool that wastes your time.

The Categories We Score

Each of these has its own published dimension framework — click any category to see the products scored against those dimensions, plus the full weighted breakdown.

Inside a Category: AI Call Answering

Here's a real one. AI call-answering services are evaluated on these 7 dimensions, with weights that add up to 100%. Contractor Fit carries the heaviest weight because a service built for dentists that also markets to plumbers has no idea what an HVAC emergency call actually sounds like.

AI Call Answering Dimensions

7 weighted criteria · totaling 100%

Contractor Fit

How well the service understands contractor workflows, trade terminology, seasonal call patterns, and job types

20% weight

Voice Quality

How natural and human-like the AI voice sounds to callers — response latency, tone, and conversational flow

15% weight

Integrations & CRM

Native connections to contractor CRMs and field service tools like ServiceTitan, Housecall Pro, Jobber, and JobNimbus

15% weight

Value for Money

Pricing transparency, cost-per-call economics, overage charges, and ROI for a typical contractor call volume

15% weight

Agentic AI Compatibility

Public API access, webhook support, and ability to plug into custom AI agent workflows, MCP servers, and automation platforms

15% weight

Emergency Handling

Ability to detect urgent calls — burst pipes, gas leaks, no heat — and route them to the right person immediately

10% weight

Lead Capture

Quality of intake forms, caller information capture, lead scoring, and how well data flows into your systems

10% weight

Where the weights come from: we set them based on operational experience running contractor businesses, patterns in customer-review data across G2, Capterra, and Reddit, and the actual cost of getting each dimension wrong. Vendor input does not influence weights. Affiliate revenue does not influence weights. Weights are reviewed quarterly and published openly — if we change them, we say so and recalculate every affected rating.

When a Product Spans Multiple Categories

Some products serve more than one category. HubSpot is a CRM and a marketing automation platform. Thryv is a CRM and a reputation tool. JobNimbus is a CRM, a project management platform, an estimating tool, and a scheduler. These products get scored separately on each category's dimensions, and the full breakdown is visible on the review page.

The top-line rating is built from those per-category weighted scores using a 70/30 primary-vs-secondary formula, plus a flat +0.20 calibration constant capped at 5.0. The math is public, the formula is reproducible, and every component score shows up on the review page.

The Multi-Category Formula

Top-line rating, in three parts

Primary category, weighted at 70%

The first category in the product's category list — the lane the product positions itself around and competes hardest in. For HubSpot, that's CRM. For Thryv, that's CRM. For JobNimbus, that's CRM.

Average of secondary categories, weighted at 30%

Every other scored category gets averaged together first, then that average contributes 30% to the top-line. A great CRM doesn't suddenly become a worse product just because we also score it as a marketing tool — but the secondary categories still pull the rating in their direction.

Calibration constant: add 0.20, cap at 5.0

After the weighted score is computed, we add a flat +0.20 so our top-line ratings sit in the same 4.4–4.5 band buyers already see on Capterra and G2. We score conservatively against published feature claims and editorial benchmarks rather than vendor self-reports — calibration brings the absolute number into alignment without changing the relative ordering of products.

Single-category products skip step 2. Their weighted score plus 0.20 is the top-line rating.

Worked Example · HubSpot

A multi-category product with one weak score and one strong one

HubSpot is scored in two categories. As a contractor CRM it earns a 3.03 weighted score — no job scheduling, no dispatch, no trade workflows. As a marketing automation platform it earns a 3.81 — genuinely strong as an MA tool. CRM is its primary category, because that's how HubSpot positions itself and where most contractor buyers land when they compare it.

(0.70 × 3.03) + (0.30 × 3.81) + 0.20 = 3.46 → rounds to 3.5

The CRM weakness drives the result because that's the lane HubSpot competes in for contractor buyers. The stronger marketing-automation score still pulls the rating up — secondary categories contribute, just less than the primary. The top-line you see on every page that mentions HubSpot is 3.5, the same number the formula returns.

Why this formula instead of an average?

We used to take a straight average across categories. That punished strong products dual-listed into adjacent categories where they're naturally weaker, and it rewarded marketing positioning over buyer accuracy. The 70/30 split keeps the primary category dominant — that's where the product earns its market position — while letting secondary scores meaningfully contribute. Updated April 26, 2026; ratings on every multi-category product were recomputed and re-published when the change shipped, and the same release added the +0.20 calibration constant.

One rating, everywhere. The number you see on a product card on a category hub, on the review page, on a comparison, in a roundup — it's always the same top-line. The per-category breakdown is still visible on every review page, so you can see exactly where a multi-category product is strong and where it's weak. We don't show different numbers in different contexts because that confuses readers and breaks trust.

How Community Scores Work

The editorial score is one tradesman's evaluation against published criteria. The community score is the rest of the field weighing in — and we deliberately give more weight to the contractors who can prove they're contractors.

Two voter tiers

Anyone visiting the site can vote. Contractors who complete our verification flow have their votes weighted at the verified tier the moment they confirm their email address. Past anonymous votes from the same browser are promoted automatically when they verify — both at email confirmation and at license approval — so a contractor's earlier opinions still count once they prove who they are.

The two-gate verification process (why "verified" actually means something)

Verification has two separate gates, and they unlock different things. We split them deliberately because they answer different trust questions.

How Verification Works

Email gate → vote weight. License gate → public quote attribution.

Email verification (automatic, ~60 seconds)

A contractor submits the verification form, gets a magic link in their email, and clicks it. Once that magic link is clicked, their votes count at full verified weight (70% of the combined score) immediately and any prior anonymous votes from the same browser are promoted automatically.

License verification (manual, usually within 24 hours)

A real person — Steven, the editor — looks up their license number against their state's contractor licensing board. Once that's confirmed, any quotes they've left on votes appear publicly with their name, business, state, trade, and license attribution. Until license verification is complete, quotes are held privately on the vote.

Fastest paths through license verification

Two things speed up the manual license review meaningfully:

Verify with a public business email that matches the license. If you submit with an email address that appears on your public business profile (Google Business, state board listing, your own contracting website) — and that profile shares the license number you entered — we can confirm the match in minutes rather than chasing down a phone confirmation. This is the fastest path.
Make sure your public business phone number is reachable. When the email-to-license match isn't clean, we'll call the number listed for your business on your state board record or Google Business Profile. A quick "yes, I submitted that verification" is all it takes.

We chose the dual-gate model on purpose: publishing a license number publicly next to a quote is a real trust claim, and an email-only verification floor doesn't carry that weight. The faster paths above let real contractors move through quickly while still gating the part that matters for everyone reading the site.

How Community Votes Are Weighted

Verified contractors carry more weight — about 2.3× an anonymous vote

Verified contractors

70% weight

License-verified through state contractor board records — the people running the tool every day.

Anonymous voters

30% weight

Anyone with a browser. One vote per product per category — cookie deduped so the same person can't stuff a ballot.

What you see on a verified contractor's vote

Every verified vote carries full attribution: the contractor's first name and last initial, business name, state, trade, and license number. We show enough to make the vote meaningful and verifiable — and we never expose anything beyond what the contractor consented to during verification.

What if no contractor has voted yet?

Cold-start handling is honest: if a product has no verified votes, we show the anonymous score directly and label it that way — no fake confidence, no padding. Once verified votes start rolling in, the score blends to the 70/30 weighting. If, in the rare case, only verified votes exist and zero anonymous ones, we show the verified score directly.

Vote scoping (per dimension, per category)

Every vote is tied to a specific product and a specific category. A contractor voting on JobNimbus as a CRM and a contractor voting on JobNimbus as an estimating tool aren't getting averaged into the same number — they're scoring two different jobs the platform does. When we roll up to a top-line community score for a multi-category product, we use the same 70/30 primary-vs-secondary blend the editorial cascade uses. Community scores do not get the +0.20 calibration constant; they're already grounded in real-world signal.

Why We Keep the Two Apart

It would be easier to mash the editorial score and the community score into one number. We don't.

The editorial score is a structured evaluation against published criteria. The community score is real-world signal from the people running the tool. When the two agree, that's a strong indicator. When they disagree, that's the most useful information on the page — and a blended average would erase it.

Real Examples · What the Gap Tells You

Editorial and community ratings rarely match exactly — that's the point

Editorial high · Community low

A polished platform with strong feature depth that real users find painful in production. The editorial review caught what's on the box; the community caught what's in the boxes you don't open until month two.

Editorial low · Community high

A scrappy tool that misses dimensions in our framework but absolutely nails the trade workflow it's built for. Worth paying attention to — that's a product the spec sheet underrates and the contractors using it understand.

Both high · or both low

Strongest signal you'll see. The editorial review and the contractor consensus are pointing in the same direction — pick accordingly.

Why Our Scores Often Run Lower Than Capterra and G2

Compare a Contractor ToolStack rating to the same product on Capterra or G2 and ours will frequently land 0.5 to 1.0 points lower. This is structural, not random — and it's deliberate.

Reason 1

Aggregator rating inflation

On Capterra and G2, 4.5 is the median in B2B SaaS. Almost every mature product clusters between 4.3 and 4.8. Reviewers self-select toward enthusiasts, vendors solicit reviews from happy customers (sometimes with gift cards), and unhappy customers churn instead of writing reviews. We use the full 1–5 scale honestly.

Reason 2

More granular methodology

Capterra reports four sub-scores that all cluster at 4.5+. We score 7 to 9 category-specific weighted dimensions. Weakness in any one of them drags the total down meaningfully — even when overall sentiment is high. A platform with strong ease of use but missing AI lands lower in our framework than in Capterra's, because AI is weighted in every category we score.

Reason 3

We score against where the category is heading

Capterra reviewers rate present sufficiency for their existing workflow. We rate against forward-looking dimensions like AI capabilities and integration depth. A platform with thin AI scores low in our framework even if its current customers don't notice the gap yet — we want our ratings to age well into 2027.

The pattern is consistent with how independent reviewers like Wirecutter, NerdWallet, and Consumer Reports score products against vendor-influenced platforms. A 4.0 from us means more than a 4.5 elsewhere — and that's deliberate.

Hands-On vs. Research-Based Reviews

Not every review is created equal, and you deserve to know the difference. We use a two-tier system, and every review is clearly labeled in its frontmatter so you know what you're getting.

Hands-On

Products I use daily across my businesses. First-person experience — real screenshots, actual workflow examples, opinions formed from months or years of daily use. When we say JobNimbus handles insurance restoration well, it's because we've processed hundreds of claims through it.

Research-Based

Products evaluated through official documentation, user reviews on G2, Capterra, and Reddit, video demos, and industry research. Thorough, but we haven't logged in and used the software day-to-day. Clearly labeled so you know the source.

Working toward making every review hands-on. We'd rather give you a well-researched review now than make you wait six months for first-hand experience with every tool.

A note on dimension scoring

The hands-on label changes the depth of color in the prose, the inclusion of personal workflow examples, and the confidence in describing daily-use friction. The dimension-by-dimension scoring uses the same framework either way — same dimensions, same weights, same source rules. Hands-on doesn't earn a rating bump, and research-based isn't penalized. The difference is editorial texture, not numerical weighting.

The 1–5 Scale and What Each Tier Means

Ratings round to one decimal place. We don't inflate scores. A 4.0 from us means something.

Rating Tiers

From "best in category" to "we'd steer you elsewhere"

4.5 – 5.0

Gold

Exceptional

Best in category. We'd recommend it to almost any contractor in the right trade.

4.0 – 4.4

Silver

Very Good

Recommended for most contractors. Strong product with minor trade-offs.

3.5 – 3.9

Bronze

Good

Solid choice with some limitations. Works well for specific use cases.

3.2

3.0 – 3.4

Average

Meaningful Gaps

Works but has meaningful gaps. There are probably better options.

Below 3.0

Skip

Not Recommended

Significant issues. We'd steer you toward alternatives.

Where Our Data Comes From

Every rating, pricing claim, feature description, and quoted review traces back to a verifiable source. We cite sources inline in every review and we'll happily walk through any individual claim with a vendor or a reader.

Official Vendor Sources

Vendor pricing pages (verified at publication)
Vendor product / feature pages
Vendor blog posts and press releases
Vendor YouTube channels and demo videos
Knowledge bases and documentation
Earnings calls and SEC filings (where applicable)

Independent Review Platforms

G2 (review counts, ratings, verbatim quotes)
Capterra and Software Advice
TrustRadius
GetApp
Apple App Store and Google Play Store ratings
Better Business Bureau profiles

Community Sources

Reddit (r/Construction, r/Roofing, r/Contractor, r/HVAC, trade subs)
ContractorTalk and other industry forums
Trade publications (Roofing Contractor, Pro Builder, Construction Dive, JLC Online)
YouTube reviews by named contractors
LinkedIn posts from leadership and named industry experts

Verification Rules

Pricing verified directly on the vendor's pricing page (third-party aggregator pricing is treated as stale until confirmed)
Every quoted customer review is attributed by username and source link
Marketing claims are flagged as marketing claims when used at all
Unverified data is labeled "unverified" rather than presented as fact

What Can and Can't Change a Rating

We update ratings during quarterly content reviews and whenever a product ships a meaningful change. Vendors and readers can submit information that triggers a re-evaluation — but the rules for what does and doesn't move a number are public and consistent.

Can change a rating

Pricing changes (tier additions, increases, transparency improvements)
New features shipping (AI launches, integrations added, modules released)
Features being removed or sunset (QuickBooks Desktop discontinuation, deprecated tiers)
Support quality changes documented in 5+ recent reviews
Ownership, leadership, or roadmap changes that affect product direction
Major bug patterns or platform stability issues documented across reviews
Score recalibration during quarterly content reviews
Dimension-weight adjustments (with all affected ratings recalculated and the change documented publicly)

Cannot change a rating

Vendor displeasure with the rating or with editorial framing
Threats of legal action
Advertising spend, sponsorship offers, or affiliate commission rates
Partnership offers or co-marketing proposals
Vendor-supplied "corrections" to editorial judgment (we own that)
Marketing collateral, sales decks, or claim sheets
Press releases announcing future features that haven't shipped
Comparisons to competitors as a basis for adjusting our score

Correction Policy

Mistakes happen. Pricing pages change. Features ship that we missed. Integrations get added or removed. Factual corrections are a normal part of running the site — here's the process.

How to Submit a Correction

Email the correction to info@contractortoolstack.com with the URL of the page in question, the specific claim or number you're disputing, and a verifiable source link supporting the correct information.
We acknowledge within 5 business days with a yes/no on whether the correction qualifies as factual (per the rules above). If we need more information, we'll ask.
Factual corrections are applied within one quarterly review cycle (so within ~90 days at the latest, often faster for time-sensitive items like pricing changes that already shipped).
Editorial-judgment disputes are not corrections. If you disagree with a rating, a section title, an opinion, or a comparative framing, that's editorial judgment and we don't change it based on vendor preference. We'll read the message and consider whether it raises a factual issue underneath the disagreement — if it does, that part gets the factual-correction treatment.
We publish a changelog when reviews update with material rating changes. The original publish date stays the same; the "Updated" date reflects the most recent revision.

Readers (not just vendors) can submit corrections through the same email if they spot stale pricing, missing features, or misattributed quotes — we're equally responsive to either source.

Our Affiliate Disclosure

Some links on this site are affiliate links. If you sign up through our link, we may earn a commission at no extra cost to you. This never influences our ratings or recommendations. We review software whether or not there's an affiliate program.

Plenty of products we recommend have no affiliate program at all. We still review them because the point of this site is helping contractors find the right tools — not maximizing our commissions.

Every page with affiliate links includes a disclosure at the top. No hiding it in the footer. No fine print.

How we review contractor software.

How Editorial Scores Work

The Categories We Score

Inside a Category: AI Call Answering

When a Product Spans Multiple Categories

Why this formula instead of an average?

How Community Scores Work

Two voter tiers

The two-gate verification process (why "verified" actually means something)

Fastest paths through license verification

What you see on a verified contractor's vote

What if no contractor has voted yet?

Vote scoping (per dimension, per category)

Why We Keep the Two Apart

Why Our Scores Often Run Lower Than Capterra and G2

Hands-On vs. Research-Based Reviews

The 1–5 Scale and What Each Tier Means

Where Our Data Comes From

What Can and Can't Change a Rating

Correction Policy

Our Affiliate Disclosure

Stay in the Loop