Skip to content

Estimation Methodology

Accurate estimation enables predictable delivery and sustainable development pace. YeboLearn uses story points with velocity tracking to forecast releases and manage commitments.

Story Point Fundamentals

What Story Points Measure

Story points represent:

  • Complexity of the work
  • Amount of work required
  • Uncertainty and risk
  • Learning curve

Story points do NOT represent:

  • Hours or days (not time)
  • Individual developer speed
  • Perfect accuracy (estimates are ranges)

Why Story Points vs Hours?

  • Abstract relative sizing is easier
  • Accounts for complexity, not just time
  • Less pressure than hour estimates
  • Team-based, not individual
  • Velocity stabilizes over time

Fibonacci Scale

1, 2, 3, 5, 8, 13, 21

Why Fibonacci?
- Forces meaningful distinctions
- Reflects increasing uncertainty
- Prevents false precision (no "7.5 points")

Estimation Scale Reference

1 Point: Trivial

Time: 1-2 hours
Complexity: Very simple
Uncertainty: None
Examples:
- Fix typo in error message
- Update text on button
- Add simple validation
- Change constant value

Technical:
- No tests needed beyond smoke test
- No database changes
- No API changes
- Straightforward implementation

2 Points: Simple

Time: Half day
Complexity: Straightforward
Uncertainty: Minimal
Examples:
- Add new API endpoint (CRUD)
- Create basic form component
- Add email validation
- Update existing UI element

Technical:
- Basic unit tests needed
- Standard implementation pattern
- May need database migration
- Clear requirements

3 Points: Moderate

Time: 1 day
Complexity: Some complexity
Uncertainty: Low to medium
Examples:
- Build quiz submission flow
- Integrate third-party API (well-documented)
- Add search functionality
- Create reusable component with variants

Technical:
- Multiple unit tests needed
- Integration test recommended
- Some edge cases to handle
- May touch multiple files/modules

5 Points: Complex

Time: 2-3 days
Complexity: Significant
Uncertainty: Medium
Examples:
- Implement AI quiz generator
- Build student dashboard with analytics
- Create payment processing flow
- Refactor authentication system

Technical:
- Comprehensive tests required
- Multiple components/modules
- External dependencies
- Performance considerations
- Error handling important

8 Points: Very Complex

Time: 1 week
Complexity: High
Uncertainty: High
Examples:
- AI essay grading engine
- Offline mode implementation
- Real-time collaboration features
- Major database schema refactor

Technical:
- Extensive testing needed
- Complex state management
- Multiple integration points
- Significant edge cases
- Performance critical
- Should consider breaking down

13 Points: Epic (Split Required)

Time: 2 weeks or more
Complexity: Very high
Uncertainty: Very high
Action: Break into smaller stories

Example: "Build complete M-Pesa integration"
Should split into:
- API endpoint setup (3 pts)
- Payment initiation (5 pts)
- Webhook handling (5 pts)
- Error recovery (3 pts)
- Testing & documentation (3 pts)
Total: 19 points across 5 stories

Estimation Process

Planning Poker

How It Works:

  1. Story Presentation (2 min)

    • Product owner reads story
    • Explains user value and context
    • Shows designs if applicable
  2. Clarification (3 min)

    • Team asks questions
    • Technical approach discussed
    • Dependencies identified
    • Edge cases surfaced
  3. Private Estimation

    • Each team member selects card privately
    • No discussion during selection
    • Prevents anchoring bias
  4. Simultaneous Reveal

    • Everyone shows card at once
    • Range of estimates visible
  5. Discussion (5 min)

    • Highest and lowest explain reasoning
    • Different perspectives shared
    • Hidden complexity surfaced
    • Risks and assumptions discussed
  6. Re-estimate

    • Second round of estimation
    • Usually converges
    • If still divergent, more discussion
  7. Consensus

    • Team agrees on final estimate
    • Record in story
    • Move to next story

Example Session:

markdown
Story: AI generates personalized quiz from student's notes

Product Owner: Students can upload notes, AI creates a 10-question quiz
tailored to their learning gaps based on past performance.

Developer 1: Do we need to store the notes or is this ephemeral?
PO: Store for future quiz variations.

Developer 2: What AI model? Gemini?
PO: Yes, Gemini API.

Developer 3: How do we identify learning gaps?
PO: From quiz history and progress data we already track.

[Team estimates privately]

Reveal: 3, 5, 5, 8

Developer 1 (3 points): I thought this was just API integration,
we have similar quiz generation already.

Developer 4 (8 points): But we need to build the note parsing,
learning gap analysis, and personalization logic. Plus file uploads.

Discussion: Ah, the learning gap analysis is new complexity.
File upload is straightforward. Note parsing might be tricky.

[Team re-estimates]

Reveal: 5, 5, 5, 5

Consensus: 5 story points

Estimation Calibration

Reference Stories (Historical Examples):

These actual YeboLearn stories serve as benchmarks:

1 Point Reference:

Story: Fix login redirect after signup
What: Redirect users to dashboard instead of landing page after signup
Why: One-line route change, tested manually
Actual Effort: 1.5 hours

2 Points Reference:

Story: Add email validation to registration
What: Validate email format and uniqueness before allowing signup
Why: Standard validation, database check, error handling
Actual Effort: 4 hours

3 Points Reference:

Story: Create quiz results summary component
What: Display score, correct/incorrect answers, time taken
Why: React component with multiple sub-components, state management
Actual Effort: 6 hours

5 Points Reference:

Story: Implement M-Pesa payment webhook
What: Receive payment callbacks, update database, send confirmation
Why: External API integration, error handling, idempotency, testing
Actual Effort: 2 days

8 Points Reference:

Story: Build AI quiz generation API
What: Accept topic, generate questions using Gemini, store and return
Why: Complex AI integration, prompt engineering, rate limiting, caching
Actual Effort: 5 days

Calibration Exercise:

Before estimation session, team reviews:
"This story feels similar to [reference story] which was [X points]"

Example:
"This payment refund feature is similar to the M-Pesa webhook we did,
so probably 5 points like that one."

Velocity Tracking

Calculating Velocity

Sprint Velocity = Sum of completed story points

Sprint 25:
Completed Stories:
✓ Essay grading UI (9 pts)
✓ AI grading backend (13 pts)
✓ Teacher dashboard (5 pts)
✓ Performance optimization (5 pts)
✓ Bug fixes (3 pts)

Total: 35 story points

Incomplete:
✗ Analytics improvements (3 pts) - carried to next sprint

Sprint 25 Velocity: 35 points

Important: Only count completed stories (done = in production)

Rolling Average Velocity

Last 6 Sprints:
Sprint 20: 34 points
Sprint 21: 38 points
Sprint 22: 32 points (holiday week)
Sprint 23: 40 points
Sprint 24: 36 points
Sprint 25: 35 points

Simple Average: 35.8 points
Median: 35.5 points

Use for Planning: 32-36 points (conservative)

Why Rolling Average?

  • Smooths out variance
  • Accounts for team changes
  • Adapts to skill improvement
  • More reliable than single sprint

Healthy Trends:

Increasing Velocity (Gradual):
30 → 32 → 35 → 36 → 38 → 40

Reasons:
✓ Team gaining experience
✓ Better tooling/automation
✓ Reduced context switching
✓ Improved estimation accuracy
✓ Less technical debt

Action: Great! Maintain quality standards.
Stable Velocity:
35 → 36 → 34 → 35 → 36 → 35

Reasons:
✓ Mature team
✓ Consistent estimation
✓ Predictable capacity

Action: Excellent! Reliable delivery.

Concerning Trends:

Decreasing Velocity:
40 → 38 → 35 → 32 → 28 → 25

Reasons:
⚠️ Accumulating technical debt
⚠️ Increased production support
⚠️ Team turnover
⚠️ Over-commitment leading to burnout
⚠️ More complex features

Action: Investigate root cause, address issues.
Erratic Velocity:
25 → 45 → 30 → 50 → 20 → 40

Reasons:
⚠️ Inconsistent estimation
⚠️ Poor sprint planning
⚠️ Frequent scope changes
⚠️ External dependencies

Action: Improve estimation, stabilize process.

Velocity Analysis

Sprint Retrospective Velocity Review:

markdown
Sprint 25 Velocity Analysis

Committed: 32 points
Delivered: 35 points
Achievement: 109%

Breakdown:
Planned Work: 32 points (100% completed)
Stretch Goals: 3 points (completed)
Unplanned Work: 0 points

Why We Over-Delivered:
✓ Stretch goals were well-estimated
✓ No production incidents
✓ Fast PR review cycle
✓ Good pair programming on complex work

Lessons:
• We can handle stretch goals reliably
• Consider committing to 34-36 points next sprint
• Keep pair programming on 8+ point stories

Capacity Planning

Team Capacity Calculation

Available Hours Per Sprint:

Developer: 80 hours per 2-week sprint
├─ Development: 56 hours (70%)
├─ Meetings: 8 hours (10%)
├─ Code Reviews: 8 hours (10%)
├─ Learning/Exploration: 4 hours (5%)
└─ Unexpected Issues: 4 hours (5%)

Effective Development: ~56 hours/developer/sprint

Team of 4 Developers:

Total Capacity: 4 × 56 = 224 hours

Story Points/Hour (Historical):
224 hours ÷ 35 points = 6.4 hours/point

This varies by complexity:
- 1 point ≈ 2 hours
- 2 points ≈ 4 hours
- 3 points ≈ 7 hours
- 5 points ≈ 16 hours
- 8 points ≈ 32 hours

Planning Capacity:

Team Velocity: 35 points/sprint
Safety Buffer: 10%

Sprint Commitment: 32 points
Stretch Goals: 6 points

If everything goes perfect: 38 points
Typical delivery: 32-35 points
Conservative: 28-32 points

Adjusting for Factors

Holidays and PTO:

Normal Sprint: 35 points capacity

Sprint with 1 Developer on PTO (25% team):
Reduced Capacity: 35 × 0.75 = 26 points

Sprint with Christmas Holiday:
Reduced Capacity: 35 × 0.6 = 21 points

Action: Plan accordingly, under-commit

New Team Members:

First Sprint: 50% productive (learning, onboarding)
Second Sprint: 70% productive
Third Sprint: 90% productive
Fourth Sprint+: 100% productive

Team of 3 + 1 New Developer:
Sprint 1: (3 × 12) + (1 × 6) = 42 points
Sprint 2: (3 × 12) + (1 × 8) = 44 points
Sprint 3: (3 × 12) + (1 × 11) = 47 points

Major Production Incidents:

Historical Impact:
Average: 0.5 incidents per sprint
Average Resolution: 8 hours (1 point equivalent)

Capacity Planning: Built into velocity average
If incident-free sprint: Deliver stretch goals

Release Forecasting

Forecasting Methodology

Simple Forecast:

Feature Size: 42 story points
Team Velocity: 35 points/sprint
Sprints Needed: 42 ÷ 35 = 1.2 sprints

Forecast: 2 sprints (round up for safety)
Timeline: 4 weeks

Confidence Intervals:

Based on velocity variance:

Optimistic (90th percentile): 40 points/sprint
→ 42 ÷ 40 = 1.05 sprints (2 weeks)

Most Likely (median): 35 points/sprint
→ 42 ÷ 35 = 1.2 sprints (3 weeks)

Pessimistic (10th percentile): 28 points/sprint
→ 42 ÷ 28 = 1.5 sprints (4 weeks)

Communicate: "Between 2-4 weeks, most likely 3 weeks"

Multi-Feature Roadmap

Q1 2026 Roadmap Forecast:

markdown
Features Planned:
1. AI Essay Grading - 42 points
2. M-Pesa Integration - 26 points
3. Offline Mode - 34 points
4. WhatsApp Notifications - 18 points
Total: 120 points

Team Velocity: 35 points/sprint
Sprints Available: Q1 = 6 sprints
Total Capacity: 6 × 35 = 210 points

Buffer (20%): 210 × 0.8 = 168 points available

Feasibility: 120 points < 168 points ✓

Forecast:
Sprint 26-27: AI Essay Grading (42 pts)
Sprint 28: M-Pesa Integration (26 pts)
Sprint 29-30: Offline Mode (34 pts)
Sprint 31: WhatsApp Notifications (18 pts)

Remaining Capacity: 48 points
Use for: Bug fixes, performance, technical debt

Dependency Considerations:

If features depend on each other:

M-Pesa Integration (26 pts) must complete before
WhatsApp Payment Notifications (8 pts)

Total Sequential: 34 points = 1 sprint

Timeline Impact:
- Parallel: Faster delivery
- Sequential: Longer timeline
- Plan sprints accordingly

Estimation Best Practices

Do's

Estimate as a Team:

  • Multiple perspectives surface hidden complexity
  • Shared understanding of work
  • Better accuracy than individual estimates

Use Historical Data:

  • Reference similar past stories
  • Calibrate against known benchmarks
  • Learn from estimation accuracy

Re-estimate if Needed:

  • If story changes significantly, re-estimate
  • If actual effort differs greatly, analyze why
  • Update reference stories

Account for Uncertainty:

  • Higher points for higher uncertainty
  • Create spike stories for unknowns
  • Build buffer into risky work

Focus on Relative Sizing:

  • "Is this bigger or smaller than X?"
  • Don't agonize over perfect precision
  • Estimates are ranges, not commitments

Don'ts

Don't Convert to Hours:

  • Story points are abstract, keep them that way
  • Conversion creates false precision
  • Focus on relative complexity

Don't Estimate Individual Tasks:

  • Estimate user stories, not technical tasks
  • Task breakdown happens during sprint
  • Too granular = wasted effort

Don't Pad Estimates:

  • Trust the velocity to account for unknowns
  • Padding leads to inflated points
  • Use story points honestly

Don't Compare Developers:

  • Velocity is team metric, not individual
  • Different developers ≠ different points
  • Focus on team improvement

Don't Ignore Outliers:

  • If 8-point story took 1 week, understand why
  • Update estimation approach if needed
  • Learn from variance

Common Estimation Scenarios

Scenario 1: Unknown Technology

Story: Integrate WhatsApp Business API

Team: "We've never used WhatsApp API before."

Approach:
1. Create spike story (time-boxed: 1 day)
   → Research API, build proof-of-concept
   → Estimate remaining work after spike

2. Or estimate with high uncertainty
   → 8 points (vs 5 if familiar)
   → Account for learning curve
   → Pair programming to share knowledge

Scenario 2: Vague Requirements

Story: "Improve student dashboard"

Team: "This is too vague to estimate."

Approach:
1. Refuse to estimate until clarified
2. Work with product owner to define specifics
3. Break down into concrete stories:
   - Add quiz completion chart (3 pts)
   - Show recent activity feed (3 pts)
   - Display upcoming assignments (2 pts)
Total: 8 points (now estimatable)

Scenario 3: Dependency Uncertainty

Story: Launch M-Pesa integration

Team: "Depends on M-Pesa approval, outside our control."

Approach:
1. Estimate work we control (integration code)
2. Flag dependency in story
3. Don't commit to sprint until dependency resolved
4. Have backup work ready if blocked

Estimation Metrics

Estimation Accuracy

Track Accuracy Over Time:

Sprint 25:
Story: AI Grading Backend
Estimate: 13 points
Actual: Completed in sprint (13 points accurate)

Story: Teacher Dashboard
Estimate: 5 points
Actual: 8 points (60% accurate, underestimated)

Average Accuracy: 85%

Improving Accuracy:

  • Analyze under/over-estimated stories
  • Update reference examples
  • Improve breakdown of complex work
  • Better upfront clarification

Velocity Predictability

Standard Deviation of Velocity:
Sprint 20-25: 34, 38, 32, 40, 36, 35
Average: 35.8
Std Dev: 2.9 (low variance = predictable)

Target: Std Dev < 5 points (good predictability)

YeboLearn - Empowering African Education