Three years ago, I approved a pull request that cost my company $47,000 in lost revenue over a single weekend. The code looked fine. The tests passed. The logic was sound. But I missed something subtle in how we handled timezone conversions for our European customers, and when daylight saving time kicked in, our payment processing system started silently failing for 18% of our user base.
💡 Key Takeaways
- The First 30 Seconds: What the PR Description Tells Me
- The Logic Layer: Does This Actually Solve the Problem?
- The Data Layer: Following the Information Flow
- The Security Lens: Thinking Like an Attacker
That incident changed how I review code forever. I'm Sarah Chen, and I've been a senior engineering manager at three different SaaS companies over the past decade, reviewing an average of 15-20 pull requests per week. That's roughly 8,000 PRs in my career, ranging from single-line hotfixes to massive architectural refactors spanning 50+ files. I've seen brilliant code that shipped bugs, and messy code that ran flawlessly in production for years.
What I've learned is that code review isn't really about finding bugs—though that's important. It's about building systems that survive contact with reality. It's about creating code that the next developer (often yourself, six months later) can understand and modify without fear. And it's about catching the invisible problems that only reveal themselves when your application scales, when edge cases emerge, or when that one customer in Australia does something you never anticipated.
This is my checklist. Not the theoretical one from textbooks, but the battle-tested one I've refined through countless production incidents, late-night debugging sessions, and that sinking feeling when you realize a bug you approved is now affecting real users.
The First 30 Seconds: What the PR Description Tells Me
Before I look at a single line of code, I spend 30 seconds reading the PR description. This might seem trivial, but a poorly written description is often the first red flag that the code itself might have issues. In my experience, developers who can't clearly articulate what their code does often haven't fully thought through the implementation.
I'm looking for three specific things: the problem being solved, the approach taken, and any trade-offs made. A good PR description reads like a mini design document. "Fixed bug in user authentication" tells me nothing. "Fixed race condition in JWT token refresh that caused 0.3% of users to be logged out unexpectedly during high-traffic periods by implementing a distributed lock using Redis" tells me the developer understands the problem deeply.
I also check if the PR is appropriately sized. My rule of thumb: if a PR touches more than 400 lines of code (excluding tests and generated files), it's probably too large to review effectively. Research from Google's engineering practices suggests that review effectiveness drops dramatically after 200-400 lines. I've found this to be accurate—after reviewing about 300 lines, my attention starts to waver, and I begin missing subtle issues.
When I encounter a massive PR, I ask the developer to break it into smaller, logical chunks. Yes, this takes more time upfront, but it prevents the scenario where I rubber-stamp a 2,000-line PR because I'm overwhelmed and just want to get through it. I've approved too many problematic PRs this way early in my career.
The description should also link to relevant context: the ticket or issue being addressed, any design documents, and related PRs. If I have to hunt through Jira or Slack to understand why this change is being made, that's friction that slows down the entire review process. At my current company, we've made it a policy that PRs without proper context don't get reviewed until the description is updated. This simple rule has improved our review quality measurably.
The Logic Layer: Does This Actually Solve the Problem?
Once I understand what the PR is supposed to do, I verify that it actually does it. This sounds obvious, but you'd be surprised how often code passes tests yet doesn't fully address the underlying issue. I've seen developers fix the symptom while leaving the root cause untouched, creating a time bomb that will explode later.
"Code review isn't really about finding bugs—it's about building systems that survive contact with reality and creating code that the next developer can understand without fear."
I start by mentally walking through the happy path. If this is a new feature for processing refunds, I imagine a customer requesting a refund and trace the code path from the API endpoint through the business logic to the database update. Does each step make sense? Are we handling the data correctly? Are we updating all the necessary records?
But the happy path is easy. What separates good code from production-ready code is how it handles the unhappy paths. What happens if the payment gateway is down? What if the refund amount is negative? What if the user requests a refund for an order that's already been refunded? I've learned to be almost paranoid about edge cases because production environments are chaos engines that will find every edge case you didn't consider.
I look for defensive programming patterns: null checks, input validation, boundary condition handling. But I also look for over-engineering. Not every function needs to handle every possible edge case. If a private method is only called from one place with validated input, adding redundant validation is just noise. The key is understanding the contract of each function and ensuring it's upheld at the boundaries.
One pattern I've seen cause problems repeatedly is the "optimistic update." A developer assumes an operation will succeed and updates the UI or database state before confirming success. This works 99% of the time, but that 1% creates incredibly confusing bugs where the system state becomes inconsistent. I always flag these patterns and ask: what happens if this operation fails halfway through?
The Data Layer: Following the Information Flow
Data bugs are the worst kind of bugs because they're often silent and cumulative. A logic error might crash your application immediately, but a data error might corrupt your database slowly over weeks, and by the time you notice, you have thousands of bad records and no clean way to fix them.
| Review Focus | Junior Reviewer | Senior Reviewer | Impact on Production |
|---|---|---|---|
| Syntax & Style | Primary focus | Automated/minimal attention | Low |
| Edge Cases | Obvious scenarios only | Timezone, locale, scale issues | High |
| Test Coverage | Tests exist and pass | Tests cover failure modes | Critical |
| Performance | Code works correctly | Behavior under load/scale | High |
| Documentation | Comments present | Why decisions were made | Medium |
When reviewing code that touches data, I trace the entire lifecycle of that data. Where does it come from? How is it validated? How is it transformed? Where is it stored? Who can access it? This is especially critical for user-generated content or financial data where errors have real consequences.
I pay particular attention to data type conversions. I once approved code that converted a price from a float to an integer for a calculation, then back to a float. Seemed harmless. But due to floating-point precision issues, prices like $10.99 would occasionally become $10.98 or $11.00 after the round trip. We didn't catch it until customers started complaining about incorrect charges. Now I scrutinize every type conversion, especially involving money or measurements.
Database migrations are another area where I'm extra cautious. I look for several things: Is the migration reversible? What happens if it fails halfway through? How long will it take on our production database? I've seen migrations that worked fine on a development database with 1,000 records but locked tables for 20 minutes on a production database with 50 million records, taking down the entire application.
For any code that modifies existing data, I ask: is there a backup plan? Can we roll this back if something goes wrong? At my previous company, we had a policy that any data migration affecting more than 10,000 records required a dry-run on a production snapshot and an explicit rollback procedure documented in the PR. This saved us multiple times.
I also look at data access patterns. Is this code doing N+1 queries? Is it loading entire collections into memory when it only needs a few fields? These performance issues might not be obvious in development but become critical at scale. I once reviewed code that seemed fine but would have caused our database CPU to spike to 100% if it had shipped to production, simply because it was making 50 separate queries instead of one join.
🛠 Explore Our Tools
The Security Lens: Thinking Like an Attacker
Security review isn't just about finding SQL injection vulnerabilities—though I do look for those. It's about thinking through how a malicious user might abuse the system. After dealing with two security incidents in my career (one minor, one that made the news), I've become much more paranoid about security implications.
"A poorly written PR description is often the first red flag that the code itself might have issues. Developers who can't clearly articulate what their code does often haven't fully thought through the implementation."
The first thing I check: does this code handle user input? If yes, is it validated and sanitized? I don't trust any input, even from authenticated users. I've seen too many cases where a developer assumed "only admins can access this endpoint" and skipped validation, only to have that assumption violated later when the authorization logic changed.
I look for common vulnerability patterns: SQL injection, XSS, CSRF, insecure deserialization, path traversal. But I also look for business logic vulnerabilities that automated scanners miss. Can a user manipulate their account balance by sending negative numbers? Can they access other users' data by changing an ID in the URL? Can they bypass rate limiting by making requests from multiple IPs?
Authentication and authorization logic gets extra scrutiny. I've seen developers implement authentication correctly but forget authorization, allowing any logged-in user to access any resource. I always ask: who should be able to perform this action, and how are we enforcing that? The code should fail closed—if there's any doubt about permissions, deny access.
I also think about information disclosure. Does this error message reveal too much about the system? Does this API endpoint return more data than necessary? I once caught a PR that would have exposed user email addresses in a public API response because the developer was returning the entire user object instead of just the fields needed for the UI.
Secrets management is another critical area. I look for hardcoded credentials, API keys in code, or sensitive data in logs. Even if the repository is private, secrets in code are a ticking time bomb. I've seen developers accidentally commit AWS credentials, and within hours, their account was compromised and used for cryptocurrency mining, racking up thousands in charges.
The Testing Strategy: Beyond Green Checkmarks
Tests passing is necessary but not sufficient. I've approved PRs with 100% test coverage that still had critical bugs because the tests weren't testing the right things. Good tests verify behavior, not implementation. They test edge cases, not just the happy path. And they're maintainable—they don't break every time you refactor.
I look at the test coverage report, but I don't obsess over the percentage. I've seen codebases with 95% coverage that were full of bugs, and codebases with 60% coverage that were rock solid. What matters is whether the critical paths are tested and whether the tests would actually catch regressions.
For each new feature or bug fix, I ask: how would I know if this broke? Is there a test that would fail? If the answer is no, I request tests. But I'm specific about what needs testing. "Add tests" is too vague. "Add a test that verifies the refund amount is correctly calculated when there's a partial return with tax" is actionable.
I pay attention to test quality. Are the tests readable? Do they have clear arrange-act-assert structure? Are they testing one thing at a time? I've seen tests that are so complex they need their own tests. If I can't understand what a test is verifying within 30 seconds, it's probably too complicated.
Integration tests are where I focus most of my attention because that's where the real bugs hide. Unit tests are great for verifying individual functions, but most production bugs occur at the boundaries between systems. Does this code correctly handle a timeout from the payment gateway? Does it properly handle a database connection failure? These scenarios are hard to unit test but critical to verify.
I also look for test data quality. Are the tests using realistic data? I've seen tests pass with perfect, sanitized test data but fail in production with messy real-world data—names with apostrophes, addresses with special characters, dates in unexpected formats. Good tests use data that resembles what you'll see in production.
The Performance Perspective: Will This Scale?
Performance issues are insidious because they often don't appear until you're at scale, and by then, they're much harder to fix. I've learned to think about performance implications during code review, not after the application is slow in production.
"I've seen brilliant code that shipped bugs, and messy code that ran flawlessly in production for years. The difference isn't always in the elegance—it's in understanding how systems fail."
I look for obvious performance anti-patterns: N+1 queries, loading entire collections when you only need a subset, doing expensive operations in loops, blocking operations on the main thread. But I also think about algorithmic complexity. Is this O(n²) algorithm acceptable for the expected data size? If we have 100 items, probably. If we might have 100,000 items, definitely not.
Caching is a double-edged sword. I look for opportunities where caching would help, but I also look for caching that's too aggressive or doesn't have proper invalidation. I've seen caching bugs that were incredibly hard to debug because the system was serving stale data, but only sometimes, depending on cache hit rates and timing.
Database queries get special attention. I mentally estimate how many rows each query will touch and how often it will run. A query that scans a million rows might be fine if it runs once a day in a background job, but it's a disaster if it runs on every page load. I look for missing indexes, inefficient joins, and queries that could be optimized.
I also think about resource usage beyond just CPU and database. Is this code holding connections open longer than necessary? Is it loading large files into memory? Is it making synchronous calls to external services that could timeout? These issues might not show up in development but become critical in production under load.
For any code that processes data in bulk, I ask: what happens if we need to process 10x more data? 100x? Is there a way to batch or stream this instead of loading everything into memory? I've seen applications crash because a background job tried to load a million records into memory at once.
The Maintainability Factor: Code is Read More Than Written
Code is written once but read dozens or hundreds of times over its lifetime. Maintainability isn't about following style guides religiously—it's about making code that future developers (including yourself) can understand and modify confidently.
I look for clear naming. Variable and function names should reveal intent. If I need to read the implementation to understand what a function does, the name isn't good enough. "processData" tells me nothing. "calculateMonthlyRecurringRevenue" tells me exactly what it does. I've found that unclear naming is often a sign that the developer hasn't fully thought through what the code is doing.
Comments are tricky. Too few comments and the code is hard to understand. Too many comments and they become noise that gets out of sync with the code. I look for comments that explain why, not what. "Loop through users" is obvious from the code. "We process users in batches of 100 to avoid overwhelming the email service" explains the reasoning behind a decision that might not be obvious.
I pay attention to function length and complexity. If a function is more than 50 lines or has more than 3 levels of nesting, it's probably doing too much and should be broken up. But I don't enforce this rigidly—sometimes a long function is the clearest way to express an algorithm. The key is whether the function has a single, clear purpose.
Code duplication is another area I watch carefully. A little duplication is fine—sometimes it's clearer than creating an abstraction. But if I see the same logic repeated three or more times, that's a sign it should be extracted into a shared function. The challenge is finding the right level of abstraction—too specific and it's not reusable, too generic and it becomes a confusing mess of parameters.
I also think about error handling and logging. Are errors handled appropriately? Are we logging enough information to debug issues in production? But not so much that we're logging sensitive data or creating noise? Good error messages are specific and actionable. "Error processing payment" is useless. "Failed to charge card ending in 1234: insufficient funds" tells you exactly what went wrong.
The Deployment Considerations: Getting Code to Production Safely
Code review doesn't end with approving the PR. I think about how this code will be deployed and what could go wrong during that process. Some of my worst production incidents came from code that worked perfectly but broke during deployment due to issues I didn't consider during review.
For any code that changes database schema, I verify there's a migration strategy. Can we deploy this without downtime? What happens to in-flight requests during the deployment? If we need to roll back, will the old code work with the new schema? These questions have saved me from several near-disasters.
I look for feature flags or gradual rollout strategies for risky changes. If this code touches a critical path like payment processing or authentication, I want a way to disable it quickly if something goes wrong. I've learned that even the most thoroughly tested code can have issues in production, and having a kill switch is invaluable.
Configuration changes get extra scrutiny. Is this new configuration value documented? What's the default? What happens if it's misconfigured? I've seen outages caused by a single misconfigured environment variable that wasn't caught until production because it worked fine with the default value in development.
I also think about monitoring and observability. How will we know if this code is working correctly in production? Are there metrics we should be tracking? Are there alerts we should set up? Code that goes to production without proper monitoring is a black box—you won't know there's a problem until users start complaining.
For any code that affects user-facing behavior, I ask: how will we communicate this change? Do we need to update documentation? Do we need to notify users? I've seen features ship that confused users because nobody thought about the communication aspect during development.
The Human Element: Building Better Teams Through Better Reviews
After 10 years of code reviews, I've learned that the technical aspects are only half the story. How you deliver feedback matters as much as what feedback you give. A technically perfect review that demoralizes the developer is a failure. A review that catches bugs while helping the developer grow is a success.
I've developed a few principles for giving feedback. First, I assume good intent. The developer isn't trying to write bad code—they might be missing context, facing time pressure, or simply approaching the problem differently than I would. Starting from a place of curiosity rather than criticism changes the entire tone of the review.
I distinguish between blocking issues and suggestions. Blocking issues are things that must be fixed before merging: security vulnerabilities, critical bugs, violations of architectural principles. Suggestions are improvements that would be nice but aren't essential: better variable names, additional tests, performance optimizations that aren't critical yet. I'm explicit about which is which.
I try to explain my reasoning, especially when asking for changes. "This won't work" is frustrating. "This could cause a race condition if two users submit the form simultaneously because we're not using a transaction here" helps the developer understand the issue and learn for next time. I've found that developers are much more receptive to feedback when they understand the why behind it.
I also look for opportunities to give positive feedback. If the developer handled a tricky edge case well, I say so. If the code is particularly clean and readable, I acknowledge it. Code review shouldn't just be about finding problems—it's also about reinforcing good practices and building confidence.
For junior developers, I'm more hands-on with my reviews. I might suggest specific implementations or point them to relevant documentation. For senior developers, I ask more questions and trust their judgment more. The goal is to calibrate my review style to help each developer grow.
I've also learned to pick my battles. Not every PR needs to be perfect. If the code works, is reasonably maintainable, and doesn't introduce security or performance issues, sometimes it's better to approve it with minor suggestions rather than block it for stylistic preferences. Perfect is the enemy of shipped.
Finally, I've learned that code review is a two-way street. I encourage developers to push back on my feedback if they disagree. Some of my best learning moments have come from developers explaining why my suggested approach wouldn't work or showing me a better way to solve the problem. The best code reviews are collaborative discussions, not one-way critiques.
That $47,000 incident I mentioned at the beginning? It taught me that code review is about more than just reading code—it's about thinking through all the ways code can fail in production, considering the broader system context, and building a culture where catching issues before they reach users is everyone's responsibility. My checklist has evolved over 10 years and 8,000 PRs, and it will keep evolving. But these principles—understanding the problem, following the data, thinking about security and performance, ensuring maintainability, planning for deployment, and treating developers with respect—have served me well across three companies and countless production systems.
The best code review is the one that catches the bug you didn't know you had, teaches the developer something new, and ships better software to users. That's what I'm looking for in every PR I review.
Disclaimer: This article is for informational purposes only. While we strive for accuracy, technology evolves rapidly. Always verify critical information from official sources. Some links may be affiliate links.