Database Design Mistakes I Made So You Don't Have To \u2014 TXT1.ai

I'll write this expert blog article for you as a comprehensive HTML piece from a specific persona's perspective.

Three years ago, I watched our startup's database grind to a halt at 2:47 AM on Black Friday. We had 50,000 concurrent users, $2 million in pending transactions, and a query that was taking 45 seconds to return product availability. Our CTO was screaming into Slack. Our investors were calling. And I was staring at a schema I'd designed six months earlier, realizing that every "clever" decision I'd made was now costing us roughly $8,000 per minute in lost revenue.

💡 Key Takeaways

The "I'll Just Use UUIDs Everywhere" Disaster
Premature Normalization: When Third Normal Form Becomes Your Enemy
The NULL Nightmare: When Optional Becomes Impossible
Index Overload: When More Isn't Better

I'm Marcus Chen, and I've spent the last twelve years as a database architect, working with everyone from scrappy SaaS startups to Fortune 500 enterprises. I've designed schemas for systems handling 500 million daily transactions, optimized queries that shaved 200ms off critical paths, and yes—I've made virtually every database design mistake in the book. That Black Friday incident? It taught me more about database design than my entire computer science degree.

Today, I'm the Principal Database Architect at TXT1.ai, where we process over 3 billion text messages annually across our AI-powered communication platform. But I got here by failing forward, and I want to share the expensive lessons I learned so you can skip the 2 AM panic attacks and angry investor calls.

The "I'll Just Use UUIDs Everywhere" Disaster

Let me start with what I call my $40,000 mistake. In 2019, I was designing a customer relationship management system for a mid-sized e-commerce company. I'd just read a blog post about how UUIDs were the "modern" way to handle primary keys—no more auto-increment integers, no more sequential ID exposure, perfect for distributed systems. I was sold.

So I made every single primary key in the system a UUID. Users table? UUID. Orders table? UUID. Order line items? You guessed it—UUID. I felt like a genius. The schema looked clean, there were no sequential ID vulnerabilities, and I could generate IDs client-side if needed. What could go wrong?

Everything. Absolutely everything went wrong.

Within six months, our database size had ballooned to 340GB when it should have been around 180GB. Query performance was degrading week over week. Our index sizes were massive—the orders table index alone was 12GB. Joins between orders and line items that should have taken 50ms were taking 800ms. The database was spending enormous amounts of time on disk I/O, and our AWS RDS costs had nearly doubled.

Here's what I learned the hard way: UUIDs are 128 bits (16 bytes) compared to a 4-byte integer or 8-byte bigint. That's 4x the storage for every primary key. But the real killer isn't the storage—it's the index fragmentation. UUIDs are random, which means every insert causes random writes across your B-tree indexes. With sequential integers, new rows append to the end of the index. With UUIDs, the database is constantly rebalancing the entire index structure.

We measured the impact: inserting 100,000 rows with integer IDs took 8 seconds. The same operation with UUIDs took 34 seconds. That's a 4.25x performance penalty just from the primary key choice. When you're processing 50,000 orders per day, that adds up fast.

The fix cost us three weeks of development time and required a carefully orchestrated migration during a maintenance window. We moved to bigint primary keys for high-volume tables and kept UUIDs only for tables where we genuinely needed globally unique identifiers across distributed systems—which turned out to be exactly two tables out of forty-seven.

My rule now: Use auto-incrementing integers or bigints for primary keys unless you have a specific, documented reason to use UUIDs. And "it seems more modern" is not a documented reason.

Premature Normalization: When Third Normal Form Becomes Your Enemy

Fresh out of university, I was obsessed with normalization. I'd memorized all the normal forms, could recite Codd's rules in my sleep, and believed that a properly normalized database was the pinnacle of design excellence. So when I designed my first production system—a content management platform—I normalized everything to third normal form and beyond.

"Every 'clever' database decision you make today is a potential 2 AM crisis six months from now. Design for the system you'll have, not the system you want."

I had a users table, a user_addresses table (because users might have multiple addresses), a user_phone_numbers table (multiple phones!), a user_preferences table, a user_settings table, and a user_metadata table. Loading a single user's profile required joining six tables. I was so proud of how "clean" it all looked.

Then we launched. The user profile page—the most frequently accessed page in the entire application—was taking 1.2 seconds to load. We were doing six joins for every single page view, and with 10,000 daily active users, that meant 60,000 joins per day just for profile views. The database CPU was constantly above 70%.

The wake-up call came when our lead developer pulled me aside and showed me the query execution plan. "Marcus," he said, "we're joining six tables to display a user's name, email, and phone number. This is insane." He was right. I'd optimized for theoretical purity instead of practical performance.

We denormalized strategically. The user's primary address went back into the users table. Their primary phone number? Same thing. We kept the separate tables for additional addresses and phone numbers, but 94% of our users only had one of each. That single change reduced our average profile page query time from 1.2 seconds to 180ms—an 85% improvement.

Here's what I learned: Normalization is a tool, not a religion. Third normal form is a great starting point, but real-world performance often requires strategic denormalization. Now I follow what I call the "80/20 denormalization rule"—if 80% of queries need data from multiple tables, that data probably belongs in one table. I measure query patterns in production and denormalize based on actual usage, not theoretical purity.

The key is knowing when to denormalize. High-read, low-write tables are perfect candidates. User profiles, product catalogs, configuration data—these are all great places to denormalize. Transaction tables with high write volumes? Keep those normalized to avoid update anomalies.

The NULL Nightmare: When Optional Becomes Impossible

I used to love nullable columns. They seemed so flexible, so accommodating. A user might not have a middle name? Make it nullable. An order might not have a discount code? Nullable. A product might not have a weight? You get the idea.

Primary Key Type	Storage Size	Index Performance	Best Use Case
Auto-increment INT	4 bytes	Excellent (sequential)	Single-server systems, high-volume tables
Auto-increment BIGINT	8 bytes	Excellent (sequential)	Large-scale single-server systems
UUID (v4)	16 bytes	Poor (random)	Distributed systems, security-sensitive IDs
ULID/UUID (v7)	16 bytes	Good (time-ordered)	Distributed systems needing sortability
Composite Keys	Varies	Fair to Good	Natural relationships, multi-tenant systems

In one particularly disastrous project, I designed an inventory management system where roughly 60% of columns across all tables were nullable. It seemed reasonable—not every field would always have data, right? Why force defaults when NULL clearly communicates "no value"?

The problems started immediately. Queries became a minefield of NULL checks. Want to find all products without a weight? You'd think "WHERE weight IS NULL" would work, but what about products where weight was explicitly set to zero? Now you need "WHERE weight IS NULL OR weight = 0". Want to sum up order totals? Better use COALESCE or your SUM might return NULL if any individual value is NULL.

🛠 Explore Our Tools

JavaScript Minifier - Compress JS Code Free → Changelog — txt1.ai → CSS Minifier - Compress CSS Online Free →

But the real disaster was the application code. Every single field access required a NULL check. Our codebase became littered with defensive programming: "if (product.weight != null && product.weight > 0)". We had bugs where NULL values were being compared with actual values and producing unexpected results. In SQL, NULL = NULL is not true—it's NULL. This caused countless logic errors.

The breaking point came when we tried to add a unique constraint on email addresses in our customers table. We wanted to ensure no duplicate emails, but the column was nullable (because some customers were imported without emails). In PostgreSQL, NULL values are considered distinct, so we could have multiple rows with NULL emails. Our "unique" constraint wasn't actually enforcing uniqueness the way we needed.

We spent two months cleaning up the schema. For every nullable column, we asked: "What does NULL actually mean here?" Often, the answer was "we were too lazy to define a proper default." We replaced NULLs with sensible defaults: empty strings for text fields where appropriate, zero for numeric fields that represented counts, and specific sentinel values for fields where "unknown" was genuinely different from "zero" or "empty".

My current approach: Make columns NOT NULL by default. Only allow NULL when it has a specific, documented meaning that's different from any possible value. If you can't explain in one sentence what NULL means for a particular column, it shouldn't be nullable. This single change has eliminated entire categories of bugs in every project I've worked on since.

Index Overload: When More Isn't Better

After learning about the importance of indexes the hard way, I went through a phase where I indexed everything. Slow query? Add an index. Another slow query? Add another index. I had tables with eight, nine, ten indexes. I thought I was being proactive, optimizing for every possible query pattern.

"UUIDs aren't inherently bad—using them everywhere without understanding the performance implications is. A 16-byte primary key on a table with 50 million rows isn't just storage overhead, it's a query execution tax you pay on every single operation."

Then I noticed something strange: our write performance was degrading. INSERT operations that used to take 50ms were now taking 300ms. UPDATE operations were even worse. Our nightly batch job that imported 100,000 product updates went from taking 20 minutes to taking 2 hours and 15 minutes. What was happening?

Every index you add makes writes slower. When you INSERT a row, the database has to update every index on that table. When you UPDATE a row, any index on the updated columns needs to be updated. I had created an index maintenance nightmare. Our products table had 11 indexes, and every single product update was touching all 11 of them.

I ran an analysis using PostgreSQL's pg_stat_user_indexes view and discovered that 6 of those 11 indexes had never been used. Not once. They were just sitting there, consuming 4.2GB of disk space and slowing down every write operation. Another 2 indexes were used so rarely (less than 10 times per day) that they weren't worth the write penalty.

We dropped the unused indexes and immediately saw write performance improve by 60%. The nightly batch job went back down to 35 minutes. INSERT operations returned to their normal 50-80ms range. We freed up 4.2GB of disk space and reduced our backup times by 8 minutes.

Here's my current indexing strategy: Start with indexes on primary keys and foreign keys—those are almost always needed. Then, add indexes based on actual query patterns, not hypothetical ones. I use database query logs to identify slow queries, then add targeted indexes to fix them. I review index usage quarterly and drop any index that hasn't been used in the last 30 days.

I also learned about composite indexes the hard way. If you have queries that filter on multiple columns (WHERE status = 'active' AND created_at > '2024-01-01'), a composite index on (status, created_at) is usually better than separate indexes on each column. The order matters too—put the most selective column first.

One more thing: Don't index low-cardinality columns. An index on a boolean field or a status column with only three possible values is usually worthless. The database will often ignore it and do a table scan anyway because it's more efficient.

The VARCHAR(255) Cargo Cult

For years, I declared every text column as VARCHAR(255). Email addresses? VARCHAR(255). Product names? VARCHAR(255). User bios? VARCHAR(255). I'd seen it in countless tutorials and open-source projects. It seemed like the standard, the safe choice.

I never questioned why 255. I just assumed it was some kind of database best practice, maybe related to how databases store string lengths internally. : I was wrong, and this lazy habit cost us in multiple ways.

The first problem hit when a customer tried to enter a product description that was 300 characters long. The application silently truncated it to 255 characters, cutting off the last sentence mid-word. The customer complained, we investigated, and I realized I'd arbitrarily limited a field that had no business being limited.

The second problem was more subtle. We were storing US ZIP codes as VARCHAR(255). ZIP codes are 5 digits, or 10 characters with the ZIP+4 format. We were allocating space for 255 characters to store 10. Multiply that by 2 million customer records, and we were wasting significant storage on a field that would never exceed 10 characters.

The third problem was validation. Because the database allowed 255 characters, our application code had inconsistent validation. Some forms limited email addresses to 100 characters, others to 255, others had no limit at all. The database wasn't enforcing any meaningful constraints, so we had data quality issues throughout the system.

I started actually thinking about field lengths. Email addresses? RFC 5321 specifies a maximum of 254 characters, so VARCHAR(254) is correct. Phone numbers? Even with international format and extensions, 20 characters is plenty. Product SKUs? Our format was 12 characters maximum. User bios? We wanted to limit them to 500 characters for UX reasons, so VARCHAR(500).

This exercise revealed something important: defining proper field lengths forces you to think about your data model. What are the actual constraints? What are the business rules? A VARCHAR(255) is a cop-out that says "I haven't thought about this."

Now I have a simple rule: Every text field length should be justified. If you can't explain why a field is VARCHAR(255), you haven't designed it properly. Use TEXT for truly variable-length content like descriptions or comments. Use specific VARCHAR lengths for structured data like emails, phone numbers, and codes. And document why you chose each length in your schema comments.

Ignoring Database Constraints: The Application Logic Trap

Early in my career, I believed that data validation belonged in the application layer. The database was just storage—a dumb bucket for bytes. I'd write elaborate validation logic in the application code: checking that email addresses were unique, ensuring that order totals matched line item sums, verifying that foreign key relationships were valid.

"The best database schema is the one that survives contact with production traffic. Everything else is just theory and wishful thinking."

This worked fine until we added a second application. We built an admin panel that accessed the same database, and suddenly we had two codebases that needed to enforce the same rules. Then we added a batch import process. Then an API. Before long, we had five different systems touching the database, each with its own validation logic—or lack thereof.

The inevitable happened: data corruption. An admin user managed to create an order with a NULL customer_id because the admin panel's validation was less strict than the main application's. A batch import created duplicate email addresses because it bypassed the uniqueness check. An API client inserted a product with a negative price because nobody thought to validate that in the API layer.

We spent three weeks cleaning up the data and adding proper database constraints. UNIQUE constraints on email addresses. CHECK constraints to ensure prices were positive. FOREIGN KEY constraints to maintain referential integrity. NOT NULL constraints on required fields. The database became the single source of truth for data integrity rules.

The transformation was remarkable. Bugs that used to slip through application-layer validation were now caught at the database level. We could add new applications without worrying about reimplementing validation logic. Data quality improved dramatically—we went from finding 50-100 data integrity issues per month to finding 2-3.

Here's what I learned: The database should enforce all data integrity rules that are fundamental to your domain model. If a rule is "this field must never be NULL," that's a NOT NULL constraint. If it's "email addresses must be unique," that's a UNIQUE constraint. If it's "prices must be positive," that's a CHECK constraint.

Application-layer validation is still important for user experience—you want to show friendly error messages before the user submits a form. But database constraints are your last line of defense. They ensure data integrity even when application code has bugs, even when someone writes a manual SQL script, even when a batch process goes rogue.

I now design database schemas with constraints first, then build application logic on top. The database enforces the rules; the application provides a nice interface for following those rules.

The Soft Delete Trap: When Deleted Isn't Really Deleted

Soft deletes seemed like such a good idea. Instead of actually deleting rows, you add a deleted_at timestamp column and set it when a user "deletes" something. The data is still there, you can recover it if needed, you can audit who deleted what and when. Perfect, right?

I implemented soft deletes across an entire e-commerce platform. Users, products, orders, everything had a deleted_at column. Every query needed to include "WHERE deleted_at IS NULL" to filter out deleted records. It felt a bit clunky, but the benefits seemed worth it.

The problems started small. A developer forgot to add the deleted_at check to a query, and suddenly deleted products were showing up in search results. We fixed that bug, but then it happened again in a different part of the codebase. And again. Every new feature was a potential place to forget the soft delete check.

Then came the performance issues. Our products table had 500,000 rows, but only 320,000 were "active" (not soft deleted). Every query was scanning all 500,000 rows and filtering out 180,000 of them. Indexes were less effective because they included deleted rows. Query performance degraded as the table grew, even though the number of active products stayed roughly constant.

The real nightmare was unique constraints. We wanted product SKUs to be unique, but with soft deletes, we couldn't use a simple UNIQUE constraint. A deleted product's SKU should be available for reuse, but the database didn't know that. We had to implement uniqueness checks in application code, which brought back all the problems I mentioned in the previous section.

Foreign key relationships were another mess. If a user was soft deleted, what happened to their orders? We couldn't actually delete the orders (we needed them for accounting), but we also couldn't leave them pointing to a "deleted" user. We ended up with complex cascading soft delete logic that was error-prone and hard to maintain.

After two years of fighting with soft deletes, we redesigned the system. For data that genuinely needed to be preserved for audit or recovery purposes, we created separate archive tables. When a user deleted their account, we moved their data to a users_archive table with a proper hard delete from the main users table. This kept the main tables clean and performant while still preserving historical data.

For data that needed to be "hidden" but not deleted (like draft posts or deactivated products), we used explicit status columns: status = 'active' or status = 'inactive'. This made the intent clear and avoided the ambiguity of soft deletes.

My current rule: Don't use soft deletes as a default pattern. If you need audit trails, use proper audit tables or event logs. If you need to hide data temporarily, use status flags. If you need to preserve deleted data, use archive tables. Soft deletes create more problems than they solve in most cases.

Learning to Love Boring Technology

I'll end with the meta-lesson that took me the longest to learn: boring is beautiful. For years, I chased the latest database technologies. I tried MongoDB when it was hot, convinced that schemaless was the future. I experimented with graph databases for a project that had no graph-like data. I pushed for a microservices architecture with separate databases for each service, creating a distributed data nightmare.

Each experiment taught me something, but they also cost time, money, and sleep. The MongoDB project required a complete rewrite after six months when we realized we needed transactions and relational integrity. The graph database added complexity without solving any actual problems. The microservices architecture created data consistency issues that took a year to fully resolve.

At TXT1.ai, we process 3 billion messages per year on PostgreSQL. Boring, reliable, well-understood PostgreSQL. We use standard relational design patterns. We have foreign keys and constraints. We normalize where it makes sense and denormalize where performance demands it. We use indexes judiciously. We monitor query performance and optimize based on real data.

And you know what? It works beautifully. Our database handles 50,000 queries per second during peak load. Our 99th percentile query time is under 100ms. We've had zero data corruption incidents in three years. Our database costs are predictable and reasonable.

The lesson: Choose boring technology that's well-understood and well-supported. Master the fundamentals before reaching for exotic solutions. Most database problems are solved with proper schema design, appropriate indexes, and well-written queries—not with the latest NoSQL database or distributed architecture pattern.

Database design is about trade-offs, not absolutes. Every decision has costs and benefits. The key is understanding those trade-offs and making informed choices based on your actual requirements, not theoretical ideals or trendy blog posts. Measure everything, optimize based on real data, and never be afraid to admit when a design decision was wrong and needs to be fixed.

That Black Friday disaster I mentioned at the beginning? We fixed it by adding three indexes and denormalizing two tables. Total development time: 4 hours. Total cost: $32,000 in lost revenue before the fix went live. The lesson was worth every penny—and every gray hair it gave me.

Disclaimer: This article is for informational purposes only. While we strive for accuracy, technology evolves rapidly. Always verify critical information from official sources. Some links may be affiliate links.