The Database Questions That Stumped Me at Oracle, Microsoft, and Amazon
After 15 years managing databases that handle terabytes of data and surviving 200+ DBA interviews, I've learned that great database administrators aren't just query optimizers—they're system guardians who prevent disasters before they happen.
My most embarrassing DBA interview was at a Fortune 500 company. They showed me their production database performance graphs—red lines everywhere, response times in seconds, not milliseconds. "What would you do?" they asked. I started talking about adding indexes. Wrong answer. They interrupted: "That's cute, but we need someone who can handle a 5TB database with 10,000 concurrent users where a 5-minute outage costs us $2 million."
That moment taught me that DBA interviews aren't about knowing SQL syntax. They're about understanding that databases are the beating heart of every business application. When a database goes down, everything stops. The best DBAs don't just maintain databases—they architect data ecosystems that never fail, scale infinitely, and recover from disasters in minutes.
This guide covers 35 questions that separate junior database administrators from senior database architects. Each answer reflects real-world experience managing mission-critical systems, complete with disaster recovery stories, performance optimization strategies, and the security practices that keep data safe.
What Database Admin Interviewers Evaluate
- SQL Mastery: Query optimization, indexing strategies, execution plan analysis
- Performance Tuning: Bottleneck identification, resource management, capacity planning
- Backup & Recovery: Disaster recovery planning, point-in-time recovery, business continuity
- Security & Compliance: Access controls, encryption, audit trails, regulatory compliance
- High Availability: Replication, clustering, failover strategies, monitoring
SQL Optimization & Indexing (Questions 1-8)
1. How do you identify and fix slow-running queries?
Tests systematic approach to query performance analysis
Answer:
- Identify slow queries: Use query logs, performance monitoring tools, or sys.dm_exec_query_stats
- Analyze execution plans: EXPLAIN ANALYZE in PostgreSQL, EXPLAIN PLAN in Oracle, or SQL Server Management Studio
- Look for common issues: Missing indexes, table scans, excessive joins, unnecessary WHERE clauses
- Optimize systematically: Add indexes, rewrite queries, update statistics, consider partitioning
- Test and measure: Compare before/after performance metrics
# PostgreSQL slow query analysis
-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000;
-- Analyze execution plan
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM users WHERE email = 'user@example.com';
2. Explain the different types of database indexes and when to use them.
Answer:
- B-Tree Index (Default): Best for equality and range queries, maintains sorted order
- Hash Index: Fast for equality lookups, not suitable for range queries
- Bitmap Index: Efficient for low-cardinality columns with few distinct values
- Partial Index: Index only rows meeting certain conditions to save space
- Composite Index: Multiple columns, order matters for query optimization
- Unique Index: Enforces uniqueness while improving query performance
- Full-text Index: For text search operations in large text fields
# Index examples
-- Composite index (order matters)
CREATE INDEX idx_user_email_status ON users(email, status);
-- Partial index
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
3. How do you determine if an index is being used effectively?
Answer:
Methods to check index usage:
- Execution plans: Look for index seeks vs. table scans
- Index usage statistics: sys.dm_db_index_usage_stats in SQL Server
- pg_stat_user_indexes: PostgreSQL index usage statistics
- Query performance monitoring: Before/after query execution times
# PostgreSQL index usage analysis
SELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_tup_read = 0;
-- Shows unused indexes
Red flags: Indexes with zero usage, queries still doing table scans, high maintenance overhead
4. What is query execution plan analysis and how do you use it?
Answer:
Execution plan analysis: Understanding how the database engine executes your queries
Key metrics to analyze:
- Cost estimates: Relative cost of different operations
- Row estimates vs. actual: Indicates outdated statistics
- Join types: Nested loop, hash join, merge join efficiency
- Index usage: Index seeks vs. scans vs. table scans
- I/O operations: Logical and physical reads
# Reading execution plans
-- Look for these warning signs:
-- Table Scan (expensive for large tables)
-- Key Lookup (missing covering index)
-- Hash Match (potential missing join index)
-- High estimated vs actual row counts
5. How do you handle database deadlocks?
Answer:
Deadlock prevention strategies:
- Consistent lock ordering: Always acquire locks in the same order across transactions
- Keep transactions short: Minimize lock hold time
- Use appropriate isolation levels: READ COMMITTED instead of SERIALIZABLE when possible
- Add proper indexes: Reduce lock duration
- Batch operations: Process large updates in smaller chunks
# Deadlock detection and handling
-- SQL Server deadlock monitoring
SELECT event_data.value('(/event/data/value)[1]', 'varchar(max)')
FROM sys.fn_xe_file_target_read_file('system_health*.xel', null, null, null)
WHERE event_data.value('(event/@name)[1]', 'varchar(128)') = 'xml_deadlock_report';
When deadlocks occur: Implement retry logic with exponential backoff in application code
6. What are database statistics and why are they important?
Answer:
Database statistics: Metadata about data distribution that helps the query optimizer make decisions
Types of statistics:
- Column statistics: Data distribution, min/max values, null count
- Index statistics: Key distribution, density, selectivity
- Table statistics: Row count, page count, modification count
# Updating statistics
-- PostgreSQL
ANALYZE users;
-- SQL Server
UPDATE STATISTICS users WITH FULLSCAN;
-- Oracle
EXEC DBMS_STATS.GATHER_TABLE_STATS('SCHEMA', 'USERS');
Best practices: Schedule regular statistics updates, monitor stale statistics, update after bulk data loads
7. How do you optimize queries with multiple joins?
Answer:
Join optimization strategies:
- Index foreign keys: Ensure all join columns are indexed
- Filter early: Apply WHERE clauses before joins when possible
- Choose optimal join order: Join smaller result sets first
- Use appropriate join types: INNER vs. LEFT JOIN based on requirements
- Consider denormalization: For frequently joined tables in read-heavy workloads
# Join optimization example
-- Optimized query structure
SELECT u.name, p.title FROM users u
INNER JOIN posts p ON u.user_id = p.user_id
WHERE u.status = 'active' -- Filter early
AND p.created_at > '2025-01-01';
-- Ensure indexes on user_id, status, created_at
8. What is query plan caching and how does it affect performance?
Answer:
Query plan caching: Database engines cache execution plans to avoid recompiling identical queries
Benefits:
- • Reduced CPU usage (no recompilation)
- • Faster query execution startup
- • Better resource utilization
Potential issues:
- • Parameter sniffing (plan optimized for specific parameters)
- • Stale plans after statistics updates
- • Memory pressure from excessive plan cache
# Managing plan cache
-- Force plan recompilation (SQL Server)
EXEC sp_recompile 'table_name';
-- Clear plan cache (use carefully)
DBCC FREEPROCCACHE;
Performance Tuning & Monitoring (Questions 9-16)
9. How do you identify and resolve database performance bottlenecks?
Tests systematic performance troubleshooting methodology
Answer:
Performance bottleneck identification process:
- Monitor key metrics: CPU, memory, disk I/O, network, wait events
- Identify resource contention: Lock waits, I/O waits, CPU bound operations
- Analyze slow queries: Top time-consuming queries and execution patterns
- Review system configuration: Buffer pools, connection limits, parallelism settings
- Check hardware resources: Disk performance, memory allocation, CPU utilization
# Key performance queries
-- PostgreSQL: Active queries and wait events
SELECT query, state, wait_event_type, wait_event
FROM pg_stat_activity WHERE state != 'idle';
-- SQL Server: Wait statistics
SELECT TOP 10 wait_type, wait_time_ms, waiting_tasks_count
FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC;
10. What are the key database performance metrics you monitor?
Answer:
Critical database metrics:
Performance Metrics:
- • Query response time (avg, p95, p99)
- • Transactions per second (TPS)
- • Query per second (QPS)
- • Buffer cache hit ratio
- • Index usage statistics
Resource Metrics:
- • CPU utilization
- • Memory usage and allocation
- • Disk I/O (IOPS, throughput, latency)
- • Connection count vs. limits
- • Lock waits and deadlocks
Alerting thresholds: CPU >80%, slow queries >5sec, deadlocks >0, connection usage >90%
11. How do you handle database connection pooling and connection limits?
Answer:
Connection pooling benefits:
- • Reduces connection establishment overhead
- • Controls concurrent connections to prevent resource exhaustion
- • Improves application performance and scalability
Configuration considerations:
- Pool size: Balance between resource usage and concurrency needs
- Connection timeout: How long to wait for available connection
- Idle timeout: Close unused connections to free resources
- Health checks: Validate connections before reuse
# Connection pool configuration example
-- PostgreSQL max_connections
ALTER SYSTEM SET max_connections = 200;
-- Application pool settings
pool_size = 20 per application instance
connection_timeout = 30 seconds
idle_timeout = 600 seconds
12. What is database partitioning and when would you implement it?
Answer:
Database partitioning: Dividing large tables into smaller, manageable pieces
Types of partitioning:
- Range partitioning: Based on value ranges (dates, IDs)
- Hash partitioning: Even distribution using hash function
- List partitioning: Based on specific values (regions, categories)
- Composite partitioning: Combination of multiple methods
When to partition:
- • Tables larger than 2GB
- • Query performance degradation
- • Maintenance window requirements
- • Data archival needs
# PostgreSQL range partitioning example
CREATE TABLE sales (id SERIAL, sale_date DATE, amount DECIMAL)
PARTITION BY RANGE (sale_date);
CREATE TABLE sales_2025 PARTITION OF sales
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
13. How do you manage database memory allocation and buffer pools?
Answer:
Memory management components:
- Buffer pool: Caches frequently accessed data pages in memory
- Sort memory: Memory allocated for sorting operations
- Hash memory: Memory for hash joins and aggregations
- Connection memory: Memory per database connection
# Memory configuration examples
-- PostgreSQL memory settings
shared_buffers = '25% of total RAM'
work_mem = '4MB' # Per operation
maintenance_work_mem = '64MB'
-- SQL Server buffer pool
sp_configure 'max server memory (MB)', 8192;
Best practices: Allocate 25% of RAM to buffer pool, monitor buffer hit ratios, adjust based on workload
14. What is database caching and how do you implement effective caching strategies?
Answer:
Database caching layers:
- Query result cache: Cache frequently executed query results
- Application-level cache: Redis/Memcached for computed values
- Database buffer cache: Built-in page caching
- Connection pool cache: Reuse prepared statements
Cache invalidation strategies:
- TTL-based: Time-based expiration
- Event-driven: Invalidate on data changes
- Manual: Application-controlled cache clearing
# Caching implementation example
-- PostgreSQL query result caching
-- Use materialized views for complex aggregations
CREATE MATERIALIZED VIEW user_stats AS
SELECT user_id, COUNT(*) as post_count FROM posts GROUP BY user_id;
REFRESH MATERIALIZED VIEW user_stats;
15. How do you handle database maintenance tasks like reindexing and statistics updates?
Answer:
Regular maintenance tasks:
- Index maintenance: Rebuild fragmented indexes, remove unused indexes
- Statistics updates: Keep query optimizer information current
- Database cleanup: VACUUM in PostgreSQL, SHRINK in SQL Server
- Log file management: Archive and truncate transaction logs
# Maintenance script examples
-- PostgreSQL maintenance
VACUUM ANALYZE users;
REINDEX TABLE users;
-- SQL Server maintenance
ALTER INDEX ALL ON users REBUILD;
UPDATE STATISTICS users WITH FULLSCAN;
Scheduling: Run during low-traffic periods, automate with cron jobs or SQL Server Agent, monitor execution times
16. What tools and techniques do you use for database capacity planning?
Answer:
Capacity planning metrics:
- Storage growth: Historical data growth rates and projections
- Performance trends: CPU, memory, and I/O utilization over time
- Connection usage: Peak concurrent connections and growth patterns
- Query complexity: Increasing complexity of analytical queries
Planning methodology:
- Collect baseline metrics for 3-6 months
- Analyze growth trends and seasonal patterns
- Project future requirements with business growth
- Plan for 2-3x capacity headroom
- Regular review and adjustment
Tools: Monitoring solutions (Datadog, New Relic), database-specific tools (pg_stat_statements, SQL Server DMVs)
Backup & Recovery (Questions 17-24)
17. Explain your backup strategy for a mission-critical database system.
Tests understanding of comprehensive backup and recovery planning
Answer:
Comprehensive backup strategy (3-2-1 Rule):
- 3 copies: Production database + 2 backup copies
- 2 different media: Local storage + cloud storage
- 1 offsite: Geographically separated backup location
Backup types and schedule:
- Full backups: Weekly complete database backup
- Differential backups: Daily changes since last full backup
- Transaction log backups: Every 15-30 minutes for point-in-time recovery
- Application-consistent backups: Coordinate with application quiesce
# PostgreSQL backup strategy
-- Full backup (weekly)
pg_dump -Fc database_name > weekly_backup.dump
-- Continuous archiving (WAL shipping)
archive_mode = on
archive_command = 'cp %p /backup/archive/%f'
18. How do you perform point-in-time recovery?
Answer:
Point-in-time recovery (PITR) process:
- Restore base backup: Latest full backup before target time
- Apply transaction logs: Replay WAL files up to specific timestamp
- Stop at target time: Use recovery_target_time parameter
- Verify data integrity: Check recovered data for consistency
# PostgreSQL PITR example
-- In recovery.conf
restore_command = 'cp /backup/archive/%f %p'
recovery_target_time = '2025-01-23 14:30:00'
recovery_target_action = 'pause'
-- Start PostgreSQL to begin recovery
systemctl start postgresql
Prerequisites: WAL archiving enabled, regular base backups, documented recovery procedures
19. What is the difference between hot and cold backups?
Answer:
Hot Backups (Online):
- • Database remains online and accessible
- • Backup while transactions are running
- • May have slight performance impact
- • Requires transaction log backups for consistency
- • Example: pg_basebackup, SQL Server online backups
Cold Backups (Offline):
- • Database must be shut down
- • Guaranteed consistent backup
- • No performance impact during backup
- • Causes service downtime
- • Example: File system copy, Volume snapshots
Recommendation: Use hot backups for production systems to avoid downtime, cold backups for maintenance windows
20. How do you test backup integrity and recovery procedures?
Answer:
Backup testing methodology:
- Regular restore tests: Monthly full restore to test environment
- Automated verification: DBCC CHECKDB, pg_dump --schema-only validation
- Recovery time testing: Measure actual RTO vs requirements
- Disaster recovery drills: Full-scale recovery simulations
# Backup verification examples
-- PostgreSQL backup verification
pg_restore --list backup.dump | head -20
-- Restore to test database
pg_restore -d test_db backup.dump
-- SQL Server backup verification
RESTORE VERIFYONLY FROM DISK = 'backup.bak';
Documentation: Maintain recovery runbooks, update test results, track RTO/RPO metrics
21. What are RTO and RPO, and how do you achieve them?
Answer:
- RTO (Recovery Time Objective): Maximum acceptable downtime after a failure
- RPO (Recovery Point Objective): Maximum acceptable data loss measured in time
Strategies to achieve RTO/RPO targets:
Low RTO (< 1 hour):
- • Database clustering/failover
- • Hot standby servers
- • Automated failover scripts
- • Pre-staged recovery environments
Low RPO (< 15 minutes):
- • Synchronous replication
- • Frequent transaction log backups
- • Real-time data mirroring
- • Continuous WAL archiving
Business considerations: Balance cost vs. requirements, test regularly, document dependencies
22. How do you handle database corruption scenarios?
Answer:
Corruption detection:
- Scheduled integrity checks: DBCC CHECKDB, pg_checksums
- Monitoring alerts: I/O errors, checksum failures
- Application errors: Query failures, inconsistent results
Recovery procedures:
- Assess scope: Identify affected objects and data extent
- Stop further damage: Take database offline if necessary
- Recovery approach: Page-level restore, table restore, or full restore
- Verify integrity: Run full consistency checks after recovery
# Corruption handling examples
-- SQL Server corruption check and repair
DBCC CHECKDB('database_name') WITH NO_INFOMSGS;
-- PostgreSQL page-level corruption check
SELECT * FROM pg_stat_database WHERE datname = 'mydb';
23. What backup retention policies would you implement?
Answer:
Multi-tier retention strategy:
- Daily backups: Retain for 30 days (operational recovery)
- Weekly backups: Retain for 12 weeks (short-term archival)
- Monthly backups: Retain for 12 months (long-term archival)
- Yearly backups: Retain for 7 years (compliance/legal requirements)
Considerations for retention policy:
- • Regulatory requirements (SOX, GDPR, HIPAA)
- • Business recovery needs
- • Storage costs and capacity
- • Data lifecycle management
# Automated backup cleanup script
# Delete daily backups older than 30 days
find /backup/daily -name "*.dump" -mtime +30 -delete
# Archive monthly backups to long-term storage
aws s3 sync /backup/monthly s3://db-backups-archive/
24. How do you backup and restore very large databases efficiently?
Answer:
Large database backup strategies:
- Parallel backups: Multiple streams to reduce backup time
- Incremental/differential: Only backup changed data
- Compressed backups: Reduce storage requirements and transfer time
- Snapshot backups: Use storage-level snapshots for consistency
- Partitioned backups: Backup table partitions independently
# Large database backup examples
-- PostgreSQL parallel backup
pg_dump -Fd -j 8 -f backup_directory database_name
-- SQL Server compressed backup
BACKUP DATABASE mydb TO DISK = 'backup.bak'
WITH COMPRESSION, MAXTRANSFERSIZE = 4194304;
Restore optimization: Parallel restore, pre-allocate file space, disable constraints during restore, rebuild indexes after restore
Security & Access Control (Questions 25-30)
25. How do you implement database security best practices?
Tests comprehensive understanding of database security principles
Answer:
Multi-layered security approach:
- Network security: Firewall rules, VPN access, port restrictions
- Authentication: Strong passwords, multi-factor authentication, certificate-based auth
- Authorization: Role-based access control, principle of least privilege
- Encryption: Data at rest and in transit encryption
- Auditing: Comprehensive logging of all database access
- Monitoring: Real-time security event detection and alerting
# Security configuration examples
-- PostgreSQL security settings
ssl = on
password_encryption = scram-sha-256
log_connections = on
log_statement = 'all'
-- Restrict network access
listen_addresses = 'localhost,10.0.0.100'
26. Explain database encryption: data at rest vs. data in transit.
Answer:
Data at Rest Encryption:
- • Encrypts stored database files
- • Transparent Data Encryption (TDE)
- • File system level encryption
- • Column-level encryption for sensitive data
- • Key management and rotation
Data in Transit Encryption:
- • SSL/TLS for client connections
- • Certificate-based authentication
- • Encrypted replication streams
- • Secure backup transfers
- • VPN tunnels for admin access
# Encryption implementation
-- PostgreSQL SSL configuration
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'
ssl_ca_file = 'ca.crt'
ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
Key management: Use dedicated key management services (AWS KMS, Azure Key Vault), implement key rotation policies
27. How do you implement role-based access control in databases?
Answer:
RBAC implementation strategy:
- Define roles: Create roles based on job functions (developer, analyst, admin)
- Grant minimal permissions: Start with least privilege principle
- Use role hierarchies: Inherit permissions from parent roles
- Regular access reviews: Periodically audit and cleanup permissions
# PostgreSQL RBAC example
-- Create roles
CREATE ROLE readonly;
CREATE ROLE developer;
CREATE ROLE admin;
-- Grant permissions
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;
GRANT readonly TO developer;
GRANT INSERT, UPDATE, DELETE ON user_table TO developer;
-- Assign users to roles
GRANT developer TO john_doe;
Best practices: Use application roles instead of individual user accounts, document role definitions, implement approval workflow for role changes
28. What database auditing and logging practices do you implement?
Answer:
Comprehensive auditing strategy:
- Connection logging: Track all login attempts and connections
- Statement logging: Log DDL, DML operations on sensitive tables
- Privilege changes: Monitor role and permission modifications
- Data access: Track SELECT operations on confidential data
- Failed operations: Log security violations and access denials
# Audit configuration examples
-- PostgreSQL audit logging
log_statement = 'ddl' # Log all DDL statements
log_connections = on
log_disconnections = on
log_min_duration_statement = 1000 # Log slow queries
-- SQL Server audit example
CREATE SERVER AUDIT DatabaseAudit TO FILE (FILEPATH = 'C:\Audit\');
Log management: Centralized log collection, retention policies, real-time monitoring and alerting, tamper-proof storage
29. How do you secure database connections and prevent unauthorized access?
Answer:
Connection security measures:
- Network isolation: Database servers in private subnets, no direct internet access
- Firewall rules: Restrict access to specific IP addresses and ports
- SSL/TLS encryption: Mandatory encrypted connections
- Certificate authentication: Client certificates for administrative access
- Connection limits: Per-user and per-application connection limits
- IP whitelisting: Allow connections only from approved sources
# Connection security configuration
-- PostgreSQL pg_hba.conf
# Only allow SSL connections from specific IPs
hostssl all all 10.0.0.0/8 cert
hostssl all all 192.168.1.0/24 scram-sha-256
# Reject all other connections
host all all all reject
Monitoring: Track connection attempts, alert on failed authentications, monitor for unusual connection patterns
30. How do you handle database security compliance requirements (SOX, GDPR, HIPAA)?
Answer:
Compliance requirements mapping:
SOX (Sarbanes-Oxley):
- • Audit trails for financial data
- • Change management controls
- • Segregation of duties
- • Data retention policies
GDPR (Privacy):
- • Data encryption requirements
- • Right to erasure (data deletion)
- • Data breach notification
- • Privacy by design
HIPAA (Healthcare):
- • PHI encryption
- • Access controls and audit logs
- • Business associate agreements
- • Risk assessments
Implementation approach:
- • Document security policies and procedures
- • Implement technical safeguards (encryption, access controls)
- • Regular compliance assessments and audits
- • Staff training and awareness programs
- • Incident response and breach notification procedures
High Availability & Replication (Questions 31-35)
31. Explain different database replication strategies and their use cases.
Tests understanding of high availability and disaster recovery architectures
Answer:
Replication types and use cases:
- Master-Slave (Read Replicas): Scale read operations, reporting workloads
- Master-Master: Active-active setup, geographic distribution
- Synchronous replication: Zero data loss, high availability (impact on performance)
- Asynchronous replication: Better performance, potential data loss during failures
- Logical replication: Cross-platform, selective table replication
# PostgreSQL streaming replication setup
-- On primary server
wal_level = replica
max_wal_senders = 5
wal_keep_segments = 32
-- On standby server
standby_mode = 'on'
primary_conninfo = 'host=primary port=5432 user=replicator'
Considerations: Network latency, failover complexity, data consistency requirements
32. How do you implement automatic failover for database systems?
Answer:
Automatic failover components:
- Health monitoring: Continuous monitoring of primary database health
- Failure detection: Network timeouts, service checks, application-level monitoring
- Failover orchestration: Automated promotion of standby to primary
- Connection redirection: Update DNS/load balancer to point to new primary
- Application notification: Inform applications of database role changes
# PostgreSQL automatic failover with Patroni
-- Patroni configuration
postgresql:
use_pg_rewind: true
parameters:
max_connections: 100
wal_level: replica
-- Health check endpoint
curl http://localhost:8008/health
Tools: Patroni (PostgreSQL), Always On (SQL Server), Oracle Data Guard, cloud-native solutions (AWS RDS Multi-AZ)
33. What is database clustering and when would you implement it?
Answer:
Database clustering types:
- Shared-disk clustering: Multiple nodes access shared storage (Oracle RAC)
- Shared-nothing clustering: Each node has dedicated storage (PostgreSQL-XL)
- Active-passive clustering: One active node, others as standby
- Active-active clustering: Multiple active nodes handling requests
When to implement:
- • High availability requirements (99.9%+ uptime)
- • Horizontal scaling needs
- • Large concurrent user loads
- • Zero-downtime maintenance requirements
Trade-offs: Increased complexity, higher costs, potential performance overhead, distributed lock management
Alternatives: Read replicas, connection pooling, application-level sharding, microservices architecture
34. How do you monitor database health and set up effective alerting?
Answer:
Database health monitoring dimensions:
Performance Metrics:
- • Query response time percentiles
- • Transactions per second
- • Connection count and utilization
- • Buffer cache hit ratio
- • Disk I/O latency and IOPS
Health Indicators:
- • Database availability/uptime
- • Replication lag
- • Disk space utilization
- • Lock wait events
- • Error log patterns
# Database monitoring queries
-- PostgreSQL health check
SELECT datname, numbackends, xact_commit, xact_rollback
FROM pg_stat_database;
-- Replication lag monitoring
SELECT client_addr, state, replay_lag
FROM pg_stat_replication;
Alerting strategy: Tiered alerting (warning/critical), alert fatigue prevention, escalation procedures, out-of-hours coverage
35. How do you plan for database disaster recovery across multiple data centers?
Answer:
Multi-data center DR strategy:
- Geographic distribution: Primary and DR sites in different regions
- Replication setup: Asynchronous replication to DR site for acceptable RPO
- Network considerations: Dedicated links, bandwidth planning, latency optimization
- Automated failover: DNS failover, application connection string updates
- Data validation: Regular consistency checks between sites
DR planning components:
- Risk assessment: Identify potential disaster scenarios
- Recovery procedures: Documented step-by-step recovery processes
- Regular testing: Quarterly DR drills and failover tests
- Communication plan: Stakeholder notification and escalation procedures
- Post-incident review: Lessons learned and plan improvements
Cloud considerations: Multi-region deployments, cross-region replication, managed disaster recovery services
The Database Administrator Mindset
After managing databases that never sleep and surviving hundreds of DBA interviews, I've learned that exceptional database administrators share these key characteristics:
✓ What Great DBAs Demonstrate:
- • Proactive monitoring - preventing problems before they occur
- • Performance obsession - every query matters at scale
- • Security awareness - data protection is non-negotiable
- • Disaster preparedness - tested backup and recovery procedures
- • Automation mindset - scripting repetitive tasks for reliability
- • Business understanding - balancing technical perfection with business needs
❌ Common DBA Interview Mistakes:
- • Focusing only on query optimization, ignoring operational aspects
- • Not understanding backup and recovery trade-offs
- • Overlooking security and compliance requirements
- • Failing to explain monitoring and alerting strategies
- • Not considering high availability and disaster recovery
- • Ignoring capacity planning and scalability concerns
The best database administrators understand that their role extends far beyond writing SQL queries. They are the guardians of business-critical data, the architects of reliable systems, and the first responders when disasters strike. Every database they manage is a foundation that entire businesses depend on—and that responsibility shapes every decision they make.
