Category: Database maintenance

  • How a Database Migration Helped Us Discover and Consolidate 80+ Overlapping SQL Indexes

    Database migrations often reveal hidden technical debt. During a recent effort to minimize migration windows, we discovered more than 80 overlapping indexes in our SQL environment. The impact on write amplification and storage overhead was too significant to ignore.

    While our broader database migration journey deserves its own post, this article focuses on the framework we used to identify and consolidate these overlapping indexes.

    What are Overlapping Indexes

    Overlapping indexes are “clones” or subset indexes that are either exact duplicates or significantly redundant. These often accumulate during rapid development due to inadequate change review processes, lack of awareness, or improper use of missing index recommendations.

    Unlike unused indexes (which receive no reads), overlapping indexes might still be used by the optimizer, but their existence is redundant because a superset index could handle the same workload.

    These can often be outright deleted or merged into a single, high-performance structure that covers all necessary query patterns.

    The Hidden Cost of Overlapping Indexes

    Overlapping indexes might seem harmless because they don’t break your queries, but they act as a “silent tax” on your database. Here are the critical pain points:

    • Increased Write Latency: Every time you perform an INSERT, UPDATE, or DELETE, the SQL engine must maintain every single related index. If you have three overlapping indexes, you are essentially asking the database to write the same piece of data three times, consuming excessive CPU, disk I/O, and memory.
    • Storage Bloat: Indexes occupy physical space on your disk. Overlapping indexes store nearly identical data across multiple B-tree structures. In large-scale systems, this can lead to gigabytes or even terabytes of wasted storage.
    • Maintenance Overhead: Tasks like index rebuilds, reorganizations, and statistics updates take significantly longer. This extends your maintenance windows and consumes CPU resources that could be better used for active workloads.

    The Framework: Identify, Review, Execute & Monitor

    To systematically address the issue, we used a simple four-step framework: identify potential overlaps, manually review candidates, execute consolidation changes in batches, and monitor system stability.

    Identify

    • Leverage system catalog views and DMVs to identify indexes with overlapping key columns. The script below (AI-assisted) flags such indexes for manual review.
    • The results are sorted by the combined size of overlapping indexes for each key column definition. The overlap count can also be used as a prioritization factor.
    WITH idx AS (
        SELECT
            i.object_id,
            i.index_id,
            i.name,
            key_columns =
                STRING_AGG(CONVERT(nvarchar(4000), c.name)
                    + CASE WHEN ic.is_descending_key = 1 THEN ' DESC' ELSE ' ASC' END, ', ')
                WITHIN GROUP (ORDER BY ic.key_ordinal)
        FROM sys.indexes i
        JOIN sys.index_columns ic
            ON ic.object_id = i.object_id
           AND ic.index_id = i.index_id
        JOIN sys.columns c
            ON c.object_id = ic.object_id
           AND c.column_id = ic.column_id
        WHERE i.index_id > 0
          AND ic.key_ordinal > 0
          AND OBJECTPROPERTY(i.object_id,'IsUserTable') = 1
        GROUP BY i.object_id, i.index_id, i.name
    ),
    
    idx_size AS (
        SELECT
            object_id,
            index_id,
            SUM(reserved_page_count) * 8.0 / 1024 AS size_mb
        FROM sys.dm_db_partition_stats
        GROUP BY object_id, index_id
    )
    
    SELECT
        QUOTENAME(OBJECT_SCHEMA_NAME(i.object_id)) + '.'
            + QUOTENAME(OBJECT_NAME(i.object_id)) AS table_name,
    
        i.key_columns AS key_column_definition,
    
        -- Identical key indexes
        STRING_AGG(i.name, '; ') AS indexes_with_identical_keys,
    
        -- Superset indexes
        super.indexes_with_prefix_superset,
    
        COUNT(*) 
          + ISNULL(super.super_count, 0) AS total_overlapping_index_count,
    
        -- NEW: Collective size (identical + supersets)
        ROUND(
            SUM(ISNULL(s.size_mb, 0))
            + ISNULL(super.super_size_mb, 0),
            2
        ) AS total_overlapping_index_size_mb
    
    FROM idx i
    
    LEFT JOIN idx_size s
        ON s.object_id = i.object_id
       AND s.index_id = i.index_id
    
    OUTER APPLY (
        SELECT
            STRING_AGG(i2.name, '; ') AS indexes_with_prefix_superset,
            COUNT(*) AS super_count,
            SUM(ISNULL(s2.size_mb, 0)) AS super_size_mb
        FROM idx i2
        LEFT JOIN idx_size s2
            ON s2.object_id = i2.object_id
           AND s2.index_id = i2.index_id
        WHERE i2.object_id = i.object_id
          AND i2.key_columns LIKE i.key_columns + ', %'
    ) super
    
    GROUP BY
        i.object_id,
        i.key_columns,
        super.indexes_with_prefix_superset,
        super.super_count,
        super.super_size_mb
    
    HAVING
        COUNT(*) > 1
        OR ISNULL(super.super_count, 0) > 0
    
    ORDER BY
        total_overlapping_index_size_mb DESC,
        table_name;

    Interpreting the Script Output

    • table_name: The fully qualified table name (schema + table) where the overlapping indexes exist.
    • key_column_definition: The ordered list of index key columns, including their sort direction (ASC/DESC). This represents the base key structure used to detect identical or prefix-overlapping indexes.
    • indexes_with_identical_keys: A semicolon-separated list of index names that share the exact same key column definition.
    • indexes_with_prefix_superset: Indexes whose key columns begin with the same column sequence as the base definition but contain additional trailing columns. These represent potential superset indexes.
    • total_overlapping_index_count: The total number of related indexes detected for that key definition, including both identical indexes and prefix supersets.
    • total_overlapping_index_size_mb: The combined storage size, in megabytes, of all related indexes (identical and superset indexes). This helps prioritize cleanup by highlighting the largest storage and maintenance impact.

    Review

    • I want to emphasize that a manual review of the output is crucial. The script can flag false positives, so directly deleting the identified indexes is not advisable.
    • Instead, select a key column definition from the results and examine the corresponding identical indexes and prefix supersets for consolidation opportunities.
    • The decision matrix below provides general guidance for evaluating common overlap patterns and determining whether indexes can be dropped, retained, or merged.
    • While the table covers many common scenarios, it is not exhaustive. Cases such as unique constraints, filtered indexes, or other specialized index configurations require additional consideration. Nevertheless, the same principle applies: identify redundant structures and consolidate them into a single index that efficiently supports the required query patterns.
    Index PatternExample Index 1Example Index 2RelationshipTypical DecisionReason
    Exact DuplicateKEY (A ASC, B ASC)KEY (A ASC, B ASC)IdenticalDrop oneBoth indexes are identical.
    Duplicate + INCLUDEKEY (A ASC, B ASC)KEY (A ASC, B ASC) INCLUDES (C, D)Index 2 is supersetKeep Index 2, drop Index 1Index 2 already contains everything in Index 1.
    Prefix SupersetKEY (A ASC, B ASC)KEY (A ASC, B ASC, C ASC)Index 2 extends Index 1Drop Index 1Index 2 can still support queries using A and B.
    Key Order VariationKEY (A ASC, B ASC)KEY (B ASC, A ASC)Different leading columnKeep bothLeading column changes query access path.
    INCLUDE ExpansionKEY (A ASC, B ASC) INCLUDES (C)KEY (A ASC, B ASC) INCLUDES (C, D)Index 2 is supersetKeep Index 2, drop Index 1Index 2 covers the same queries as Index 1.
    Partial OverlapKEY (A ASC, B ASC)KEY (A ASC, C ASC)Shared first columnKeep bothEach index supports different query filters.
    Merge CandidateKEY (A ASC, B ASC) INCLUDES (C)KEY (A ASC, B ASC) INCLUDES (D, E)Same keys, different INCLUDE columnsMerge Index 1 into Index 2, then drop Index 1Combine INCLUDE columns into one index.

    Execute & Monitor

    • Given the large number of index changes, we rolled out the consolidation in batches of two for severe overlaps and batches of five for less severe ones to better manage risk.
    • Changes were applied during non-peak hours to minimize disruption, followed by a smoke test of the affected modules to validate application behavior.
    • Post-rollout, we used Application Insights to monitor SQL dependency performance and ensure that query execution and overall database performance remained stable.
    • Since the primary objective was to optimize database migration windows by reducing index rebuild times, our monitoring focused on system stability rather than quantifying performance gains.

    Result

    • Index maintenance windows reduced by up to 30%.
      Consolidating overlapping indexes significantly shortened our periodic index maintenance cycles.
    • Database migration windows improved.
      Faster index rebuilds helped reduce overall migration time, which was the primary objective of the effort.
    • Storage reclaimed across database instances.
      Several gigabytes of storage were logically recovered by removing redundant indexes.
    • Potential improvement in write performance and resource utilization.
      Although we did not formally measure changes in write throughput or overall resource consumption, reducing redundant indexes should positively impact both write performance and database resource utilization.
    • Simpler index management.
      With a cleaner and consolidated index landscape, evaluating future index additions or updates became much easier.

    Preventing Recurrence

    • Stronger index governance going forward.
      We introduced internal technical guidelines to prevent similar overlaps and strengthened the review process for index changes.
    • Proactive monitoring for new overlaps.
      We implemented weekly insights, leveraging the earlier script, to detect newly introduced overlapping indexes early and address them promptly.

    Summary

    Overlapping indexes introduce unnecessary overhead, impacting both maintenance and performance. This type of technical debt can accumulate gradually over time.

    If your indexes have not been reviewed recently, a periodic audit can be worthwhile. Periodic review and consolidation help reduce maintenance overhead and keep the index landscape manageable.