Find the Longest Repeated Substring with T-SQL

Find the longest repeated substring with T-SQL

Step 1: Create a number table based on the stirng length but it would be better to use an auxiliary numbers table;
Step 2: Cross join Numbers table to create positions and steps for all substring combinations;
Step 3: Calculate the occurrence of substrings in various lengths, the length for each substring and a sequence within a substring based on the step. The WHERE condition is to eliminate double count for any substrings after the last full step increaments;
Step 4: Get all substrings with at least two occurrrences;
Step 5: Remove repeating substring with overlapping and get the ranking order for the longest substrings;
Step 6: Retrieve the longest repeating substring(s) with two options. The option1 is to get longest string with tie breaker (first occurrence) by using rnLen=1 and the second option is to get longest strings without tie breaker by using rnkLen=1;

In order to change the default recursion limit from 100 to no limit for CTEs, the query hints MAXRECURSION is set to 0.

Declare @MyString varchar(max)
–Plain DNA Sequence Sample


— Set @MyString = ‘3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233’

;WITH Numbers AS
SELECT i + 1 AS i FROM Numbers WHERE i = Step

,mycte3 as
SELECT Step,StartPos, SubSeq, maxLen, rnSubSeq
FROM mycte2
WHERE cntSubSeq>1

,mycte4 as
SELECT a.SubSeq, a.StartPos, a.maxLen
,RANK() OVER (order by a.maxLen DESC) AS rnkLen
,Row_Number() OVER (order by a.maxLen DESC,a.Step) AS rnLen
FROM mycte3 a
LEFT JOIN mycte3 b ON a.SubSeq=b.SubSeq and (a.rnSubSeq=b.rnSubSeq-1 OR b.rnSubSeq-1 = 0)
WHERE (a.StartPos+a.Step-1) < b.StartPos


SELECT SubSeq as [LognestRepeatingSubstring] FROM mycte4
WHERE rnkLen=1
Order by maxLen DESC, StartPos

OPTION (maxrecursion 0)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s