A B C D 1 Hello 1 1 1 2 2 3 1 3 3 4 1 4 5 6 1 1 5 7 9 0 6 3 7 7 0 9 1 0
The data columns are A:A and B:B. C:C and D:D show the results after running the VBA code below:
1 means: fully unique, there is no repeating value (or text) at all, neither
in the same nor in the column compared with.
0 means: there are duplicates in the same column, but not in the column compared
with.
"" means: there are single or multiple duplicates between A:A and B:B
Example: "9" in B:B is a duplicate in B:B, but does not occur in A:A. It gets a "0".
C1 refers to A1, D6 to B6 etc. (which means: by offset of 2 columns).
Why VBA? Because it needs only 1 second for 11,000 entries in each column, or 4 seconds for 60,000 entries. Excel formulas need minutes up to hours for it.
Sub UniqueItemsInColumns()
'for both unsorted text and numerics, even mixed data
'2008/01/27, Excel 2000, A.Wolff
'do not change the code, simply let it run first in
'a blank(!) workbook (normal module, "Insert Module")
'later you may change or use what u need
'Performance: 11000 data/column, 3.0 GHz P4: < 1 sec
'Contents:
'---------
'1. Preselections - please adjust
'2. Data
'3. Create an extra range for sorts and calculations
'4. Main Task: tick the items
'5. Result and clean-up
'---------------------------------------------------------------
'1. Preselections - please adjust
A = Timer 'delete this line later, measuring program's time
Z = 11000 'Count of data (for each column)
'2. Data
' random build of data (if none available) in A:B
Range("A1:B" & Z).Formula = "=TRUNC(RAND()*" & Z * 4 & ")"
Range("A1:B" & Z) = Range("A1:B" & Z).Value
' (for your purpose: copy your data to A:B instead!)
Z = Evaluate("=COUNTA(A:A)") 'overrides Z if mistakenly kept!
'3. Create an extra range for sorts and calculations
' copy A:B to H:H and L:L
Range("H1:H" & Z) = Range("A1:A" & Z).Value
Range("L1:L" & Z) = Range("B1:B" & Z).Value
' preserve sort order by assigning running numbers
Range("G1:G" & Z).Formula = "=ROW()"
Range("G1:G" & Z) = Range("G1:G" & Z).Value
Range("K1:K" & Z) = Range("G1:G" & Z).Value
' sort the 2 ranges
Range("G1").Sort Key1:=Range("H1")
Range("K1").Sort Key1:=Range("L1")
'4. Main Task: tick the items
Dim Hrr, Lrr, Irr, Mrr 'use array variants for speed
Hrr = Range("H1:H" & Z + 1): Lrr = Range("L1:L" & Z + 1)
Irr = Range("I1:I" & Z + 1): Mrr = Range("M1:M" & Z + 1)
' tick "1" for all "inter"-uniques
h = 1: l = 1
Do
Select Case Hrr(h, 1)
Case Is = Lrr(l, 1)
Do: h = h + 1: Loop While Hrr(h, 1) = Hrr(h - 1, 1)
Do: l = l + 1: Loop While Lrr(l, 1) = Lrr(l - 1, 1)
Case Is < Lrr(l, 1)
Irr(h, 1) = 1: h = h + 1
Case Is > Lrr(l, 1)
Mrr(l, 1) = 1: l = l + 1
End Select
Loop Until h > Z Or l > Z
If h <= Z Then For i = h To Z: Irr(i, 1) = 1: Next
If l <= Z Then For i = l To Z: Mrr(i, 1) = 1: Next
' tick "0" for all "intra"-duplicates, but
' tick "empty" for all of them which were "empty" before
' ("inter"-duplicates)
For i = 1 To Z
If Hrr(i + 1, 1) = Hrr(i, 1) Then Irr(i, 1) = _
IIf(IsEmpty(Irr(i, 1)), "", 0): Irr(i + 1, 1) = Irr(i, 1)
If Lrr(i + 1, 1) = Lrr(i, 1) Then Mrr(i, 1) = _
IIf(IsEmpty(Mrr(i, 1)), "", 0): Mrr(i + 1, 1) = Mrr(i, 1)
Next
Range("I1:I" & Z) = Irr
Range("M1:M" & Z) = Mrr
'Stop 'here if you like to see the sorted stuff
' sort back to previous order
Range("G1").Sort Key1:=Range("G1")
Range("K1").Sort Key1:=Range("K1")
'5. Result and clean-up
' assign ticks for A:A in C:C, and for B:B in D:D
Range("C1:C" & Z) = Range("I1:I" & Z).Value
Range("D1:D" & Z) = Range("M1:M" & Z).Value
' delete calculation range G:M
Range("G1:M" & Z + 1).ClearContents
' some statistics
Range("D" & Z + 2 & ":E" & Z + 4).FormulaArray = _
"={1,""fully unique"";" & _
"0,""unique, but duplicates in own column"";" & _
""""",""duplicates in and between columns""}"
Range("A" & Z + 2 & ":B" & Z + 4).FormulaR1C1 = _
"=COUNTIF(R1C[2]:R" & Z & "C[2],RC4)"
Range("C" & Z + 2 & ":C" & Z + 4).FormulaR1C1 = _
"=SUM(RC[-2]:RC[-1])"
Range("A" & Z + 5 & ":C" & Z + 5).FormulaR1C1 = _
"=SUM(R[-3]C:R[-1]C)"
Range("A" & Z + 2 & ":G" & Z + 5).Select
MsgBox Timer - A & " seconds" 'delete this line later
End Sub