Transform

Created by Dariusz Szczendzina, Modified on Wed, 12 Jul, 2023 at 11:08 AM by Dariusz Szczendzina

Transform allows you to transform data (Abbreviate, Elaborate, Exclude, Normalize, Transliterate) in 5 spoken languages and from 12 entity categories.


Addresses

Addressing elements are used on address fields, usually Address Line 1 and Address Line 2.

This category understands most common elements of address data. For example 'Gnds' is an abbreviation for 'Gardens', 'Rd' is 'Road', 'Av' is 'Avenue' etc. E.g. using this function 15 Hound Terrace will match 15 Hound Street. This would not match if you use Normalise however 15 Hound St. will match 15 Hound Street.

Again this is potentially dangerous if selected as a match field on its own, but when this address field is accompanied by a match defined on Postcode as well it becomes a lot more accurate. See the example below to better understand how this record is definitely a match, however some addressing elements have been incorrectly captured during data input.

PostcodePO16 8UTPO16 8UT
Address Line 215 Hound Terrace15 Hound Street
Record StructureMaster RecordDuplicate Record
Contact NameMr Robert DicksonBob Dixon
Job TitleMarketing ManagerMkt Mgr
Company NameFictitious LtdFictitious Plc
Address Line 1The New Stables
Address Line 3
TownFarehamFareham
CountyHampshireHants
CountryUnited KingdomUK

Businesses

Business Name elements are used on company name fields.

This category understands the most common elements of business name data. For example 'Ltd' is an abbreviation for 'Limited', 'Plc' is 'Public Limited Company', 'Grp' is 'Group' etc.

E.g.

PostcodePO16 8UTPO16 8UT
Company NameFictitious LtdFictitious Plc
Record StructureMaster RecordDuplicate Record
Contact NameMr Robert DicksonBob Dixon
Job TitleMarketing ManagerMkt Mgr
Address Line 1The New Stables
Address Line 215 Hound Terrace15 Hound Street
Address Line 3
TownFarehamFareham
CountyHampshireHants
CountryUnited KingdomUK

Business Job Titles

Job Title elements are used on Name fields and Job Title fields.

This category understands most common elements of Job Title data. For example 'Mgr' is an abbreviation for 'Manager', 'Mkt' is 'Marketing', 'Col' is 'Colonel' etc.

E.g. using this function 'Marketing Manager' will match 'Mkt Mgr'.

Job TitleMarketing ManagerMkt Mgr
Record StructureMaster RecordDuplicate Record
Contact NameMr Robert DicksonBob Dixon
Company NameFictitious LtdFictitious Plc
Address Line 1The New Stables
Address Line 215 Hound Terrace15 Hound Street
Address Line 3
TownFarehamFareham
CountyHampshireHants
PostcodePO16 8UTPO16 8UT
CountryUnited KingdomUK

Countries

The Countries category is used on an address field that contains the information about which country that record relates to.

This category understands most common styles of Country data. For example 'UK' is an abbreviation for 'United Kingdom', 'USA' is 'United States of America', 'De' is 'Germany' etc.

E.g. using this function 'United Kingdom' will match 'UK'.

E.g.

CountryUnited KingdomUK
Record StructureMaster RecordDuplicate Record
Contact NameMr Robert DicksonBob Dixon
Job TitleMarketing ManagerMkt Mgr
Company NameFictitious LtdFictitious Plc
Address Line 1The New Stables
Address Line 215 Hound Terrace15 Hound Street
Address Line 3
TownFarehamFareham
CountyHampshireHants
PostcodePO16 8UTPO16 8UT

Dates

The Dates Category would only be used when your data set contains date information that you wish to match on.

This category understands most common styles of that a date can be written in. For example 'Jan' is an abbreviation for 'January', 'Feb' is 'February', 'Mon' is 'Monday' etc.


Given Names/Family Names

First name elements are used to standardise name fields.

This category understands most common elements of forename data. For example ‘Bill’ can be an abbreviation for 'William', ‘Bob’ can be an abbreviation for 'Robert' etc.

E.g. using this function 'Robert Dickson' will match 'Bob Dixon'.

This can be advantageous when you have Robert Dickson in a single field and you only wish to match on surname.

E.g. exclude initials and exclude forename and R Dixon will match Bob Dixon.

This has a potential danger if the both elements of the name are in the same field and the name is made up of two words that can be interpreted as forenames, such as George Michael or Elton John would have both elements excluded.

Record StructureMaster RecordDuplicate Record
Contact NameMr Robert DicksonBob Dixon
Job TitleMarketing ManagerMkt Mgr
Company NameFictitious LtdFictitious Plc
Address Line 1The New Stables
Address Line 215 Hound Terrace15 Hound Street
Address Line 3
TownFarehamFareham
CountyHampshireHants
PostcodePO16 8UTPO16 8UT
CountryUnited KingdomUK

Miscellaneous

The Miscellaneous Category is very rarely used.

This category understands some obscure transformations that may be required in the matching process. For example ‘pm’ is an abbreviation for 'Post Meridian', ‘am’ is 'Ante Meridian', ‘&’ is 'and' etc.

E.g. using this function ’Tate & Lyle’ will match ’Tate and Lyle’


Numbers

The Numbering Category should be used when you have different number formats within your data.

This category understands most common styles of that a number can be written in. For example '10' could also be 'Ten', '1000' could also be 'One Thousand', '1st' could also be 'First', '2nd' could be 'Second' etc.

E.g. using this function ‘1 to 1’ will match ‘One to One’ in a company name field.


Qualifications

Qualification elements are usually used on Name fields.

This category understands most common elements of qualifications that can be added to a person’s name. For example qualifications include 'Bsc' as an abbreviation for 'Bachelor of Science', 'Phd' is 'Doctor of Philosophy', 'MSc' is 'Master of Science' etc.

E.g. using this function 'Mr Robert Dickson BSc' will match 'Bob Dixon' during a phonetic match.

Record StructureMaster RecordDuplicate Record
Contact NameMr Robert DicksonBob Dixon
Job TitleMarketing ManagerMkt Mgr
Company NameFictitious LtdFictitious Plc
Address Line 1The New Stables
Address Line 215 Hound Terrace15 Hound Street
Address Line 3
TownFarehamFareham
CountyHampshireHants
PostcodePO16 8UTPO16 8UT
CountryUnited KingdomUK

Salacious

This is a rarely used function that can be used to exclude inappropriate language.


Salutations

Salutation elements are commonly used on name fields.

This category understands most common elements of name data.

E.g. using this function 'Mr Robert Dickson' will match 'Bob Dixon' during a Phonetic match as the 'Mr' is excluded.

Record StructureMaster RecordDuplicate Record
Contact NameMr Robert DicksonBob Dixon
Job TitleMarketing ManagerMkt Mgr
Company NameFictitious LtdFictitious Plc
Address Line 1The New Stables
Address Line 215 Hound Terrace15 Hound Street
Address Line 3
TownFarehamFareham
CountyHampshireHants
PostcodePO16 8UTPO16 8UT
CountryUnited KingdomUK

Weights and Measures

The Weights/Measures Category would only be used when your data set contains this type of information and you wish to match on this field.

This category understands most common styles of that a weight or measure can be written in. For example 'Oz' is an abbreviation for 'Ounce', 'Kg' is 'Kilogram' etc.

E.g. using this function ’12 Oz’ will match ’12 Ounces’


Transform Operations

Abbreviate

Abbreviate, when selected, allows you to transform the structure of your data to ensure a consistent format.

For example you can choose to Abbreviate business elements in a company name field. This will reduce 'Limited' to 'Ltd', 'Group' to 'Grp', 'Incorporated' to 'Inc' etc. and write the results directly back to the field you select in your data set.

CategoryExample
Addressing'Road' to 'Rd', 'Avenue' to 'Ave'
Business'Limited' to 'Ltd', 'Company' to 'Co'
Countries'United Kingdom' to 'UK', 'New Zealand' to 'NZ'
DateEvents'January' to 'Jan', 'Monday' to 'Mon'
JobTitles'Manager' to 'Mgr', 'Colonel' to 'Col'
Numbers'Twenty' to '20', 'Nine' to '9'
Qualifications'Bachelor of Science' to 'BSc', Doctor of Philosophy to ‘Phd’
Salutations'Doctor to Dr', 'Mister' to 'Mr'
WeightsMeasures'Ounces' to 'Oz'
Miscellaneous'Object' to 'Obj'
Forenames'Robert' to 'Bob', 'Anthony' to 'Tony'

Elaborate

Elaborate, when selected, allows you to transform the structure of your data to ensure a consistent format.

For example you can choose to elaborate business elements. This will expand 'Ltd' to 'Limited', 'Grp' to 'Group', 'Inc.' to 'Incorporated' etc. and write the results directly back to the attribute selected in your data set.

As with all rules there are times when you would not wish to use this rule. For example in a Forename field it would be very dangerous to elaborate your data. Pete would go to Peter, Bob to Robert etc. but you cannot define what Sam would go to e.g. Samuel or Samantha so Sam would be left the same. However if there were a Rob in your database that is short for Robin this would also be transformed to Robert.

CategoryExample
Addressing'Rd' to 'Road', 'Ave' to 'Avenue'
Business'Ltd' to 'Limited', 'Co' to 'Company'
Countries'UK' to 'United Kingdom', 'NZ' to 'New Zealand'
DateEvents'Jan' to 'January', 'Mon' to 'Monday'
JobTitles'Mgr' to 'Manager', 'Col' to 'Colonel'
Numbers'9' to 'Nine', '20' to 'Twenty'
Qualifications‘Bsc’ to Bachelor of Science, ‘Phd’ to Doctor of Philosophy
Salutations'Dr' to 'Doctor', 'Mr' to 'Mister'
WeightsMeasures'Oz' to 'Ounces'
Miscellaneous'Obj' to 'Object'
Forenames'Bob' to 'Robert', 'Tony' to 'Anthony'

Exclude

Exclude, when selected, allows you to transform the structure of your data to ensure a consistent format.

For example, you can choose to exclude business elements in a company name field. This will remove 'Ltd', 'Limited', 'Grp', 'Group', 'Inc', 'Incorporated' etc.

CategoryExample
AddressingExclude text such as 'Road“ and “Rd'
BusinessExclude text such as 'Ltd' and 'Limited'
CountriesExclude text such as 'UK' and 'USA'
DateEventsExclude text such as 'Mon' and 'January'
JobTitlesExclude text such as 'Mgr' and 'Manager'
NumbersExclude text such as '100' and 'Hundred'
QualificationsExclude text such as 'BA' and 'BSc'
SalutationsExclude text such as 'Mr' and 'Dr'
WeightsMeasuresExclude text such as 'Oz' and 'Ounces'
MiscellaneousExclude text such as 'Obj' and 'Object'
ForenamesExclude text such as 'Andi' and 'Robert'

Normalize

Normalize, when selected, allows you to transform the structure of your data to ensure a consistent format.

For example you can choose to elaborate business elements in a company name field. This will expand 'Ltd' to 'Limited', 'Grp' to 'Group', 'Inc' to 'Incorporated' etc. and write the results directly back to the field you select in your data set.

As with all rules there are times when you would not wish to use this rule. For example in a Forename field it would be very dangerous to elaborate your data. Pete would go to Peter, Bob to Robert etc. but you cannot define what Sam would go to e.g. Samuel or Samantha so Sam would be left the same. However if there were a Rob in your database that is short for Robin this would also be transformed to Robert.

CategoryExample
Addressing'Garden', 'Garden', 'Gdns' to 'GDN'
Business'Company', 'Comp' to 'CO'
Countries'United Kingdom', 'Great Britain', 'GBR' to 'GB''
DateEvents'January' to 'Jan', 'Monday' to 'Mon'
JobTitles'Engineer', 'Engr' to 'ENG'
Numbers'Nought', 'Null', 'Nil' to '0'
Qualifications'Dr of Philosophy', 'DPhil' to 'PhD'
Salutations'Mrs', 'Ms', 'Madam' to 'MRS'
WeightsMeasures'Inches', 'Inch', 'Ins' to 'IN'
Miscellaneous'Cheque', 'Check' to 'Chq'
Forenames'Andrew', 'Andrea', 'Andres' to 'Andi'

Transliterate

Transliterate, when selected, allows you to transform the structure of your data to ensure a consistent format.

For example, you can choose to transliterate in a name field to remove characters from foreign alphabets e.g 'ß' will change to 'ss'.

Custom Functions

DQ.TRANSFORM

The DQ.TRANSFORM function transforms an input based on type and operation.

Parameters:

input (string), text to be transformed

transformType (string), type of data to be transformed, from list:

  • Addresses
  • Numbers
  • Given_Names
  • Businesses
  • Business_Job_Titles
  • Dates
  • Miscellaneous
  • Weights_and_Measures
  • Qualifications
  • Salutations
  • Countries
  • Family_Names
  • Salacious

operation (string), transformation method, from list:

  • Elaborate
  • Abbreviate
  • Normalize
  • Exclude
  • Transliterate

language (string), language of input, from list:

  • English
  • Spanish
  • French
  • Italian
  • German

returns (string), transformed value

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article