What are translation tables in Power Query’s Fuzzy Match and why would we need them?

In Power Query, we have the option to use Fuzzy Match when merging tables. A little known feature are the Translation Tables which help in that process.

Merging tables is relatively straightforward in Power Query – choose the left table, choose the right table, mark the columns for the merge to be made onto, choose the join type and you’re set. Sometimes, however, we may have a case where the merge has to be made on columns which do not match perfectly. This is where Fuzzy Match comes in place.

Fuzzy Match is a setting that allows Power Query to do the join, based on a degree of similarity between the columns from the two tables. The degree of similarity is set up by the developer by entering a decimal between 0 and 1 where 1 is a perfect match (i.e. not Fuzzy by itself, but you can still choose to ignore Small/ Capital letters from the checkbox) and 0 allows, supposedly, match between completely different items. I have tested the option with 0 and it doesn’t work as described, as of the moment of writing. The default value is 0.8, that is, if you don’t enter a value yourself, this is the similarity threshold level that will be used.

Where does the translation table come in place?

In the screenshot below, you can see right at the bottom, an option to choose a translation table – that is a table, to be used as a sort of a dictionary for values, that (1) you want to be matched in the join and (2) you expect that won’t be matched automatically, even with Fuzzy Match.

Merging two tables, with Fuzzy Matching options expanded

Let’s say we want to match countries and languages. In most cases, the language and the toponym have most letters in common (France/French, Germany/German etc). However, there may be some exceptions, such as United Kingdom and English and USA and English. This is where a Translation table comes in play.

The Translation table is a simple, two column table, with two columns – From and To:

It serves as a lookup table, which Fuzzy Match will use and will try to apply to non-matched values. Let’s compare the results of the merge with and without the Translation table:

Without translation table used

With Translation table

We can see that using the Translation table we get complete and sensible match of the values we have. This technique can be useful in other situations – imagine categorizing payments as Intracompany and Intercomapany. Intracompany would be those between the parent company and its subsidiaries. It is likely that there will be a partial match in the names in some cases (ACME US, ACME Netherlands), and no match in other cases (ACME US, GoodGoods PLC). Using a translation table would help in the merge, in order to get sensible results.

Finally, keep in mind that the function is called Fuzzy Match – there could be a risk of mismatch, either a false positive or false negative. Using the similarity threshold and a translation table, you can get the results that you are looking for.

Thank you for reading.

Vitali

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s