Merging two Data Frames using Fuzzy/Approximate String Matching in R

Solution 1:

It's highly recommended to use the dgrtwo/fuzzyjoin package. stringdist_inner_join(a,b, by="Fund.Name")

Solution 2:

One quick suggestion: try to do some matching on the different fields separately before using merge. The simplest approach is with the pmatch function, although R has no shortage of text matching functions (e.g. agrep). Here's a simple example:

pmatch(c("med", "mod"), c("mean", "median", "mode"))

For your dataset, this matches all the fund names out of a:

> nrow(merge(a,b,x.by="Fund.Name", y.by="Fund.name"))
[1] 58
> length(which(!is.na(pmatch(a$Fund.Name, b$Fund.name))))
[1] 238

Once you create matches, you can easily merge them together using those instead.

Is there a standard function in C that would return the length of an array?

How to pass values (parameters) between XAML pages?

How to create a HTML Table from a PHP array?

Submit form using AJAX and jQuery

How can I use React hooks in React classic `class` component?

Propagating 'typedef' from based to derived class for 'template'

What is difference between getSupportFragmentManager() and getChildFragmentManager()?

How can I programmatically invoke an onclick() event from a anchor tag while keeping the ‘this’ reference in the onclick function?

SQL Server Linked Server Example Query

Why I am getting Cannot pass parameter 2 by reference error when I am using bindParam with a constant value?

How to parse JSON to receive a Date object in JavaScript?

Push notification issue with iOS 10

Merging two Data Frames using Fuzzy/Approximate String Matching in R

Solution 1:

Solution 2:

Related

Recent Posts