Data Cleaning

Data cleaning, often referred to as data scrubbing, ensures that your data is accurate, consistent, and correctly formatted. Data cleaning employs specialized tools to correct or delete obsolete, redundant, corrupt, poorly formatted, or inconsistent data rather than just having personnel pore over datasheets, and making corrections. In other words, data cleaning gets your data ready to go to work for you in a timely fashion.

Why Do You Need Professional Data Cleaning Services?

Data can be text, numbers or a list of items written on paper, or it can be bytes and bits inside your computer’s memory; it could even be ideas stored inside a person’s mind. Since the invention of computers, people have used the term data to refer to computer information. Data has become one of the most valuable assets that a business can have today. Data defines the market intelligence that both large and small businesses gather about their customers and the market in which they operate. Whether used in contact management, understanding your customer base or market predictions, it can utterly make or break a company.

The fact that data tends to change over time should come as no surprise. People’s ages and addresses change, and phone numbers are often updated. With all this happening, your data will rapidly become outdated and useless if you aren’t able to effectively clean it. While efficiently cleaned data is of tremendous value to your business, unclean data will inevitably lead to many complications and frustrating problems.

Some causes of database errors

Human error, made during data entry.

Merging disparate databases.

A lack of either industry-wide or company-specific data standards.

Older systems that hold on to obsolete data, or data structures.

Resulting in these problems for your business

Businesses lose up to 20% of their revenue because of bad data quality.

Employees waste up to 50% of their production time dealing with routine data quality tasks.

In any given hour of the day, almost five dozen companies will change their addresses, nearly a dozen will change their name, and over forty new businesses will open.

Merge Tables

Often, the first step is to merge disparate data from several sources to centralize it into one single cohesive database, which has its own data file structure. These disparate sources may be files from different departments in your company – with dissimilar formats and data structures.

Sometimes a client will want to bring in legacy [old] data, importing an outside file, or internally, one from their previous, now obsolete system. Typically, this is done to move legacy data into an updated database management system (DBMS) or an alternate platform.

We can also help you convert data from multiple formats into one common format, usable for migration, reporting or analysis.

Realign Scattered Data

Data alignment picks up misplaced data, scattered among two or more columns, which belongs in one, and moves “strays” back to their correct field.

Data alignment – Normalization – puts your data elements in their correct fields (columns) so that, for example, all companies must be in the company column; all last names in the last name column; etc. Remember: sorting, querying, coding and reporting can only be accurate if your data is consistently found where it should be located.

Once your data has been aligned, we will standardize the way it’s written to your file, based on industry standards and your specific business rules.

Append New Data

  • We can match records in your database file to a variety of national data sources and add data to your file.
  • Email Append – Once you have determined which email addresses are good and which aren’t, we submit records with either an invalid or missing email address. Based on a match of name and mailing address, Email Append will add the new one if possible. These updated email addresses are sent an email asking if they would like to opt out. A flag is provided to mark the difference.
  • Phone Number Append – Adds business or residential phones to your list where they are either missing or invalid.
  • Mailing Lists – We carry Business, Residential, Walk-Sequence and a variety of specialty mailing lists. Our list department purchases from only those compilers that meet our strictest data accuracy requirements.

Correct Errors In Data

We check for errors, anomalies and inconsistent entries and can delete irrelevant or erroneous data as well as add correct information to your data where possible.

  • USPS CASS Address Cleaning – CASS (Coding Accuracy Support System) processing evaluates, cleans and standardizes your addresses to meet USPS address standards. Addresses that fully pass processing come back with a +4 added to their 5-Digit ZIP Codes. the USPS recognizes these addresses as deliverable. Canadian CASS processing is also available.
  • USPS NCOA Processing – Knowing that your addresses are clean, and deliverable is only half of the picture. The USPS NCOA (National Change of Address) System tracks all address changes in the US for the past 48 months so that you can be sure that who you think you are mailing to is still there at that address to receive it. Canada NCOA (CNCOA) processing is also available.
  • Email Audit – Verifies that the email addresses in your file have not expired or are not miswritten, which can lead to a dead server.

Standardize & Reformat Data

We scrub, standardize, establish value ranges, as well as format the data in your database or mailing list. We can work with your codes, flags and, more typically, disparate entries such as countries, phones, departments, titles, etc. We will standardize and format your data, to present it in a uniform and consistent manner

  • Parsing (Splitting) – Parsing splits data into their standard component parts; for example, Name Splitting segments full names into a Prefix (Mr, Dr, etc.), First Name (or initials), Middle Name, Last Name and Suffix (MD, PhD, etc.). City, State and ZIP’s stored in a single column are usually split into their standard components as well.
  • Casing – Data cleansing isn’t fully finished until your data is consistently cased as you would like to see it in your reports or on your mailing pieces. For the most versatile looking data – Proper case; for the easiest maintenance, all Upper-case letters.
  • Genderization – Gender codes can be applied that indicate whether a record represents a male, female, couple, company or is recognized but is undeterminable (e.g., Terry, Chris, Dale, etc.).
  • Congressional District Append – Adds congressional districts to your records for political segmentation.
  • Householding – This is used in mailing so that each household receives only one piece of mail. “Householded” records can display more than one name at that same residence, so that no one is overlooked, and you are still saving money.
  • Nth Sampling – also called Systematic Selection, is often preferred instead of random sampling due to its much simpler process. It can be easy and effective in isolating record samples for test mailings. As the sample is collected, every nth member within the population is flagged. Most often used in mailing list testing, Nth Sampling selects individual members or a subset of the list’s entire “universe” to make statistical inferences more accurately and assess characteristics of the list’s entire population.

For example, a mailer intends to collect a representative sample of 500 people within a population of 5000. We would select every 10th individual to be a part of the sample (Total population/ Sample Size = 5000/500 = 10).

Remove Duplicate Records (aka “De-Dupe” or “Purge”)

All databases and mailing lists accumulate duplicate records; but all duplicates are not the same. It’s easy for a de-duping system process to purge Exact duplicates, where everything about the duplicate entries is the same – right down to their number of spaces and punctuation. Near Dupes are duplicate entries (aka True Dupes) that are written differently; “Richard Smith” – “Dick Smith”; “Thomas & Mary” – “T J & Mary Ann”; and so forth. We use a proprietary de-duping system that polls to identify duplicate records based on their percentage of match.

It is an incredible system that can delete, extract and output, or simply mark duplicate records and let you decide which to keep or delete.

Review And Report

We perform integrity checks to further ensure data accuracy and quality. We then return your data with a record layout, data and dupe reports.

Cleansing your company’s data on your own or developing an in-house team is not always feasible for everyone. It is time consuming, resource-intensive, and may require a significant software investment if implemented in-house. To get the desired results, you will need competent resources, an enhanced understanding of data management, state-of-the-art technology as well as advanced software tools.

Outsourcing your data cleansing to an experienced and reputable data scrubbing company is a cost-effective way to transform your data into an optimum state for your business projects or donor appeals. Northwest Database Services has been scrubbing data, performing data transformations, cleaning mailing lists, and providing mailing lists for over 35 years.

We Are Here To Help!

12 + 15 =

Office

Sandersville, GA 31082

Email

gch [@] nwdatabase.com
To use email, remove the brackets

 

Call Us

(478)412-2156