How to remove special characters in PostgreSQL

In this PostgreSQL tutorial, I’ll walk you through multiple proven methods to effectively remove special characters in PostgreSQL using the REPLACE() and REGEXP_REPLACE() functions. Additionally, you will learn to use the TRANSLATE() function, which can be used in conjunction with these two functions to replace special characters with new string values.

How to remove special characters in PostgreSQL

Sometimes your database contains a piece of information with special characters in a table. These special characters can be (@, ^, &, $, *, %, _, #), and sometimes you intentionally keep the special character with the information.

However, you may sometimes inadvertently encounter these special characters, or when importing data from other sources into PostgreSQL.

Perhaps this type of unwanted character enters your table due to a data conversion issue when PostgreSQL is unable to interpret the information. Usually, this kind of information with special characters should not exist in the database because it is meaningless or has no value.

So, what should you do? The solution is to remove or replace these characters according to your preference. PostgreSQL offers various functions that replace or remove special characters with new string values.

Let’s discuss all the approaches, along with some examples, and understand how to remove and replace special characters in PostgreSQL with new string values.

Approach 1: Using the Replace() Function

The REPLACE() function in PostgreSQL replaces an old string value with a new one. It replaces all strings that match the substring specified in this function.

If you have specified the special characters within the function REPLACE() that you want to replace or remove, then the REPLACE() function will replace all the matching special characters within a string.

Syntax

The syntax is given below.

REPLACE(actual_string, string_to_replace, new_string)
  • actual_string: This is the source string that contains unwanted characters or strings that you want to replace. Also, the actual_string can be a column of the table.
  • string_to_replace: It is a string that you want to replace within the source string.
  • new_string: This is the new string value that you want in place of the substring (string_to_replace).

Example

Let’s see an example of how to replace special characters using the REPLACE() function.

Consider a table ‘customers’ with columns ‘customer_id’, ‘name’, ’email’, ‘age’, and ‘registration_date’. This table contains some special characters in the column name and email.

How to remove special characters in PostgreSQL

As you can see from the table data, the column name contains special characters. If you examine the customer ID equal to 5, the name ‘Jorge@Cooper’ includes a special character (@).

There should be a space between the first name and last name, so use the below command to fix this problem.

UPDATE 
    customers
SET 
    name = REPLACE(name,'@',' ')
WHERE 
    customer_id = 5
RETURNING *;

After executing the above query, I got the expected output as shown in the screenshot below.

remove special characters postgresql

In the above code part (name = REPLACE(name,’@’,’ ‘)), replace the special character @ with a space (‘ ‘). The code replaces the @ character in the name ‘Jorge@Cooper’ of the ‘name’ column with a space, resulting in the output ‘Jorge Cooper’, which is more readable.

Approach 2: Using REGEXP_REPLACE

Another function for replacing special characters in PostgreSQL is REGEXP_REPLACE, which replaces a specific part of strings with a new string based on the specified regular expression pattern. The regular expression is a sequence of characters.

Syntax

The syntax of REGEXP_REPLACE is given below.

REGEXP_REPLACE( actual_string, regexp_pattern, new_string )
  • actual_string: The source string in which the matched part of the string using the regexp pattern is replaced by new_string.
  • regexp_pattern: This is the regular expression pattern to find the part of the string that you want to replace with new_string.
  • new_string: It is the new string that replaces the part of the string that is found by the regular expression.

Example

Let’s take an example and understand how to use the regexp_replace to replace the special characters.

Use the below command to replace special characters # from the name ‘James@Johnson#’.

UPDATE 
    customers
SET 
    name = regexp_replace(name, '#',' ')
WHERE 
    customer_id = 1
RETURNING *;

After executing the above query, I obtained the expected output, as shown in the screenshot below.

remove special characters in postgresql

The above query replaces the special character # with a space within the name ‘James@Johnson#’, and the output is ‘James@Johnson‘, but it still has the special character @, so to remove it, use the regular pattern as ‘@’ in the regexp_replace() function.

Approach 3: Using the Translate()

The TRANSLATE() function replaces the part of a string with a new string based on a specified set. This function performs the same operation as REPLACE() and REGEXP_REPLACE().

Syntax

The syntax is given below.

TRANSLATE(actual_string, character_set, new_string)
  • actual_string: This is the source string where the new string replaces the found set.
  • Character set: It is a collection of characters that acts as a pattern.
  • new_string: The new string that replaces the set found in the actual string.

Example

Let’s check with an example and see how to use the translate() function to replace special characters in PostgreSQL.

Use the command below to replace most special characters in the ‘name’ column of the customers table.

UPDATE 
    customers
SET 
    name = translate(name, '_%#^*@$', '      ') from customers;
RETURNING *;

After executing the above query, I got the expected output as shown in the screenshot below.

how to remove special characters from a string in postgresql

If you look at the output of the above command, the translate() function replaces all the special characters specified in the column ‘name’ using the selected set ‘_%#^*@$’ with a new string, which is a space in this case.

Now the name of the employee seems meaningful after replacing the special characters from each name.

However, you need to understand one thing about the translate() function: how to specify the new string.

Let me show you simple examples. Suppose you have the string ‘Ma&ria*’ with two special characters ‘&’ and ‘*’.

Use the below command to replace the special characters with the capital letters ‘K’ and ‘J’.

SELECT translate('Ma&ria*', '&*', 'KJ');

After executing the above query, I got the expected output as shown in the screenshot below.

remove special characters from a string in postgresql

Now, understand that within the set ‘&*’, you want to search for this pattern, but in the string ‘Ma&ria*’, you see the special character exists at different places. The new string is ‘KJ’ and the output is ‘MaKriaJ’.

So what happens here is that the translate function considers the characters within the set as individual characters wherever these characters exist in the actual_string. For each character, there is a different replacement string or character in the new_string.

Approach 4: Combining Methods for Complex Scenarios

For more complex data cleaning requirements, I often combine multiple techniques to achieve optimal results.

Example

UPDATE company_profiles
SET standardized_name = 
    regexp_replace(
        translate(company_name, '!@#$%^&*()_+=[]{}|:;"<>,.?/~`', ' '),  -- Replace specials with spaces
        '\s+',                                                           -- Find multiple spaces
        ' ',                                                             -- Replace with single space
        'g'
    );

This two-step process first converts all special characters to spaces using the translate() function, then uses regexp_replace() to collapse multiple spaces into a single space.

Performance Considerations

Here are the key considerations.

Performance by Method

Below is a general performance comparison.

MethodRelative PerformanceBest Use Case
translate()FastestSimple character replacement with known character sets
regexp_replace() with simple patternsFastRemoving defined categories of characters
regexp_replace() with complex patternsSlowerComplex pattern matching and replacement
Custom PL/pgSQL functionsVariesReusable operations across multiple queries

Best Practices

Best practices for special character handling:

  1. Be intentional about what you remove – Define exactly what constitutes a “special character” for your specific use case.
  2. Create test cases – Before applying changes to production data, test your approach with representative samples.
  3. Document your patterns – Regular expressions can be challenging to decipher later; add comments explaining your pattern choices.
  4. Consider performance – For large datasets, choose the most efficient method and implement appropriate indexing.
  5. Create reusable functions – If you’re performing similar cleaning operations throughout your database, invest in creating custom functions.

Conclusion

In this PostgreSQL tutorial, I showed you how to remove/replace special characters with strings using REGEXP_REPLACE() and REPLACE(). Additionally, you learned about the function TRANSLATE() to replace the special characters based on the specified set, which is a set of characters.

By following these guidelines and using the techniques mentioned in this article, you’ll be well-equipped to handle special character removal in PostgreSQL effectively.

You may also read:

Top 200 SQL Server Interview Questions and Answers

Free PDF On Top 200 SQL Server Interview Questions And Answers

Download A 40 pages PDF And Learn Now.