Message Encoding

To improve cost efficiency for customers who use the Webext Interact SMS API, a subset of Unicode characters are automatically converted to standard GSM 03.38 equivalents.

Conditional Conversion Logic

The conversion process is conditional. To ensure message integrity and predictability, the system applies the following rule:

All-or-Nothing Conversion

The system will only perform conversions if every non-standard Unicode character in your message has a defined GSM 03.38 equivalent.

Fallback

If your message contains any character that cannot be converted (i.e., a character outside the supported GSM 03.38 set for which no conversion rule exists), the system will not perform any conversions at all. The message will be sent in its original format.

Conversion Mapping

Quotation Marks and Apostrophes

Original CharacterUnicodeConverted ToDescription
'U+2018'Left single quotation mark
'U+2019'Right single quotation mark
U+201A'Single low-9 quotation mark
U+201B'Single high-reversed-9 quotation mark
U+2032'Prime
U+2035'Reversed prime
ʹU+02B9'Modifier letter prime
ʼU+02BC'Modifier letter apostrophe
ʻU+02BB'Modifier letter turned comma
`U+0060'Grave accent
´U+00B4'Acute accent
"U+201C"Left double quotation mark
"U+201D"Right double quotation mark
U+201E"Double low-9 quotation mark
U+201F"Double high-reversed-9 quotation mark
U+2033"Double prime
U+2036"Reversed double prime
«U+00AB<Left-pointing double angle quotation mark
»U+00BB>Right-pointing double angle quotation mark
U+2039<Single left-pointing angle quotation mark
U+203A>Single right-pointing angle quotation mark

Dashes and Hyphens

Original CharacterUnicodeConverted ToDescription
U+2013-En dash
U+2014-Em dash
U+2015-Horizontal bar
U+2010-Hyphen
U+2011-Non-breaking hyphen
U+2012-Figure dash
U+2212-Minus sign
U+FE58-Small em dash
U+FE63-Small hyphen-minus
U+FF0D-Fullwidth hyphen-minus

Whitespace Characters

Original CharacterUnicodeConverted ToDescription
(tab)U+0009(space)Tab character
(nbsp)U+00A0(space)Non-breaking space
(en quad)U+2000(space)En quad
(em quad)U+2001(space)Em quad
(en space)U+2002(space)En space
(em space)U+2003(space)Em space
(3/em space)U+2004(space)Three-per-em space
(4/em space)U+2005(space)Four-per-em space
(6/em space)U+2006(space)Six-per-em space
(fig space)U+2007(space)Figure space
(punc space)U+2008(space)Punctuation space
(thin space)U+2009(space)Thin space
(hair space)U+200A(space)Hair space
(nnbsp)U+202F(space)Narrow no-break space
(mmsp)U+205F(space)Medium mathematical space
(ideographic)U+3000(space)Ideographic space

Dots and Bullets

Original CharacterUnicodeConverted ToDescription
·U+00B7.Middle dot
U+2022.Bullet
U+2027.Hyphenation point
U+2219.Bullet operator
U+22C5.Dot operator
U+2024.One dot leader

Slashes and Division

Original CharacterUnicodeConverted ToDescription
U+2044/Fraction slash
U+2215/Division slash
U+29F8/Big solidus
U+FF0F/Fullwidth solidus
÷U+00F7/Division sign

Multiplication and Asterisks

Original CharacterUnicodeConverted ToDescription
×U+00D7-Multiplication sign
U+2217-Asterisk operator
U+2731-Heavy asterisk
U+FF0A-Fullwidth asterisk

Accented Latin Characters

A Variants

OriginalUnicodeConverted ToDescription
ÀU+00C0ALatin A with grave
ÁU+00C1ALatin A with acute
ÂU+00C2ALatin A with circumflex
ÃU+00C3ALatin A with tilde
ĀU+0100ALatin A with macron
ĂU+0102ALatin A with breve
ĄU+0104ALatin A with ogonek
ǍU+01CDALatin A with caron
áU+00E1aLatin a with acute
âU+00E2aLatin a with circumflex
ãU+00E3aLatin a with tilde
āU+0101aLatin a with macron
ăU+0103aLatin a with breve
ąU+0105aLatin a with ogonek
ǎU+01CEaLatin a with caron

Note: The character à (U+00E0) is natively supported in SMS and is preserved as-is.

E Variants

OriginalUnicodeConverted ToDescription
ÈU+00C8ELatin E with grave
ÊU+00CAELatin E with circumflex
ËU+00CBELatin E with diaeresis
ĒU+0112ELatin E with macron
ĔU+0114ELatin E with breve
ĖU+0116ELatin E with dot above
ĘU+0118ELatin E with ogonek
ĚU+011AELatin E with caron
êU+00EAeLatin e with circumflex
ëU+00EBeLatin e with diaeresis
ēU+0113eLatin e with macron
ĕU+0115eLatin e with breve
ėU+0117eLatin e with dot above
ęU+0119eLatin e with ogonek
ěU+011BeLatin e with caron

Note: The characters É (U+00C9), é (U+00E9), and è (U+00E8) are natively supported in SMS.

I Variants

OriginalUnicodeConverted ToDescription
ÌU+00CCILatin I with grave
ÍU+00CDILatin I with acute
ÎU+00CEILatin I with circumflex
ÏU+00CFILatin I with diaeresis
ĨU+0128ILatin I with tilde
ĪU+012AILatin I with macron
ĬU+012CILatin I with breve
ĮU+012EILatin I with ogonek
İU+0130ILatin I with dot above
íU+00EDiLatin i with acute
îU+00EEiLatin i with circumflex
ïU+00EFiLatin i with diaeresis
ĩU+0129iLatin i with tilde
īU+012BiLatin i with macron
ĭU+012DiLatin i with breve
įU+012FiLatin i with ogonek
ıU+0131iLatin dotless i

Note: The character ì (U+00EC) is natively supported in SMS.

O Variants

OriginalUnicodeConverted ToDescription
ÒU+00D2OLatin O with grave
ÓU+00D3OLatin O with acute
ÔU+00D4OLatin O with circumflex
ÕU+00D5OLatin O with tilde
ŌU+014COLatin O with macron
ŎU+014EOLatin O with breve
ŐU+0150OLatin O with double acute
ǑU+01D1OLatin O with caron
óU+00F3oLatin o with acute
ôU+00F4oLatin o with circumflex
õU+00F5oLatin o with tilde
ōU+014DoLatin o with macron
ŏU+014FoLatin o with breve
őU+0151oLatin o with double acute
ǒU+01D2oLatin o with caron

Note: The characters Ö (U+00D6), ö (U+00F6), Ø (U+00D8), ø (U+00F8), and ò (U+00F2) are natively supported in SMS.

U Variants

OriginalUnicodeConverted ToDescription
ÙU+00D9ULatin U with grave
ÚU+00DAULatin U with acute
ÛU+00DBULatin U with circumflex
ŨU+0168ULatin U with tilde
ŪU+016AULatin U with macron
ŬU+016CULatin U with breve
ŮU+016EULatin U with ring above
ŰU+0170ULatin U with double acute
ŲU+0172ULatin U with ogonek
ǓU+01D3ULatin U with caron
úU+00FAuLatin u with acute
ûU+00FBuLatin u with circumflex
ũU+0169uLatin u with tilde
ūU+016BuLatin u with macron
ŭU+016DuLatin u with breve
ůU+016FuLatin u with ring above
űU+0171uLatin u with double acute
ųU+0173uLatin u with ogonek
ǔU+01D4uLatin u with caron

Note: The characters Ü (U+00DC), ü (U+00FC), and ù (U+00F9) are natively supported in SMS.

Y Variants

OriginalUnicodeConverted ToDescription
ÝU+00DDYLatin Y with acute
ŶU+0176YLatin Y with circumflex
ŸU+0178YLatin Y with diaeresis
ýU+00FDyLatin y with acute
ÿU+00FFyLatin y with diaeresis
ŷU+0177yLatin y with circumflex

C Variants

OriginalUnicodeConverted ToDescription
ĆU+0106CLatin C with acute
ĈU+0108CLatin C with circumflex
ĊU+010ACLatin C with dot above
ČU+010CCLatin C with caron
çU+00E7cLatin c with cedilla
ćU+0107cLatin c with acute
ĉU+0109cLatin c with circumflex
ċU+010BcLatin c with dot above
čU+010DcLatin c with caron

Note: The character Ç (U+00C7) is natively supported in SMS.

N Variants

OriginalUnicodeConverted ToDescription
ŃU+0143NLatin N with acute
ŅU+0145NLatin N with cedilla
ŇU+0147NLatin N with caron
ńU+0144nLatin n with acute
ņU+0146nLatin n with cedilla
ňU+0148nLatin n with caron

Note: The characters Ñ (U+00D1) and ñ (U+00F1) are natively supported in SMS.

S Variants

OriginalUnicodeConverted ToDescription
ŚU+015ASLatin S with acute
ŜU+015CSLatin S with circumflex
ŞU+015ESLatin S with cedilla
ŠU+0160SLatin S with caron
śU+015BsLatin s with acute
ŝU+015DsLatin s with circumflex
şU+015FsLatin s with cedilla
šU+0161sLatin s with caron

Z Variants

OriginalUnicodeConverted ToDescription
ŹU+0179ZLatin Z with acute
ŻU+017BZLatin Z with dot above
ŽU+017DZLatin Z with caron
źU+017AzLatin z with acute
żU+017CzLatin z with dot above
žU+017EzLatin z with caron

Other Consonants

OriginalUnicodeConverted ToDescription
ĐU+0110DLatin D with stroke
đU+0111dLatin d with stroke
ÐU+00D0DLatin Eth
ðU+00F0dLatin eth
ĞU+011EGLatin G with breve
ğU+011FgLatin g with breve
ĢU+0122GLatin G with cedilla
ģU+0123gLatin g with cedilla
ĶU+0136KLatin K with cedilla
ķU+0137kLatin k with cedilla
ĹU+0139LLatin L with acute
ĺU+013AlLatin l with acute
ĻU+013BLLatin L with cedilla
ļU+013ClLatin l with cedilla
ĽU+013DLLatin L with caron
ľU+013ElLatin l with caron
ŁU+0141LLatin L with stroke
łU+0142lLatin l with stroke
ŔU+0154RLatin R with acute
ŕU+0155rLatin r with acute
ŖU+0156RLatin R with cedilla
ŗU+0157rLatin r with cedilla
ŘU+0158RLatin R with caron
řU+0159rLatin r with caron
ŢU+0162TLatin T with cedilla
ţU+0163tLatin t with cedilla
ŤU+0164TLatin T with caron
ťU+0165tLatin t with caron
ŦU+0166TLatin T with stroke
ŧU+0167tLatin t with stroke

Mathematical Symbols

Original CharacterUnicodeConverted ToDescription
U+2248~Almost equal to

Brackets and Parentheses

Original CharacterUnicodeConverted ToDescription
U+FF08(Fullwidth left parenthesis
U+FF09)Fullwidth right parenthesis
U+FF3B[Fullwidth left square bracket
U+FF3D]Fullwidth right square bracket
U+FF5B{Fullwidth left curly bracket
U+FF5D}Fullwidth right curly bracket
U+2329<Left-pointing angle bracket
U+232A>Right-pointing angle bracket
U+3008<Left angle bracket
U+3009>Right angle bracket

Special Punctuation

Original CharacterUnicodeConverted ToDescription
U+2017_Double low line
U+203E-Overline
U+2043-Hyphen bullet
¦U+00A6|Broken bar
U+FF01!Fullwidth exclamation mark
U+FF1F?Fullwidth question mark
U+FF0C,Fullwidth comma
U+FF0E.Fullwidth full stop
U+FF1A:Fullwidth colon
U+FF1B;Fullwidth semicolon

Fullwidth Alphanumeric Characters

All fullwidth digits (0-9) and letters (A-Z, a-z) are converted to their standard ASCII equivalents.

RangeUnicode RangeConverted To
Fullwidth digitsU+FF10 - U+FF190-9
Fullwidth uppercaseU+FF21 - U+FF3AA-Z
Fullwidth lowercaseU+FF41 - U+FF5Aa-z

Symbols and Miscellaneous

Original CharacterUnicodeConverted ToDescription
U+2717xBallot X
U+2718xHeavy ballot X

Characters Removed (Invisible Characters)

The following invisible/control characters are automatically removed from messages:

CharacterUnicodeDescription
(invisible)U+200BZero Width Space
(invisible)U+200CZero Width Non-Joiner
(invisible)U+200DZero Width Joiner
(invisible)U+2060Word Joiner
(invisible)U+FEFFByte Order Mark / Zero Width No-Break Space
(invisible)U+00ADSoft Hyphen
(invisible)U+E0000-E007FTags block (128 deprecated language tag characters)

Natively Supported Characters (No Conversion Needed)

The following special characters are natively supported by SMS and are preserved as-is:

Currency

  • $ (Dollar)
  • £ (Pound)
  • ¥ (Yen)
  • € (Euro)

Accented Characters

  • à, À (a/A with grave)
  • è, é, È, É (e variants)
  • ì (i with grave)
  • ò (o with grave)
  • ù (u with grave)
  • Ç (C with cedilla)
  • Ä, ä, Ö, ö, Ü, ü (German umlauts)
  • Ñ, ñ (Spanish n with tilde)
  • Å, å (Swedish a with ring)
  • Ø, ø (Danish/Norwegian o with stroke)
  • Æ, æ (Danish/Norwegian ligature)
  • ß (German sharp s)

Greek Letters (commonly used in math/science)

  • Δ (Delta), Φ (Phi), Γ (Gamma), Λ (Lambda)
  • Ω (Omega), Π (Pi), Ψ (Psi), Σ (Sigma)
  • Θ (Theta), Ξ (Xi)

Special Characters

  • @ (At sign)
  • _ (Underscore)
  • ¡ (Inverted exclamation)
  • ¿ (Inverted question mark)
  • § (Section sign)
  • ¤ (Currency sign)

Examples

Before and After Conversion

Original TextConverted Text
"Smart quotes""Smart quotes"
It's working — great!It's working - great!
Price: €50 × 2Price: €50 * 2
Café MünchenCafé München
Łódź, PolandLodz, Poland
Cancel: ✗Cancel: x