Menu
Instagram engineer delves into emoji madness

Instagram engineer delves into emoji madness

Company engineers did painstaking work to allow users to search by emoji

A portion of Instagram's regular expression search query for finding emojis

A portion of Instagram's regular expression search query for finding emojis

Emojis: Kids may love their simplicity, but programmers will loathe their complexities.

Last month, the Instagram photo-sharing service started recognizing emojis in its hashtag searches, making the company the first major social networking service to offer this capability. A user could affix a sprightly emoji to a photo hashtag so the snap could be found by other users searching for that emoji. The Internet rejoiced.

Now, one of the Instagram engineers responsible for this technical feat has shared the company's approach in a blog item posted Wednesday that should be perused by any developer looking to outfit a social Internet service or consumer app with similar emoji goodness. Turns out that supporting the little digital icons is no easy task.

"Identifying characters can be difficult across programming languages. Only by parsing the standard, finding character variations and understanding language differences do they become possible to support," Instagram engineer Piyush Mangalick wrote in the new post.

While elders may bemoan emojis' putative deleterious effect on language, one thing is for sure: The youth love them. Today, almost 60 percent of user text generated on Instagram contains emojis. Among Instagram's 300 million users, emojis are now more widely used than acronyms. LOL.

First popularized in Japan during the last decade, emojis convey a wide range of subjects and emotions through the use of simple symbols and pictographs, usually fitted on a 12-by-12-pixel grid. They are often used as shorthand to eliminate the laborious typing of words on small devices. The Unicode standard for encoding the world's languages on computers adopted a set of 1,282 emojis in 2010, which paved the way for their widespread use on Apple and Android devices.

Including emojis in Instagram's hashtag index at first seemed like a simple task. With Unicode, each character -- be it a letter, symbol or emoji -- is represented by a string of hexadecimal numbers, which a programming language or operating system can translate into the appropriate character by using the Unicode guide.

Unfortunately, creating a single way to search these raw Unicode strings across different platforms was not possible, Mangalick said. Emojis used a subset of Unicode, called UTF-16, that allows the numeric strings to be of differing lengths. That made them tricky to parse, given that different programming languages used different escape keys, or markers, to signify the end of the numeric string. Additionally, some emojis required two strings of numbers.

Apple muddied the waters further by offering users the ability to encode some emojis in various colors, which resulted in non-standard strings. Android also had a set of non-standard emoji encodings. For Instagram to use emojis correctly, an Android device had to recognize an iPhone emoji, and vice versa.

For the solution, Instagram turned to regular expressions, a dense but extremely versatile language for searching for patterns in text. Regular expressions, called regex for short, were designed for tasks such as recognizing complex sets data strings within larger, more complex strings of data.

In the IT world, regular expressions searches justifiably have gained a reputation for being fiendishly complicated. Instagram's regular expressions for finding emojis may be the most complicated yet.

The company painstakingly crafted a regex search pattern for Python 2.7, the company's preferred language for its back-end search service, that would identify all the possible emojis a user could use. The list was more than 3,600 characters long. Imagine entering that into Google without a single mistake.

And that was just the regex for Python. Instagram had to identify emojis across all the platforms it supported. So company engineers had to craft separate, though equally voluminous, regex patterns for Google's and Apple's choices, Java and Objective-C.

The work paid off, however, not only in terms of the positive publicity that the emoji support generated for Instagram, but also by helping the company stay in touch with its digitally expressive user base. If emojis ever do surpass the use of text itself, as pundits fear and Instagram predicts, then Instagram is well poised for this colorful future.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com


Follow Us

Join the newsletter!

Error: Please check your email address.

Tags application developmentLanguages and standardsInstagramsoftware

Featured

Slideshows

Meet the top performing HP partners in NZ

Meet the top performing HP partners in NZ

HP honoured leading partners across the channel at the Partner Awards 2017 in New Zealand, recognising excellence across the entire print and personal systems portfolio.

Meet the top performing HP partners in NZ
Tech industry comes together as Lexel celebrates turning 30

Tech industry comes together as Lexel celebrates turning 30

Leading figures within the technology industry across New Zealand came together to celebrate 30 years of success for Lexel Systems, at a milestone birthday occasion at St Matthews in the City.​

Tech industry comes together as Lexel celebrates turning 30
HP re-imagines education through Auckland event launch

HP re-imagines education through Auckland event launch

HP New Zealand held an inaugural Evolve Education event at Aotea Centre in Auckland, welcoming over 70 principals, teachers and education experts to explore ways of shaping and enhancing learning using technology.

HP re-imagines education through Auckland event launch
Show Comments