tutorial

How to build a URL text summarizer with simple NLP

To view the source code, please visit my GitHub page.

Wouldn’t it be great if you could automatically get a summary of any online article? Rather you’re too busy, or have too many articles in your reading list, sometimes all you really want is a short article summary. 

That’s why TL;DR (too long didn’t read) is so commonly used these days. While this internet acronym can criticize a piece of writing as overly long, it is often used to give a helpful summary of a much longer story or complicated phenomenon. While my last piece focused on how to estimate any article read time, this time we will build a TL;DR given any article.

Getting started

For this tutorial, we’ll be using two Python libraries:

  1. Web crawlingBeautiful Soup. Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

  2. Text summarizationNLTK (Natural Language Toolkit). NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

Go ahead and get familiar with the libraries before continuing, and also make sure to install them locally. Alternatively, run this command within the project repo directory

pip install -r requirements.txt

Next, we will download the stopwords corpus from the nltk library individually. Open Python command line and enter:

import nltknltk.download("stopwords")

Text Summarization using NLP

Lets describe the algorithm:

  1. Get URL from user input

  2. Web crawl to extract the page text from the HTML page (by paragraphs <p>).

  3. Execute the summarize frequency algorithm (implemented using NLTK) on the extracted text sentences. The algorithm ranks sentences according to the frequency of the words they contain, and the top sentences are selected for the final summary.

  4. Return the highest ranked sentences (I prefer 5) as a final summary.

For part 2 (1 is self explanatory), we’ll develop a method called getTextFromURL as shown below:

def getTextFromURL(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")
    text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
    return text

The method initiates a GET request to the given URL, and returns the text from the HTML page.

From Text to TL;DR

We will use several methods here including some that are not included (to learn more see code source in repo).

def summarizeURL(url, total_pars):
    url_text = getTextFromURL(url).replace(u"Â", u"").replace(u"â", u"")
    fs = FrequencySummarizer()
    final_summary = fs.summarize(url_text.replace("\n"," "),       total_pars)
    return " ".join(final_summary)

The method calls getTextFromURL above to retrieve the text, and clean it from HTML characters and trailing new lines (\n). 

Next, we execute the FrequencySummarizer algorithm on a given text. The algorithm tokenizes the input into sentences then computes the term frequency map of the words. Then, the frequency map is filtered in order to ignore very low frequency and highly frequent words, this way it is able to discard the noisy words such as determiners, that are very frequent but don’t contain much information, or words that occur only few times. To see the code source click here.

Finally, we return a list of the highest ranked sentences which is our final summary.


Summary

That’s it! Try it out with any URL and you’ll get a pretty decent summary. There are many summarization algorithms which have been proposed in recent years (such as TF-IDF), and there’s much more to do in this algorithm. For example, go ahead and improve the filtering of text. If you have any suggestions or recommendations, I’d love to hear about them so comment below!

Bots are here to stay. Here are strong reasons why.

For the past several years, I’ve been dedicating my life to learn, design, build and write about chatbots. The main reason chatbots fascinate me and conversational interfaces in general, is because they offer the most natural way for humans to interact with machines. Not only is the interaction natural, but also simple, clean and focused on what you need instantly. Think of Google’s search interface. All you can do is input search queries into a little text box. Everything that comes afterwards is magic.

Chatbots are still in their very early stage due to a few factors:

1. Lack of expectations between what chatbots can do and what users expect (or in other words, bad user experience). This leads to an instant disappointment and therefore to low usability. It starts with the largest bots such as Siri and all the way down to the very basic ones, which confuse users about their actual capabilities. In theory, Siri claims she can do almost anything related to your iPhone device, calendar and other internal Apple apps.

Screen Shot 2017-09-06 at 09.33.32.png

In reality, over 60% of what you ask Siri is not understood or results in general web search results. Just so you can get an idea of what she can do, here’s a list of Siri commands. Did you know she can do most of it? I didn’t before.

2. Educating users on creating new habits. The last bot I’ve built, was focused purely on conversation, and therefore did not have any buttons or menus. Retention was very low, despite the fact that most of the conversations between the bot and users were held successfully. What I’ve discovered, was that the more buttons and menus I added, the more retention grew. This led me to the conclusion that the majority of users are still not used to talking with machines naturally, but rather prefer to click on buttons as we’ve been used to for the past 30 years. Secondly, clicking buttons are faster than typing sentences. However, buttons are not faster than voice, which is why voice will eventually dominate the bot space. The transition from buttons to a natural conversation is growing but is still in its early adaption stage.

3. Artificial intelligence might have improved, but is still in its early stages. The reason this last and not first, is because I truly believe we can build great chatbots with todays current AI solutions (such as Api.ai, Wit.ai, etc.) if bots were more focused on how to create user habits and offering a well designed user experience that meets user expectations. You can read more about how to do this in a previous post I’ve written on how to improve your chatbot with 3 simple steps. Obviously, AI will only improve with time as more and more data is collected and trained across a multitude of domains.

For all the reasons above and more, we’re still far from seeing the true potential of chatbots. However, there’s strong reasons why chatbots are here to stay and will improve exponentially over time:

The optimal Human to Machine interaction

If you think about it, we’ve learned over the past 30 years to adjust ourselves to the complexity and limitations of machines. This has been done via websites, applications, buttons, icons, etc. But in my opinion, the optimal scenario should be just the opposite - Machines should be able to adjust themselves to us humans, both from a natural language understanding perspective, and personalization.

Humans should be able to ask a machine anything naturally, instead of having to learn new interfaces, products, and habits for every service they need.

For example, let’s say you’d like to know the weather. Until recently, you’d have to find and pick a service of your choice out of many alternatives and learn how to use it functionally. Now, since every service tries to be innovative and different from its alternatives, this usually results in various UX/UI which means more learning and effort required by users. The optimal solution would be if you could just ask. Thankfully there’s many great solutions today for weather assistants (such as Siri or Poncho), and there’s many more to come in other domains.

Domain specific AI

Not very long ago, companies built virtual assistants which tried to go very wide and open, but quickly realized how hard it is to understand natural language. Going back to Siri’s example, Apple tried to capture many domains in order to display Siri as the ultimate personal assistant. This ambition failed very quickly. On the other hand, AI solutions that have succeeded, are the ones that focus narrowly on one specific domain.

Take for example Getdango - an AI solution for predicting emojis. They’re doing a great job predicting emojis based on natural language and it’s due to their narrow focus. Another example is Meekan, a scheduling assistant for teams on Slack. Meekan is a chatbot dedicated to providing the best solution for scheduling events as easily as possible.

The power of synergy where individual bots focus on specific domains, is the right approach for solving bigger AI challenges. You can see companies moving in this direction like FB Messenger’s latest release of handover protocol which enables two or more applications to collaborate. More importantly, Amazon partnered with Microsoft to collaborate on Alexa with the help of Cortana in order to provide a more powerful virtual assistant. If every bot was to focus on one specific domain, the race to AI in a whole, would be solved faster and more efficiently. Happily, that’s where we’re heading.

The power of long term relationships

The way most products are designed today, is to maximize instant short term value for users once they enter an application. While web and mobile applications focus on short term value, bots can and should focus on long term value. Bot developers should focus on how to build relationships with users over time so that value is gained and grows constantly with every single interaction. With time, bots should know enough about what they need, to maximize and personalize the user experience and minimize input friction. 

For example, say you’re looking for a travel planning service. The way you would go by it today, is to look at travel sites, fill the proper forms, and basically teach the site about your preferences and filters every single time. The way a bot should work, is to know which information is relevant to learn about the user, like personal information, preferences, budgets, places the user has already been to and etc. Bots should constantly learn from user’s behavior and offer much more personalized responses. 

The optimal way you’d be conversing with such a bot after some time, would be as follows:

Screen Shot 2017-09-06 at 09.23.02.png

The bot should know by now how many people are in your family, their ages, where you’ve already been to, where you’re at right now, what you like to do, what you don’t like and much more. In other words, the bot shouldn’t be any different from a real travel agent.

To see how much are users willing to share regarding their personal information within a conversation, I’ve conducted a research with one of my bots. The bot started asking users basic questions like “how old are you” all the way to more personal questions like “what are you most insecure about?”. Guess how many users answered all questions truthfully?… Over 85%. More specifically, 89% of women answered all questions truthfully, while “only” 81% of men answered all questions. So if you’re a bot developer, don’t be worried about users not sharing their information. Worry about what questions you should be asking in order to enhance the users long term value. This kind of information retrieval is something todays applications cannot achieve, and where chatbots have a huge advantage.

Cross platform usability

In just a few years, mobile apps have transformed to must-haves for smartphone users. But despite the increase in app usage and app choices, the number of apps used per user is staying the same, according to a recent report from Nielsen. Most people are tired of downloading mobile apps and learning about how to use new interfaces. 

In addition, research says that US citizens own in average 3.6 connected device. These devices can vary from mobile devices to Smart TVs and products like Amazon Alexa. That’s a lot of connected devices! Now obviously, users would like to interact with your service on any device they’re using. But what do you do? Build an application for iOS, Android, Smart TV, Alexa, Smart watch, iPad, Windows and Mac and more? Sounds like a lot of work. And it’s going to be very hard for you to get users to download your app in the first place, since they’re already flooded with other apps. 

This is where the beauty of messaging platforms comes in. At present, approximately 75% of all smartphone users use some sort of messaging apps such as WhatsApp, WeChat, Facebook Messenger, etc. Over 1.2 billion people worldwide have and use Messenger on their devices, all people that have mobile devices have SMS and obviously most have an email account. The list goes on. Instead of building applications and spending hundreds of thousands of dollars, just focus on building your bots back end. For the front end, just integrate your bot across multiple messaging platforms that are already on users devices and you’re set. If your service brings value, users will come. More importantly, turns out it’s what users want.


The future and success of chatbots depends not only on the big 4 tech companies, but on developers and entrepreneurs who continue to innovate and push boundaries in AI and conversational interfaces. Most of todays mistakes in the conversational user interface space, are tomorrows improvements and solutions. Eventually, bots will bring great value that cannot be achieved with most todays applications.

To learn more about chatbots go ahead and read the chatbots beginners guide. If you want to start building one, read this post on how to develop a Facebook Messenger bot.

How to send push notifications with PHP

Sending push notifications to an iOS/Android Application can enhance the user experience exponentially, while allowing you to deliver key information easily. However, sending the push notification to users can be a bit tedious at times, and at times confusing. You need to ensure that you pack your integers, and times correctly - failing to do this and you'll probably get an unhelpful status from Apple or Google.

I've came across some online PHP Scripts for either iOS or Android implementation however not for both. This PHP script includes implementation for both mobile operating systems.

PHP Script (For a description, scroll below the script):

function send_mobile_notification_request($user_mobile_info, $payload_info)
{
    //Default result
    $result = -1;
    //Change depending on where to send notifications
    $pem_preference = "production";
    $user_device_type = $user_mobile_info['user_device_type'];
    $user_device_key = $user_mobile_info['user_mobile_token'];
    if ($user_device_type == "iOS") {
        $apns_url = NULL;
        $apns_cert = NULL;
        //Apple server listening port
        $apns_port = 2195;
        if ($pem_preference == "production") {
            $apns_url = 'gateway.push.apple.com';
            $apns_cert = __DIR__.'/cert-prod.pem';
        }
        //develop .pem
        else {
            $apns_url = 'gateway.sandbox.push.apple.com';
            $apns_cert = __DIR__.'/cert-dev.pem';
        }
        $stream_context = stream_context_create();
        stream_context_set_option($stream_context, 'ssl', 'local_cert', $apns_cert);
        $apns = stream_socket_client('ssl://' . $apns_url . ':' . $apns_port, $error, $error_string, 2, STREAM_CLIENT_CONNECT,                                   $stream_context);
        $apns_message = chr(0) . chr(0) . chr(32) . pack('H*', str_replace(' ', '', $user_device_key)) . chr(0) . chr(strlen($payload_info)) .                               $payload_info;
        if ($apns) {
            $result = fwrite($apns, $apns_message);
        }
        @socket_close($apns);
        @fclose($apns);
    }
    else if ($user_device_type == "Android") {
        // API access key from Google API's Console
        define('API_ACCESS_KEY', ADD_YOUR_API_KEY_HERE);
        // prep the bundle
        $msg = array
        (
            'message' => json_decode($payload_info)->aps->alert,
            'title' => 'This is a title. title',
            'subtitle' => 'This is a subtitle. subtitle',
            'tickerText' => 'Ticker text here...Ticker text here...',
            'vibrate' => 1,
            'sound' => 1,
            'largeIcon' => 'large_icon',
            'smallIcon' => 'small_icon'
        );
        $fields = array
        (
            'registration_ids' => array($user_device_key),
            'data' => $msg
        );
        $headers = array
        (
            'Authorization: key=' . API_ACCESS_KEY,
            'Content-Type: application/json'
        );
        $ch = curl_init();
        curl_setopt( $ch,CURLOPT_URL,                     'https://android.googleapis.com/gcm/send' );
        curl_setopt( $ch,CURLOPT_POST, true );
        curl_setopt( $ch,CURLOPT_HTTPHEADER, $headers );
        curl_setopt( $ch,CURLOPT_RETURNTRANSFER, false );
        curl_setopt( $ch,CURLOPT_SSL_VERIFYPEER, false );
        curl_setopt( $ch,CURLOPT_POSTFIELDS, json_encode( $fields ) );
        $result = curl_exec($ch);
        curl_close($ch);
    }
    return $result > 0;
}

function create_payload_json($message) {
    //Badge icon to show at users ios app icon after receiving notification
    $badge = "0";
    $sound = 'default';
    $payload = array();
    $payload['aps'] = array('alert' => $message, 'badge' => intval($badge),'sound' => $sound);
    return json_encode($payload);
}

Description

Let's start. The first method builds the body of the notification request depending on the users operating system and sends accordingly. The flow process for sending push notifications is first to Apple/Google servers, and only then to the end user. Therefore, each end user holds on his mobile device, a unique token. Learn more about how to retrieve the user device key in Android or iOS.

Personally, I wrote the main method such that the input contains $user_mobile_info - An array containing the user's device and unique device key, and $payload_info - A JSON which contains the body message for sending the push notification request (Found in the second method). The $pem_preference variable inside the method is also hard coded, however can be changed to your preference. Apple offers two servers for development - sandbox for QA (gateway.sandbox.push.apple.com) and regular for production (gateway.push.apple.com). If you're in the testing phase of your development, just change the url or the variable itself.

The second method builds the message body. I've hard coded some variables such as the sound and badge. Sound can be changed to various options, and badge describes the badge to be shown when the user receives the notification. I've modified it to "0", meaning there will be no badge icon when receiving notifications.

Usage Example

The main part of the push notification is the message itself. Let's say the notification we want to send is "I know how to send push notifications!". We'll first create the payload JSON using the second method:

$payload = create_payload_json("I know how to send push notifications!");

Let's say the user has an iOS (This info can be kept on a server, database etc... for each user) and the array is as follows:

$user_mobile_info = ['user_device_type'=>"iOS", 'user_mobile_token'=>'1234ABCD'];

Now we can send the notification itself using the first method: 

send_mobile_notification_request($user_mobile_info, $payload);

They're many minor sections which have not been covered by this blog post. Feel free to leave me comments if you have any further questions.

To learn about more coding fundamentals visit devclass.io.