How to Build the Ultimate AI Automation with Multi-Agent Collaboration

Learn how to build an autonomous research assistant using LangGraph with a team of specialized AI agents

It has only been a year since the initial release of GPT Researcher, but methods for building, testing, and deploying AI agents have already evolved significantly. That’s just the nature and speed of the current AI progress. What started as simple zero-shot or few-shot prompting, has quickly evolved to agent function calling, RAG and now finally agentic workflows (aka “flow engineering”).

Andrew Ng has recently stated, “I think AI agent workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it.”

In this article you will learn why multi agent workflows are the current best standard and how to build the optimal autonomous research multi-agent assistant using LangGraph.

To skip this tutorial, feel free to check out the final code implementation of GPT Researcher x LangGraph here.

Introducing LangGraph

LangGraph is an extension of LangChain aimed at creating agent and multi-agent flows. It adds in the ability to create cyclical flows and comes with memory built in — both important attributes for creating agents.

LangGraph provides developers with a high degree of controllability and is important for creating custom agents and flows. Nearly all agents in production are customized towards the specific use case they are trying solve. LangGraph gives you the flexibility to create arbitrary customized agents, while providing an intuitive developer experience for doing so.

Enough with the smalltalk, let’s start building!

Building the Ultimate Autonomous Research Agent

By leveraging LangGraph, the research process can be significantly improved in depth and quality by leveraging multiple agents with specialized skills. Having every agent focus and specialize only a specific skill, allows for better separation of concerns, customizability, and further development at scale as the project grows.

Inspired by the recent STORM paper, this example showcases how a team of AI agents can work together to conduct research on a given topic, from planning to publication. This example will also leverage the leading autonomous research agent GPT Researcher.

The Research Agent Team

The research team consists of seven LLM agents:

  • Chief Editor — Oversees the research process and manages the team. This is the “master” agent that coordinates the other agents using LangGraph. This agent acts as the main LangGraph interface.

  • GPT Researcher — A specialized autonomous agent that conducts in depth research on a given topic.

  • Editor — Responsible for planning the research outline and structure.

  • Reviewer — Validates the correctness of the research results given a set of criteria.

  • Reviser — Revises the research results based on the feedback from the reviewer.

  • Writer — Responsible for compiling and writing the final report.

  • Publisher — Responsible for publishing the final report in various formats.

Architecture

As seen below, the automation process is based on the following stages: Planning the research, data collection and analysis, review and revision, writing the report and finally publication:


More specifically the process is as follows:

  • Browser (gpt-researcher) — Browses the internet for initial research based on the given research task. This step is crucial for LLMs to plan the research process based on up to date and relevant information, and not rely solely on pre-trained data for a given task or topic.

  • Editor — Plans the report outline and structure based on the initial research. The Editor is also responsible for triggering the parallel research tasks based on the planned outline.

  • For each outline topic (in parallel):

  • Researcher (gpt-researcher) — Runs an in depth research on the subtopics and writes a draft. This agent leverages the GPT Researcher Python package under the hood, for optimized, in depth and factual research report.

  • Reviewer — Validates the correctness of the draft given a set of guidelines and provides feedback to the reviser (if any).

  • Reviser — Revises the draft until it is satisfactory based on the reviewer feedback.

  • Writer — Compiles and writes the final report including an introduction, conclusion and references section from the given research findings.

  • Publisher — Publishes the final report to multi formats such as PDF, Docx, Markdown, etc.

We will not dive into all the code since there’s a lot of it, but focus mostly on the interesting parts I’ve found valuable to share.

Define the Graph State

One of my favorite features with LangGraph is state management. States in LangGraph are facilitated through a structured approach where developers define a GraphState that encapsulates the entire state of the application. Each node in the graph can modify this state, allowing for dynamic responses based on the evolving context of the interaction.

Like in every start of a technical design, considering the data schema throughout the application is key. In this case we’ll define a ResearchState like so:

class ResearchState(TypedDict):
    task: dict
    initial_research: str
    sections: List[str]
    research_data: List[dict]
    # Report layout
    title: str
    headers: dict
    date: str
    table_of_contents: str
    introduction: str
    conclusion: str
    sources: List[str]
    report: str

As seen above, the state is divided into two main areas: the research task and the report layout content. As data circulates through the graph agents, each agent will, in turn, generate new data based on the existing state and update it for subsequent processing further down the graph with other agents.

We can then initialize the graph with the following:

from langgraph.graph import StateGraph
workflow = StateGraph(ResearchState)

Initializing the graph with LangGraph

As stated above, one of the great things about multi agent development is building each agent to have specialized and scoped skills. Let’s take an example of the Researcher agent using GPT Researcher python package:

from gpt_researcher import GPTResearcher

class ResearchAgent:
    def __init__(self):
        pass
  
    async def research(self, query: str):
        # Initialize the researcher
        researcher = GPTResearcher(parent_query=parent_query, query=query, report_type=research_report, config_path=None)
        # Conduct research on the given query
        await researcher.conduct_research()
        # Write the report
        report = await researcher.write_report()
  
        return report

As you can see above, we’ve created an instance of the Research agent. Now let’s assume we’ve done the same for each of the team’s agent. After creating all of the agents, we’d initialize the graph with LangGraph:

def init_research_team(self):
    # Initialize agents
    editor_agent = EditorAgent(self.task)
    research_agent = ResearchAgent()
    writer_agent = WriterAgent()
    publisher_agent = PublisherAgent(self.output_dir)
    
    # Define a Langchain StateGraph with the ResearchState
    workflow = StateGraph(ResearchState)
    
    # Add nodes for each agent
    workflow.add_node("browser", research_agent.run_initial_research)
    workflow.add_node("planner", editor_agent.plan_research)
    workflow.add_node("researcher", editor_agent.run_parallel_research)
    workflow.add_node("writer", writer_agent.run)
    workflow.add_node("publisher", publisher_agent.run)
    
    workflow.add_edge('browser', 'planner')
    workflow.add_edge('planner', 'researcher')
    workflow.add_edge('researcher', 'writer')
    workflow.add_edge('writer', 'publisher')
    
    # set up start and end nodes
    workflow.set_entry_point("browser")
    workflow.add_edge('publisher', END)
    
    return workflow

As seen above, creating the LangGraph graph is very straight forward and consists of three main functions: add_node, add_edge and set_entry_point. With these main functions you can first add the nodes to the graph, connect the edges and finally set the starting point.

Focus check: If you’ve been following the code and architecture properly, you’ll notice that the Reviewer and Reviser agents are missing in the initialization above. Let’s dive into it!

A Graph within a Graph to support stateful Parallelization

This was the most exciting part of my experience working with LangGraph! One exciting feature of this autonomous assistant is having a parallel run for each research task, that would be reviewed and revised based on a set of predefined guidelines.

Knowing how to leverage parallel work within a process is key for optimizing speed. But how would you trigger parallel agent work if all agents report to the same state? This can cause race conditions and inconsistencies in the final data report. To solve this, you can create a sub graph, that would be triggered from the main LangGraph instance. This sub graph would hold its own state for each parallel run, and that would solve the issues that were raised.

As we’ve done before, let’s define the LangGraph state and its agents. Since this sub graph basically reviews and revises a research draft, we’ll define the state with draft information:

class DraftState(TypedDict):
    task: dict
    topic: str
    draft: dict
    review: str
    revision_notes: str

As seen in the DraftState, we mostly care about the topic discussed, and the reviewer and revision notes as they communicate between each other to finalize the subtopic research report. To create the circular condition we’ll take advantage of the last important piece of LangGraph which is conditional edges:

async def run_parallel_research(self, research_state: dict):
    workflow = StateGraph(DraftState)
    
    workflow.add_node("researcher", research_agent.run_depth_research)
    workflow.add_node("reviewer", reviewer_agent.run)
    workflow.add_node("reviser", reviser_agent.run)
    
    # set up edges researcher->reviewer->reviser->reviewer...
    workflow.set_entry_point("researcher")
    workflow.add_edge('researcher', 'reviewer')
    workflow.add_edge('reviser', 'reviewer')
    workflow.add_conditional_edges('reviewer',
                                   (lambda draft: "accept" if draft['review'] is None else "revise"),
                                   {"accept": END, "revise": "reviser"})

By defining the conditional edges, the graph would direct to reviser if there exists review notes by the reviewer, or the cycle would end with the final draft. If you go back to the main graph we’ve built, you’ll see that this parallel work is under a node named “researcher” called by ChiefEditor agent.

Running the Research Assistant

After finalizing the agents, states and graphs, it’s time to run our research assistant! To make it easier to customize, the assistant runs with a given task.json file:

{
  "query": "Is AI in a hype cycle?",
  "max_sections": 3,
  "publish_formats": {
    "markdown": true,
    "pdf": true,
    "docx": true
  },
  "follow_guidelines": false,
  "model": "gpt-4-turbo",
  "guidelines": [
    "The report MUST be written in APA format",
    "Each sub section MUST include supporting sources using hyperlinks. If none exist, erase the sub section or rewrite it to be a part of the previous section",
    "The report MUST be written in spanish"
  ]
}

The task object is pretty self explanatory, however please notice that follow_guidelines if false would cause the graph to ignore the revision step and defined guidelines. Also, the max_sections field defines how many subheaders to research for. Having less will generate a shorter report.

Running the assistant will result in a final research report in formats such as Markdown, PDF and Docx.

To download and run the example check out the GPT Researcher x LangGraph open source page.

What’s Next?

Going forward, there are super exciting things to think about. Human in the loop is key for optimized AI experiences. Having a human help the assistant revise and focus on just the right research plan, topics and outline, would enhance the overall quality and experience. Also generally, aiming for relying on human intervention throughout the AI flow ensures correctness, sense of control and deterministic results. Happy to see that LangGraph already supports this out of the box as seen here.

In addition, having support for research about both web and local data would be key for many types of business and personal use cases.

Lastly, more efforts can be done to improve the quality of retrieved sources and making sure the final report is built in the optimal storyline.

A step forward in LangGraph and multi-agent collaboration in a whole would be where assistants can plan and generate graphs dynamically based on given tasks. This vision would allow assistants to choose only a subset of agents for a given task and plan their strategy based on the graph fundamentals as presented in this article and open a whole new world of possibilities. Given the pace of innovation in the AI space, it won’t be long before a new disruptive version of GPT Researcher is launched. Looking forward to what the future brings!

To keep track of this project’s ongoing progress and updates please join our Discord community. And as always, if you have any feedback or further questions, please comment below!

Introducing GPT Researcher - The Future of Online Research

After AutoGPT was published, I immediately took it for a spin. The first use case that came to mind was autonomous online research. Forming objective conclusions for manual research tasks can take time, sometimes weeks, to find the right resources and information. Seeing how well AutoGPT created tasks and executed them got me thinking about the great potential of using AI to conduct comprehensive research and what it meant for the future of online research.

But the problem with AutoGPT was that it usually ran into never-ending loops, required human interference for almost every step, constantly lost track of its progress, and almost never actually completed the task.

Nonetheless, the information and context gathered during the research task were lost (such as keeping track of sources), and sometimes hallucinated.

The passion for leveraging AI for online research and the limitations I found put me on a mission to try and solve it while sharing my work with the world. This is when I created GPT Researcher — an open source autonomous agent for online comprehensive research.

In this article, I will share the steps that guided me toward the proposed solution.



Moving from infinite loops to deterministic results

The first step in solving these issues was to seek a more deterministic solution that could ultimately guarantee completing any research task within a fixed time frame, without human interference.

This is when I stumbled upon the recent paper Plan and Solve. The paper aims to provide a better solution for the challenges stated above. The idea is quite simple and consists of two components: first, devising a plan to divide the entire task into smaller subtasks and then carrying out the subtasks according to the plan.

Example inputs and outputs of GPT-3 with (a) Zero-shot-CoT prompting, (b) Plan-and-Solve (PS) prompting

As it relates to research, first create an outline of questions to research related to the task, and then deterministically execute an agent for every outline item. This approach eliminates the uncertainty in task completion by breaking the agent steps into a deterministic finite set of tasks. Once all tasks are completed, the agent concludes the research.

Following this strategy has improved the reliability of completing research tasks to 100%. Now the challenge is, how to improve quality and speed?

Aiming for objective and unbiased results

The biggest challenge with LLMs is the lack of factuality and unbiased responses caused by hallucinations and out-of-date training sets (GPT is currently trained on datasets from 2021). But the irony is that for research tasks, it is crucial to optimize for these exact two criteria: factuality and bias.

To tackle this challenges, I assumed the following:

  1. Law of large numbers — More content will lead to less biased results. Especially if gathered properly.

  2. Leveraging LLMs for the summarization of factual information can significantly improve the overall better factuality of results.

After experimenting with LLMs for quite some time, I can say that the areas where foundation models excel are in the summarization and rewriting of given content. So, in theory, if LLMs only review given content and summarize and rewrite it, potentially it would reduce hallucinations significantly.

In addition, assuming the given content is unbiased, or at least holds opinions and information from all sides of a topic, the rewritten result would also be unbiased. So how can content be unbiased? The law of large numbers. In other words, if enough sites that hold relevant information are scraped, the possibility of biased information reduces greatly. So the idea would be to scrape just enough sites together to form an objective opinion on any topic.

Great! Sounds like, for now, we have an idea for how to create both deterministic, factual, and unbiased results. But what about the speed problem?

Speeding up the research process

Another issue with AutoGPT is that it works synchronously. The main idea of it is to create a list of tasks and then execute them one by one. So if, let’s say, a research task requires visiting 20 sites, and each site takes around one minute to scrape and summarize, the overall research task would take a minimum of +20 minutes. That’s assuming it ever stops. But what if we could parallelize agent work?

By levering Python libraries such as asyncio, the agent tasks have been optimized to work in parallel, thus significantly reducing the time to research.

# Create a list to hold the coroutine agent tasks tasks = [async_browse(url, query, self.websocket) for url in await new_search_urls]  # Gather the results as they become available responses = await asyncio.gather(*tasks, return_exceptions=True)

In the example above, we trigger scraping for all URLs in parallel, and only once all is done, continue with the task. Based on many tests, an average research task takes around three minutes (!!). That’s 85% faster than AutoGPT.

Finalizing the research report

Finally, after aggregating as much information as possible about a given research task, the challenge is to write a comprehensive report about it.

After experimenting with several OpenAI models and even open source, I’ve concluded that the best results are currently achieved with GPT-4. The task is straightforward — provide GPT-4 as context with all the aggregated information, and ask it to write a detailed report about it given the original research task.

The prompt is as follows:

"{research_summary}" Using the above information, answer the following question or topic: "{question}" in a detailed report — The report should focus on the answer to the question, should be well structured, informative, in depth, with facts and numbers if available, a minimum of 1,200 words and with markdown syntax and apa format. Write all source urls at the end of the report in apa format. You should write your report only based on the given information and nothing else.

The results are quite impressive, with some minor hallucinations in very few samples, but it’s fair to assume that as GPT improves over time, results will only get better.

The final architecture

Now that we’ve reviewed the necessary steps of GPT Researcher, let’s break down the final architecture, as shown below:

More specifically:

  • Generate an outline of research questions that form an objective opinion on any given task.

  • For each research question, trigger a crawler agent that scrapes online resources for information relevant to the given task.

  • For each scraped resource, keep track, filter, and summarize only if it includes relevant information.

  • Finally, aggregate all summarized sources and generate a final research report.

Going forward

The future of online research automation is heading toward a major disruption. As AI continues to improve, it is only a matter of time before AI agents can perform comprehensive research tasks for any of our day-to-day needs. AI research can disrupt areas of finance, legal, academia, health, and retail, reducing our time for each research by 95% while optimizing for factual and unbiased reports within an influx and overload of ever-growing online information.

Imagine if an AI can eventually understand and analyze any form of online content — videos, images, graphs, tables, reviews, text, audio. And imagine if it could support and analyze hundreds of thousands of words of aggregated information within a single prompt. Even imagine that AI can eventually improve in reasoning and analysis, making it much more suitable for reaching new and innovative research conclusions. And that it can do all that in minutes, if not seconds.

It’s all a matter of time and what GPT Researcher is all about.

Why Open Source Models May Not Win The AI Race

The rapid progress in open source models, especially language models, has given rise to the belief that these models could pose a significant challenge to incumbent companies such as Google, Facebook and Microsoft.

While the idea of open source models dismantling the power of tech giants is appealing to many, it is important to remain realistic and recognize the major barriers the open source community still faces when trying to contend with proprietary models.

Open source AI: The successful underdog

In recent months, the open source AI community has witnessed significant advancements, starting with Meta’s LLaMA — the first impressive open source LLM that was leaked online. This sparked an influx of innovative research and creative development to rival ChatGPT, such as Falcon, Alpacaand Vicuna.

As seen below, these open source models are catching up with Vicuna almost surpassing Google Bard. Even though it uses fewer parameters than ChatGPT (13B compared to 175B), Vicuna was introduced as an “open source chatbot with 90% ChatGPT quality” and scored well in the tests performed.

One highlight was the community’s solution to a scalability problem with the new fine-tuning technique, LoRA, which can perform fine-tuning at a fraction of their normal costs.

In addition, consulting firm Semianalysis leaked a document by a Google engineer that even considered the open source community a real competitor, arguing that “we have no moat” and that they couldn’t withstand the power of open source labor across the globe.

The strongest case for open source models is the potential of disrupting smaller niche specific domains, where powerful AI may not be needed, and training smaller and cheaper open source models might be good enough. In addition, open source models are easier to customize, provide more transparency into training data, and give users better control over costs, outputs, privacy, and security.

However, beneath the surface, there are several drawbacks to open source AI, revealing that it is far from being a threat to incumbents.

Incumbents have more resources and capital.

During an event in Delhi, Sam Altman, the CEO of OpenAI, was questioned if it would be possible for three Indian engineers with a budget of $10 million to develop a project akin to OpenAI. Altman responded by asserting that it would be practically “hopeless” for such a young team with limited resources from India to create an AI model comparable to OpenAI. He further commented, “We’re going to tell you, it’s absolutely hopeless to compete with us on building foundation models, and you shouldn’t attempt it.”

Now you might be thinking, “well, he has to say that”, but the rationale behind his answer makes total sense. The way OpenAI’s chief scientist Ilya Sutskever explains it in short, is that “the more GPU, the better the models”. Or in other words, the more capital and resources, the better models can get. Ilya believes that academic research has and always will lag behind the powerful incumbents due to a lack of capital and engineering resources.

Since 2021, incumbents have quickly entered the race, offering both closed and open models that currently outperform the smaller ones, as seen below.

The flaws and drawbacks of mimicking LLMs

A critical issue with open source models lies in the algorithms used when trying to compete with proprietary models. A new study from UC Berkeleyhas identified that even though open source AI models have adopted clever techniques, such as fine-tuning their output or better models’ output, there remains a significant “capabilities gap” between open and closed models. Models fine-tuned this way adopt the style of the model they imitate but don’t improve in factuality. As a result, the open source community still faces the daunting challenge of pre-training state-of-the-art models — a task unlikely to be accomplished anytime soon.

Another inherent obstacle for open source AI is the hardware limitation of on-device inference. While the idea of running small and inexpensive customized LMs on local devices (such as computers or smartphones) is appealing, the current hardware limitations render it impossible to match the performance of proprietary server-hosted models like those offered by Google or OpenAI.

Technical challenges like the memory wall and data reuse problems, unresolved hardware trade-offs, and lack of parallelization across queries mean that open source AI models have an upper limit on their achievable efficacy.

Incumbents actually do have a moat

The business strategies executed by incumbent companies provide these firms with undeniable advantages that open source models cannot contend with. Contrary to popular belief, incumbents have powerful moats that protect their position. Not only do they have the financial resources, talent, brand, proprietary data, and power, but they also control the distribution of products that billions of people use daily. Leading tech companies are leveraging these advantages by enhancing their products and making AI an integral part of their offerings.

For example, Google has recently released new AI capabilities to their Search, Gmail, and Docs. Microsoft not only released similar capabilities to Office and Bing, but also partnered with OpenAI to offer a complete suite of cloud services on Azure to help businesses leverage AI at scale. Nonetheless, Microsoft owns GitHub and VStudio, two leading services for software engineers worldwide, and has already made a remarkable leap with Copilot. Amazon was next to follow with new services to AWS, such as Amazon Bedrock and Codewhisperer. If that’s not enough, Nvidia has recently launched DGX Cloud, an AI supercomputing service that gives enterprises immediate access to the infrastructure and software needed to train advanced models for generative AI and other groundbreaking applications.

This is just the tip of the iceberg, with many more examples from market leaders such as Adobe Firefly, Salesforce, IBM, and more.

The theory of the innovator’s dilemma might suggest that under the right circumstances, open source models have a chance to overthrow incumbent companies. The reality is far from this idealized scenario. Generative AI perfectly fits into the product suites and ecosystems that leading technology companies have already established.

On the contrary, start-ups and open source models have a great opportunity in disrupting domain-specific verticals, such as finance, legal, or other professional services, where incumbents are unlikely to compete or focus their efforts, and where open source models may prove to be an advantage, as they can be more customized and optimized for such verticals.

The Uphill Battle Open Source AI Models Continue to Face

In summary, while the progress made in the realm of open source models is indeed impressive, they still face many challenges that prevent them from competing with proprietary models in the long term. It is also fair to assume that proprietary models will only improve over time, becoming much cheaper, faster, customizable, safer, and more powerful.

From algorithmic to hardware restraints and the entrenched advantage of incumbents, open source models face an uphill battle in achieving the level of success and acceptance necessary to unsettle established giants. As AI becomes increasingly commoditized with the products and services offered by incumbent companies, the likelihood of open source models effectively upending the current dynamics diminishes further.

The future of open source models shows great potential, but it is important to maintain a grounded perspective and recognize the immense obstacles that stand in its way.

The Ultimate Tech Stack for Building AI Products

A Peek Into the Tech Stack That Powered My Viral AI Web Application

This is the era of solopreneurs. It has never been easier to build end-to-end AI-powered applications, thanks to the most recent developments in AI and developer-friendly frameworks in particular. What used to be a complex domain, mainly led by experienced data scientists, is now democratized and is as straightforward as calling an API (and will only get better and easier over time). However, with all the noise, hype, and continuous advancements in the field, it is difficult to know where to focus and with what stack.

In July 2022, I released Cowriter, an AI-powered text editor aimed at writers. The product quickly went viral and grew to over 500K users worldwide within a few months. In the process, I learned the challenges and benefits of using various tech stacks at scale. Building impressive AI demos is one thing, but scaling AI applications to millions is another, and where most current frameworks fail.

In this post, I will share with you the entire tech stack used for building my viral AI web application. You may already be familiar with some of this stack, so my goal is not only to introduce them to you, but also to help you gain intuition on why they were optimal for my needs. Let’s dive right in!

OpenAI API

This one is less obvious than you might think. From all the AI services out there, which one is the best? What defines the best? For me, the best is about quality, reliability, security, performance and pricing.

Over the past few years, I have researched numerous AI services, including OpenAI, AI21, Anthropic, Bloom, GPT-J, and more. Recently, there has been a significant increase in open source models, some of which are said to outperform superior models, such as Falcon, Alpaca, Vicuna and Llama. However, these models currently require “LLM/ML OPs” experience, as well as investing in your own hosting and monitoring. As a solopreneur, it is necessary to decide on which parts of the business to focus your efforts.

The conclusions and research of these models deserve their own post. However, the TL;DR is that OpenAI’s GPT models are far superior in terms of performance and reliability when compared to their competitors. Although OpenAI is on the higher end of pricing, you can build powerful AI tools using chatgpt-3.5-turbo for only $0.002 per ~750 words. When you do the math, it is quite remarkable how cheap it is.

In addition, OpenAI offers a fine-tuning API, which is very easy to use. I leveraged fine-tuning to create a model to generate content optimized for SEO and marketing.

Replicate

While building an Editor for blog writers, it was important to leverage image generation as well. While Dalle being the obvious API, I was frustrated by the lack of dimension options (such as 1024x768) which is optimal for blogs, and Dalle is pretty expensive. Also, Dalle only hosts your images for one hour, so you also need to integrate with hosting services such as AWS S3 to keep generated images for the long term. Lastly, I was determined to find an alternative that was more similar to Midjourney’s quality. That’s when I discovered Replicate.

Replicate makes it easy to deploy and host machine learning models, and run models with a few lines of code, without needing to understand how machine learning works. One specific powerful and viral model hosted on Replicate was trained on Midjourney images, which shows some pretty impressive results! Moreover, the model allows you to define multiple widths and heights and the number of outputs. Nonetheless, it’s fairly cheap, much cheaper than Dalle, and you “pay as you go”, based on inference computation time.

Langchain

LangChain is a framework for developing applications powered by large language models (LLMs). It’s an open-source library that provides developers with the tools to build applications using LLMs, such as OpenAI or other similar models. LangChain simplifies the creation of these applications and connects to the AI models you want to work with.

Without a doubt, Langchain is a must for all developers and the definitive leader for developer AI frameworks. Not only does Langchain have the largest community of contributors and support, but it is built by developers for developers. There are numerous examples and snippets online showing “how to build a chatpdf with 10 lines of code”. And it’s true, its that easy to build LLM chains and applications.

Langchain offers various solutions for building AI agents. It even supports the latest autonomous agent solutions such as BabyAGI and AutoGPT. One very powerful and popular feature of Langchain is called ReAct, which aims to leverage LLMs with reasoning and performing actions such as given tools only when needed. To learn more about ReAct and how to use it with Langchain see here.

Pinecone

Vector databases are the hottest kid on the block right now. With investments exceeding $1 billion in funding over the past year, it is essential to comprehend why Vector DBs, such as Pinecone, should be incorporated into almost any AI application.

Vector databases are specialized databases designed to store and manage high-dimensional data represented as vectors. These databases are highly efficient in dealing with complex data and allow for quick searches of similar items based on specific criteria. This is especially important when working with LLMs such as GPT, as they are capable of overcoming the token limit issue.

However, it is important not to be fooled by the idea of “what if there was no token limit?”. When working with AI models and services, you pay per inference, or in other words, you pay per request based on the amount of tokens sent to the model. Sending over 100,000 words per every request would be both slow and expensive. Vector databases help to easily retrieve only data related to the user context and text similarity, thus optimizing relevancy of data and costs.

In addition, due to the nature of vector databases, they can serve as long term memory for AI agents, by retrieving relevant “memories” from past interactions related to the conversation context.

Choosing the right database varies greatly based on many considerations. The leading ones to consider are Pinecone, Chroma, Milvus, Weaviate, Vespa, and Elastic Search. Some are open-sourced, while others are not; some offer customizability while others are aimed at simplicity.

I decided to go with Pinecone due to its simplicity of using a hosted service, which takes care of most of the burden for me. Additionally, Pinecone has a very large community, is well-funded, highly scalable and has been around for long enough (2019).

FastAPI

FastAPI is a modern web framework for building RESTful APIs in Python. It was first released in 2018 and has quickly gained popularity among developers due to its easy use, speed and robustness. A good alternative to FastAPI is Flask, as seen in this example.

It plays a great advantage to build your back-end system with the Python language. Due to the exponential growth of AI development, the Python community has grown massively, with many crucial open sources being built exclusively in Python. Such exclusive open sources vary from application and infrastructure frameworks, to LLMs and vector databases.

Other popular frameworks such as Node.js are lagging behind with a much smaller open source community building AI related tools. With the fast pace development of AI, it’s business critical to stay ahead.

ReactJS

There is a lot of discussion around AI not being a moat for tech startups, since it’s become so powerful and easy to integrate. But I truly believe that building great products, that aim at specific verticals and optimize for user experience, create a significant advantage in a very competitive market. This is why UX/UI is still king and even more crucial than ever when building your AI application.

You might’ve heard already about Streamlit or Gradio which are python frameworks for building data driven web interface applications. There is a lot of noise on frameworks like these for easily building web applications on top of backend code that run some AI functionality. However, these frameworks usually go as far as great demos, but don’t scale. They don’t scale both in terms of the UX/UI you can actually build and in terms of performance.

As a full stack developer, nothing beats good old React, or its more advance frameworks such as Next.js and Vue.js. By leveraging the React community, you can build super powerful user experiences that are critical for optimizing AI tailored for user needs.

Slate

As already mentioned, Cowriter is a rich text editor, so I had to find the best React framework for building editors. This was probably the longest research of all, since there is no definitive winner. Research included hands on trial and error with several frameworks until I found the one. Each editor optimizes for different business goals. I researched Draft.js, Editor.js, Slate.js, Plate.js, Quill and more.

For example, if you are looking for the stability of a large organization behind your project and a fairly wide adoption, then Draft.js from Facebook is the way to go. However, features such as nesting and collaborative editing may be difficult to achieve. Alternatively, Editor.js offers a range of features out of the box, and is a great option if you want to get started quickly. If ultimate control and support is what you are after, and the possibility of collaboration is also desirable, then Slate.js is the way to go, but be prepared to do some extra work to get set up. Finally, Plate.js was built on top of Slate.js and offers a wide range of plugins for powerful features, but may not be the best choice if you are looking to create customisable AI experiences.

Ultimately, Slate.js was the best choice for me, since it has a great community and is fully customizable for building powerful editors. With AI-powered editors, it is important to create experiences such as automated writing, colorful animated gestures, selection toolbox, markdown support, DOM manipulation, adding elements such as images, etc.

Netlify

Netlify is a cloud computing platform that automates the deployment of web projects. It enables developers to build, deploy, and manage modern websites and web applications more easily and efficiently. Netlify also provides other features such as continuous integration and deployment (CI/CD), serverless functions for dynamic content, form handling, and more.

Overall, Netlify simplifies the process of web development and hosting, making it more accessible and efficient for developers and businesses.

I’ve recently been exploring different deployment options, and I was pleasantly surprised to discover how easy and efficient it is to work with Netlify. Not only that, but I was amazed to find that performance didn’t suffer at all compared to other options I’ve used, like deploying directly on AWS. And the best part? It’s completely free for single projects!

Stripe

Stripe is an online payments processing platform that allows to accept payments over the internet. It provides a suite of APIs and software tools that enable businesses to securely accept and manage online payments. Stripe also offers additional features, such as customizable checkout pages, fraud detection, subscription billing, and more.

I was looking for the quickest time to market with payment integration, and I found Stripe’s incredible latest feature Checkout. Checkout is a prebuilt, hosted payment page that can be configured with no code in only a few minutes. I was able to go from no monetization to a great payment checkout flow in less than a day of work.

There are many more frameworks to consider in your technical stack such as HuggingFace, Supabase, LLamaIndex and much more. It all depends on the product, vertical, time to market, resources and scale you’re aiming for. For me, it was important to choose a stack that enables me to succeed as a sole developer, while focusing mostly on building a great product and supporting customers, without the hassle of dealing with infrastructure, scale, performance, non-strategic assets, etc. The more resources and funding you have, the more creative you can be with choosing the best stack for you.

Feel free to drop a comment below if you have any questions, or if you’re wondering what would be the best stack for you.

Action-Driven LLMs: The Future of Chatbot Development is Here

How to build AI chatbots without labeling intents and designing endless chat flows.

Imagine if you could create state-of-the-art contextual AI chatbots for your business needs by simply providing it with a list of tools and a policy. Sounds magical right? Well, guess what, that future is here and I’m going to show you how.

When you think about how it works with human agents, it makes sense. When human support agents onboard a new customer care team, they are provided with a set of tools (such as a knowledge base, CRM, communication channels, etc) and a defined policy (how to deal with various interaction use cases). Once all that is learned, the agent is now ready to start interacting with users. In theory, why should that be any different for chatbots?

How most AI chatbots are built today

As an experienced chatbot developer, I can say one thing for sure — current chatbot development practices don’t scale. When you just start, it’s fairly easy. It usually goes something like this:

  1. Define a list of intents that are most likely to be brought up by a user in any interaction. For example, in travel, you’d have intents like searching for flights or modifying existing bookings.

  2. Collect different utterances that relate to the those intents.

  3. Train an NLU model to best predict intents based on user utterances.

  4. Repeat.

If that’s not enough, you then need to design and build various deterministic conversation flows that are triggered based on detected intents. Very quickly, this process gets hard to manage, as more and more intents add up, disambiguating similar intents becomes almost impossible, and the conversation tree gets too deep and wide to maintain.

This approach may work for more simple chatbot use cases but can become very challenging to maintain and scale when dealing with more complex scenarios that require multiple and strongly coupled layers of chat flows and intent understanding.

LLMs to the rescue

With the latest rise of LLMs (large language models), it has now become possible to build smart, dynamic, and capable context-aware chatbots with no need for most of the processes defined above. While LLMs have demonstrated impressive performance across tasks in language understanding and interactive decision-making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics.

A recent paper called ReAct (Reasoning and Acting) aims to tackle exactly that. The paper explores the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for improved performance, human interpretability, and trustworthiness on a diverse set of language and decision-making tasks. ReAct, the approach presented, achieves impressive results on question answering, fact verification, and interactive decision-making, outperforming state-of-the-art baselines with improved human interpretability and trustworthiness.

The basic idea of this approach is that based on user input, the LLM decides whether a tool is required to answer the query or not. If so, the LLM will decide which of the given tools is best for helping with an answer. It then triggers the selected tool, gets an answer, and then decides if the answer suffices. If not, repeat.

The future of chatbot development

For demonstrating this new and disruptive approach, let’s assume we want to build a weather chatbot. The new proposition for chatbot development goes like this:

  1. Define a set of tools required for achieving the chatbot task (like getting a weather forecast based on location). Also, describe how and when each tool should be used.

  2. Define a policy for how the chatbot should behave. For example, to be polite, always respond with a follow-up question, etc.

  3. Feed it to an LLM as a prompt for every user input.

And that’s it. Let’s see it in action using a python library called Langchain. You can find the full gist here. Please note, you’ll need a valid OpenAI and SerpApi key to get it running.

We first need to define our tools. For simplicity, we will use SerpApi (Google search) for retrieving weather information, but assume any other more specific weather API can be used instead.

search = SerpAPIWrapper()
tools = [
    Tool(
        name = "Weather Forecast Tool",
        func=search.run,
        description="useful for when you need to answer questions about current and future weather forecast"
    ),
]

As seen above, we create a list of tools, where for each tool we provide the name, trigger function, and a description. All of these are important because they help the LLM better decide if and what tool to use.

Now we can define the interaction policy so the agent knows how to react to various scenarios. When using Langchain, it is currently required to pass a JSON file that holds the agent metadata such as the policy. The policy items are defined in the ‘suffix’ key as seen below.

{
  "load_from_llm_and_tools": true,
  "_type": "conversational-react-description",
  "prefix": "Assistant is a large language model trained for forecasting weather.\n\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives.\nAssistant should only obtain knowledge and take actions from using tools and never on your own.\nTOOLS:\n------\n\nAssistant has access to the following tools: ",
  "suffix": "Please make decisions based on the following policy: \n- If the user is asking for a weather forecast use the Weather Forecast tool\n- If the user does not provide a location, ask before checking for weather\n- Apologize if the user is angry or showing frustration\n- Answer with a friendly and professional tone.\n- Always end a response with a follow up question like 'what else can i help you with?', unless the user shows gratitude.\nBegin!\n\nPrevious conversation history:\n{chat_history}\n\nNew input: {input}\n{agent_scratchpad}",
  "ai_prefix": "AI Agent",
  "human_prefix": "Human"
}

Let’s look at some of the interesting defined policy items:

  • If the user does not provide a location, ask before checking for the weather

  • Apologize if the user is angry or showing frustration

  • Always end a response with a follow up question

What’s incredibly groundbreaking is that detecting sentiment and acting upon it is as simple as writing it. Detecting sentiment is a somewhat hard problem which today requires using external tools and resources. The same for missing entities such as location. Let’s move on.

memory = ConversationBufferMemory(memory_key="chat_history")

Based on Langchain docs, we need to create a ConversationBuffer object for storing chat history to enable context awareness.

llm=OpenAI(temperature=0)
agent_chain = initialize_agent(tools, llm, agent="conversational-react-description", verbose=True, memory=memory)

Finally we initialize an agent using an LLM (here we use OpenAI GPT3). And that’s it! We can now execute our agent. Let’s take it for a spin.

>> Hey, I am Assaf

Thought: Do I need to use a tool? No
AI Agent: Hi Assaf, how can I help you today?

So far so good, the LLM understood that a Tool is not required, and the response contains a follow up question.

>> What is my name?

Thought: Do I need to use a tool? No
AI Agent: That is a great question! Your name is Assaf, is there anything else I can help you with?

Nice! It remembers my name. Let’s try asking for the weather without providing a location.

>> What is the weather?

Thought: Do I need to use a tool? No
Observation: I need to ask for a location
AI Agent: It looks like you're asking about the weather. Can you provide me with a location so I can give you a more accurate forecast?

Incredible! The LLM made the right decision and asked me to provide a location. Let’s see context will remain if I just reply with “new york”.

>> new york

Thought: Do I need to use a tool? Yes
Action: Weather Forecast Tool
Action Input: new york
Observation: New York City comprises 5 boroughs sitting where the Hudson River meets the Atlantic Ocean. At its core is Manhattan, a densely populated borough that’s among the world’s major commercial, financial and cultural centers. Its iconic sites include skyscrapers such as the Empire State Building and sprawling Central Park. Broadway theater is staged in neon-lit Times Square. ― Google
Thought: Do I need to use a tool? No
AI Agent: The current weather in New York is mostly cloudy with a high of 68°F and a low of 54°F. Is there anything else I can help you with?

And there you have it. The LLM was contextually aware and correctly collected the right information and triggered the weather forecast tool, and finally returned a correct forecast followed by a follow-up question as defined in the policy.

The full gist is available here.


What I’ve demonstrated is just the tip of the iceberg. Imagine replacing weather forecasts with actual tools required for assisting users with real-life support issues. And even further, with the speed of AI advancements, imagine how incredible and deterministic this approach will only improve over time.

But don’t be fooled, there is a long way until this approach will be deterministic enough to work automatically at a large scale. With current LLM limitations, it might be too difficult to deal with those edge cases such as prompt injection, “hallucinations” and memory limits. Also, it’s currently very hard to monitor the quality of responses without manually reviewing each of them.

With the simplicity and nature of this approach, it’s easy to believe that it is most likely to be the future of chatbot development. Oh wait, it’s already here.

Generative AI: The ultimate beginner's guide

This image was created by Assaf Elovic and Dalle

Why you should care about the generative AI revolution

This summer has been a game-changer for the AI community. It’s almost as if AI has erupted into the public eye in a single moment. Now everyone is talking about AI — not just engineers, but Fortune 500 executives, consumers, and journalists.

There is enough written about how GPT-3 and Transformers are revolutionizing NLP and their ability to generate human like creative content. But I’ve yet to find a one stop shop for getting a simple overview of its capabilities, limitations, current landscape and potential. By the end of this article, you should have a broad sense of what the hype is all about.

Let’s start with basic terminology. Many confuse GPT-3 with the more broad term Generative AI. GPT-3 (short for Generative Pre-trained Transformer) is basically a subset language model (built by OpenAI) within the Generative AI space. Just to be clear, the current disruption is happening in the entire space, with GPT-3 being one of its enablers. In fact, there are now many other incredible language models for generating content and art such as Bloom, Stable Diffusion, EleutherAI’s GPT-J, Dalle-2, Stability.ai, and much more, each with their own unique set of advantages.

The What

So what is Generative AI? In very short, Generative AI is a type of AI that focuses on generating new data or creating new content. This can be done through a variety of methods, such as creating new images or videos, or generating new text. As the saying goes, “a picture is worth a thousand words”:

Jason Allen’s A.I.-generated work, “Théâtre D’opéra Spatial,”

The image above was created by AI and has recently won the Colorado State Fair’s annual art competition. Yes, you read correctly. And the reactions did not go well.

If AI can generate art so well that not only is it not distinct from “human” art, but also good enough to win competitions, then it’s fair to say we’ve reached the point where AI can now take on some of the most challenging human tasks possible, or as some say “create superhuman results”. 

Another example is Cowriter.org which can generate creative marketing content, while taking attributes like target audience and writing tone into account. 

cowriter.org - creative marketing content generator

In addition to the above examples, there are hundreds if not thousands of new companies leveraging this new tech to build disruptive companies. The biggest current impact can be seen in areas such as text, video, image and coding. To name a few of the leading ones, see the landscape below:

However, there are some risks and limitations associated with using generative models. 

One risk is that of prompt injection, where a malicious user could input a malicious prompt into the model that would cause it to generate harmful output. For example, a user could input a prompt that would cause the model to generate racist or sexist output. In addition, some users have found that GPT-3 fails when prompting out-of-context instructions as seen below:

Another risk is that of data leakage. If the training data for a generative model is not properly secured, then it could be leaked to unauthorized parties. This could lead to the model being used for malicious purposes, such as creating fake news articles or generating fake reviews. This is a concern because GPT models can be used to generate text that is difficult to distinguish from real text.

Finally, there are ethical concerns about using generative models. For example, if a model is trained on data from a particular population, it could learn to biased against that population. This could have harmful consequences if the model is used to make decisions about things like hiring or lending.

The Why

So why now? There are a number of reasons why generative AI has become so popular. One reason is that the availability of data has increased exponentially. With more data available, AI models can be better trained to generate new data that is realistic and accurate. For example, GPT3 was trained on about 45TB of text from different datasets (mostly from the internet). Just to give you an intuition of how fast AI is progressing —  GPT2 (the previous version of GPT) was trained on 8 million web pages, only one year before GPT-3 was released. That means that GPT-3 was trained on 5,353,569x more data than it’s predecessor a year earlier.

Another reason is that the computing power needed to train generative AI models has become much more affordable. In the past, only large organizations with expensive hardware could afford to train these types of models. However, now even individuals with modest computers can train generative AI models. 

Finally, the algorithm development for generative AI has also improved. In the past, generative AI models were often based on simple algorithms that did not produce realistic results. However, recent advances in machine learning have led to the development of much more sophisticated generative AI models such as Transformers.

The How

Now that we understand the why and what about generative AI, let’s dive into the potential use cases and technologies that power this revolution.

As discussed earlier, GPT-3 is just one of many solutions, and the market is rapidly growing with more alternatives, especially for free to use open source. As history shows, the first is not usually the best. My prediction is that within the next few years, we’ll see free alternatives so good that it’ll be common to see AI integrated in almost every product out there. 

To better understand the application landscape and which technologies can be used, see the following landscape mapped by Sequoia:

Generative AI Application landscape, mapped by Sequoia

As described earlier, there are already so many alternatives to choose from, but OpenAI is still leading the market in terms of quality and usage. For example, Jasper.ai (powered by GPT-3) just raised $125M at a $1.5B valuation, surpassing OpenAI in terms of annual revenue. Another example is Github, which released Copilot (also powered by GPT-3), which is an AI assistant for coding. OpenAI has already dominated three main sectors: GPT-3 for text, Dalle-2 for images and Whisper for speech. 

It seems that the current headlines and business use cases are around creative writing and images. But what else can generative AI be used for? NFX has curated a great list of potential use cases as seen below:


NFX (including myself) believe that generative AI will eventually dominate almost all business sectors, from understanding and creating legal documents, to teaching complex topics in high education. 

More specifically, the below chart illustrates a timeline for how Sequoia expect to see fundamental models progress and the associated applications that become possible over time:

Based and their prediction, AI will be able to code products from simple text product descriptions by 2025, and write complete books by the end of 2030.

If art generated today is already good enough to compete with human artists, and if generated creative marketing content can not be differed from copywriters, and if GPT-3 was trained on 5,353,569x more data than its predecessor only a year later, then you tell me if the hype is real, and what can we achieve ten years from today. Oh, and by the way, this article’s cover image and title were generated by AI :).

I hope this article provided you with a simple yet broad understanding of the generative AI disruption. My next articles will dive deeper into the more technical understanding of how to put GPT-3 and its alternatives into practice.

Thank you very much for reading! If you have any questions, feel free to drop me a line in the comments below!

How to build a URL text summarizer with simple NLP

To view the source code, please visit my GitHub page.

Wouldn’t it be great if you could automatically get a summary of any online article? Rather you’re too busy, or have too many articles in your reading list, sometimes all you really want is a short article summary. 

That’s why TL;DR (too long didn’t read) is so commonly used these days. While this internet acronym can criticize a piece of writing as overly long, it is often used to give a helpful summary of a much longer story or complicated phenomenon. While my last piece focused on how to estimate any article read time, this time we will build a TL;DR given any article.

Getting started

For this tutorial, we’ll be using two Python libraries:

  1. Web crawlingBeautiful Soup. Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

  2. Text summarizationNLTK (Natural Language Toolkit). NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

Go ahead and get familiar with the libraries before continuing, and also make sure to install them locally. Alternatively, run this command within the project repo directory

pip install -r requirements.txt

Next, we will download the stopwords corpus from the nltk library individually. Open Python command line and enter:

import nltknltk.download("stopwords")

Text Summarization using NLP

Lets describe the algorithm:

  1. Get URL from user input

  2. Web crawl to extract the page text from the HTML page (by paragraphs <p>).

  3. Execute the summarize frequency algorithm (implemented using NLTK) on the extracted text sentences. The algorithm ranks sentences according to the frequency of the words they contain, and the top sentences are selected for the final summary.

  4. Return the highest ranked sentences (I prefer 5) as a final summary.

For part 2 (1 is self explanatory), we’ll develop a method called getTextFromURL as shown below:

def getTextFromURL(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")
    text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
    return text

The method initiates a GET request to the given URL, and returns the text from the HTML page.

From Text to TL;DR

We will use several methods here including some that are not included (to learn more see code source in repo).

def summarizeURL(url, total_pars):
    url_text = getTextFromURL(url).replace(u"Â", u"").replace(u"â", u"")
    fs = FrequencySummarizer()
    final_summary = fs.summarize(url_text.replace("\n"," "),       total_pars)
    return " ".join(final_summary)

The method calls getTextFromURL above to retrieve the text, and clean it from HTML characters and trailing new lines (\n). 

Next, we execute the FrequencySummarizer algorithm on a given text. The algorithm tokenizes the input into sentences then computes the term frequency map of the words. Then, the frequency map is filtered in order to ignore very low frequency and highly frequent words, this way it is able to discard the noisy words such as determiners, that are very frequent but don’t contain much information, or words that occur only few times. To see the code source click here.

Finally, we return a list of the highest ranked sentences which is our final summary.


Summary

That’s it! Try it out with any URL and you’ll get a pretty decent summary. There are many summarization algorithms which have been proposed in recent years (such as TF-IDF), and there’s much more to do in this algorithm. For example, go ahead and improve the filtering of text. If you have any suggestions or recommendations, I’d love to hear about them so comment below!

How to Evaluate Startup Ideas — Part 1

After graduating from Y Combinator Startup School and building several startups, with the recent reaching over 5M users, I’ve been eager to share lessons learned when evaluating startup ideas.

According to a TED talk by Bill Gross, founder of Idealab, timing accounts for 42% of a startup’s success. He looked at five factors (funding, business model, team, idea, timing) that he believed are important, to get the one factor that accounts for a startup’s success. After analyzing the five factors on hundreds of successful startups, funding and idea only accounted for 14%-28% while timing hit the jackpot.

1*Xn9GYep9bMhdtpca36HLIw.jpeg

Part of why many great ideas fail is because the market wasn’t ready. Or in other words, it took too much time and money to educate the market to believe they needed what was being offered. A rule of thumb I go by is that you can’t change or manipulate the market. Not as a small fish at least.

If timing is the biggest factor for success, how can we evaluate when is good timing?

One of my favorite metrics for evaluating the readiness of a market is called the Knowledge Spectrum as seen below.

1*2G9NtlhaT2JF6CX0hA2MZA.png

The spectrum represents all the knowledge in a certain field or market. The far left, represents no knowledge at all, while the far-right represents all knowledge or expertise. As you can assume, nearly anyone knows all there is to know in a specific field, but at most times there is too little knowledge.

The ‘current’ pointer represents where you assume the average user is current in regards to his knowledge, and the ‘target’ pointer represents the knowledge required to want your product solution. The knowledge gap represents the amount of education needed to get the user from his current knowledge state to the target.

Too little knowledge signals it would be very expensive to convert users to your product idea, therefore it’s more likely to fail. Too much knowledge might signal it’s already too crowded of a market. Therefore, your startup’s idea has to fall somewhere in the middle where a knowledge gap exists but is bridgeable within a short time frame of education (like a landing page). You need to assume with high confidence that your average user is currently placed close to the target (your solution).

Your startup’s idea has to fall somewhere in the middle where a knowledge gap exists, but it’s bridgeable within a short time frame of education

There are many examples of successful startups that had good timing on their side (Facebook), or that were too late in the game (Windows Mobile). So let’s give some interesting examples of startups that were too early, to emphasize the importance of market education. WebVan provided online grocery shopping in the late 1990s. Around the-dot com bubble, consumers were still not educated in using the internet for on-demand services(knowledge gap was too wide). WebVan burned through almost $400M in venture funding, only to crash and burn, yet ordering groceries online is commonplace today.

Another good example is Dodgeball which offered social location sharing via text messages as early as 2003. The company was acquired but eventually shut down. Checking into locations is now common behavior on Facebook and similar platforms.

The Knowledge Spectrum has a lot of depth to it and can be used in other areas as well. For example, you can use it to improve conversion rates within your landing page. By understanding where your average user’s current knowledge is, you can set the target (the landing page content) such that the knowledge gap is small. If you provide too much knowledge or set the target too high, conversion rates will significantly decrease since it’ll take too much energy from your users to understand what you’re selling, therefore they’ll most likely churn.

There are so many factors to take in mind when evaluating a startup idea. But one factor that is hardly measurable happens to be a very important one — luck (or bad luck). For example, think of all the startups that launched right before the great recession, or those that launched right before a hype (like cryptocurrencies). The metrics we use to measure an idea, are simply there to help asses the risk — the probability of failure or success.

In my next articles, I’ll show more popular metrics and things to consider when evaluating startup ideas, so stay tuned!

How to build a responsive landing page in 10 min

Build a responsive layout using mainly CSS (Grid and Flex)

To skip the tutorial, feel free to download the source code template from my Github repo here.

There are quite a few templates and tutorials out there for building landing pages. However, most tend to overcomplicate or add heavy design (such as multiple pages, forms, etc), to what in most cases requires very simple and lean design. Moreover, I’ll be showing you how to use mainly CSS (Grid and Flex) to create a responsive UI, instead of using old school CSS libraries (such as bootstrap). So let’s get to it! 💪

We’re going to build a basic landing page layout, and focus mainly on the fundamentals so you can hopefully take it from there on to the landing page you desire. Here’s an example of the result:

lp4.gif

The page will be constructed of four main components: Navigation bar, cover image, a grid of cards and finally the footer. 

The index.html is pretty straight forward, which contains mainly div tags and overall page structure:

<!DOCTYPE html>
<html>
<head>
  <title>Basic LP Layout</title>
  <link rel="stylesheet" type="text/css" href="./style.css">
</head>
<body>
  <nav class="zone blue sticky">
      <ul class="main-nav">
          <li><a href="">About</a></li>
          <li><a href="">Products</a></li>
          <li><a href="">Our Team</a></li>
          <li class="push"><a href="">Contact</a></li>
  </ul>
  </nav>
  <div class="container">
      <img class="cover" src="img/cover.jpg">
      <div class="coverText"><h1>Making the world a better place</h1></div>
  </div>
  <div class="zone blue grid-wrapper">
      <div class="card zone">
          <img src="./img/startup1.jpg">
          <div class="text">
              <h1>Team play</h1>
              <p>We work together to create impact</p>
              <button>Learn more</button>
          </div>
      </div>
    <div class="card zone"><img src="./img/startup2.jpg">
        <div class="text">
            <h1>Strategy</h1>
            <p>Every goal is part of our strategy</p>
            <button>Learn more</button>
        </div>
    </div>
    <div class="card zone"><img src="./img/startup3.jpg">
        <div class="text">
            <h1>Innovation</h1>
            <p>We're focused on thinking different</p>
            <button>Learn more</button>
        </div>
    </div>
  </div>
  <footer class="zone"><p>2019 Assaf Elovic All right reserved. For more articles visit <a href="www.assafelovic.com">www.assafelovic.com</a></p></footer>
</div>
</body>
</html>

Therefore, we’ll focus strictly on the styling (CSS). If you’re new to html and page structure, click hereto learn the basics before moving forward.

Styling layouts with Grid and Flex

Rule of thumb: Use Grid when you need grid-like views such as tables, cards, album media (like in Instagram). In all other use cases consider using Flex. I highly recommend diving deeper into both, since once you master them, you won’t need much else to create beautiful responsive web pages.

Navigation bar

We’ll use flex so we have a one direction row as needed for our navigation bar. Since we’re using a <nav> tag, we want to remove the dots (list-style). Finally, we’d like to remove any default margins set by the browser, so we reset margin: 0.

.main-nav {
    display: flex;
    list-style: none;
    margin: 0;
    font-size: 0.7em;
}

When changing the width size of our browser, we can see some of the nav bar cut out, so we need to modify how it’ll look when the width is smaller:

@media only screen and (max-width: 600px) {
    .main-nav {
        font-size: 0.5em;
        padding: 0;
    }
}

We’d like the ‘Contact’ option to stick to the right so we set margin-left to ‘auto’. This will automatically set a maximum margin to the left of the hyperlink:

.push {
    margin-left: auto;
}

Finally, we’d like the navigation bar to stay sticky and always appear at the top of the page. Also, we’d like it to appear above all other elements (z-index):

.sticky {
  position: fixed;
  z-index: 1;
  top: 0;
  width: 100%;
}

Cover

We use Flex since we want to keep things simple (just center content). After setting our display to flex, justify-content centers content horizontal (X axis) within the container and align-items centers vertical (Y axis). We want the image to fit the entire screen so we set the height to 100vh, which means 100% of view height:

.container {
    height: 100vh;
    display: flex;
    align-items: center;
    justify-content: center;
}

Also, we’d like the cover text to appear above the image and in the center:

.coverText {
    position: absolute;
    left: 50%;
    top: 50%;
    transform: translate(-50%, -50%);
    color: white;
    font-weight: bold;
}

See the full CSS style sheetfor additional minor adjustments.

Grid of cards

As described above, we want to create a grid of cards so we’ll use Grid this time. grid-template-columns sets the style of each column (or div). FYI: If we were to just set 1fr, we would see just one block per column. So we set it to repeat (just like typing 1fr 1fr …) and autofill display with anything from min 350px to whole screen (1fr). Finally, we set the gri-gap (padding between grid objects) to 20px:

.grid-wrapper {
    display: grid;
    grid-template-columns: repeat(auto-fill, minmax(350px, 1fr));
    grid-gap: 10px;
}

Next, we’ll style each cardwithin the grid. The below is pretty straight forward for setting the margin per card and background color:

.card {
    background-color: #444;
    margin: 50px;
}

We’d like each card to have an image fit the entire top area, and under it have a title and paragraph and a button for ‘read more’. Also, we just want to manipulate images, titles and paragraphs within the class card, so we set it with the below:

.card > img {
    max-width: 100%;
    height: auto;
}

.card h1 {
    font-size: 1.5rem;
}

.card p {
    font-size: 1rem;
}

While the image fits 100% of the card’s width, we’d like to add some nice padding to the text area of the card:

.card > .text {
    padding: 0 20px 20px;
}

Finally, we set the button design that appears within each card. We’ll set the border to 0 (since default appears with a border), and add some padding, color, etc:

button {
    cursor: pointer;
    background: gray;
    border: 0;
    font-size: 1rem;
    color: white;
    padding: 10px;
    width: 100%;
}

button:hover {
    background-color: #e0d27b;
}

Footer

Last but not least, our footer is pretty straight forward. We’d like the inner text to be smaller than default, and pad the footer with some color:

footer {
    text-align: center;
    padding: 3px;
    background-color: #30336b;
}

footer p {
    font-size: 1rem;
}

That’s it! Following this simple responsive layout, you can further build almost any landing page your heart desires. To take this layout to the next level with some amazing animation libraries, here are some of my favorites:

  1. Sweet Alert — Add stylish alerts

  2. Typed.js — Add typing animation to your headers.

  3. Auroral — Add animated gradient backgrounds.

  4. Owl Carousel — Add animation to you elements

  5. Animate.css — Add styling animations upon loading elements.

To download the full source code click here!

What I wish I knew when I became VP R&D of a large startup

When I started my position as VP R&D at a growing startup, I thought my biggest challenges would be mainly technical. That is, provide state of the art architecture solutions, conduct code reviews, help solve complex algorithmic challenges and maybe even push some code. That’s because my previous technical leading positions were at small startups, where anyone who can code is needed when dealing with limited resources in a fast pace environment.

A few mistakes straight into the position, made me realize very quickly that leading an R&D of over 20 employees requires a whole different set of skills and focus. Just to clarify, my team was constructed of five divisions — Front End, Back End, Mobile, Research (mostly machine learning) and QA. So here are 6 lessons I wish I knew when starting this positions:

1. Don’t be the Hero

I truly agree with the saying “You’re only as successful as your team”. As engineers, we are constantly striving to be the ones solving complex problems, or in other words, being the hero. As the leader of your R&D, your job is to have a capable team that can solve any challenge on their own, without you. The more you try and solve for them, the more they’ll rely on you for future challenges. 

“You’re only as successful as your team”

I found this rule to be very hard to accomplish since sometimes it feels much more effective to bring out a solution from experience, than have your team research for days on days. However, down the line, it’s proven to be the most valuable lesson I’ve learned. With an independent and capable team, you’ll have much more time and focus to push and improve areas of your R&D that only you are capable of.

2. Get to know your people

Understanding what empowers and motivates your team members is power. It’s important to remember that every person is different: they need different things, have different communication styles, and focus on different things. Get to know what motivates each of your team members and what are their passions and career goals. This way you can leverage tasks and responsibilities based on it and maximize productivity and motivation within your team. It’ll also help retain your employees and make them feel more resourceful and self-fulfilled.

I schedule a weekly 1on1 meeting with each team leader and a monthly 1on1 with the rest of my team. In these meetings, I try to focus mainly on the personal level. Some meetings are very short, and some suddenly take hours. This policy provides me with a constant pulse of my teams’ status and motivation level, allowing me to prioritize who and when needs that extra push and attention. And believe me, there was always someone who needed it.

3. Never be a Bottleneck

The first mistake I did into the position, was took on a coding task. Coding is my comfort zone which is why I probably fell back to it. Very quickly I was flooded with unexpected top priority issues to deal with, hours and hours of staff, business and product meetings, not finding almost a single hour to focus on the coding task. Even when I did find some time, we all know coding requires getting “into the zone”, which was hard when constantly getting interrupted. In the end, I was creating a bottleneck in my own team, which almost delayed deployment. 

I am still amazed at how many unexpected issues can occur on a daily basis. From HR and external relations to technical and political company challenges. As a leader, you should make sure you’re always available to deal with urgent issues. If you take on yourself tasks such as deployment, you’ll either be risking being a bottleneck or not have enough time to deal with urgent tasks that only you can help solve effectively.

4. Always be the technical voice

Although you should understand and pursue business requirements as a leader in the company, there are enough people in high ranks prioritizing business and marketing needs over your R&D needs. You should trust that others will try and push needs which might contradict your teams’ needs. Your job is to be the technical voice within the staff. You should make sure all R&D requirements and needs are considered with the staff while taking other requests into consideration. In the end, if you don’t, your internal priorities will be constantly pushed down and your team will feel neglected.

For example, my team and I greatly wanted to migrate our services to AWS, from a private cloud. This would require a 40% capacity of my team for 3 months. It was immediately translated within the staff to 3 months of time that could be focused on building more product and marketing features to help the business grow. Then how do you prove what’s more important? If I was to take their approach, I would risk retaining my top talent engineers, have challenges acquiring new talent, suppress innovation and eventually decrease team motivation. By politically succeeding in selling the business advantages of the migration (it’ll reduce costs and speed up development) it was finally approved. Six months later, my top talent is still retained and motivated, deployment speed has risen by 50%, costs have been reduced by 65% (by taking advantage of AWS services such as Spot instances) and we’ve even got $100K in credits.

5. Focus on building processes and strategy

As someone who’s constantly aware of both the high-level business needs, and internal requirements and pains, you’re in the best position for focusing on building external processes. When I started the position, the first process I focused on, was how to conduct code reviews. While this is something that might be expected of a VP R&D, it is a process my team can probably build better than me. Since they face the daily challenges of deployment, understand each other’s styling preferences and coding standards, this was definitely an internal process I could give my team leaders to build, while I’d focus on external processes. Also, by having your team lead such internal processes, you’d increase overall engagement, sense of responsibility that would ultimately lead to more initiatives within your team.

An example of an external process would be the delivery process between the product and R&D team. Each division has its own requirements, culture, and needs. I’d conduct meetings with the VP Product, interview product managers, and my team leaders, to fully understand how to build a process that’s aligned with everyone’s needs and maximize delivery productivity. Only you have the time and resources to fully understand and see the high-level picture of what’s needed to accomplish such cross-functional external processes.

6. Always strive to be clear and aggressive about estimates

We’ve all been asked this questions before — “how long will this take?”, “why is it taking so long?”, etc. Surprisingly, we are often asked this kind of questions when things are not taking any longer than estimated. We are often asked these questions when our peers, either didn’t really like the original estimation or didn’t ask for it in the first place, and now they’re upset, despite nothing going wrong. Therefore, you must always be aggressive about sharing estimates and updating accordingly, even when people don’t ask. It is your job to make it clear, as best you can, what “long” actually is by providing your best view into the timescale of a project, and proactively updating that view when it changes.

Nonetheless, you should also be aggressive about getting estimates from your team, and constantly strive to improve their estimation process and instincts. Try to take part in estimation meetings, and don’t be afraid to challenge their input and cut scope toward the ends of projects in order to make important deadlines. Your role in these meetings may be to play a tiebreaker and make decisions about which features are worth cutting, and which features are essential to the project’s success.

Despite requiring several skills that are different from other technical roles, there are also some similarities. For example, just like in engineering, where each task brings on new unexpected challenges and constraints to deal with, so is here. In my opinion, the main skill for success in such a role (and I guess in almost any other role) is being able to embrace new, unexpected and dynamic challenges, while constantly striving to provide out of the box efficient and analytical solutions. 

I have no doubt many more lessons are to be learned as I continue my journey in such roles, so I hope to continue sharing them with you.

How to build a code review process

The ultimate guide for building your team’s code review process

After conducting hundreds of code reviews, leading R&D teams and pushing several unintentional bugs myself, I’ve decided to share my conclusions for building the ultimate code review process for your team.

This article assumes you know what is a code review. So if you don’t, click herefor a great intro.

Let’s quickly state some straight forward reasons as to why you should do code reviews:

  1. Can help reduce bugs in code.

  2. Validate that all coding requirements have been filled.

  3. An effective way to learn from peers and get familiar with the code base.

  4. Helps maintain code styling across the team.

  5. Team cohesion — encourage developers to talk to each other on best practices and coding standards.

  6. Improves overall code quality due to peer pressure.

However, code reviews can be one of the most difficult and time-consuming partsof the software development process.

We’ve all been there. Either you might have waited days until your code was reviewed, or once it was reviewed you started a ping pong with the reviewer of resubmitting your pull request. All of sudden you’re spending weeks going back and forth and context switching between new features and old commits that still need polishing.

If the code review process is not planned right, it could have more cost than value.

This is why it’s extremely important to structure and build a well defined process for code reviews within your engineering team.

In general, you’ll need to have in place well-defined guidelines for both the reviewer and reviewee, prior to creating a pull request and while it’s being reviewed. More specifically:

Define perquisites for creating pull requests.

I’ve found that the following greatly reduces friction:

  1. Make sure code compiles successfully.

  2. Read and annotate your code.

  3. Build and run tests that validate the scope of your code.

  4. All code in codebase should be tested.

  5. Link relevant tickets/items in your task management tool (JIRA for example) to your pull request.

  6. Do not assign a reviewer until you’ve finalized the above.

Define reviewee responsibilities

While the reviewer is last in the chain of merging your PR, the better it’s handed over by the reviewee, the fewer risks you’ll run into in the long term. Here are some guidelines that can greatly help:

  1. Communicate with your reviewer — Give your reviewers background about your task. Since most of us pull request authors have likely been reviewers already, simply put yourself in the shoes of the reviewer and ask, “How could this be easier for me?”

  2. Make smaller pull requests — Making smaller pull requests is the best way to speed up your review time. Keep your pull requests small so that you can iterate more quickly and accurately. In general, smaller code changes are also easier to test and verify as stable. When a pull request is small, it’s easier for the reviewers to understand the context and reason with the logic.

  3. Avoid changes during the code review — Major changes in the middle of code review basically resets the entire review process. If you need to make major changes after submitting a review, you may want to ship your existing review and follow-up with additional changes. If you need to make major changes after starting the code review process, make sure to communicate this to the reviewer as early in the process as possible.

  4. Respond to all actionable code review feedback — Even if you don’t implement their feedback, respond to it and explain your reasoning. If there’s something you don’t understand, ask questions inside or outside the code review.

  5. Code reviews are discussions, not dictation — You can think of most code review feedback as a suggestion more than an order. It’s fine to disagree with a reviewer’s feedback but you need to explain why and give them an opportunity to respond.

Define reviewer responsibilities

Since the reviewer is last in the chain before merging the code, a great part of the responsibility is on him for reducing errors. The reviewer should:

  1. Be aware to the task description and requirements.

  2. Make sure to completely understand the code.

  3. Evaluate all the architecture tradeoffs.

  4. Divide your comments into 3 categories: Critical, Optional and Positive. The first are comments that the developer must accept to change, and the latter being comments that let the developer know your appreciation for nice pieces of code.

I’ve also found that a following checklist is a great tool for an overall better and easier reviewing process:

  • Am I having difficulty in understanding this code?

  • Is there any complexity in the code which could be reduced by refactoring?

  • Is the code well organized in a package structure which makes sense?

  • Are the class names intuitive and is it obvious what they do?

  • Are there any classes which are notably large?

  • Are there any particularly long methods?

  • Do all the method names seem clear and intuitive?

  • Is the code well documented?

  • Is the code well tested?

  • Are there ways in which this code could be made more efficient?

  • Does the code meet our teams styling standards?

There are various effective and different code review approaches that vary based on team’s needs. So assume this is my personal opinion and that there are other ways that might work for your team. In the end, building such a sensitive process should be subjective to your companies goals, team’s culture and overall R&D structure.

If you have any questions or feedback for improving these guidelines, please feel free to add a comment below!

How my app grew by over 1M users in one month

Building and promoting a new consumer product is one of the most challenging things you can do as an entrepreneur. While there are many approaches on how to design, test, build and promote apps, usually they don’t seem to bring real results. 

Then you start wondering, maybe it’s the product? Maybe there’s not enough market fit? Or is it bad execution? Or maybe we should grow the marketing/branding budget? Maybe we’re not targeting the right audience? Maybe we should build more features!

giphy.gif

When you start questioning everything, things usually get even worse. You start defocusing from the main goal and start wasting energy and money on all kinds of wide approaches.

The worst is when you think it’s all a matter of growing your marketing or branding budget.

Your goal should always be one — improving customer retention. For those who are not familiar with what is customer retention, click here.

To make my point clear, I’ll let you in on a story I heard from a friend of mine, who’s the Co-founder and CTO of a very successful productivity B2C company. In 2012, they’ve released the first version of their app to the Google store and a crazy thing happened. A few days since the launch, 500K users worldwide downloaded the app. The reason for that crazy growth was due to no good apps in the productivity space back then. Over the next few months, they’ve grown to a few million users and raised over 5M dollars from VCs. 

Four years later, they still couldn’t reach a decent business model, and he realized that despite the big numbers, there were very few actually using the product long term. So he decided to dig into the data and look for the reason. He found out that retention was very low, and what’s even worse, that it hasn’t improved much in four years! That’s when it hit him to focus on retention instead of user growth.

Back then, VC’s poured millions of dollars into companies with large user growth because they didn’t know how to deal or measure the crazy scale that mobile app stores and websites brought with them. Today the case is different. The first thing you’ll need in order to raise money in B2C is to show retention growth. And there’s a very good reason for that. Back to my friend’s story, with no retention, it didn’t matter how many users have downloaded their app. After a week, 95% of users have stopped using the product. So even if they had a billion users, after a few weeks all it would be is just a number in their database.

If you have 100K users using your product every day, it’s 100X more valuable than having 100M users using your product once a month.

Most importantly, once you’ve reached a decent retention rate, you can be sure that your marketing budget will lead to a sustainable growth of your product and business.

Too many startups begin with an idea for a product that they think people want. They then spend months, sometimes years, perfecting that product without ever showing the product, even in a very rudimentary form, to the prospective customer. This is where the Lean startupcomes in. In short, Lean Startup is a methodology that every startup is a grand experiment that attempts to answer one main question — “Should this product be built?”. 

A core component of Lean Startup methodology is the build-measure-learn feedback loop. The first step is figuring out the problem that needs to be solved and then developing a minimum viable product (MVP) to begin the process of learning as quickly as possible. Once the MVP is established, a startup can work on tuning the engine. This will involve measurement and learning and must include actionable metrics that can demonstrate cause and effect question.

So that’s exactly what we did at Tiv.ai, with a few of our own twitches. FYI, Tiv.aiis a startup building an AI memory assistant which I’ve co-founded in January 2017. You can check out our product here.

Here are the steps we’ve taken in each iteration of building our product:

  1. Define the most important product assumption

  2. Design and build an MVP of how this assumption should be tested

  3. Target early adopters to test our MVP

  4. Apply the test results on the product

  5. Repeat

This is how our growth looked in the first year (2017) of iterations:

User activity from March 2017 to January 2018

User activity from March 2017 to January 2018

Slowly but surely right? Now let’s do an example together and see what happend after enough iterations.

1. Define the main assumption

We believed that there are no decent reminder apps that people actually like to use. The main reason in our opinion is that there’s a lot of friction in setting a single reminder. Either you need to fill a long form on a mobile app, or naturally ask an assistant like Siri, but realize she doesn’t understand 50% of your requests. So that’s when we defined our most important product assumption  —  If we could achieve understanding for almost every reminder request in natural language, users would use such a product long term.

 

2. Design and build an MVP

Since our assumption is focused on NLU (natural language understanding), we decided to focus solely on that. No branding, UX or other features. Just focus on improving the understanding. We’ve hired data scientists to build a state of the art NLU algorithm for understanding complex reminder requests. Secondly, since all we’re validating is this assumption, we’ve decided to build the MVP as a chatbot on Facebook Messenger, instead of going through the long and annoying process of building a mobile app. 

Please note! If we were to build a mobile app, this would not add anything to testing our assumption, and make our MVP longer and more complicated to design and build. Moreover, it might have even defocused us from the main assumption. For example, what if users just don’t like to use new apps anymore? We might’ve concluded that our assumption is wrong even though it was for a whole other reason. 

It’s important to narrow your MVP as much as possible, so there are no distractions from your main assumption.

 

3. Target early adopters

We needed English speakers since our algorithms only supported it. Also, we believed that millennial moms would eagerly want a product like this, since they’re always on the move and very busy, while constantly needing to remember things. So we targeted some Facebook pages (with no budget) which were based on a community of moms, and successfully brought onboard a few hundred beta testers to try it out.

 

4. Apply the test results on the product

After our first iteration, we’ve learned the following:

  1. There were much more ways to ask for reminders than we thought. But users extremely enjoyed the ease of setting reminders with a simple request.

  2. Users don’t always ask for reminders in a single request but break it to a few steps.

  3. When working with chatbots, users would like the assistance of buttons to make it faster and easier to handle.

With these results, we went on to our next main assumption, which was to add buttons to the flow of setting a reminder (conclusion 3). And guess what, that assumption also turned out to be true. For more proven assumptions, you can read my article on How to improve your chatbot.

Little by little, we improved our overall product and retention rate on a weekly basis. We never went on to tackle more than one assumption at a time. Slowly but surely, we started discovering users who were with us for over 6 months! After a year of weekly iterations, we’ve finally decided it was time to launch our product. We’ve reached a Week 1 retention rate of 92% and Week 4 of 16%. It’s way above the market standard which was enough for us.

We’ve published our chatbot on FB Messenger on mid Feb 2018 and within a month, have grown by over 5,800% as you can see below. 

Daily new users from Feb 17 to March 17

Daily new users from Feb 17 to March 17

This was mostly due to delivering a lean product that we knew people enjoyed and would recommend to others. Since then, we’ve grown to over 1M users worldwide and are growing by tens of thousands of users a day. 

User activity from Jan 01 to May 01

User activity from Jan 01 to May 01

We’re continuing to work with this methodology, and it’s proven a success every day. 

Not only did this methodology help us focus solely on what’s the most important set of features users want, but it also helped us filter out features we believed are valuable, but that are actually not. 

As said with customer service that the customer is always right, the same goes for product development. Trust your customers, listen to them and engage with them, understand what they want and don’t want. Never build things out of your own intuition unless it’s for assumption testing. At the end of the road you’ll reach one of two conclusions:

  1. The product does not have enough market fit, time to move on.

  2. You have a product people want. Good job, you’re on the way to building a company!

Either way, you win.

How to mine Ethereum in 5 min

How to mine Ethereum in 5 min

I’m sure you’ve already heard of the cryptocurrency craze way before reading this post. Cryptocurrencies are slowly and quietly revolutionizing the way financial systems and transactions work (and should work in my opinion).

With Bitcoin hitting its $6K mark not awhile ago, a current total of $158B in market cap and hundreds of ICOs (Initial Coin Offering) conducted since August, there’s a current bubble in the cryptocurrency space. What’s more, is that it doesn’t seem this bubble is going anywhere, anytime soon.

marketcap.png

 

So what would be the best way to enter this space and enjoy the growth? I’m not a cryptocurrency expert, however, from my research I’ve found that the answer is pretty clear:

If you’re looking for quick earnings, just invest in coins. 

However, due to this ever-growing crypto market, mining in the short term could lead to significant earnings in the long term.

Personally, I’m very interested in Ethereum, and have found an effortless way to start mining it quickly on AWS! This post will walk you through the process.

 

What is Ethereum?

Ethereum is an open software platform based on the blockchain technology that enables developers to build and deploy decentralized applications. The advantage of Ethereum over Bitcoin, is that it can support many different types of decentralized applications.

Ethereum has gained massive growth in the last year with over 230% as you can see below:

price.png

However, increasing growth in the price leads to an increasing demand for Ethereum mining and therefore to an increase in mining difficulty:

difficulty.png

 

So what is crypto mining?

Mining originates from the gold analogy of the cryptocurrency sphere. In simple, crypto mining is a process of solving complex math problems. “Miners” are people that spend time and energy solving these math problems. They provide the solution to the issuers, who verify it and reward the miners with a block of Ether. Intuitively, an increase in mining difficulty means it becomes harder to solve complex problems, and therefore to fewer rewards.

 

Is it worth it?

As more miners join the Ethereum network, the harder it becomes to solve the problem, which leads to an increase in the mining difficulty. This is why it’s currently costly to mine Ethereum. The returns are very low and equipment is pretty expensive. However, as Ethereum price continues to rise, it could become worthwhile in the future.

Nonetheless, Ethereum will be switching to a proof-of-stake framework later this year, which means mining could no longer be relevant. Take in mind, that this could also lead to a significant increase in Ethereum price.

If you’ve reached this far and still eager to mine Ethereum, let’s get to it!

 

How to start your AWS Mining instance

The steps are pretty simple:

  1. Go to your EC2 console in AWS and change the zone to US East (N.Virginia). This zone happens to be the cheapest for the type of instance we’ll be using, and also contains a community AMI that has all the required mining libraries already installed for instant use.
  2. Under Instances, select Spot Instances and click ‘Request Spot Instances’.
  3. Search for a community AMI called ami-cb384fdd and select it.
  4. Under Instance type choose g2.8xlarge.
  5. Review and Launch!

 

How to start mining

To start mining, you’ll need an Ethereum wallet and to join a mining pool. 

To generate a wallet, simply go to https://www.myetherwallet.com and follow the steps. By the end of the process, you’ll receieve a wallet address.

We’ll be using Dwarfpool for mining, which is rated in the top best mining pools. Feel free to use others if you like.

Simply SSH to your instance and type:

> tmux
> ethminer -G -F http://eth-eu.dwarfpool.com/{WALLET ADDRESS}/{YOUR_EMAIL ADDRESS} --cl-local-work 256 --cl-global-work 16384

Tmux allows you to run a process when exiting your SSH connection.

Ethminer is an Ethereum GPU mining worker. Entering your email address allows you to receive notifications on payouts. The other parameters are for mining optimizations.

That’s it!

You should soon see a DAG file generated and right afterward, your mining should start. To view your stats, simply go to https://dwarfpool.com/eth and in ‘Worker stats’ enter your wallet address.

Personally, I’ve concluded that due to the very high Ethereum mining difficulty, it’s not really worth it in the short and mid-term. No one can know for sure what the future of cryptocurrencies holds and if this is just a bubble that will soon pop. Some even say cryptocurrency trading is equivalent to buying lottery tickets. 

However, being a tech enthusiast and a believer in disruptions, I truly believe there’s a bright future for cryptocurrencies. If so, the current rise in Bitcoin and Ethereum is just the start. And one coin might be worth 100x its price today, which justifies mining it today.

Bots are here to stay. Here are strong reasons why.

For the past several years, I’ve been dedicating my life to learn, design, build and write about chatbots. The main reason chatbots fascinate me and conversational interfaces in general, is because they offer the most natural way for humans to interact with machines. Not only is the interaction natural, but also simple, clean and focused on what you need instantly. Think of Google’s search interface. All you can do is input search queries into a little text box. Everything that comes afterwards is magic.

Chatbots are still in their very early stage due to a few factors:

1. Lack of expectations between what chatbots can do and what users expect (or in other words, bad user experience). This leads to an instant disappointment and therefore to low usability. It starts with the largest bots such as Siri and all the way down to the very basic ones, which confuse users about their actual capabilities. In theory, Siri claims she can do almost anything related to your iPhone device, calendar and other internal Apple apps.

Screen Shot 2017-09-06 at 09.33.32.png

In reality, over 60% of what you ask Siri is not understood or results in general web search results. Just so you can get an idea of what she can do, here’s a list of Siri commands. Did you know she can do most of it? I didn’t before.

2. Educating users on creating new habits. The last bot I’ve built, was focused purely on conversation, and therefore did not have any buttons or menus. Retention was very low, despite the fact that most of the conversations between the bot and users were held successfully. What I’ve discovered, was that the more buttons and menus I added, the more retention grew. This led me to the conclusion that the majority of users are still not used to talking with machines naturally, but rather prefer to click on buttons as we’ve been used to for the past 30 years. Secondly, clicking buttons are faster than typing sentences. However, buttons are not faster than voice, which is why voice will eventually dominate the bot space. The transition from buttons to a natural conversation is growing but is still in its early adaption stage.

3. Artificial intelligence might have improved, but is still in its early stages. The reason this last and not first, is because I truly believe we can build great chatbots with todays current AI solutions (such as Api.ai, Wit.ai, etc.) if bots were more focused on how to create user habits and offering a well designed user experience that meets user expectations. You can read more about how to do this in a previous post I’ve written on how to improve your chatbot with 3 simple steps. Obviously, AI will only improve with time as more and more data is collected and trained across a multitude of domains.

For all the reasons above and more, we’re still far from seeing the true potential of chatbots. However, there’s strong reasons why chatbots are here to stay and will improve exponentially over time:

The optimal Human to Machine interaction

If you think about it, we’ve learned over the past 30 years to adjust ourselves to the complexity and limitations of machines. This has been done via websites, applications, buttons, icons, etc. But in my opinion, the optimal scenario should be just the opposite - Machines should be able to adjust themselves to us humans, both from a natural language understanding perspective, and personalization.

Humans should be able to ask a machine anything naturally, instead of having to learn new interfaces, products, and habits for every service they need.

For example, let’s say you’d like to know the weather. Until recently, you’d have to find and pick a service of your choice out of many alternatives and learn how to use it functionally. Now, since every service tries to be innovative and different from its alternatives, this usually results in various UX/UI which means more learning and effort required by users. The optimal solution would be if you could just ask. Thankfully there’s many great solutions today for weather assistants (such as Siri or Poncho), and there’s many more to come in other domains.

Domain specific AI

Not very long ago, companies built virtual assistants which tried to go very wide and open, but quickly realized how hard it is to understand natural language. Going back to Siri’s example, Apple tried to capture many domains in order to display Siri as the ultimate personal assistant. This ambition failed very quickly. On the other hand, AI solutions that have succeeded, are the ones that focus narrowly on one specific domain.

Take for example Getdango - an AI solution for predicting emojis. They’re doing a great job predicting emojis based on natural language and it’s due to their narrow focus. Another example is Meekan, a scheduling assistant for teams on Slack. Meekan is a chatbot dedicated to providing the best solution for scheduling events as easily as possible.

The power of synergy where individual bots focus on specific domains, is the right approach for solving bigger AI challenges. You can see companies moving in this direction like FB Messenger’s latest release of handover protocol which enables two or more applications to collaborate. More importantly, Amazon partnered with Microsoft to collaborate on Alexa with the help of Cortana in order to provide a more powerful virtual assistant. If every bot was to focus on one specific domain, the race to AI in a whole, would be solved faster and more efficiently. Happily, that’s where we’re heading.

The power of long term relationships

The way most products are designed today, is to maximize instant short term value for users once they enter an application. While web and mobile applications focus on short term value, bots can and should focus on long term value. Bot developers should focus on how to build relationships with users over time so that value is gained and grows constantly with every single interaction. With time, bots should know enough about what they need, to maximize and personalize the user experience and minimize input friction. 

For example, say you’re looking for a travel planning service. The way you would go by it today, is to look at travel sites, fill the proper forms, and basically teach the site about your preferences and filters every single time. The way a bot should work, is to know which information is relevant to learn about the user, like personal information, preferences, budgets, places the user has already been to and etc. Bots should constantly learn from user’s behavior and offer much more personalized responses. 

The optimal way you’d be conversing with such a bot after some time, would be as follows:

Screen Shot 2017-09-06 at 09.23.02.png

The bot should know by now how many people are in your family, their ages, where you’ve already been to, where you’re at right now, what you like to do, what you don’t like and much more. In other words, the bot shouldn’t be any different from a real travel agent.

To see how much are users willing to share regarding their personal information within a conversation, I’ve conducted a research with one of my bots. The bot started asking users basic questions like “how old are you” all the way to more personal questions like “what are you most insecure about?”. Guess how many users answered all questions truthfully?… Over 85%. More specifically, 89% of women answered all questions truthfully, while “only” 81% of men answered all questions. So if you’re a bot developer, don’t be worried about users not sharing their information. Worry about what questions you should be asking in order to enhance the users long term value. This kind of information retrieval is something todays applications cannot achieve, and where chatbots have a huge advantage.

Cross platform usability

In just a few years, mobile apps have transformed to must-haves for smartphone users. But despite the increase in app usage and app choices, the number of apps used per user is staying the same, according to a recent report from Nielsen. Most people are tired of downloading mobile apps and learning about how to use new interfaces. 

In addition, research says that US citizens own in average 3.6 connected device. These devices can vary from mobile devices to Smart TVs and products like Amazon Alexa. That’s a lot of connected devices! Now obviously, users would like to interact with your service on any device they’re using. But what do you do? Build an application for iOS, Android, Smart TV, Alexa, Smart watch, iPad, Windows and Mac and more? Sounds like a lot of work. And it’s going to be very hard for you to get users to download your app in the first place, since they’re already flooded with other apps. 

This is where the beauty of messaging platforms comes in. At present, approximately 75% of all smartphone users use some sort of messaging apps such as WhatsApp, WeChat, Facebook Messenger, etc. Over 1.2 billion people worldwide have and use Messenger on their devices, all people that have mobile devices have SMS and obviously most have an email account. The list goes on. Instead of building applications and spending hundreds of thousands of dollars, just focus on building your bots back end. For the front end, just integrate your bot across multiple messaging platforms that are already on users devices and you’re set. If your service brings value, users will come. More importantly, turns out it’s what users want.


The future and success of chatbots depends not only on the big 4 tech companies, but on developers and entrepreneurs who continue to innovate and push boundaries in AI and conversational interfaces. Most of todays mistakes in the conversational user interface space, are tomorrows improvements and solutions. Eventually, bots will bring great value that cannot be achieved with most todays applications.

To learn more about chatbots go ahead and read the chatbots beginners guide. If you want to start building one, read this post on how to develop a Facebook Messenger bot.

How to Integrate your bot with Alexa in under 10 minutes

Why Alexa?

There’s no doubt Amazon's Alexa is currently one of the best consumer virtual assistants to find. In general, it seems consumers feel more comfortable speaking to a virtual assistant at home than in public (yet). More particularly, Alexa currently has over 8.2 million active users in the U.S. and awareness of the devices increased to 82% as of Dec. 31, 2016, up from 47% a year before that and 20% on March 31, 2015. Lastly, Alexa high demand for 3rd party service integrations. For these reasons and many more, I’d say it’s a smart move to integrate your service as well.

The Challenge

The challenge with integrating with Alexa when your service's interface is a bot, is that Alexa isn’t really set up for conversation. Since most services don’t provide a conversational interface, Alexa provides an interaction model that offers basic pattern matching and NLP. If you’ve built your own NLP engine, this is really bad news. It means you’ll need to classify thousands of patterns to enable the same ability you've already achieved with your bot. 

Luckily, I’ve found a work around thats actually so simple, it shouldn’t take you more than 10 minutes! In general, all you need to use is a slot type called AMAZON.LITERAL. From the Alexa Skills Kit documentation: "AMAZON.LITERAL passes the recognized words for the slot value with no conversion. You must define a set of representative slot values as part of your sample utterances." A while back, Amazon deprecated this solution but recently announced its return for good, so have no worries!

How Alexa works

For this tutorial, I’ll use a weather bot called Weabo, configured with a custom HTTPS endpoint (you can also use an AWS lambda function if you'd like). This is what happens when a user says something to Alexa:

  1. A Voice message comes in from user. The message pattern could be: “Alexa, ask Weabo what’s the weather in San Fransisco”
  2. An Alexa NLP engine classifies the message intent and extracts entities defined in advance. For example, the intent here could be “weather.get” and the entities “location=San Fransisco”.
  3. Results from the NLP engine are constructed into a request.
  4. The request is routed to a custom HTTPS endpoint of your choice, or to an AWS lambda function.
  5. A message response is returned from the request, and back to the user. A message response could be: "The weather in San Fransisco is 70 degrees".

Solution

We won't go into details of how to set up your Alexa skill, since there's so many great tutorials out there. Take a look at this one by Amazon, on developing an Alexa skill in under 5 minutes with a Lambda function, or this one for developing with a HTTPS endpoint.

Let's assume this is our endpoint function (written in Node.js):

function respond_to_incoming_message(req, res) {
  // Extract text message from request
  var text = req.body.text;
  // Extract text intent, entities and generate bot response
  var bot_response = respond_to_user_message(text);
  // Return response 
  res.send(bot_response);
}

What we'd like to do, is to make sure a message sent to Alexa, arrives to the above endpoint as is, so your bot can take care of the parsing it and returning a proper response.

Once you've reached the 'Interaction Model' area of your Alexa skill configuration, add the following to the Intent Schema section :

{
  "intents": [
    {
      "slots": [
        {
          "name": "MessageText",
          "type": "AMAZON.LITERAL"
        }
      ],
      "intent": "FreeText"
    }
  ]
}

After that, add the following to the 'Sample Utterances' section:

FreeText {this is a sentence|MessageText}
FreeText {this is another sentence|MessageText}

Looking back at our endpoint function above, replace the first line with the following:

var text = req.body.intent.slots.MessageText.value;

That’s it! This will allow your users to talk freely with Alexa while using your own NLP engine.

Estimating an article's average reading time (Python)

Offering an article's reading time estimation to your site's content, can contribute greatly to your end users.  First of all, it allows end users to prepare the time they needs to read an article in full. Secondly, it could help them choose the right article for the right amount of available time they have. Lastly, it opens a whole new range of features, sorting options and filter improvements you can offer (like filtering articles by reading time).

In this post, I will walk you through on how to estimate the reading time of any public article url by crawling and making simple calculations (written in Python). By the way, this post was estimated to be 6 minutes.

 

Estimating words per minute

Words per minute, commonly abbreviated WPM, is a measure of words processed in a minute, often used as a measurement of the speed of typing or reading. WPM has many meanings and complications. The first is, that average reading time is subjective. Secondly, the length or duration of words is clearly variable, as some words can be read very quickly (like 'dog') while others take much longer (like 'rhinoceros'). Therefore, the definition of each word is often standardized to be five characters long. There are other parameters that effect the reading time such as font type and size, your age, rather you're reading on a monitor or paper, and even the number of paragraphs, images and buttons in the article's site. 

Based on research done in this field, people are able to read English at 200 WPM on paper, and 180 WPM on a monitor (the current record is 290 WPM).  

For the sake of simplicity, we'll define a word as five characters (including spaces and punctuation), and WPM = 200. Feel free to add additional parameters to your calculation. Note that if all you're looking for is a broad estimation, what we've defined will suffice.

 

From URL to Estimating reading time

Lets design the simple algorithm process:

  1. Extract visible webpage text (title, subtitle, body, page buttons, etc.) from given url.
  2. Filter unnecessary content from text.
  3. Estimate filtered text reading time.

1. Extracting visible webpage text

In order to extract a webpage's text content, we'll use Python libraries called BeatifulSoup and Urllib:

import bs4
import urllib, re

def extract_text(url):
    html = urllib.urlopen(url).read()
    soup = bs4.BeautifulSoup(html, 'html.parser')
    texts = soup.findAll(text=True)
    return texts

2. Filter unnecessary page content

Once we've extracted the desired text content, we need to filter out all the unnecessary content such styles (CSS), scripts (JS), html headers, comments, etc:

def is_visible(element):
    if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
        return False
    elif isinstance(element, bs4.element.Comment):
        return False
    elif element.string == "\n":
        return False
    return True

def filter_visible_text(page_texts):
    return filter(is_visible, page_texts)

3. Estimate reading time

To estimate the article's reading time, we need to count number of words (as defined above) and divide by defined WPM (200):

WPM = 200
WORD_LENGTH = 5

def count_words_in_text(text_list, word_length):
    total_words = 0
    for current_text in text_list:
        total_words += len(current_text)/word_length
    return total_words

def estimate_reading_time(url):
    texts = extract_url(url)
    filtered_text = filter_visible_text(texts)
    total_words = count_words_in_text(filtered_text, WORD_LENGTH)
    return total_words/WPM
    

That's it! Feel free to test it out with any string url, by calling the method estimate_reading_time.

To view the source code, please visit my GitHub page. If you have any questions, feel free to drop me a line.

How to improve your chatbot in 3 simple steps

I have tested hundreds of chatbots and came to realize a key factor to why they fail in user experience. You may be thinking it has to do with the the chatbots purpose or the lack of powerful AI, but thats not usually the case. Actually, many of the chatbots have a very good purpose, and do solve a real problem or pain. 

The main reason these chatbots fail is because of a key element developers miss - Users are expecting to chat with your chatbot, not fill a form.  I will use a Weather Bot called Weabo as an example throughout this post. Lets take a look at the following conversation:

Screen Shot 2017-04-14 at 17.38.30.png

The above conversation usually occurs when your conversation logic looks something like this:

function generate_response(message, current_state) {
  ...
  if (current_state == "zipcode.get") {
    response = "This is not a valid Zipcode. Please try again";
    if (is_valid_zipcode(message)) {
      save_user_zipcode(message);
      response = get_weather(message);
    }
  }
  ...
  return response
}

As much as we would hope so, users don't always follow the conversation flows we as bot developers have designed and expected. For most users this might even be the first time talking to a chatbot. They don’t understand the complexity of NLP/NLU and how fragile a bots understanding can be. So you can’t blame them for doing what they’re supposed to do - simple chat. Thats why you shouldn’t assume specific and strict user input because in many cases the user will be stuck in a infinite loop, lose patience and dump your chatbot. 

Here are 3 steps you can take to significantly improve your conversational experience without much work:

1. Provide small talk context understanding

In my opinion, every chatbot should standardize itself to understand and respond to basic small talk. You don't need to be an NLP or Machine expert to supply small talk to you chatbot. Just get familiarized with one of the many great 3rd party solutions out there such as Api.ai, Wit.ai, Lex, etc… These solutions offer simple out of the box solutions for small talk. So for example if the user asked “What can you do”, you can easily catch that via 3rd party APIs and provide the appropriate response. Check out Api.ai’s solution for small talk which I personally recommend and have found very useful - CLICK HERE.

To summarize thus far, supply small talk understanding for anything from a basic “hello” or “thank you”, to specific questions such as “what can you do?” In my opinion, this shouldn’t take you more than a days work. More strongly, once you’re obligated to provide answers to questions such as “what can you do?”, this will push you to really tighten your bots purpose and understand whats unique about it.

2. Keep your conversation flow logic loose

Conversation states are crucial to any chatbot so if you’re going in that direction - good job. However,  don’t build your flow logic such that you’re expecting a specific and strict answer from users, because thats where you can seriously fail. Instead, loosen your logic and accept the fact that users might decide to deviate from the flow you’ve built.

All you have to do, is reverse your current logic. If you’re expecting a Zip code as input at some conversation state, match the current state with the appropriate response only if you’ve first identified that there’s a valid Zip code in the users input. Otherwise, treat the current user input as a stateless message, ignore the users current state and respond accordingly. Also, take into consideration the retrieved intent from the 3rd party you've decided to integrate. Lets look at a refined example of the flow chart above:

function generate_response(message, current_state, intent) {
  ...
  if (intent == "smalltalk.name.get") {
    response = "My name is Weabo :)";
  } else if (intent == "smalltalk.help") {
    response = "Sure! You can type 'What is the weather in X?'"
  } else if (intent == "weather.get") {
    response = get_weather(message)
  } else if (is_valid_zipcode(message)) {
    if (current_state == "zipcode.get") {
      save_user_zipcode(message);
      response = get_weather(message);
    }
  }
  ...
  return response
}

After implementing this logic, the conversation example above would look like this:

Screen Shot 2017-04-14 at 17.37.50.png

To summarize, try to first understand the intent/meaning of every incoming message, and only in the right cases, match them with users current states and respond accordingly.

3. Redirect unknown intents to what you do know

Lastly, I want to talk focus on unknown intents. Unknown intents are messages that the chatbot did not understand or knows how to respond to. I have identified that between 70%-80% of all users input falls into unknown intents. There are hundreds of blog posts I can write regarding how to improve your bots logic in this case, but for now I’ll focus on one - redirect the user to what your bot can understand. Think of the conversation as a pinball game. The user shoots a ball and your mission is to make sure the ball doesn’t enter the drains. In order to achieve that, is to provide hints and responses of what your bot can understand. For example: “I might understand this in the future. For now, I can tell you the weather. Try this: whats the weather in new york”.

Most users just want to understand your chatbots limits, what it can do, but mostly what it can’t do. The more you assist your users in understanding what your bot can’t do, the less users input will fall into the unknown intents category.

Summary

There is still so much you can do to improve your chatbots conversational experience. Whats important to understand, is the simplicity of significantly improving your chatbots user experience. More importantly, users want to chat, but mostly understand your chatbots abilities and how to improve their relationship with it. Treat the conversational interface as you would treat any basic human to human conversation, and forget what you’ve learned about web/app interfaces.

Last note: If you’re not planning to provide basic free text understanding, consider moving to persistent menus/quick replies. Its better to limit the users expectations at first, rather than to disappoint them.

I hope this post helped you in some way and thank you for taking the time to read it. Feel free to drop a comment if you have any thoughts!

 

URL text summarizer using Web Crawling and NLP (Python)

To skip this tutorial, feel free to download the source code from my Github repo here.

I’ve been asked by a few friends to develop a feature for a WhatsApp chatbot of mine, that summarizes articles based on URL inputs. So when a friend sends an article to a WhatsApp group, the bot will reply with a summary of the given URL article. I like this feature because from my personal research, 65% of group users don’t even click the shared URLs, but 97% of them will read a few lines of the articles summary.

As part of being a Fullstack developer, it is important to know how to choose the right stack for each product you develop, depending on the requirements and limitations. For web crawling, I love using Python. The Python community is filled with efficient, easy to implement open source libraries both for web crawling and text summarization. Once you’re done with this tutorial, you won’t believe how simple it is to implement the task.

 

GETTING STARTED

For this tutorial, we’ll be using two Python libraries:

  1. Web crawling - Beautiful Soup. Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

  2. Text summarization - NLTK (Natural Language Toolkit). NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

Go ahead and get familiar with the libraries before continuing, and also make sure to install them locally. If you’re having trouble installing the libraries, follow this commands in your Terminal:

pip install beautifulsoup4
pip install -U nltk
pip install -U numpy
pip install -U setuptools
pip install -U sumy

After that, open Python command line and enter:

import nltk
nltk.download(“stopwords”) 

 

THE ALGORITHM

Lets describe the algorithm:

  1. Get URL from user input

  2. Web crawl to extract the natural language from the URL html (by paragraphs <p>).

  3. Execute the summarize class algorithm (implemented using NLTK) on the extracted sentences.

    1. The algorithm ranks sentences according to the frequency of the words they contain, and the top sentences are selected for the final summary.

  4. Return the highest ranked sentences (I prefer 5) as a final summary.

For section 2 (1 is self explanatory), we’ll develop a method called getTextFromURL as shown below:

def getTextFromURL(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")
    text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
    return text

The method initiates a get request to the given URL, and returns the extracted natural language from the URL html page.

For sections 3-4, we’ll develop a method called summarizeURL as shown below:

def summarizeURL(url, total_pars):
    url_text = getTextFromURL(url).replace(u"Â", u"").replace(u"â", u"")
    fs = FrequencySummarizer()
    final_summary = fs.summarize(url_text.replace("\n"," "), total_pars)
    return " ".join(final_summary)

The method calls the method above to retrieve the text, and clean it from html characters and trailing new lines (\n). Secondly, execute the Summarize algorithm (inspired by this post) on the given text, which then returns a list with the highest ranked sentences which is our final summary.

 

SUMMARY

That’s it! Try it out with any URL and you’ll get a pretty decent summary. The algorithm proposed in this article as as stated, inspired by this post, which implements a simple text summarizer using the NLTK library. There are many summarization algorithms which have been proposed in recent years, and there’s no doubt there are even better solutions. If you have any suggestions, recommendations I’de love to hear about them so comment below!

Feel free to download directly the source code via my Github account.

How to develop a Facebook Messenger bot (using Node.js and MongoDB)

To skip the tutorial, feel free to download the source code from my Github repo here.

In very little time, there are quite a few very good tutorials out there for beginners about how to develop a Facebook Messenger bot. However, these tutorials describe a stateless bot. What that means is, that for every user who send a message to your bot, there is no info saved regarding his current state in the conversations, and other basic info. This is why I've decided to write this tutorial, which consists of a basic implementation of a Facebook Messenger bot, in addition to a functional working MongoDB library. 

Saving end users specific states dynamically within a conversation, is crucial for the UX of any basic chat bot. Saving states allows the bot to communicate with the end users in a flow which follows some pattern, that otherwise would not be possible. 

Getting started

For starters, you'll need a Facebook developers account which can be found here.

Secondly, follow the beginning of the process for creating a Facebook page and set up a 'Webhook' (up until step 5) by clicking here. Note: You should write down the verification code which you've provided to the web hook in the tutorial. Secondly, once you've got a Facebook page up and running, look up the page token, and send a POST request with the following:

https://graph.facebook.com/v2.6/me/subscribed_apps?access_token=<TOKEN_GOES_HERE>

You should get a response 'true' which means you've synced your Facebook page with the provided API.

Lastly, please get familiar with the basics of Node.js and MongoDB. It is recommended to learn the basics of MongoDB. In addition, you should understand the basics of writing in Node.js and ES6.

Now let's create you very first Facebook messenger chat bot!

Facebook API Structured messages

First things first. Understand and learn the basic concepts of the Facebook API - click here. Let's look at an example:

"welcome_message": {
  "attachment": {
    "type": "template",
      "payload": {
        "template_type": "button",
          "text": "Hello and welcome to your first bot. Would you like              to get see our products?",
            "buttons": [
              {
                "type": "postback",
                "title": "Yes",
                "payload": "get_options"
              },
              {
                "type": "postback",
                "title": "No",
                "payload": "no_options"
              }
            ]
      }
  }
}

In the example above, you can see that for every message sent to Facebook, we need to declare the type of message, which is in this case a template (for basic text, text is enough). In addition, we declare the template type which is buttons in this case, and the buttons themselves. For every button, we need to declare the button title, type and payload. Type is so we'll know how the button click is handled, and payload is so we can identify which button the user clicked (a further example is described in the source code). 

Server side

The basic and required implementation for the server side, is to set up a GET handler for the url/webhook/, and a POST for the same url/webhook/. The GET handler is for Facebook verification when applying your url webhook and should be as follows:

function facebookVerification(req, res) {
    if (req.query['hub.verify_token'] === WEBHOOK_TOKEN) {
        res.status(200).send(req.query['hub.challenge']);
    } else {
        console.error("Failed validation. Make sure the validation          tokens match.");
    }
}

Note: the WEBHOOK_TOKEN above is to be stated as you've declared when initializing the webhook. Facebook shows an example with 'my_voice_is_my_password_verify_me'. You can leave it as is and update the source code.

The second and most important method is the POST. Facebook Messenger sends every ICM (Incoming message) sent to your bot page, via POST to the url you've declared in the developers portal. The method should handle all ICM either those which arrived by user clicks, or by free text. I will describe three methods which are used in this case:

// 0 MongoDB info
const mongoose = require('mongoose');
const User = mongoose.model('User', {_id: String, name: String, profile_image_url: String, phone_number: String, current_state: String});
// 1
app.post('/webhook/', facebook_parser.facebookWebhookListener);
// 2
function facebookWebhookListener(req, res) {
    if (!req.body || !req.body.entry[0] ||                      !req.body.entry[0].messaging) {
        return console.error("no entries on received body");
    }
    let messaging_events = req.body.entry[0].messaging;
    for (let messagingItem of messaging_events) {
        let user_id = messagingItem.sender.id;
        db_utils.getUserById(user_id, messagingItem, parseIncomingMSGSession);
    }
    res.sendStatus(200);
}
// 3
function getUserById(user_id, incomingMessage, callback) {
    var result = null;
    //Lets try to Find a user
    User.findById(user_id, function (err, userObj) {
        if (err) {
            console.log(err);
        } else if (userObj) {
            result = userObj;
            console.log('User ' + user_id + ' exists. Getting current user object:', userObj);
        } else {
            console.log('User not found!');
        }
        // After getting user object, forward to callback method.
        callback(user_id, incomingMessage, userObj);
    });
}
// 4
function parseIncomingMSGSession(user_id, messageItem, userObj) {
    var current_state = "welcome_message";
    if (userObj != null) {
        current_state = userObj.current_state;
    }
    // If we recieve any text message, parse and respond accordingly
    if (messageItem.message && messageItem.message.text) {
        // Currently support a static welcome message only
        sendFacebookGenericMsg(user_id, message_templates.templates["welcome_message"]);
    }
    // If the user sends us a button click
    if (messageItem.postback && messageItem.postback.payload) {
        var button_payload_state = messageItem.postback.payload;
        switch (button_payload_state) {
            case "get_options":
                sendFacebookGenericMsg(user_id, message_templates.templates["options_message"]);
                break;
            case "no_options":
                sendFacbookTextMsg(user_id, "Ok. There is so much you can do with stateful bots!");
                break;
        }
    }
    // Save new user state. If user does not exist in DB, will create a new user.
    db_utils.setUserFieldById(user_id, "current_state", "");
}

The first step (commented) is for listening to POST requests and forwarding them to a method called facebookWebhookListener (method 2). This method then retrieves from the POST body the relevant info such as the message item (consists of user unique id, message text, etc) and forwards the content to a method called getUserById (method 3).

The method getUserById (method 3), uses the info set at the top (comment 0), and tries to retrieve a user with the given id in the DB. If the user is not found, a null will be returned, and the info is passed to a callback function which is in our case, parseIncomingMSGSession (method 4). 

The method parseIncomingMSGSession (method 4), is in charge of sending an OGM (Outgoing message) based on the user info. In the case above, the default state is "welcome_message". Secondly, the method obtains the type of the ICM, which could either be a text message, or a clicked message (when user clicks on buttons the bot provided). Based on the ICM and users state, a relevant message is sent. There are additional methods declared in the code above, which I will not explain, since they are pretty much self explanatory and can be found in full in the source code provided at the top of this post (or at the end). Feel free to ask me any questions regarding any of the methods and general flow of the server side.

Finally, in order to send back a response to the end user, you'll need to send a POST request with the message template as described above and with the following structure:

// Send generic template msg (could be options, images, etc.)
function sendFacebookGenericMsg(user_id, message_template) {
    request({
        url: 'https://graph.facebook.com/v2.6/me/messages',
        qs: {access_token: TOKEN},
        method: 'POST',
        json: {
            recipient: { id: user_id },
            message: message_template
        }
    }, facebookCallbackResponse);
}

function facebookCallbackResponse(error, response, body) {
    if (error) {
        console.log('Error sending messages: ', error)
    } else if (response.body.error) {
        console.log('Error: ', response.body.error)
    }
}

The TOKEN shown above is the page token you've received via the Facebook developers portal page. Congratulations! You've completed your very first Facebook messenger bot. The source code is built in such a way, that it'll be very easy for you to scale it up to a fully functional chatbot.

To view the full project source code, click the button below. Feel free to ask any questions you might have, and I'll answer you ASAP! 

Chatbots - The beginners guide

If you search for chatbots on Google, you'll probably come across hundreds of pages starting from what is a chatbot to how to build one. This is because we're in 2016, the year of the chatbots revolution.

I've been introduced to many people who are new to this space, and who are very interested and motivated in entering it, rather they're software developers, entrepreneurs, or just tech hobbyists. Entering this space for the first time, has become overwhelming in just a few months, particularly after Facebook announced the release of the messenger API at F8 developer conference. Due to this matter, I've decided to simplify the basic steps of entering this fascinating world.

What is a chatbot?

To fully understand what is a chatbot and its potential, lets start by watching the following example:

Get the idea? The conversation example above, was conducted between an end user and a chatbot, built on the Facebook messenger platform.

So what is a chatbot? It is a piece of software that is designed to automate a specific task. More specifically, a chatbot is essentially a conversational user interface which can be plugged into a number of data sources via APIs so it can deliver information or services on demand, such as weather forecasts or breaking news. 

Why now?

Chatbots have been around for decades right? So what is all this noise all of a sudden? This question has many different answers, depending on who you ask. If you ask me, there are two main reasons:

1. Messaging has become the most essential and most popular tool for communication. 

2. We're closer to AI (Artificial intelligence) and NLP (Natural Language Processing) breakthroughs than ever before. This means that talking to a chatbot can closely become as real as talking to a human. Today, developers can find many APIs that offer AI/NLP services, without even understanding how AI/NLP works - This is HUGE. A few examples I recommend are Crunchable.io, Chatbots.io, Luis.ai (a must!), API.ai and Wit.ai.

Basically, the point I'm trying to make is, that messaging platforms are the place we all go to on a regular basis. So why not bring all the other places into this platforms? This is what Facebook did with Facebook Messenger.

Facebook Messenger is far more than a messenger app. It is a store for thousands of apps which are integrated into our daily conversations. Furthermore, as stated above, Facebook has released its chatbot platform in April, 2016. Since then, more than 11,000 bots have been added to Messenger by developers.

Where are the chatbots?

The first chatbot I built was on WhatsApp. The reason I chose WhatsApp, is because all my friends use it as their main messaging platform. Unfortunately, WhatsApp doesn't offer an official API. What this means is, that WhatsApp doesn't approve building chatbots on its platform (not a surprise since WhatsApp is a Facebook company, which itself offers an extensive API). This doesn't mean that there aren't any work arounds. If you're as stubborn as I am, take a look at yowsup and start from there. You'll also need a registered phone number before starting the process. So to conclude, WhatsApp is probably not the place you'll find rich and powerful chatbots. 

Platforms that do offer official APIs are:

1. Facebook Messenger

2. Slack

3. Telegram

4. Kik

There are other deployment channels such as Android and iOS (via SMS), Skype and even Email. However, the listed above are the ones I would focus on.

You can find a rich list of most of the chatbots out there by clicking here, thanks to our friends at Botlist.co that did an amazing job. 

How do I build a chatbot?

This requires a long answer. An answer I will save for my next blog post, in which I will describe how to build your very first chatbot using Node.js and MongoDB.

If you're not a developer, or is looking for an easier approach which does not require programming, here are a few solutions for you:

1. Chatfuel - My first choice. No coding required. Easily add and edit content— what you see is what you get.

2. Botsify - Build a facebook messenger Chatbot without any coding knowledge. 

3. Meya.ai - Meya helps with the mechanics of bot building so you can concentrate on the fun stuff.

There is some downsides to using a service instead of building your own. Using the above services limit your creativity in many ways, enabling you only a glimpse of what can be done. Secondly, you are using a third party hosting service, which means you're stuck with them. Nevertheless, these are great solutions for services that will get you started with chatbots, without the need for any coding knowledge.

Summary

There has been a lot of controversy rather bots will succeed or fail in the near future. To understand the controversy, you have to understand the differentiation between "stupid" bots and "smart" bots. "Stupid" bots work with structured input, while "smart" bots process your natural language and provide a more human-to-human experience.

The main issue with "stupid" bots is that as soon as people start bundling things up, changing their minds, going back to what has been mentioned earlier in the chat, etc., the bot falls apart. Therefore, as long as chatbots can't fully conduct a conversation naturally, while understanding the intent of the user at every stage, bots will be limited and ineffective. 

Having said that, in my opinion, chatbots don't have to be smart in order to succeed. There are thousands of use cases in which a "stupid" chatbot can simplify both the end users experience, and the business side productiveness. Take for example ordering Pizza. You can create a flow in which the user needs to enter inputs based on questions and options. You can deliberately state the input you're expecting from the user, and therefore the need for NLP or AI becomes irrelevant. I would prefer ordering pizza from a "stupid" bot then over the phone, or some cheap website any day. 

To fully summarize the above and much more, have a look at the Chatbot ecosystem, brought together by Business Insider.

Stay tuned for my next blog post, about how to develop your very first Facebook Messenger chatbot, using Node.js and MongoDB.