HTML5, NodeJS and Neural Networks: The tech behind MySam, an open source Siri

HTML5, NodeJS and Neural Networks: The tech behind MySam, an open source Siri

Recently I published the very first version ofĀ MySam, an open ā€œintelligentā€ assistant for the web similar to Siri or Google Now. Unlike those however, you can teach Sam yourself, it works in many modern browsers and it is extensible with plugins written in HTML and JavaScript. Here is a video that shows what Sam can do:

It is a fun project that combines many of the open source projects Iā€™ve recently been working on or interested in. In this short post Iā€™d like to show how they all came together.

The Brain

The natural language understanding and learning process is probably the most interesting part. Sam uses natural language processing and machine learning to determine the probabilities of which of the previously learned actions to perform.

The NodeJS server runsĀ natural-brainĀ which combinesĀ node-natural, a natural language library withĀ BrainJS, aĀ neural networkĀ library for JavaScript. That means that, given some training data and an input text, you will get back probabilities with how likely it matches a certain label (in Samā€™s case which action). Here is an example how natural-brain works:

var BrainJSClassifier = require('natural-brain');
var classifier = new BrainJSClassifier();

classifier.addDocument('my unit-tests failed.', 'software');
classifier.addDocument('tried the program, but it was buggy.', 'software');
classifier.addDocument('tomorrow we will do standup.', 'meeting');
classifier.addDocument('the drive has a 2TB capacity.', 'hardware');
classifier.addDocument('i need a new power supply.', 'hardware');
classifier.addDocument('can you play some new music?', 'music');

classifier.train();

console.log(classifier.classify('did the tests pass?')); // -> software
console.log(classifier.getClassifications('did the tests pass?'));
// -> [
// { label: 'software', value: 0.6207398029868376 },
// { label: 'meeting', value: 0.09425356030447715 },
// { label: 'hardware', value: 0.10512960673523415 },
// { label: 'music', value: 0.06935145959504821 }
// ]
  
console.log(classifier.classify('did you buy a new drive?')); // -> hardware
console.log(classifier.classify('What is the capacity?')); // -> hardware
console.log(classifier.classify('Lets meet tomorrow?')); // -> meeting
console.log(classifier.classify('Can you play some stuff?')); // -> music

node-natural already comes with two statistical models for language classification (naive bayesĀ andĀ logistic regression) but since the data format was also perfect to use with a neural network, connecting it to BrainJS was an interesting experiment. I do not have any hard numbers for comparison yet but at least in Samā€™s case the neural network seemed to perform much better than the other two methods and the prediction accuracy and learning ability, especially once it had more data available, was more than surprising at times.

Compared to the language classification the tagging mechanism that extracts parts of a sentence is currently pretty primitive. Since we can assume that a classified sentenceā€Šā€”ā€Šprovided a given confidenceā€Šā€”ā€Šis quite similar to the original training sentence it simply looks at the words around a tag that it has in common. From the video, if the training sentenceĀ ā€œIs it cold in Swedenā€Ā with the location tag asĀ ā€œSwedenā€Ā is matched by the sentenceĀ ā€œDo you know if it is cold in Canadaā€Ā it will tag the words afterĀ ā€œcold inā€Ā as the location. While not the most clever algorithm it works quite well mainly thanks to the natural language classification accuracy.

The API

Now that Sam can learn to make sense of language, getting it to communicate with the world is just as important. This is whereĀ FeathersĀ comes in. Feathers is a service oriented REST and real-time API framework for NodeJS. This means that you can connect to Samā€™s classifier through a RESTful API and in real-time via websockets.

The service layer is also where Sam stores its training and configuration data. To avoid having to set up your own database server it is using the filesystem based databaseĀ NeDBĀ through theĀ feathers-nedbĀ plugin. One really nice thing about Feathers is that the database can be swapped out forĀ MongoDB,Ā *SQLdatabases or even a remote API by just changingĀ two lines of code. With MySam running, the API has two endpoints:

localhost:9090/actionsĀ where actions are stored. An action contains a training text, the action type name (potentially with additional parameters) and the location of the words in the training text belonging to a tag (-1 means the end of the sentence).

{
  "text": "what's the weather in Vancouver",
  "action": {
    "type": "weather"
  },
  "tags": [
    {
      "label": "location",
      "start": 6,
      "end": -1
    }
  ],
  "_id": "l0QZr5Ya52rHVk2B"
}

localhost:9090/classifyĀ is where classifications are sent to. A classification currently only has an input sentence. For example a JSON object like

{ "input": "Do you know the weather in Chicago" }

in a CURL request like this:

curl 'http://localhost:9090/classify/' -H 'Content-Type: application/json' ā€” data-binary '{ "input": "Do you know the weather in Chicago" }'

Returns a classification similar to this:

{
   "text":"what's the weather in Vancouver",
   "action":{
      "type":"weather"
   },
   "tags":[
      {
         "label":"location",
         "start":6,
         "end":-1
      }
   ],
   "_id":"l0QZr5Ya52rHVk2B",
   "id":"01852f7590c96450a800d9bfac523830",
   "input":"Do you know the weather in Chicago",
   "classifications":[
      {
         "label":"hf5UlkJlY9V9TaK9",
         "value":0.017050428002625857
      },
      {
         "label":"id9SUqixVdvWI4xG",
         "value":0.022634214537984445
      },
      {
         "label":"l0QZr5Ya52rHVk2B",
         "value":0.3540857832992517
      }
   ],
   "pos":{
      "tokens":[
         "Do", "you", "know", "the", "weather", "in", "Chicago"
      ],
      "tags":[
         "VBP", "PRP", "VB", "DT", "NN", "IN", "NNP"
      ]
   },
   "extracted":{
      "location":[
         "Chicago"
      ]
   }
}

It contains the same fields as the matched action (e.g.Ā tagsĀ orĀ text) and theinput.Ā In theĀ classificationsĀ property you get a list of the confidence for each action id. As you can see, the action with idĀ l0QZr5Ya52rHVk2BĀ has a 35% confidence and is also the one that was matched.Ā posĀ contains someĀ part of speechĀ information andĀ extractedĀ has all extracted tags (in our case Chicago for theĀ location).

Technically, the natural language classification could also run by itself in the browser. Having a NodeJS server that is accessible through an API however has many advantages. It makes it possible to provide different frontends like a web-page, theĀ ElectronĀ desktop, a mobile application or even connect a chat bot to it. It also allows you to create plugins for voice controlling local programs like iTunes or hardware like anĀ ArduinoĀ robot.

The Frontend

A not yet too well known part of the HTML5 specification is theĀ speech recognitionĀ API which is currently only supported inĀ Webkit based browsers. It makes it very easy to add voice recognition to any web application and works extremely wellā€Šā€”ā€Ševen with my odd German-Canadian English accent. So all that needs to be done to get Sam to classify a spoken sentence is to start the voice recognition and once completed, send the recognized text to the API.

The web-frontend itself is located in theĀ mysam-frontendĀ module. It is aDoneJSĀ application which allows you to dynamically load globally installed plugins and register actions (a callbackĀ that gets the classification result and the main DOM element) andĀ learners, which is the form that is shown when learning something new. Once a classification comes back and is past a certain confidence level (I found a 30% to be a good threshold) the frontend will look up the action and call it if it exists. There will be much more documentation around it soon but in its basic formĀ creating your own pluginĀ for Sam should be almost as easy as writing a jQuery plugin.

The Future

This article is only a very brief and not very in-depth overview of MySamā€™s technology. There is still much more to be said about the reasons why it exists at all and the possibilities of a truly open AI assistant. In the meantime, I invite you all try it out, teach it new things, write plugins, share your thoughts and hopefully be part of the beginning of a journey exploring different ways to interact with our computers