2016-12-11

The Right Way of Democratizing Machine Learning

Nowadays people are hearing quite a new services providing Machine Learning capabilities in manner of a magic box. They state that those services are "democratizing" Machine Learning.

How democratic is it then? Well, you are given some APIs, which usually come with a short description of what they do and what kind of input they need. And some example service requests are provided too. You finish reading that manual in 10 minutes or less, copy & paste the request code, modify the request with your own data, send the request. Voilà, some predictive or profound information as output is in the response, works really like magic. The power of Machine Learning no longer belongs only to those privileged few, namely algorithm scientists and data scientists. Any one who knows how to invoke web services can suddenly freely wield swiss knives of artificial intelligence, and ready to crack a lot of difficult problems in the life. Really democratic, isn't it?

But that's just an illusion.



If you don't know how the algorithm works, you would very often choose a "wrong" API (that is, a "wrong" mathematic model) for your problem. The bad thing is, you are likely to get some results, which seemingly "meaningful" from that API. If you are not trained in ML algorithms and data science, you often do not even know how to assess or how to make sense of the result from those magic APIs. As a consequence, you would tend to believe or have to believe whatever that API tells you. Do you think using those APIs this way is empowering, democratizing, or just fooling around/being fooled?

Sad thing is that many people tend to believe things they have no idea about, and many people are happy enough to see some results anyhow. Fortune tellers are thus making a business.

Even if you are smart enough to choose a suitable API after reading the short introductory manual, if you are not well trained MLer, you may fail to pre-process your dataset in a purposive way, or fail to specify some critical parameters correctly. The result may not well targeted to your problem or even totally off, but you are not able to recognize the bad result and present it as good. This way, you may end up creating new problems instead of solving problems.

My point is: No, knowing how to invoke ML APIs alone does not

  1. help you relate your problem to a suitable API
  2. help you tune critical input parameters which would make dramatic differences to results
  3. help you get the results truly relate to your problem even if you happen to choose a suitable API
  4. make you understand how Machine Learning works

In my opinion, ML APIs are maybe convenience for trained data scientists and computer scientists, but only a tiny tiny part of democratizing ML.

But, is there a way to bring the power of ML to the mass? Yes. Education is my answer.

Organizations such as Khan Academy, Coursera, and many other online education platforms, are providing excellent courses on Machine Learning and related mathematics. Many of these courses are free or charging a very small tuition (usually wouldn't exceed several hundreds of US dollars). This is true democracy!!

So, spend time on those in-depth courses. Watch the lecture videos, learn the algorithms thoroughly, do the assignments, implement those algorithms from scratch, test them out with various datasets and parameters, make sense of the results, know the art of handling errors and overfitting.

If something is highly abstract and complex (many ML algorithms are!), then work hard to know the technical details and do not try to take a shortcut. Educate yourself in depth, but not just superficially calling some ML APIs like a fortune teller. Become a learning machine is an excellent way of mastering machine learning.

Last but not least, I have something to say about today's API economy.

Vendors nowadays are selling algorithms as APIs, users call those API in their applications, and get desired results. So for the application developers, there would be less algorithmic work but more integration work. But those HTTPS calls usually have a long round-trip time, which will have an impact to the performance of your application.

Many algorithms can be implemented elegantly, and they run most efficiently in your own application. In addition, it's easier to tune the algorithm anyway you like, in your application. There is a motto "do not reinvent wheels" which encourages the use of APIs or libraries. But, MLs algorithms for different problems are often not the same wheels. There are so many things in the math which can be customized or should be customized.

Usually those ML API calls are not free. Each invocation costs a tiny fortune. If your app gets popular and have a lot of users, tiny fortunes sums up to a big fortune.

In my own applications, unless I have to, I use remote APIs as prudently as possible. Good local libraries are okay. They load fast and some open source libraries are free to use. With local libraries handling some common tasks (such as vector/matrix arithmetics, string processing, etc) for me, I build/train ML models in my own application. So there is no magic and I can tune the algorithms to very specific problems.

When do I do call third party APIs? For example, when the application needs weather forecast or map information. I do not have the historical weather data, nor do I have the map database. For common image recognition tasks, APIs are also good to call. Because, hardly anyone has millions or billions of pictures containing all kinds of objects to train her own recognition model. If resourceful vendors have those models trained for you, why not calling them. Decision trees for your company's internal processes? You'd better off building and training the models in your application, instead of calling third party APIs.  

No comments:

Post a Comment