How I Discovered the Cause of Slowness in My Express App

Please note that this is a back-ported post from 2016.

Recently I discovered some slowness in my express app.

A bit background here: in the platform we are building here in Madadata, we are using an external service to provide user authentication and registration. But in order to test against its API without incurring the pain of requesting from California (where our CI servers are) to Shanghai (where our service providers' servers are), I wrote a simple fake version of their API service using Express and Mongoose.

We didn't realize the latency of my service until our recently started load testing, where it shows that more than half of requests didn't return within 1 second and thus failing the load test. As a simple Express app using Mongoose there is hardly any chance of getting it wrong, at least not anywhere near 1 second of latency.

v040

The screenshot above for running mocha test locally revealed that there is indeed some problem with the API service!

What went wrong?

From the screenshot I can tell that not all APIs are slow: the one where users log out and also the one showing current profile is reasonably fast. Also, judging from the dev logs that I printed out using morgan, for the slow APIs, their response time collected by Express is indeed showing a consistent level of slowness, (i.e. for the red flagged ones, you are seeing a roughly sum of latency of two requests above them, respectively).

This actually rules out the possibility that the slowness comes from connection, rather than within Express. So my next step is to look at my Express app. (N.B. this is actually something worth ruling out first, and I personally suggest trying one or two other tools rather than mocha, e.g. curl and even nc before moving on, because they almost always prove to be more reliable than the test code you wrote).

Inside Express

Express is a great framework when it comes to web server in Node and it has come a long way in terms of speed and reliability. I thought it is more likely due to the plugins and middlewares that I used with Express.

In order to use MongoDB as session store I used connect-mongo for backing my express-session. I also used the same MongoDB instance as my primary credential and profile store (because why not? it is a service for CI testing after all). For that I used Mongoose for ODM.

At first I suspected that it might be because of the built-in Promise library shipped by default in Mongoose. But after changing it with ES6 built-in one the problem wasn't solved.

Then I figured it is worth to check the schema serialization and validation part. There is only one model and it is fairly simple and straightforward:

const mongoose = require('mongoose') const Schema = mongoose.Schema const isEmail = require('validator/lib/isEmail') const isNumeric = require('validator/lib/isNumeric') const passportLocalMongoose = require('passport-local-mongoose') mongoose.Promise = Promise const User = new Schema({ email: { type: String, required: true, validate: { validator: isEmail }, message: '{VALUE} 不是一个合法的 email 地址' }, phone: { type: String, required: true, validate: { validator: isNumeric } }, emailVerified: { type: Boolean, default: false }, mobilePhoneVerified: { type: Boolean, default: false }, turbineUserId: { type: String } }, { timestamps: true }) User.virtual('objectId').get(function () { return this._id }) const fields = { objectId: 1, username: 1, email: 1, phone: 1, turbineUserId: 1 } User.plugin(passportLocalMongoose, { usernameField: 'username', usernameUnique: true, usernameQueryFields: ['objectId', 'email'], selectFields: fields }) module.exports = mongoose.model('User', User)

Mongoose Hooks

Mongoose has this nice feature where you can use pre- and post- hooks to interact and investigate document validation and saving process.

Using console.time and console.timeEnd we can actually measure the time spent during these processes.

User.pre('init', function (next) { console.time('init') next() }) User.pre('validate', function (next) { console.time('validate') next() }) User.pre('save', function (next) { console.time('save') next() }) User.pre('remove', function (next) { console.time('remove') next() }) User.post('init', function () { console.timeEnd('init') }) User.post('validate', function () { console.timeEnd('validate') }) User.post('save', function () { console.timeEnd('save') }) User.post('remove', function () { console.timeEnd('remove') })

and then we are getting this more detailed information in mocha run:

pre-post

Apparently document validation and saving doesn't take up large chunks of latency at all. It also rules out the likelihood a) that the slowness comes from connection problem between our Express app and MongoDB server, or b) that the MongoDB server itself is running slow.

Passport + Mongoose

Turning my focus away from Mongoose itself, I start to look at the passport plugin that I used: passport-local-mongoose.

The name is a big long but it basically tells what it does. It adapts Mongoose as a local strategy for passport, which does session management and registering and login boilerplate.

The library is fairly small and simple, so I start to directly edit the index.js file within my node_modules/folder. Since function #register(user, password, cb) calls function #setPassword(password, cb), i.e. specifically this line, I started to focus on the latter. After adding some more console.time and console.timeEnd I confirmed that the latency is mostly due to this function call:

pbkdf2(password, salt, function(pbkdf2Err, hashRaw) { // omit }

PBKDF2

The name itself suggested that it is a call to cryptography library. And a second look at the README show that the library is using 25,000 iterations.

Like bcryptpbkdf2 is also a slow hashing algorithm, meaning that it is intended to be slow, and that slowness is adjustable given number of iterations, in order to adapt against ever-increasing computation power. This concept is called key stretching.

As written in the Wiki, the initial proposed iteration number was 1,000 when it first came out, and some recent updates on this number reached as hight as 100,000. So in fact the default 25,000 was reasonable.

After reducing the iterations to 1,000, my mocha test output now looks like:

iter-1000

and finally it is much acceptable in terms of latency and security, for a test application after all! N.B. I did this change for my testing app, it does not mean your production app should decrease the iterations. Also, setting it too high will also render the app vulnerable to DoS attack.

Final thoughts

I thought it would be meaningful to share some of my debugging experience on this, and I'm glad that it wasn't due to an actual bug (right, a feature in disguise).

Another point worth mentioning is that for developers who are not experts on computer security or cryptography, it is usually a good idea not to homemake some code related to session/key/token management. Using good open source libraries like passport to start with is a better idea.

And as always, you'll never know what kind of rabbit hole you'll run into while debugging a web server - this is really the fun part of it! 

如何问优秀的问题

Please note that this is a back-ported post from 2016.

This post is a translation of Julia Evans’ How to ask good questions. I personally like it a lot and think it’ll be a good idea to share with translation. She has acknowledged the translation but not endorsed or verified it.

这篇文章是 Julia Evans 的这篇博客文章的翻译。我觉得写得很不错所以翻译了这个版本。翻译得到了原作者授权,但没有审核或者背书。

提优秀的问题是一个软件开发时超级重要的技能。我在过去几年中做的越来越好(以至于我的同事还经常评论和提及)。以下的几点是我很受用的原则!

首先,我其实非常同意「你完全可以问很蠢或者不那么优秀的问题」。我自己就经常问别人比较蠢的问题,那些问题其实我自己可以 Google 或者搜索代码库就可以解决。虽然我大多数时候尽量避免,但是有时候还是问了很傻的问题,之后也不觉得是世界末日。

所以以下的策略并不是关于「你在提问之前必须要做这些事情,否则你就是一个坏人且应该自责」,而其实是「以下这些事情曾经帮我更好的提问和获得我想要的解答」。

什么算是一个优秀的问题?

我们的目标是问那种容易解答的技术概念的问题。我经常遇到这样的人,他拥有一堆我想了解的知识,但他不总是知道如何以最好的方式解释给我听。

但如果我可以问一系列好的问题,那我可以帮那个人有效地解释清楚他所知道的,并且引导他,告诉我我感兴趣的东西。所以我们来讲讲如何做到这一点!

陈述你所知

这是我最爱的提问技巧之一!这种问题的基本形式是:

  1. 陈述关于这个话题你目前的理解
  2. 然后问「我理解的对么」?

例如,我最近和一个人(一个非常优秀的提问者)讨论计算机网络!他说:「所以我的理解是,这里有一系列的递归 DNS 服务器……」。但那不对!事实上,递归 DNS 服务器是没有「一系列」的(当你访问递归 DNS 服务器的时候,这里只有一个递归服务器参与其中)。所以他事先陈述了他的目前的理解,这帮助我们很容易的就厘清了其中的工作原理。

我之前对 rkt 感兴趣,但并不理解为什么 rkt 跑容器的时候比 Docker 多占用了那么多磁盘空间。

但「为什么 rkt 比 Docker 占用更多的磁盘空间」感觉上不像该提的问题——我或多或少明白它的代码是如何工作的,但并不明白为什么他们要这么写。所以我向 rkt-dev 邮件组写了这个问题:「为什么 rkt 存储容器镜像的方式不同于 Docker?

我:

  • 写下了我对 rkt 和 Docker 是如何在磁盘上存储容器的理解
  • 提出了几点我认为的他们如此设计的理由
  • 然后就只问了「我的理解对么?」

我得到的答案都超级超级的有用,正好是我想要的。我在组织问题上花了不少时间,才得到我满意的方式,但我很高兴我花了这些时间,因为它帮助我更好的理解了来龙去脉。

陈述你的理解其实一点都不容易(它需要你花时间去思考你所知,并且理清你的思路!!)但是它很有用,而且让回答你问题的人可以更好的帮助你。

你的问题的答案应该是一个客观事实

我一开始的很多提问都有点模糊,比如「SQL 里面 join 是怎么工作的?」那个问题并不好,因为 join 的工作原理分好多部分!对方怎么可能知道我想了解的是哪一部分呢?

我喜欢问的那些问题,他们的答案是一个简单直接的事实。比如,在我们的 SQL join 的例子里面,一些以事实为答案的问题可能是:

  • Join 两个大小为 N 和 M 的表的时间复杂度是什么?是 O(NM)? 还是 O(NlogN) + O(MlogM)?
  • MySQL 是不是总是在 join 之前先对 join 的列进行排序?
  • 我知道 Hadoop 有时候会做 hash join——其他数据库引擎是不是也会使用这个?
  • 当我在一个有索引的列和一个未索引的列上做 join 的时候,我需要事先对未索引的列排序么?

当我问这样超级具体的问题的时候,对方不总是知道答案(但这没关系!!),但是至少他们明白我想了解哪种问题——比如,明显我对如何使用 join 不感兴趣,我想了解的是具体实现和算法。

敢于说出你不理解的地方

当有人给我解释一个东西的时候,他们常常会说一些我不明白的东西。比如,有人给我解释数据库的时候可能会说:「好,我们在 MySQL 里面使用了乐观锁,然后……」我完全不知道「乐观锁」是什么。所以,那就是一个理想的提问时机!:-)

学会打断对方,然后说「嘿,那是什么意思?」是一个超级重要的技能。我觉得这是一个自信的工程师的素质之一,而且是件很棒的事情。我经常看到一些资深工程师,他们经常提问要求清楚解释概念——我觉得当你越来越对你的技能自信的时候,这一点也变得更容易。

主动提问越多,我就越觉得请求对方解释这件事情很自然。事实上,在我解释事情的时候,如果对方不主动问题,我会担心他们不是在认真听。

同时这也创造了更多的机会来给回答问题的人承认,他们已经穷尽了他们所知。我经常碰到提问的时候对方不知道答案的情况。我问的人通常都比较擅长说「不,那个我不知道!」

指明你不清楚的概念

刚开始现在这份工作的时候我在数据组。在了解我的新工作职责的时候,里面全是这种词!Hadoop、Scalding、Hive、Impala、HDFS、Zoolander 等等。我之前可能听过 Hadoop,但基本上不知道以上任何单词的意思。里面有的词是内部的项目,有的词是开源项目。所以我就开始请别人帮我了解其中每一个概念的含义以及他们之间的关系。期间我可能问过这样的问题:

  • HDFS 是一个数据库么?(不是的,它是一个分布式文件系统)
  • Scalding 用到了 Hadoop 么?(是的)
  • Hive 用到了 Scalding 么?(没有)

因为实在太多了,事实上我为这些所有概念写了一个「词典」。了解这些概念帮助我找到了方向,以及在之后更好的提问。

自己做些调研

当我打出上面那些 SQL 问题的时候,我在 Google 里面搜索了「如何实现 SQL 的 join 语句」。点击其中一些链接之后我看到了「噢,我明白了,有时候有排序,有些时候有哈希 join,我都听过」,然后写下了我的一些更具体的问题。一开始先自己 Google 一番帮我提出一些稍稍更好一点的问题。

话虽如此,我觉得有的人太坚持「永远不要在自己 Google 之前提问」这件事——有时候我和别人吃午饭时候,好奇对方的工作内容,我会问一些比较基础的问题。这完全没问题!

但自己做些调研真的很有用,而且做足功课之后可以提出一系列很棒的问题,这真的挺有意思的。

决定谁是请教对象

这里我主要讨论的是问你的同事问题,因为我自己大部分时间都是这样。

我在问同事问题之前,会做以下考量:

  • 对对方来讲这是一个好的时机么?(如果对方正在处理一个紧迫的事情,很可能不是)
  • 我问这个问题节约的时间是不是值得我提问所花的时间?(如果我提问需要5分钟,可以节约我2小时的时间,那太棒了 :D)
  • 对方需要花多少时间来回答我的问题?(如果我有个需要半小时时间的问题,我可以和对方预约一个之后的一整块时间;如果我只是有个小的问题,可能我就会立刻就问)
  • 对方是不是在这个问题上太资深了?我觉得总是问那些对某个话题最在行和资深的人,是挺容易的陷进去的一个误区。但是常常去找那些稍微不那么资深的人会更好——他们常常可以回答你的大多数问题,回答问题的压力也分散了,而且他们还可以有机会展示他们的知识(这一点很棒)。

上面的原则,我也不总是搞的清楚,但是考量一下他们的确帮到我很多。

另外,我经常花更多时间问离我近的人问题——他们每天和我交流最多,我可以很方便的问他们问题,因为他们已经有了我做的工作的背景知识,也很容易给出建设性的答案。

ESR (Eric Steven Raymond)写的「如何聪明的问问题」是一篇流行但挺刻薄的文章(它开头就有「我们叫这样的人怂货」这样的糟糕语句)。这是关于在互联网上向陌生人提问的。在网上向陌生人提问是一个超级有用的技能,也可以给你很有用的信息,但它也是提问的「困难模式」。你的提问对象不了解你的处境,所以你需要以成倍的耐心去陈述你想了解什么。我并不喜欢 ESR 的那篇文章,但是它讲了一些有价值的东西。其中「如何有建设性地回答问题」一章其实非常棒。

问那些可以揭示隐藏知识的问题

以提问的方式来揭示隐藏的假设或知识是一种高级的提问技巧。这类问题其实有两个目的,其一是为了获得答案(可能有些信息是一个人知道但另外的人不知道的),其二是为了指出其中隐藏的信息,分享出来有好处。

Etsy 的 Debriefing facilitation guide 其中的「提问的艺术」一章是在讨论突发事件的背景下,对此的一个精彩介绍。以下是来自其中的一些问题:

当你怀疑这类失败发生的时候你会寻找什么迹象? 你怎么判断一个「正常」的情况? 你怎么知道数据库下线了? 你怎么知道你需要报告给哪个组?

这类(看起来挺基本,但却并不明显的)问题,在那些有些权威的人问出来的时候特别有效。我特别喜欢那种情况,就是一个主管或者高级工程师问类似「你怎么知道数据库下线了」这样的基本但却重要的问题,因为这时候会让不那么有权威的人之后有条件来问同样的问题。

回答问题

André Arko 的如何为开源软件做贡献文章里面,我最喜欢的一部分是:

现在你读完了所以所有的 issues 和 pull requests,开始寻找你可以回答的问题了。用不了多久你就会发现有人在问之前回答过、或者在你刚刚读过的文章里面回答过的问题。回答这些你可以回答的问题。

如果你刚开始了解一个小的项目,回答他人的问题是一个非常棒的强化你的知识的方法。每当我第一次回答一个关于某个话题的问题的时候,我都感觉「噢天呐,万一我的答案是错的呢?」不过一般我都可以回答正确,然后我就会觉得更了解了这个话题一些。

问题也是一大贡献

好的问题也可以是对社区做的一大贡献!我之前在 Twitter 问了一堆关于 CDN 的问题,然后在我的文章里面写出了总结。很多人告诉我他们很喜欢那篇文章,我觉得我提那些问题不只是帮到了我,也帮到了许多人。

有很多人很喜欢回答问题!我认为提出好的问题也是一件你可以为社区做的很棒的事情,不只是「提出好的问题,让对方从非常不爽,变得只是有点不爽」而已。

编程很难(但之后会变得容易一些)

Please note that this is a back-ported post from 2016.

今天我们的前端工程师告诉我说,postcss 的一个小 bug/feature 花了他很多时间才搞清楚。

Postcss 是我们用的一个处理 CSS 的前端框架,和我们目前也在用的 webpack 可以一起使用。后者负责把各种 JavaScript,CSS 以及图片等资源进行打包,而前者的主要工作就是在此之前把 CSS 代码进行一遍预处理。

比如 Webpack 的配置文件里面可能有这么一段:

// webpack.config.js module.exports = { // 省略 postcss: [require("autoprefixer"), require("precss")] // 省略 };

其中的 postcss 部分是一个数组(array),里面的每一个元素都是一个 postcss 的插件。Postcss 自己做的非常模块化,大部分任务都交给各式各样的插件来完成,通过组合不同的插件来实现多种功能。「Do one thing and do one thing well」,多么 UNIX 的哲学。

但我们的前端工程师遇到问题是,他之前不知道这个定义的顺序是有意义的,还以为只要把所需的功能模块放上去就好了。我说,其实这个和 UNIX 的 pipe 是一样的,或者你如果用过 gulp 也更能清楚这种流式处理(Streamline/Pipeline)系统的工作原理:顺序其实非常重要。他总结说是自己只用过 webpack 的原因。但是我仔细一想,其实这种情形我自己倒是反复遇到过。

「总会有漏洞的抽象层」

记得本科的时候读过 Joel Spolsky 的文章 The Law of Leaky Abstractions,大体讲的是,所有的抽象层都会有漏洞,总会出现一个情况下,你必须绕过抽象层去了解背后的细节。文中举了 TCP 作例子,说很多程序都依赖 TCP 协议,把易错的 IP 层协议封装好,让你发送的数据总是有序、完整和没有冗余的。但是总会出现一些情况,比如老鼠咬断了网线,那时候你还是会观察到抽象层的破碎(TCP 不再正常工作),不得不处理那些意想不到的情况。

抽象是一个超级强大的工具,事实上有句很有名的话就是说:「计算机科学里面,没有什么问题是多加一层抽象不可以解决的」。比如上面 webpack 的配置文件其实就是对流式代码的一种封装和抽象。而且抽象其实不只是计算机科学里面的概念。它是一种人类最最基本的思维工具,可以帮助人脑减少同时要处理的信息量,把不重要的、可重复的细节滤掉,而着眼于目前最重要的东西。抽象大部分情况下都是思维的利器,也是每一个程序员必须掌握的本领。

一切都很美好——直到你第一次碰壁。

「编程很简单」

最近有个好朋友在做零基础编程培训,课程设计的紧凑但很有含金量:他会教所有学员在三周时间内,先学会用 HTML 搭建一个静态网页,然后学习基本的 CSS 知识,用 Twitter Bootstrap 美化页面,最后学习 JavaScript 给页面加上一些动态的元素。

和许多其他给零基础的人群做的培训一样,他希望在很短时间内让大家了解什么是编程,什么是编程思维,也顺便学会如何和程序员同事、朋友沟通共事。我觉得这个目标非常的实际和靠谱——当然,与之相对应的就是许多夸大宣传的所谓程序员培训,希望在短期内让人速成,可以学会数据挖掘,AR/VR,和人机交互等「酷炫」的技能——说到底,最多可以让你走马观花看看而已(虽然这个本身也是好事)。

的确,现在的很多程序语言(比如 Python,JavaScript,Swift)已经比以前(比如 C,C++)好学太多了,不用手动管理内存,不用学习文件系统,甚至可以拖拽式的操作,完成一个基本的「编程」。我个人非常喜欢这些「user friendly」的编程语言,而且无论是 code.org 还是 code academy 都是非常有价值的存在——因为他们都旨在激发大家对编程的兴趣,而编程几乎肯定是未来的必备技能(最多是换一种形式罢了,编程思维才是核心);他们无一例外的都要告诉受众:编程很容易,开始学习吧!

但,其实这不是故事的全部:其实编程很难;它一开始比较容易,之后就会变得比较难;但等你熬过去,它又会变得比较容易。而这个过程下来,你会真正的感觉到受益匪浅。而我认为,这一切背后的原因,很大一部分就是来自于这个无所不在的、总会有漏洞的抽象层。

比如,我的那个朋友在备课的时候,和我探讨过以下几个问题:

  • 为了保证学员在最短时间内接触最有意思的东西,编程语言用相对好入手的 JavaScript,因为网页编程是最「所见即所得」的编程方法。但是每个人用的浏览器可能不同,所以最好用 Twitter Bootstrap 这样的 CSS 框架,至少整体页面的 look and feel 不会因为浏览器不同而不同——但是,但是,我和你打赌肯定会有人用 IE 6 的,oops!
  • 编程环境也必须得统一,否则就光去解决每个人的配置问题了(想想几种不同的换行符,想想文件编码,想想至少四种不同的 npm 版本)。我提议用 Vagrant,让每个人都用同一个虚拟机镜像,但是这个方案因为中国的网速最后作罢。(不过好在他最后直接找到了 codepen 这一个终极的受控的环境。)
  • 为了让大家不受网络连接的问题影响,需要找到靠谱的代理或者 VPN,否则一些 CDN 可能会受影响,也会出现比如 CSS 无法加载的问题,而如何教会大家使用网络工具本身又是一个挑战

你看,虽然已经是一个及其简化的配置(所见即所得,不需要编译),一个极其容错的编程平台(浏览器),也还是到处都是「有漏洞的抽象」。如果一个人因此认为自己掌握了前端编程而信心满满的时候,他下次换个机器,换个浏览器,甚至哪怕忘记打 </script> 里面的 / 的时候,就会感觉到受挫和沮丧。

什么时候变得容易?

在过去的编程经历里,我自己也无数次遇到过这样的「有漏洞的抽象」,而以往的经历往往是这样的:

  1. 发现一个很好的框架/抽象/编程语言特性,玩起来发现很有意思,可以提高效率,做到以前不能或者不容易做的事情
  2. 直到你用它来做一些更重要的事情,用到更重要的项目里面,才发现原来里面这么多的「坑」,于是你不得不开始研究它背后的原理,读文档,搜 StackOverflow 的问题,读源代码,用不同的配置方法去调试,直到你搞清楚了它的原理,它究竟解决什么问题,解决这个问题的思路是什么,具体做法是什么
  3. 这个时候再回过头来看,原来它这层抽象是有这么一个意义,而且比它取而代之的之前方法更简洁、高效,只是唯一的弊端是你之前踩过的那些坑,并且你知道了如何避免,甚至改进它

从这个角度看,其实学习编程的过程也许也可以类比:

  1. 发现编程很好玩,可以用来写一个简单的程序,可以给朋友炫耀,自己也体会到了动手的快乐
  2. 直到你想深入学习,才发现踩了大坑,你开始去学习文件系统,学习内存管理,学习好的编程习惯,学习数据管理和数据库,学习数据结构,学习离散数学,学习编译器,你才发现这些东西原来都联系在一起,才发现那个「找不到对象」的错误提示背后指的是数据库没有启动,而不是一个悲伤的故事
  3. 这个时候再回过头来看,原来编程和编程思维是这样的,和想象中有那么多的不一样。原来 DeepMind 下围棋赢了人类和强人工智能的到来之间没有那么强的因果关系,也不再会被新闻媒体随随便便的忽悠了;最关键的,是你掌握了一整套分析和解决问题的思维框架

不难发现以上都是 2. 最长,这也的确是最漫长的一个过程,甚至还意味着很多次在 123 之间的反复循环。

但是结论也许也是这样的:一切都很美好——直到你第一次碰壁,然后你摸索着沿着墙壁走,逐渐的,找到了门

Review of University of Toronto’s Coursera Specialization on Self-Driving Cars

Last weekend I’ve finally completed University of Toronto’s specialization on self-driving cars. It has been a rewarding journey so let me take a moment to review and reflect.

Generally I would recommend it to people with engineering background and anyone who’s interested in this field. The specialization has four courses, which cover introductions and a kinodynamic model, sensing and localization (LIDAR and IMU), visual perception and 3D modeling, and lastly motion planning and actuation. Most of the topics are useful and necessary in order to get into this field, and combined they provide a good overall feel of what it takes to build a self-driving car.

The courses

For 1st course, the bicycle model of a car appears amazing simple yet powerful when it comes to modeling the trajectory of a car. I liked how it shows one can approach an engineering problem by layering and abstractions, e.g. by simplifying four wheels into two without losing generality. Also as an introductory course, it contains many real-world engineers and entrepreneurs’ opinions on the industry, the project, the problem domain, and future expectations. I specifically like Paul Newman’s take on the industry and why building a self-driving car which can adapt to all variety of infrastructure instead of building an infrastructure to some specific spec is necessary.

For the 2nd course, as a CS-background I find it challenging yet very rewarding to get to understand and implement Kalman filtering and its variations. I even went out to compare the slides on Extended Kalman Filter against the real open-sourced code in Baidu’s Apollo project, and you might be surprised that the formulas almost get plainly translated line-by-line into C++ code.

3rd course is the easiest one for me because most of the deep learning part was already covered in deeplearning.ai’s specialization. What was new to me was mainly the OpenCV part where some legendary algorithms like SIFT still play a shining role. Also I find this course most comprehensive and thorough where it starts with a classical pinhole camera model and works all the way towards a recent solution to image semantic segmentation problem using VGG-net. 

4th course builds up a top-down approach, where the high level problem of route planning is solved using Dijkstra’s algorithm, and then a mid-level behavior planning problem tackled using finite state machines, and then finally a local-level maneuver planning problem using parametric curve solving. The highlight is the final project where all the pieces are put together in order to successfully drive the vehicle to perform obstacle avoidance, lead vehicle tracking, as well as stop sign handling. This almost gives you a feeling of how a real self-driving car performs in action.

Room for improvement

I feel like the programming assignments could’ve been more well-rounded, because there are sometimes bugs in provided utility functions, a Python version mismatch that broke the Jupyter-Hub, and also the feedback from wrong submissions was very minimal - a better prepared assignment could’ve included more intermediate steps of submissions so that learners could sanity check their progress as check-ups.

Also the teaching staff were not responsive enough when it comes to answering questions in the forum. I feel like the forums were only used by students to help each others (although very useful as well).

Outside the scope of this specialization I also find that OpenCV has a relatively poor documentation. Many of the functions’ Python version has wrong return type and/or little explanation of the algorithm background.

Afterthoughts

This area is almost white-hot in recent years. I can count a few high-profile startups as well as big names (Pony.ai, Tesla, Zoox, drive.ai, Waymo, MomentaTusimple, Baidu’s Apollo, Uber’s ATG, Cruise, comma.ai, Mobileye, etc. just to name a few without a specific order), each with a different approach and focus area. They are also taking in huge amount of investment money and resources and racing to build a larger fleet of autonomous cars by the day.

I think this specialization gives a glimpse of what the autonomous driving future will be like. Indeed, “any sufficiently advanced technology is indistinguishable from magic”. I think the moment of that magical future is not yet near. Like Kaifu Lee and Rodney Brooks challenged, it’s not anywhere near 2020. My (unqualified) opinion is that it’s not going to come within the next five years, but in mid-/long-term it’ll be possible in our lifetime (and luckily much sooner).

My short-term pessimism comes from understanding of the problem and design domain we need to tackle as learned from the cource material. And these are (like anything else) threefold: technology, talents, and capital.

Technology-wise I believe the current trend of boost in the industry is largely driven by the software and hardware upgrades from dissemination of deep learning (near-realtime object detection algorithms like Yolo, and cheaper GPU/TPUs). It certainly gets us near the goal of L4/L5 autonomy but to really get there it’s not enough. Tesla hasn’t fully convinced everyone that Lidar is optional, nor hasn’t the accumulated number of vehicles of all the fleets driven down the price significantly enough (I think). On behavior planning the usage of reinforcement learning is still early stage, and it has to deal with explainability of the agent’s policy (to pass regulatory and media’s scrutiny). As for the access/sharing of data, I haven’t yet seen the “ImageNet” moment (e.g. like what Bert is likely perceived in NLP community) of high-precision map and many other areas.

Predictions like those could easily be wrong but it stands that deep learning itself hasn’t moved everything in the industry just yet, and there are plenty of such technicalities that need be solved, which is hard within the next 5 years.

Talents, both in engineering and management, are lacking. I think some of the founders of the startups can raise tons of money because of their track of records in Internet and software industry. But to tackle this problem one needs more than that. Managing hardware supplies, OEMs, risk assessing, quality control, etc. is (probably) harder than pushing accuracy rate up 1% or fixing a software continuous delivery pipeline. Good engineers are also hard to be found. Earlier definition of full-stack engineers (in the Internet industry) might involve CSS all the way down to database and dev-ops, but to build a self-driving car it brings a whole new level (you’ll know how to calibrate cameras, analyze point clouds data, understand gears and transmissions, and also how to program RL agents). If we don’t need that type of full-stack-ness, we still need engineering leaders to cover the whole lot. Training such people takes time and a lot of failures. The progress probably will take many deaths of companies which train talents in the meantime. Hopefully the it is slow but steady. I have no much authority in this area but as I will happily be proven wrong.

Given the timespan will take likely > 5 years, it’s also a challenge for venture capitals. Large corps like Google can be supported by its board to invest in this type of moon-short projects, but for financial VCs, their LPs might not be that patient. Maybe some of the startups either have to pivot and re-focus on something more achievable than L4/L5 autonomy or they have to persuade either their investor or acquirers. For high profile startups, both ways are exponentially more difficult when you have already raised hundreds of millions of USD.

Having written all that I think still the problem is hard but solvable. It might take another example of “PayPal mafia” story where companies appear, growth, burst, and them disseminate seeds of talents, industry know-hows, etc. and then just behind the horizon a new level of advance can be made. The whole society is actively excited about this area and both US and China (and they are not the only ones) are pouring financial and policy resources. Maybe a new form of venture capitalism can be formed to adapt to this industry which can foster better cross-industry corporation, etc. I don’t know. As an engineer background I think this might just be the Apollo project of our time, and it’s exciting.

Hello world

Starting a new blog is both exciting and daunting experience for me, because thinking of the actual real commitment and future (less possible) serendipity is hard to be materialized and weighted. So maybe laying the expectation low would be a better strategy.

It’s the end of year 2018 and many things have changed especially for the last four years. I do not usually care for looking back and this time it’s no different: in this piece I should only try to cover my expectation for the next year and this blog.

In the past few years I’ve been reading a lot but to be honest most of them are just random articles on web that I forget within an hour. To change that I think writing down my thoughts and rumination upon the article and subject can be useful. (Also this applies to books.) So to write is to remember.

Making myself write longer blog articles can also hopefully help building my ability to articulate and then be more terse again. I read about an article from someone who had been blogging since sixteen and he shared that writing more than two thousand words per article can help with SEO as well: this might be a virtue of longer articles. So in this sense to write is to articulate and oil the vehicle of expressiveness - because expressiveness limits one’s thoughts as well.

Also mentioned in that articles is the importance of picking an audience: because to write is to express oneself and spread ideas; people are busy and won’t care about who you are or what you did - it had to be relevant to them. I hope this can help me build up the sense of meaningfulness as well as a sharper sense of what really matters to people and their daily and professional life (of course within a technology context).

Hopefully that laid the background of the why for this new blog.