Hanlun AI

Welcome to the Hanlun AI Challenge

Hanlun AI is in Hong Kong and we're looking for data scientists to help us build amazing products in Hong Kong, China and the world. We also have internships and graduate programs.

Still reading? OK! Here are a few programming problems that we found interesting and challenging. They are much tougher than anything you'd face during a job interview, so be ready for a challenge.

Feel free to use any language and take as long as you like. If you make progress on a problem, or you're not sure why your answer isn't accepted, submit what you've got to bizdev@hanlunai.com and we'll get back to you.

With the formalities out of the way, it's time to pick a challenge.

  1. Curve classification
  2. Noise classification
  3. Document annotation
  4. Plantar pressure & body balance
  5. Topological time series analysis

Curve classification

You are given sets of 2D points. Each set is a projection of points of one of the two 3D closed curves subdivided at equal intervals of the parameter to a random plane, i.e. the points on the curves are multiplied on the left by a random orthogonal matrix. The formulas of the two 3D curves are:

Curve 1
x(𝑡) = 9 cos 0.5π𝑡+0.3 sin 1.5π𝑡
y(𝑡) = 3 cos (1.5π𝑡+0.45π) (2+sin π𝑡)+0.5 cos 2.5π𝑡
z(𝑡) = 8.5 cos (0.5π𝑡+1.3π)+0.6 sin 2π𝑡
Curve 2
x(𝑡) = 2 cos 0.5π𝑡+0.3 sin 1.5π𝑡
y(𝑡) = 6 cos (0.5π𝑡+1.45π)+1.5 cos 2.5π𝑡
z(𝑡) = 7 cos (0.5π𝑡+1.3π)+0.6 sin 2π𝑡

where parameter 𝑡 ∈ [0,4].

You are given 500 sets of projected points from curve 1 Download and 500 sets from curve 2 Download to check your program. Each set contains 20 projected points. So if the first point is projected from a point on a curve at 𝑡 = 𝑡₀, then the second point is projected from the point on the curve at 𝑡 = 𝑡₀ + 0.2, the third at 𝑡 = 𝑡₀ + 0.4, and so on. Each set is given in plain text as a 2×20 matrix where each row of the matrices is in a line and elements of a row are separated by comma.

Also, you are given 500 sets of unlabelled projected points Download, each set also contains 20 projected points, and you need to identify which of the two curves each set is from.

Your answer should be a list of 500 integers. Each integer is 1 if the set is from curve 1, or 2 if the set is from curve 2.

A submission with perfect score had been received. You are encouraged to try other challenges.

Noise classification

You'll be given labelled noise data from different sources: construction site, vehicles, humans, etc. Please construct a neural network to classify new noise data to these classes.

Interested parties please contact us for the data.

Document annotation

The dataset Download has 220 items manually labeled with the following 10 categories:

You are asked to build or describe a model that annotate any documents or extract any useful information from the documents.

Plantar pressure & body balance

The plantar pressure board measures the pressure exerted by different points of the sole when a person is standing on it, hence a sparse matrix is generated.

The body balance sensor is attached to the waist and gives a single score of balance.

You are asked to build a model to find the relationship between these 2 measures. Interested parties please contact us.

Topological time series analysis

We built a footprint recorder for our Client to record the online study time of the students on each page and quiz.

If the student opens multiple tabs simultaneously, the study time will overlap. You're required to employ topological time series analysis to investigate the footprint of the students. Interested parties please contact us.