Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For me, the exciting part is to understand how the "blackbox" can learn and to find a representation of the data that makes this box to learn.

For instance, I've been working in users profiling and it's been a challenge to find which features and in which representation allow the model to learn. It's fantastic when you make a little change in a feature (for instance, use the median instead of the mean) and your model suddenly gets a +5% acc.

The field in the real world is not as simple as to make an API call. The data in the wild is really complicated. To get value from this data is the real challenge.



"replace the mean with median"

This illustrates the black-box aspect of it. You changed something and the results are affected but you don't know why. Median has a built-in implicit filtering (it's not affected by extreme outliers like the mean), so it could simply be that you needed to filter your inputs. But won't know, because... black box.


and the results are affected but you don't know why

Well, that's just your assumption without knowing the exact problem and I think you are missing my point.

You can approach to data preparation by randomly changing things, and maybe you can get some interesting results but I promise you you will fail many many times. Other way is to know what does it means to change the mean for the median for instance (as I mentioned in this just random example), and I promise you will find better solutions.

The idea is not just "change and test" and see what happens. The interesting part is to understand how the model uses your representation and why one is "better" than another.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: