Database vendors understand that SQL, while great for simple queries, is not a good enough interface for complex analytics. Here at Coders’Co. we would say: use Rax :-). Since R so popular among data scientists, though, many database vendors attempt to somehow integrate R into their technology. This post is the beginning of a series of posts about R and databases.
Recently Microsoft announced that it will support in-database R in SQL Server 2016:
It’s an interesting development that will save data scientists copying their data between their database and their R installation. From what I understood so far, though, it only cuts down on network traffic between the SQL Server machine and the data scientist’s laptop where R is running. The R process will be run inside the SQL Server process, but in a sort of sandbox. So data will still have to be copied out of the tables and into the R process’ memory. Also, the alleviation of the data-size limitations will not be that great. Unless the implementation of R internal data structures and algorithms is re-done, the physical memory size of the machine on which the R process is running will still be the limit. And since this memory has to be shared with the SQL Server process, I wonder how that will work out.
But still I’m curious. Especially about how they will solve the syntax problem. Right now the whole R script is glued into a single string which is completely unreadable. They must be working on something better than that.