In October of 2012, the Harvard Business Review declared Data Scientist to be the sexiest job of the 21st century.
Around that time, the world at large was catching on to the fact that 50 years of artificial intelligence research had finally begun to pay off. A century is a long time! Is data science worth all of the hype? Should we be looking for the next big thing, or positioning ourselves to catch the wave?
How did we get here?
At the heart of data science lies business intelligence. Every business on the planet is looking for an edge in the game; otherwise, we’ll be looking for a new occupation. But what does that have to do with AI and deep learning, and why everyone is so crazy about it?
Alan Turing presented the first detailed design of a computer program in 1946. In the following years, he was deeply focused on the problem of artificial intelligence. AI\ML, Deep Learning, and Data Science all have roots in that 40s and 50s era. Studies in neural science inspired artificial neural networks (ANN), and programs capable of improving performance by learning from experience became machine learning(ML). Early experiments in ML found success at checkers, chess, speech recognition, automatic translation, and other useful functions; gaining attention and funding from both the public and private sectors. Despite early optimism, AI techniques failed to deliver on expectations. Over the years, AI research experienced a few cycles of promise where breakthroughs led to hope, funding, and eventual disappointment resulting in a loss of funding.
The biggest breakthroughs were to come in the mid-2000s. By this time, the internet had gained popularity and become a vast repository of data which researchers could use to train machine learning algorithms. Additionally, our processing capabilities greatly surpassed what was previously available. During this time, researchers discovered a new way to train ANN that proved more successful than previous attempts. A turning point came in 2009 when Stanford researchers found that using a high-speed GPU to train ANN produced results 70% faster than traditional methods. Consequently, experiments that used to take weeks could now be completed in days. By 2012, the tech industry was paying close attention to what became known as Deep Learning. At its core, this advance applies massive amounts of data at high speed to methods that have been around for a long time.
The Sexiest Job of the Century?
Today, we are drowning in data.
We know that there is an incredible amount of value piling up on servers across the globe. The difficulty is in extracting valuable insight from that information. That’s where data science and deep learning come in. Computers are now able to process vast repositories of data, too large for human analysis.
The article declaring data science to be the sexiest job of the century tells the story of LinkedIn, who had 8 million users in 2006 but hadn’t gotten them engaged with the site. LinkedIn’s CEO brought in Ph.D. Physicist, Jonathan Goldman, to pour through that data and see what he could make of it. With a wide degree of creative liberty, Goldman began to study the way people connected with each other on the site. His earliest breakthrough was to offer users suggestions of who they might know, based on the relationships between the connections they already had. The click-through rate of this feature was 30% higher than any other attempt to get users interacting further with the site. Thanks to this single addition, Linkedin experienced millions of new page views, growing significantly in a short period of time. That story speaks volumes about the power of data; after barely scratching the surface, an insight was found with great benefits.
Data scientists show up to work with an arsenal of tools at their disposal. To begin with, they retain an understanding of linear algebra, regression, statistics and probability theory. Typically, they make use of programming languages such as R, Python, and others. Software solutions such as Hadoop, Spark, SQL, and machine learning make up an important part of their toolkit. In addition to technical skills, a data scientist must also possess “intellectual curiosity”, in-depth domain-specific knowledge of their chosen field, and the ability to explain complex findings to a non-technical business team.
The Dirty Work
Data scientists, along with data analysts and engineers, do the difficult work of handling and preparing data to find those valuable gems. Applying the findings of big data to a business are not necessarily the hard part. The hard part is digging through the data. And how sexy is that job, exactly? Not very! The vast ocean of data is disconnected, unstructured, full of extraneous information, and possibly out-dated or irrelevant by the time scientists get their hands on it. Data science may be the sexiest profession, but it’s not very sexy work.
As much as 80% of data science involves data analysis and engineering. Engineers create the digital pipelines to deliver data to analysts who clean and prepare it. Connecting data sources and preparing data is a lot of work. At this stage you may begin experimenting with algorithms, perhaps transforming the data multiple times while refining the approach. Engineers also play a role on the other end, porting the work of a data scientist to a client facing application.
Today, much of the work that used to be done by scientists are now taken care of by off-the-shelf applications. As more of these applications arise, much of the work is automated and passed on to less specialized hands. Some say that automation is putting jobs in data science. However, that might be looking at the field a little too narrowly. Last year IBM reported, 90% of the world’s data had been created since 2016. If we don’t find more ways to automate data-processing, faster than the data grows, we’ll all quickly drown in a sea of data.
One of the most popular tools for data-processing is the Python programming language. It is an easy to use general purpose language that also has scientific functions — making it great for data science, and just about anything else you need to do. According to Stack Overflow, Python was the fastest growing language over the past 6 years. There could be a few reasons for that, but I suspect that the deep learning revolution, and the dawn of the data science era, might have something to do with it. Besides data science, Python is also the fastest growing language in the FinTech industry, very popular for back-end web-development, and a commonly a required skill for blockchain developing. Currently, the median income for a Python developer in Europe is €43,958 for data analysts and €54,632 for data scientists.
Perhaps the sexiest thing about data science are the applications that harness big data; enhancing our lives and the market share of companies deploying them. Thanks to data science, graphics processors and the vast amount of training data available online, we’ve seen dramatic improvements in many technologies.
If you’ve noticed Google Translate getting a lot better in the past few years, you can thank the deep neural net they now use for translations. Excelling anything previously available, we now have voice-based digital assistants, image recognition, automated financial analysis, drug discovery, and many other data-driven innovations. The major transformation of those technologies only came in the past 6 years! Considering that the volume of data is growing faster than our ability to process it, data science professionals aren’t currently at risk of being automated out of existence. Their work will continue to deliver innovative solutions and transform our world.