How to become a data scientist: A cheat sheet

How to become a data scientist: A cheat sheet

If you are interested in pursuing a career in data science, this primer is a good reference for information about necessary skills, salaries, training resources and more.

data-science.jpg

Image: Wright Studio/Shutterstock

Data scientists are in great demand, taking the coveted No. 2 spot on Glassdoor’s Best Jobs in America list for 2021 with 5,971 job openings, and the demand continues to grow. In 2012, the Harvard Business Review billed data scientists as the sexiest job of the 21st century.

Among data scientists, many different jobs can exist. “There are data scientists that focus very much on advanced analytics. Some data scientists only do natural language processing,” said Dana Seidel, data scientist. “And the work emcompasses many diverse skills, she said, including “project management skills, data skills, analysis skills, critical thinking skills.”

SEE: Hiring kit: Data scientist (TechRepublic Premium)

More about big data

The field is in such high demand because businesses need data analytics to stay competitive. “In the end, the main reason demand is still high is because if your competitors are relying on data-driven decision making and you aren’t, they will surpass you and steal your market share.

Therefore companies have to adapt and employ data science tools and techniques or they will simply be forced out of business,” said Christopher Zita in an article on Towards Data Science.

To help those interested in the field better understand how to break into a career in data science, we’ve created a guide with the most important details and resources.

SEE: All of TechRepublic’s cheat sheets and smart person’s guides

Executive summary

  • Why is there an increased demand for data scientists? Nearly every company now has the ability to collect data, and the amount of data is growing larger and larger. This has led to a higher demand for employees with specific skills who can effectively organize and analyze this data to glean business insights.

  • What are some of the data scientist job roles? Core data scientist, researcher and big data specialist are some of the top job titles in the data science field.

  • What skills are required to be a data scientist? The common skill set for a data scientist includes, machine learning, Python, Hadoop SPARK, SQL, according to Glassdoor

  • Which industries have the hottest markets for data scientists? The cities with the fastest-growing tech salaries between 2019-2020 according to the DICE 2021 Tech Salary Report include Charlotte, North Carolina (+13.8%); Orlando, Florida, (+13.4%); New York, New York (+11.6%); Austin, Texas (+9.7%); and Philadelphia, Pennsylvania (+8.3%). Other top-ranking cities in this category were Detroit, Phoenix, Houston, Minneapolis and Baltimore. In addition to the traditional “tech hubs” this list includes a number of emerging cities. Some of the top-paying industries are aerospace product and parts manufacturing, $119,590; telecommunications, $102,180; federal executive branch (OEWS designation) $101,560; oil and gas extraction, $101,130; and software publishers, $96,510.What is the average salary of a data scientist? The national average base salary for data scientists was $117,288 as of September 2021, according to Glassdoor. LinkedIn placed the national average base salary at $119,378 for September  2021. Salaries vary greatly depending on location; the positions with the highest salaries are in San Francisco, San Jose, Seattle, and New York City.

  • What are typical interview questions for a career in data science? “In an interview, expect to answer technical questions about your ability to perform quantitative tests as well as create clear visualizations of large, complex data sets. Come ready to discuss past projects you’ve worked on and how you communicate data findings clearly and concisely in order to help solve business-related problems,” Glassdoor suggested.

  • Where can I find resources for a career in data science? The Data Science Association, The Institute for Operations Research and the Management Sciences and the International Institute for Analytics are national and international organizations where you can seek out information about the profession as well as certification and training options. A number of online courses in programming languages such as Python, R and SQL are available from many providers.

Additional resources:

Why is there an increased demand for data scientists?

As every company becomes a tech company to some degree, the need for skilled professionals who can analyze that data and glean business insights increases.

“As the size of data at companies grow larger and larger, there is higher demand for employees with specific skills who can effectively organize and analyze this data,” said Pablo Ruiz Junco, Glassdoor economic research fellow. “At the same time, the amount of people with these skills is still relatively low compared to the demand, which results in higher pay.”

SEE: Python is eating the world: How one developer’s side project became the hottest programming language on the planet (cover story PDF) (TechRepublic)

Technology advances and the massive volumes of online data available are affecting every sector, and have tremendous impacts on the economy, said Karen Panetta, IEEE fellow and dean of graduate engineering at Tufts University. This so-called “data avalanche” is not just about the sheer volume of data, but also the speed at which it changes and grows, and the diverse types of data available.

“Knowing how to use a spreadsheet and a traditional database will not suffice in the emerging Big Data revolution,” Panetta said. “Analyses need to be done in real-time, where decisions can be critical. Being able to simply know how to use the software tools is only part of this challenge. Understanding the data across disciplines, being able to communicate its meaning, and using statistics will be the differentiating factors from a traditional ‘number cruncher.'”

Additional resources:

What are some of the data scientist job roles?

Generally speaking, data scientists mine data and analyze it for specific company interests, and then work with marketing departments to capitalize on that knowledge. These workers must be familiar with data-gathering software, programming, and warehousing techniques.

Data scientist jobs fall into 10 categories, according to Towards Data Science.

Data scientist—A data scientist knows a bit of everything, and they can offer insights on the best solutions for a specific project. They are in charge of researching and developing new algorithms and approaches. In large companies, they oversee projects from start to finish.

Data analyst—Data analysts are responsible for visualizing, transforming and manipulating the data. They are often in charge of preparing the data for communication by making reports that show trends and insights.

Data engineer—Data engineers are responsible for designing, building and maintaining data pipelines. They make sure that the data is ready to be processed and analyzed. They need to keep the ecosystem and the pipeline optimized and efficient.

Data architect—A data architect is similar to a data engineer. They both need to ensure that the data is well-formatted and accessible. Data architects also design, create and maintain new database systems that match the requirements of a specific business model. 

Data storyteller—This is the newest job role in this list. Data storytelling is not just about visualizing the data and making reports and stats; rather, it is about finding the narrative that best describes the data and uses it to express it. The data storyteller helps people understand the data.

Machine learning scientist—A machine learning scientist researches new data manipulating approaches and designs new algorithms to be used. 

Machine learning engineer—Machine learning engineers need to be very familiar with the various machine learning algorithms like clustering, categorization and classification and are up-to-date with the latest research advances in the field. Machine learning engineers need to have strong statistics and programming skills in addition to some knowledge of the fundamentals of software engineering.

Business intelligence developer—Business Intelligence developers design and develop strategies that allow business users to find the information they need to make decisions quickly and efficiently. BI developers need to have at least a basic understanding of the fundamentals of business models.

Database administrator—A database administrator will be in charge of monitoring the database, making sure it functions properly, keeping track of the data flow, and creating backups and recoveries.

Technology specialized roles—As the data science field grows, more specific technologies will emerge. As the field develops, new specialized job roles will be created. These job roles apply to data scientists and analysis as well. 

Additional resources:

What skills are required to be a data scientist?

Here are the 12 marketable skills a data scientist might need, according to an Indeed report:

  1. Cloud computing
  2. Statistics and probability
  3. Advanced mathematics
  4. Machine learning
  5. Data visualization skills
  6. Query languages
  7. Database management
  8. Visualizations
  9. Python coding
  10. Microsoft Excel
  11. R programming
  12. Data wrangling

“If you’re looking to enter the field of data science and build a solid foundation of experience that will stand out in the eyes of future employers, there are three core skills you need: Python, R and SQL,” said Pablo Ruiz Junco, Glassdoor economic research fellow. “With these skills, you’ll be eligible to apply to over 70% of all online job postings for data scientist roles. Plus, expanding your skills beyond these foundational languages can lead you to a higher salary and allow you to cast a wider net when applying.”

Additional resources:

Which data science job roles pay the highest salaries?

While analysts predicted that demand for data scientists would boom by 2020, that demand slowed down in 2020, thanks to the COVID-19 pandemic. Fortunately, that slowdown isn’t expected to last.

According to a report from Indeed, the 15 highest-paying data jobs by national average salary in 2021 are:

  1. Machine learning engineer: $149,847
  2. Enterprise architect: $144,013
  3. Data architect: $133,840
  4. Big data engineer: $132,571
  5. Data modeler: $93,476
  6. Data scientist: $122,519
  7. Infrastructure engineer: $113,546
  8. Business intelligence developer: $100,494
  9. Statistician: $99,055
  10. Database administrator: $97,730
  11. Business intelligence analyst: $96,737
  12. Database developer: $89,250
  13. Data warehouse manager: $84,221
  14. Data analyst: $75,225
  15. Database manager: $65,558

Additional resources:

What is the average salary of a data scientist?

Average salary figures differ slightly for U.S. data scientists depending on which job site you look at. LinkedIn says the average base pay is $119,378 , and Glassdoor says the average base pay for the position is $117,288.

Data scientists in San Francisco are the highest paid, with a median base salary of $160,525, followed by San Jose, California ($107,226), Seattle ($143,300), and New York City ($151, 527), according to Indeed.

The Bureau of Labor Statistics said the median pay for a data scientist with a master’s degree in 2020 was $126,830 per year.

As seen above with the salary differences between core data scientists, researchers, and big data specialists, the skills that individual data scientists bring to the table can have a large impact on pay. Job seekers should consider what role they are most interested in and make a cost-benefit analysis of which skills are worth spending time learning.

Additional resources:

What are typical interview questions for a career in data science?

“To assess if a candidate can be successful as a data scientist, I’m looking for a few things: baseline knowledge of the fundamentals, a capacity to think creatively and scientifically about real-world problems, exceptional communication about highly technical topics, and constant curiosity,” said Kevin Safford, senior director of engineering at Umbel.

A junior data scientist can expect questions like the following in a job interview, according to Forrester analyst Kjell Carlsson:

  • Walk me through the project that you are most proud of where you used data/data science/machine learning/advanced analytics. What was your role on the project, and what did you do in each step?
  • Tell me about a project where you used (insert language or skill here, e.g., Python, R).
  • Tell me about a time you had to work with someone who is not data-savvy on a data science project.
  • Pretend I am not a data scientist, explain (insert data science topic, e.g., cross validation, unsupervised learning, etc.) to me.
  • Tell me about a time you had to work with very messy data.
  • Tell me about your experience working in teams.
  • Tell me about a time when you had to become an expert on a new technique quickly.

The interviewee might be given a mini-case study based on a data science project the team has undertaken, with questions such as: What data would you need? What are the hypotheses you would like to test? What technique(s) would you use to evaluate them?

An interview may also include an exercise in which the interviewee is given a data set and a broad question, and asked to present their findings, Carlsson said.

For more senior positions, these questions may come up, according to Daniel Miller, vice president of recruiting at Empowered Staffing:

  • Have you built a data warehouse from scratch? If so, tell me about the process you created in order to successfully implement the data warehouse. (If they have not been part of it from scratch, you can ask if they have been part of a department that dealt with a company merger or acquisition of data and how they handled it.)
  • What types of customized dashboards have you built, and what information/analytics were being presented through your dashboard?
  • Tell me about the most complicated data project you have worked on, and what you were able to do in order to achieve success.
  • How are you with explaining and presenting data to executive and senior leadership?

Additional resources:

Where can I find resources for a career in data science?

The Data Science Association, The Institute for Operations Research and the Management Sciences and the International Institute for Analytics are national and international organizations where you can seek information about the profession as well as certification and training options.

Some educational institutions have created data science degree programs, including University of California Berkeley, Northwestern University, Carnegie Mellon University and Kennesaw State University. Some of these schools offer online courses.

You can find a number of online programming courses, such as those in Python, R and SQL, from many providers. Programs and seminars are also available through the IEEE Computer Society.

A number of certifications in data science are also available. These include the vendor-neutral Certified Analytics Professional (CAP), the Dell EMC Proven Professional certification program, the Microsoft Certified Solutions Expert (MCSE) and the SAS Data Science Certification.

Additional resources:

Editor’s note: This article was has been updated to reflect the latest information. 

Source of Article