What is Data Science?
Data Science is the area of study which involves extracting insights from vast amounts of data using various scientific methods, algorithms, and processes. It helps you to discover hidden patterns from the raw data. The term Data Science has emerged because of the evolution of mathematical statistics, data analysis, and big data. Data Science is an interdisciplinary field that allows you to extract knowledge from structured or unstructured data. Data science enables you to translate a business problem into a research project and then translate it back into a practical solution. Data Science is a combination of multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract knowledge and insights from it.
- Data Science is about data gathering, analysis and decision-making.
- Data Science is about finding patterns in data, through analysis, and make future predictions.
By using Data Science, companies are able to make:
- Better decisions (should we choose A or B)
- Predictive analysis (what will happen next?)
- Pattern discoveries (find pattern, or maybe hidden information in the data)
Where is Data Science Needed?
Data Science is used in many industries in the world today, e.g. banking, consultancy, healthcare, and manufacturing. Examples of where Data Science is needed:
- For route planning: To discover the best routes to ship
- To foresee delays for flight/ship/train etc. (through predictive analysis)
- To create promotional offers
- To find the best suited time to deliver goods
- To forecast the next years revenue for a company
- To analyze health benefit of training
- To predict who will win elections
How Does a Data Scientist Work?
A Data Scientist requires expertise in several backgrounds like Machine Learning, Statistics, Programming (Python or R), Mathematics, Databases. A Data Scientist must find patterns within the data. Before he/she can find the patterns, he/she must organize the data in a standard format.
Here is how a Data Scientist works:
- Ask the right questions - To understand the business problem.
- Explore and collect data - From database, web logs, customer feedback, etc.
- Extract the data - Transform the data to a standardized format.
- Clean the data - Remove erroneous values from the data.
- Find and replace missing values - Check for missing values and replace them with a suitable value (e.g. an average value).
- Normalize data - Scale the values in a practical range (e.g. 140 cm is smaller than 1,8 m. However, the number 140 is larger than 1,8. - so scaling is important).
- Analyze data, find patterns and make future predictions.
- Represent the result - Present the result with useful insights in a way the "company" can understand.
What is Data Science Process?
- Discovery: Discovery step involves acquiring data from all the identified internal & external sources, which helps you answer the business question. The data can be: Logs from webservers, Data gathered from social media, Census datasets, Data streamed from online sources using APIs
- Preparation: Data can have many inconsistencies like missing values, blank columns, an incorrect data format, which needs to be cleaned. You need to process, explore, and condition data before modelling. The cleaner your data, the better are your predictions.
- Model Planning: In this stage, you need to determine the method and technique to draw the relation between input variables. Planning for a model is performed by using different statistical formulas and visualization tools. SQL analysis services, R, and SAS/access are some of the tools used for this purpose.
- Model Building: In this step, the actual model building process starts. Here, Data scientist distributes datasets for training and testing. Techniques like association, classification, and clustering are applied to the training data set. The model, once prepared, is tested against the “testing” dataset.
- Operationalize: You deliver the final baselined model with reports, code, and technical documents in this stage. Model is deployed into a real-time production environment after thorough testing.
- Communicate Results: In this stage, the key findings are communicated to all stakeholders. This helps you decide if the project results are a success or a failure based on the inputs from the model.
What are various Data Science Jobs Roles?
Most prominent Data Scientist job titles are:
- Data Scientist Role: A Data Scientist is a professional who manages enormous amounts of data to come up with compelling business visions by using various tools, techniques, methodologies, algorithms, etc. Languages: R, SAS, Python, SQL, Hive, Matlab, Pig, Spark
- Data Engineer Role: The role of a data engineer is of working with large amounts of data. He develops, constructs, tests, and maintains architectures like large scale processing systems and databases. Languages: SQL, Hive, R, SAS, Matlab, Python, Java, Ruby, C + +, and Perl
- Data Analyst Role: A data analyst is responsible for mining vast amounts of data. They will look for relationships, patterns, trends in data. Later he or she will deliver compelling reporting and visualization for analyzing the data to take the most viable business decisions. Languages: R, Python, HTML, JS, C, C+ + , SQL
- Statistician Role: The statistician collects, analyses, and understands qualitative and quantitative data using statistical theories and methods. Languages: SQL, R, Matlab, Tableau, Python, Perl, Spark, and Hive
- Data Administrator Role: Data admin should ensure that the database is accessible to all relevant users. He also ensures that it is performing correctly and keeps it safe from hacking. Languages: Ruby on Rails, SQL, Java, C#, and Python
- Business Analyst Role: This professional needs to improve business processes. He/she is an intermediary between the business executive team and the IT department. Languages: SQL, Tableau, Power BI and, Python
What are the applications of Data Science?
Some application of Data Science are:
- Internet Search: Google search uses Data science technology to search for a specific result within a fraction of a second
- Recommendation Systems: To create a recommendation system. For example, “suggested friends” on Facebook or suggested videos” on YouTube, everything is done with the help of Data Science.
- Image & Speech Recognition: Speech recognizes systems like Siri, Google Assistant, and Alexa run on the Data science technique. Moreover, Facebook recognizes your friend when you upload a photo with them, with the help of Data Science.
- Gaming world: EA Sports, Sony, Nintendo are using Data science technology. This enhances your gaming experience. Games are now developed using Machine Learning techniques, and they can update themselves when you move to higher levels.
- Online Price Comparison: PriceRunner, Junglee, Shopzilla work on the Data science mechanism. Here, data is fetched from the relevant websites using APIs.
What are Challenges for Data Science?
- A high variety of information & data is required for accurate analysis
- Not adequate data science talent pool available
- Management does not provide financial support for a data science team
- Unavailability of/difficult access to data
- Business decision-makers do not effectively use data Science results
- Explaining data science to others is difficult
- Privacy issues
- Lack of significant domain expert
- If an organization is very small, it can’t have a Data Science team
No comments:
Post a Comment