By Mr. Data Science

Photo by Hush Naidoo on Unsplash

Data Science has had a huge impact on the field of medical science. Some of the areas where it is making a difference include:

  • Medical image analysis
  • Genetics and Genomics research
  • Creating new drugs/Drug Discovery with Data Science
  • Predictive Analytics in Healthcare
  • Data Analysis of healthcare data

Topics like the discovery of new drugs are a little beyond the scope of this article but we can still take a look at some examples of predictive and exploratory data analysis of healthcare data. In example 1 we’ll look at data on cancer and how we could approach…

By Mr. Data Science

Photo by Greg Rakozy on Unsplash

An excellent way to learn data science is to do data science: get some data and start analyzing it. The techniques used in this article can be applied to any data, and some of the issues we will encounter are typical of the challenges real-world data analysis throws up.

This article will investigate some data on asteroids to find if there is a threat of collision. Example 3 will use machine learning to classify asteroids as potential threats. …

Photo by Jason Rosewell on Unsplash

Natural language generation (NLG) is the process creating text using software. In general, it can be divided into a few subgroups[1]:

  • text-to-text generation, such as machine translation
  • text summarization
  • open-domain conversation response generation
  • data-to-text generation

NLG is growing in popularity because there are so many applications in areas such as journalism, business, and law. With NLG, you can complete tasks such as writing product descriptions, engaging with users, and writing investigative reports. NLG is frequently used to generate social media posts, such as on twitter, and retroactively caption images throughout the web.

A brief background on natural language generation

One important concept in natural language generation is…

By Mr. Data Science

Photo by Atul Pandey on Unsplash

A Brief Overview:

Throughout this article, we will explore migration data to gain a better understanding of migration drivers. Since migration remains a contentious political issue, we will refrain from giving opinions and focus on the data instead. To investigate migration drivers we will use a couple of datasets (all of them csv files):

  • The country data set was downloaded from Kaggle
  • The happiness reports (5 files) were also downloaded from Kaggle

The goals for this article are to:

  1. demonstrate some useful data science techniques such as combining datasets, generating correlation heat maps, and applying k-means to a dataset

Use Python to find when your favorite superhero character appeared

A pile of comic books.
A pile of comic books.
Photo by Erik Mclean on Unsplash

There’s a famous quote by the American engineer/statistician W. Edwards Deming:

“Without data, you’re just another person with an opinion.”

One of the first steps you take when working with a new data set is to perform exploratory data analysis (EDA). The overarching objective of EDA is to help data scientists understand what the data contains and what types of questions the data will be able to answer. Note: EDA doesn’t attempt to answer any single question. It’s an investigative tool in your belt. Throughout this article, we’ll use a variety of EDA techniques on Marvel versus DC Comics data.

1. Getting Started: Preprocessing the Data

Want to publish your story on The Data Science Publication?

Just leave a comment on this article expressing your interest and we will review your previous articles. If you meet…

Use a Kaggle dataset and a few Python libraries to get started

Person watching Netflix
Person watching Netflix
Photo by Mollie Sivaram on Unsplash.

Recommender systems are used on large online platforms like Netflix and YouTube to recommend movies, shows, or videos based on what you have watched in the past. Recommender systems are also commonly used in the online retail space. One common recommender system statistic is that Amazon makes about one-third of their sales from recommended products. Just imagine making 50% more money than you currently do.

If you ask me, learning how to implement a recommender system is well worth the time commitment.

The recommender system described in this article will be simple but will demonstrate the fundamental problems that need…

By Mr. Data Science


Throughout this article, I will use the mnist dataset to show you how to reduce image noise using a simple autoencoder. First, I will demonstrate how you can artificially inject noise into your images. Next, I will describe the process for creating an autoencoder, and finally, I will test the autoencoder on a few different signal-to-noise ratio (SNR) images to assess the model’s robustness. Note that the goal of this article is to introduce you to the concept of noise reduction with autoencoders, not teach you the nuances of autoencoder architectures and design.

A Brief Background on Autoencoders and Noise Reduction:

According to Wikepedia…

By Mr. Data Science

A Brief Overview:

k-Nearest Neighbor (KNN) is a classification algorithm, not to be confused with k-Means, they are two very different algorithms with very different uses. k-Means is an unsupervised clustering algorithm, given some data k-Means will cluster that data into k groups where k is a positive integer. k-Nearest Neighbor is a supervised classification algorithm, note — a supervised algorithm uses training data whereas an unsupervised algorithm has no training data.We …

By Mr. Data Science

Photo by Tachina Lee on Unsplash

In this article we will attempt to use machine learning, specifically ensemble learning to detect fake news. First let’s define what is meant by the term ‘fake news’ at least within this article. A statement such as ‘NASA discovers an alien civilisation living on the moon’ is fake news in the sense that it is factually incorrect. There is another use of the term ‘fake news’ where some people will attempt to dismiss anything that challenges their world view as fake news, we will not be using that definition. …

Mr. Data Science

I’m just a nerdy engineer that has too much time on his hands and I’ve decided to help people around the world learn about data science!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store