With simple examples in Python

Oil Pipelines: Finished or Future? — Consumer Energy Alliance

We see the acronym ETL thrown around a lot in the context of data science and analytics. This buzzword is generously sprayed across descriptions for analytics roles and also for online courses on platforms such as Coursera, Databricks, Datacamp etc.

And there is good reason for that. Almost everyone who has dealt with collecting, wrangling and storing data has been involved in the process at a small or large scale.

So what exactly is an ETL process?

ETL stands for extract, transform and load, which are three crucial steps involved in data management part of…

During the early parts of 2021, there was a general feeling among finance experts that a market crash was imminent. No one knew exactly when or how it would transpire but there were several signs which supported that notion:

  • A massive surge in the value of several tech stocks in 2020
  • Increasing interest rates
  • Fears of inflation due to excessive money printing
  • Economy not fully recovered from the pandemic (especially unemployment rate)
  • Rise in popularity of meme stocks such as Gamestop (GME), AMC (AMC) and Dogecoin (Doge)

Industry experts such as Michael Burry and Ray Dalio in recent weeks have…

Taking a closer look at Detroit’s blight problem

Back when I first tackled this project I didn’t really understand, rather paid attention to, the actual business problem being addressed. Like a lot of learners who are just getting into data science or machine learning, I dove straight into the modeling part.

Now, as a more mature (hopefully) applier of data for problem solving, I made an attempt to redo this project with:

  • A more holistic approach
  • A clearer understanding of the business problem at hand
  • More data exploration and analysis

Originally created as a real data science challenge, the Michigan Data Science Team¹ (part of the University of…

If you like to dabble in stocks (like me), you are likely conscious about events of high volatility in the market which could lead to massive red days which could further extend to even weeks or months. I call this FOMC — fear of market crash.

Yes, I just invented this term. But the concern is real and many in the past have tried to predict such events with low success rates. One individual who accurately predicted the biggest crash of our lifetime, the 2008 housing market crash, was Dr. Michael Burry.

At the time, he was a hedge fund…

Let me explain. I was reading this book about ‘Big Data’ and the internet, that I casually picked up from a convenient store at the Austin airport.

The book was called “Everybody Lies” by Seth Stephens-Davidowitz and it delved into several really interesting topics mostly revolving around data science, the internet and human psychology.

In one of the earlier chapters of the book, the author provides insightful examples of how people make everyday decisions based on their previous experiences. One of those examples involved the author’s grandmother.

He talks about how his grandmother helped him pick the right person to…

Following up from the my previous attempt to model how best to pick a movie, this new iteration takes a very different approach.

In the previous project, I tried to predict the IMDB rating for a specific movie based on several factors such as whether the movie contained an Oscar winning actor, genre of the movie and most importantly, the Rotten Tomato score.

Picking the Rotten Tomato score as one of the predictor variables was a mistake. The correlation value between this score and the dependent variable, IMDB rating, was 0.86 which indicated a strong association between the two.


It’s only when you look back and reflect that you realize the transitions and transformations that you have been through. We work towards our goals everyday but the smaller achievements often go unnoticed, especially by oneself. Taking the time to view this journey from a broader perspective helps in recognizing the growth that has transpired.

Such is the case with me and my confidence while speaking to a larger audience. It was not until one of my friends pointed it out when I realized that there had been a positive change in the way I speak in a room full…

Google Colab is one of the best and most convenient ways to run Jupyter Notebooks. However, it took me a while to stumble upon this platform and even longer to truly appreciate what it had to offer.

A lot of my early work with Python was done on a local computer using IronPython which is an interpreter that comes installed with the widely popular data science platform Anaconda.

At that point, I hadn’t explored Jupyter Notebooks and was wildly under-utilizing the Anaconda suite of tools and libraries. …

Sayan Das

Data Science, Storytelling, Productivity, Finance, Technology

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store