Opportunities
Interested in side projects or a study group?
Are you an expert with data and willing to mentor, or are you an up and coming hobbyist looking for a side project to work on? We are looking to put together a group and are looking to host our kick-off meeting on the weekend of May 9th. If interested, please send us an email to discuss building a side project group.
Considering a career change?
Are you a software or system engineer, data scientist, analytic developer, or cybersecurity expert interested in learning about new opportunities?
Please send us an email to learn about the opportunities available with our partners.
Get involved!
Want to be more involved in our data science community? If you have experience running workshops, hackathons, curating newsletters, or are just interested in helping to grow the meetup, please send us an email!
Erias Ventures
Erias has an immediate need for Software Engineers, System Engineers, Test Engineers, Data Scientists, and System Administrators. External referral bonuses are available. For more information, please contact us at info@eriasventures.com.
|
|
COVID-19 Data News
Why It’s So Freaking Hard To Make A Good COVID-19 Model — Using a mathematical model to predict the future is valuable for experts, even if there are vast gulfs between possible outcomes. But it’s not always easy to make sense of the results and how they change over time, and that confusion can hurt both your brain and your heart. This article talks about what goes into modeling a pandemic.
COVID-19 and the Importance of “Obsolete” Data — There are some folks who are very quick to dismiss “old” data as being “obsolete”, that the value of data erodes over time. However, historical data can be just as valuable as current data in trying to ascertain the potential business and operational impact of current situations or events. There are some categories of events where “obsolete data” may be quite valuable: annual events such as the Super Bowl, seasonal events as holidays, 4-year events as elections, proxy events as used for product launches, and random events such as natural disasters.
The COVID Tracking Project — From the Atlantic, this project provides complete testing data - including not just identified cases, but how many people have been tested, and where. Maryland's data quality rating is A.
Course: CS472 Data science and AI for COVID-19 — This project class investigates and models COVID-19 using tools from data science and machine learning. The class introduces the relevant background for the biology and epidemiology of the COVID-19 virus and then critically examines current models that are used to predict infection rates in the population as well as models used to support various public health interventions (e.g. herd immunity and social distancing). Slides and videos are available to the public.
|
|
Data News and Articles
AI Is Changing Work — and Leaders Need to Adapt — As AI is increasingly incorporated into our workplaces and daily lives, it is poised to fundamentally upend the way we live and work. Concern over this looming shift is widespread. A recent survey of 5,700 Harvard Business School alumni found that 52% of even this elite group believe the typical company will employ fewer workers three years from now.
Why Python is not the Programming Language of the Future — Since the early 2010s, Python has been booming — and eventually surpassing C, C#, Java, and JavaScript in popularity, but will it last? This article assesses the virtues that are boosting Python’s popularity right now and the weak points that will break it in the future.
Why We Need DevOps for ML Data — Getting machine learning (ML) into production is hard. In fact, it’s possibly an order of magnitude harder than getting traditional software deployed. This blog discusses why the industry needs to solve DevOps for ML data, and how ML’s unique data challenges stifle efforts to get ML operationalized and launched in production.
Why I’m Leaving Data — This article outlines the pros and cons of working as a data analyst and describes why the author decided to leave this lucrative industry entirely.
Lessons Learned Managing the GitLab Data Team — Several lessons and takeaways learned from being the data team manager at GitLab. Lessons include planning for growth, deciding your role, hiring awesome people, and picking excellent tools.
Streams and Monk: How Yelp is Approaching Kafka in 2020 — Five years ago Yelp's Kafka cluster was not monitored, did not expose any metrics, and did not have anyone on call for it. Now they can now scale clusters with a single configuration push, load balance and decommission brokers, automatically trigger rolling-restarts to pick up new cluster configuration, and more.
|
|
How-To's and Tutorials
Microsoft Forecasting Best Practices — Time series forecasting is one of the most important topics in data science. Almost every business needs to predict the future in order to make better decisions and allocate resources more effectively. This repository provides examples and best practice guidelines for building forecasting solutions using Python Jupyter notebooks and R markdown files and a library of utility functions.
Curriculum for Reinforcement Learning — A curriculum is an efficient tool for humans to progressively learn from simple concepts to hard problems. It breaks down complex knowledge by providing a sequence of learning steps of increasing difficulty. In this post, we will examine how the idea of curriculum can help reinforcement learning models learn to solve complicated tasks.
|
|
Data Tools and Resources
The Google Cloud Developer's Cheat Sheet — Every product in the Google Cloud family described in <=4 words by the Google Developer Relations Team.
FastAI Book — These draft notebooks cover an introduction to deep learning, fastai, and PyTorch. fastai is a layered API for deep learning; for more information, see the fastai paper.
Tonks: Building Multi-Task Models — Tonks is a library that streamlines the training of multi-task PyTorch networks. It supports training with multiple task-specific datasets, multiple inputs, and ensembles of multi-task networks.
PyCaret: Low-Code Machine Learning — PyCaret is an open-source machine learning library in Python to train and deploy supervised and unsupervised machine learning models in a low-code environment. It allows you to go from preparing data to deploying models within seconds from your choice of notebook environment.
Swift — Google's plans on making Swift the first mainstream language with first-class language-integrated differentiable programming capabilities. More information available at Github.
|
|
|
|