Copy
January 2021

Upcoming Events
 



ML Design Patterns and Designing ML Infrastructure
February 24 | 6:00 PM | Online

Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects.
 

Data Science Product Management
March 20 | 12:00 PM | Online

When put into service solving customer needs, data science can be a critical differentiator for digital products in ever-more-competitive markets. But productizing data science presents a unique set of challenges, and often leaves product managers and data scientists struggling to find common ground and a shared language. In this talk, product coach and consultant Matt LeMay shares the lessons he's learned building bridges between product management and data science at companies like Bitly, Songza, and Spotify. Expect a candid, direct, and entertaining conversation about mistakes made, lessons learned, and suggestions for how to move forward.
 

Pitch! - Yourself, Your Company, Your Project, Your Ideas!
April 3 | 1:00 PM | Online

Looking for a job? Looking to hire someone? Trying to get your project started? Grab your best pitch and come share with the Data Works MD community. We need speakers! If you would like to speak, please register here.

Past Events

 

Opportunities

 

Data Works MD Conference 2021
We are in the early planning stages for a Maryland data-focused conference in 2021. If you would like to stay informed, please sign-up for updates.

Interested in a side project?
Are you an expert with data and willing to mentor, or are you an up and coming hobbyist looking for a side project to work on? We have put together a group to focus on a few problems working with Baltimore City data and need your help. The current project focuses on data parsing and analysis for the Baltimore Board of Estimates. If interested, please send us an email or join us on Slack to discuss building a side project group.

Considering a career change?
Are you a software or system engineer, data scientist, analytic developer, or cybersecurity expert interested in learning about new opportunities?
Please send us an email to learn about the opportunities available with our partners.

Are you hiring?
If your company is looking for data scientists, data engineers, software engineers, and other data related experts, please reach out so that we can help our members find new opportunities.
Please send us an email introducing your company and needs.

Get involved!
Want to be more involved in our data science community? If you have experience running workshops, hackathons, curating newsletters, or are just interested in helping to grow the meetup, please send us an email!

Erias Ventures
Erias has an immediate need for Software Engineers, System Engineers, Test Engineers, Data Scientists, and System Administrators. External referral bonuses are available. For more information, please contact us at info@eriasventures.com.
 

Data News and Articles


 

Why Is It So Hard to Become a Data-Driven Company? — To compete today, companies need to be data-driven. But for mainstream, legacy companies, that’s easier said than done. Despite a decade of investment and the adoption of Chief Data Officers, this survey of Fortune 1000 senior executives finds that many companies are still struggling against not just legacy tech, but embedded cultures that are resistant to new ways of doing things — over 90 percent of companies surveyed reported culture was their biggest barrier. In response to this, leaders should do three things: 1) focus their data initiatives on clearly identified high-impact use cases, 2) reconsider how their organizations handle data, and 3) remember that this transformation is a long-term process that requires patience, fortitude, and focus. Tags: Data, Company

Managing Data as a Data Engineer — As we have more data and as data grows larger, managing data becomes quite a challenge. In this article, the author shares some of the things that learnt while managing terabytes of data in a fintech company. Tags: Data

How Spotify Optimized the Largest Dataflow Job Ever for Wrapped 2020 — This article discusses how Spotify optimized and sped up elements of their largest Dataflow job using a technique called Sort Merge Bucket (SMB) join. Tags: Big Data

Why Some Models Leak Data Machine learning models use large amounts of data, some of which can be sensitive. If they're not trained correctly, sometimes that data is inadvertently revealed. Tags: Data

Interview With an Applied Scientist at Amazon What does an average day look like at Amazon? A quick interview about working at Amazon. Tags: Career

Google Research: Looking Back at 2020, and Forward to 2021 — Extensive article on research areas of 2020 including COVID-19, responsible AI, reinforcement learning, robotics, and quantum computing. Tags: Research

(At Least) 5 Ways Data Analysis Improves Product Development — At the company Mode, they believe that iterative analytics is a critical method to how you answer any meaningful product question. Traditional BI tools are great at answering static questions (examples: how many purple socks did we sell today?), but not the big-picture questions that direct decisions on what big swings to take in your product roadmap. This article describes five ways that they see data science as necessary to prioritize the best product development choices. Tags: Team, Product Development

Cultivating Algorithms: How We Grow Data Science at Stitch Fix — Data, by itself, is seemingly chaotic and unorganized. Yet, when properly processed, it can provide a competitive advantage or even enable completely new services. This article covers how Stitch Fix discovers innovative capabilities through curiosity-driven tinkering. Tags: Team

Coronavirus, a Visual Rundown — A quick rundown of the useful visuals about COVID-19. Tags: COVID, Visualization

Ditching Excel for Python – Lessons Learned from a Legacy Industry — Since 2017, the author has observed a radical shift in data analysis methodologies. Excel-based models, which had seemed top-of-the-line suddenly were too slow and too rigid; Integration with 3rd party data sources, which was once a luxury, became the norm; And analysts began to utilize scripts to accomplish many labor-intensive tasks typically performed by hand or in spreadsheets. Enabling this change is a suite of accessible Python-powered tools. Tags: Python

Machine Learning is Going Real-time — There seems to be little consensus on what real-time ML means, and there hasn’t been a lot of in-depth discussion on how it’s done in the industry. This article discusses two levels of real-time ML: making predictions in real-time and incorporating new data and updating models in real-time. Tags: ML
 

How-To's and Tutorials

 
  

Introduction to Sentiment Analysis — This article goes through the process of building a sentiment analysis model using Python. Specifically, it creates a bag-of-words model using an SVM. By interpreting this model, we also understand how it works. Along the way, you will learn the basics of text processing. Tags: Sentiment Analysis

Signal Processing for Scientific Data Analysis with Python  — A multi-part series covering topics such as audio signal processing, speech signal processing, and seismology. Tags: Python

3 Steps to Build a Modern Data Stack  — Building your data stack is confusing. It doesn’t have to be. In this post, the author offers a way to clearly think about how to pick tools for your modern data stack. Tags: Data

Data Tools and Resources

 
Supercharging Apache Superset At Airbnb, many employees rely on data every day to do their jobs. While several different tools are used for analysis, at the core of Airbnb’s self-serve business intelligence (BI) solution is Apache Superset™ (“Superset”). Superset is an open-source data exploration and visualization platform designed to be visual, intuitive, and interactive. It enables users to analyze data using its SQL editor, and easily build charts and dashboards. Tags: Tools

rjs: R in JavaScript —R in JavaScript, a way to insert R code directly into websites, powered by OpenCPU. Tags: Tools, R

Top Python Libraries of 2020  The rules are simple: they were launched or popularized in 2020, they are well maintained and have been since their launch date, and they are outright cool, and you should check them out. The picks are heavily influenced by machine learning and data science libraries, although some can indeed be very useful for non-data science people. Tags: Tools, Python

TF Quant Finance: TensorFlow based Quant Finance Library —This library provides high-performance components leveraging the hardware acceleration support and automatic differentiation of TensorFlow. The library will provide TensorFlow support for foundational mathematical methods, mid-level methods, and specific pricing models. Tags: Tools, TensorFlow

101 GitHub Repos - Absolute List Of Useful Repos — A list of 101 great repositories ranging from Javascript to Python, from heatmaps to chatbots. Tags: Tools
 
Website
LinkedIn
Twitter
Facebook
Email
If you are interested in speaking, hosting, or sponsoring a meetup, have opportunities to list, or local news to share, please email info@dataworksmd.org.






This email was sent to <<Email Address>>
why did I get this?    unsubscribe from this list    update subscription preferences
Data Works MD · 101 W Dickman St · Baltimore, MD 21784-9239 · USA

Email Marketing Powered by Mailchimp