I’ve been playing with GPT-4 over the last few weeks and there are a few prompt engineering best practices that I developed through my exploration that I want to share here.
One of the best ways to enforce code quality in Python projects is to run mypy checks at commit stage and automatically in CI/CD pipeline. Let me explain why I recommend mypy.
One of the best low effort, high return tweaks that you can do for your analytics project is to set up a standard folder structure across all your projects and teams.
When you work in the team, everyone has slightly different way to format their code. Setting up code standards from the beginning and enforcing them through automatic code formatting provides clarity of code, reduced source control diffs and makes code reviews more efficient.
When working on analytical projects, it is important to use a requirements.txt file to specify the dependencies required for the project. This file lists all the external libraries and packages that the project relies on, along with their version numbers, ensuring that everyone working on the project is using the same set of dependencies. This helps avoid issues with version conflicts and makes the project more reproducible.
In the previous newsletter (as well as in my website article) , I covered trunk-based development. One of the ways to keep branches small is to commit and and merge to the main branch often, by keeping part of the codebase in main inactive. Feature flags is a way to do it systematically.
In the last edition of this newsletter, I wrote about the importance of source control in analytics development. I’ve seen, however, many analytics teams misusing source control systems (eg pushing directly to main without pull request). Today, I’d like to introduce trunk-based development methodology (TBM).
One of the biggest long-term productivity boosts for data science teams is the use of source control. For those with experience in software engineering, it is a no-brainer, but I still see data science teams using shared folders or email to collaborate on code.
As I mentioned in one of my previous articles, I started using ChatGPT in my coding projects. After a few days, I got used to asking and refining my queries so that the returned function was “almost” correct. After this exercise, I am more sure than ever that this system will speed up development, but it can’t replace the developer. The functions returned are almost never copy-paste, but most of the time, they are 80% correct.
I’m often asked about a simple way to maintain focus. This is especially hard in the new world with full of distractions. One of the practical ways to maintain focus is “The Pomodoro Technique”.
Pomodoro means tomato and the technique is named after the tomato-shaped kitchen timer.