Security Best Practices When Using AI to Write Code

At Mindgrub, the engineering team, like many, has found itself wondering just how good AI is at writing code. What do these things really know, and can they make us better, strong, and faster?

I’m here to tell you that it is pretty good, in some cases damn good. If you need a quick code snippet or find yourself wanting to convert existing code into a different language, tools like ChatGPT do a fantastic job. We’re also actively using and exploring tools like Github’s CoPilot – a development assistant that makes intelligence look antiquated. Our engineers are also investigating a metric ton of generative AI code tools like CodeWP for WordPress, AWS’s CodeWhisperer for, and X.

Generative code is impressive and quick, but is this code safe to run? After all, these AI tools open a brand new and wholly unexplored set of security concerns and unknown vectors for attacks by hackers. Most of these may not directly impact code security, but they point to the level of awareness we as technologists need to have as we explore the use of these platforms in our work lives.

Developers have already built AI tools that can brute force or perform login stuffing with an accuracy and speed that is impressive. All of this uses open-source tools like PassGAN, tools that are getting better every day. Some researchers have gotten clever and jailbroken or avoided AI safeguards to trick systems into writing code using known exploits or to write code that is used for nefarious reasons such as a DoS (Denial of Service attack). Others are creating advanced phishing systems that create highly personalized messages making truth and reality increasingly hard to differentiate.

Can we trust these types of tools to our most junior engineers or non-engineers to create code for production? Do we have a choice? We all know the reality is this is already happening and only going to increase. What we really need is to find ways to keep our AI-assisted code safe and secure.

So to help, I will first explore using a handful of these tools and give some hints along the way. I will also offer suggestions on what to look for and ways to keep the most generative code secure.

Github CoPilot

Many in my age range have joked about CoPilot being the replacement for Clippy, but both are assistants. However, CoPilot, unlike Clippy, is an assistant powered by OpenAI’s GPT artificial intelligence and trained on bajillions of code hosted by Microsoft’s GitHub. These days Github is the apparent elephant for public and private code repositories. It is also home to an incredible amount of open-source projects. If schools are code repositories, GitHub is Xavier’s school for the gifted or Hogwarts, without the riff-raff.

CoPilot integrates into many popular IDEs, such as IntelliJ and Visual Studio, to extend intelligence or auto-suggestion feedback. For most, it will feel like you’re getting a quick suggestion based on context – but these aren’t old-school suggestions. Often, you will find that CoPilot will have suggestions that are entire functions vs. finishing a line or two of code.

In this, CoPilot and tools like ChatGPT can be very different. CoPilot feels more like an assistant or peer programmer offering thoughts along the way. You pick and choose, but the architecture and direction of development are still very much you.

As a code generator, the results are mixed. In 2021 a DevSec engineer reviewing early results provided multiple examples of code that were prone to suggest code with several security issues. My experience is mixed. I’ve witnessed code snippets with SQL injection vulnerabilities or other minor problems. The more significant concern IMHO was not the quality of the code but the speed at which I accepted that the code would do what I anticipated.

OpenAI ChatGPT

OpenAI’s ChatGPT is what we now refer to as AGI or Artificial General Intelligence. For example, GitHub’s CoPilot has been primarily trained or concrete knowledge bases around programming and code giving it intelligence that is limited to a very particular realm. In short, it’s like a toddler who can tell you everything about Pokemon and nothing about the general makings of our world.

AGI makes writing code more of a hobby for ChatGPT but also gives it the ability to be a bit more creative in how it answers questions. It can add data from its general knowledge like we do to come to sometimes surprising conclusions.

ChatGPT, as a development tool, is an excellent starter. It excels at transforming example code into your preferred programming language. It can take code snippets and re-write them with additional features or adjustments. It also does a fantastic job of creating starter applications.

To test CoPilot, I asked it to generate the full login, logout, and sign-up logic for an application in JavaScript. I purposely omitted details to see if the AI would assume the need for a unique identifier like a username or email address. I also avoided mentioning the need to sanitize or encrypt arguments and SQL values like passwords.

Weirdly the code response differed. Each prompt could generate a wildly different answer. My first attempt at my prompt displayed an application that hard-coded the database password and showed a noticeable lack of validation. That request failed to complete as if the AI hit a point that it knew invalidated the previous response. By not updating my prompt and allowing it to regenerate, I got a much better response that essentially fixed issues without me editing or asking.

Like Github’s CoPilot, the results came back mixed, but most of my code from ChatGPT required a bit more knowledge and cleanup to run. For example, ChatGPT suggested a make file and a SQL script for my user database table but did not help me actually do the task. It was much more of an accelerator, requiring me to reimplement a lot of what it provided.

How do we keep it secure?

AI generative code tools are super accelerators. These tools are also trained on our own lousy code and suffer from the human mistakes we are all prone to. For junior engineers and non-engineers alike, these tools provide incredible power, but they will not mean the end result is better, more secure, or better quality.

So how do we keep them secure? We do what we should have (or what we are already doing).

First, let’s keep following the best practices of software development. If you have a team, make sure you enforce peer reviews and merge reviews. As the saying goes, measure twice and cut once – more eyes and especially those of a lead or senior engineer, will only make your code better.

Second, good unit tests and code coverage are some of the best checks a developer can put in place. Unit tests require the engineer to understand the expected results of the code they write and to verify that the code reacts as anticipated. By requiring larger code coverage, our engineers can use more generative code, but we can safeguard the upper and lower limits of these operations with these tests. 

Liscense and dependency management can accidentally pop into code when using ChatGPT it’s not uncommon for it to recommend libraries and incorporate those libraries into a larger code base. For production code, this can unexpectedly force code to accept a GPL license, open, sourcing a chunk, or introduce vulnerabilities in an older library. These days we can add analyzers to our CI/CD pipeline that check and warn for these scenarios and reduce unexpected risks.

Other tools in the CI/CD pipeline also open the door can also safeguard against bad code quality:

•  Lints and code syntax checks help maintain code conformity and check for common mistakes in a language. These same tools can scan for passwords checked inline to a code repository and reject code not in the company’s agreed-upon format. 

•  Many companies offer code security analyzers that look for the common mistakes and prevent developers from 

•  Static code testers scan the executable binary generated from an application for 

If you still find yourself adament that your company or project is not ready for generative AI, you can also look into several tools that help detect AI-generated code. As a warning, this can be a bit of an arms race. As new AI tools improve, the detectors will take time to adapt and identify the latest version of GPT or CoPilot.

For many of our dev shops AI will introduce a new wild card in how we build things – but that wild card can be a great accelerator that increases productivity and helps make junior engineers bigger contributors to production projects. Embracing the unknown can be scary, but with the proper safeguards in place, we can create a secure environment where our teams can thrive.