Legacy App Refactoring

We’d been working on this project for a few years, but never really had the time to work on massively refactoring it. There were always new features, small fixes, and improvements in the way.

To answer the question Rewrite vs Refactoring, we chose Refactoring, based on this very informative article https://siftware.com/rewrite-refactor-php-application/.

The Problem

Our project had what you expect to see in a legacy/old project:

  • complex and tangled control structure – Spaghetti Code
  • Include-oriented/page-based (as opposed to object-oriented and front-controlled)
  • No separation of concerns
  • Everything sits in your webroot
  • Multiple rewrite/refactor attempts are obvious
  • No automated tests, only human QA

(Above list taken from here)

Define a Roadmap

  • Start with tests – You can’t be sure you haven’t broken anything along the way unless you have some automated tests to prove that.
  • Fix the small things first, while adding new features
  • Switch to config based settings, get rid of globals
  • Separate view/frontend logic from backend code
  • Refactor functionality into classes
    • Use Dependency Injection, Factory Patterns, Separation of Concerns, Single Responsibility Principle, and other programming patterns/best practices.
  • Routing issue, autoloading, and the Slim Framework integration

Tools and Resources

Fix the small things first, while adding new features

This process might last for months. In order to keep the features running, you have to refactor in small chunks. We had monthly cleanup tasks just for the small things.

It’s easy to find your backend errors/bugs, especially when you have a healthy user base. Using Apache server logs, you can see the majority of issues your project has, such as:

  1. Missing checks with isset() or !empty()
  2. Access of undefined variables, class properties, null objects, illegal string offsets, indexes in arrays
  3. Wrong argument types (number_format(), strpos(), in_array(), foreach() etc)
  4. File read/write errors with unhandled exceptions and missing checks for results, such as:
    1. accessing invalid directories or files and not using file_exists()
    2. URL responses other than 200 OK and connection timeouts
    3. empty database results or results with null values (file_get_contents(), curl, DB queries)

Another common issue with PHP legacy apps is the file include order. Since there’s no front controller, every single page will have the same key files included (like bootstrap.php, app.php, urls.php, functions.php, headerFunctionality.php, etc), which leaves room for incorrect include order or even forgetting some files at all.

Inspecting and ensuring the correct inclusion of files in the first steps makes it easier to debug later. It might be bad, but at least it’s consistently bad.

Switch to config based settings, get rid of globals

Global variables are very common in legacy apps. They’re usually declared in a key file that is included everywhere, and it’s pretty long, with no indication of scope in variable names.

To fix that, you group them by the context in which they’re used. A quick and modest refactoring is to introduce a new global object called App ($app) that will serve as a pseudo container for your app, with a structure like this:

class App

    protected $app = [
        ‘params’ => [...],
        ‘urls’ => [...],
        ‘modules’ => [...]
    ]

    // ...
}

You instantiate this class only once in a bootstrap file, or whatever is included first in all your pages. Then you replace most of your global variables with calls like this:

<?php if ($isContentPage) {…} ?> => <?php if ($app->params[‘isContentPage’]) {…} ?>

It is a slow process, but you’ll benefit from it when you introduce new classes. You will only need to inject the App object, and not all the global variables your classes need.

Separate views/frontend logic from backend code

Some legacy pages are written like this:

  1. Some key file includes
  2. Global variable initializations
  3. Include the header navbar html
  4. Include some content
  5. Make a database query
  6. Parse the query results
  7. Echo the results mixed with html
  8. Include some footer files

The issues coming with this kind of setup are as follows:

  • The same Html or CSS code should be included in every page you have
  • The database queries, results parsing and modification happen alongside with the view rendering. And that code is copy-pasted into three other pages.
  • You’ll have to include those footer scripts every time, in the same order, in the same place

To fix this

  • First of all take all the database results fetching and parsing and place it above all Html code. Even better, try and make a class out of it. That way you can use the same code into the other pages that need that functionality without repeating yourself.
  • Secondly, separate all other data preparation/modification or business logic from your html/js code. Create a new directory, like views, and move all the frontend files there. Now you have a single place where you’ll look for presentation issues.
  • Point out which view files are included in every page, like the navbar, sidebar, footer, footer scripts, header scripts. If the same Html, CSS, or js is being copy-pasted into every page, they are good candidates for being extracted into their own partial files.
  • Finally, since most of your pages have the same structure, you should enforce that structure by creating and using layouts, for example, a default layout, or clean, results, admin, product-page layout. In your pages than you can simply include a layout file, and pass all the content your page needs as “view variables”.

At this point of refactoring, a templating engine, like Twig, or Blade, might come in handy. Integrating them might take a lot of time since you’ll have to change all your frontend code to fit with the new template.

Refactor functionality into classes

It is very common having file includes in your project with bits of functionality. You might find tracking scripts, analytics, database queries, data manipulation, output buffers, etc everywhere. The functionality included this way, is a source of bugs. The most common issue is the new variables introduced and their global scope. Your $content variable might be a string in its initialization, and somehow it has become a database object when you last referenced it.

In a poorly written codebase, the bits of functionality are wrapped in files spanning close to 1000 lines. Which are pretty hard to maintain, debug, or develop upon. To fix this, we have to split the code into separate classes.

Firstly, introduce a few interfaces, so that we enforce what classes are intended to do, initially as a code convention, and later used with dependency injection.

Next, convert all the pieces of distinct functionality into their own classes, implementing interfaces.

Finally, to avoid object creation in a global scope, we’re going to need an object factory. To kick it up a notch you can apply a common design pattern called the Factory Method Pattern – Find a great explanation here.

Routing issues, autoloading, and the Slim Framework integration

A major issue with legacy apps is routing. Using an Apache server, all the routing happens in the .htaccess file located in the root of the project. It gets hundreds of lines long for medium-sized apps, and it’s pretty complex and unmaintainable. Modern frameworks like Laravel, Codeigniter, Symfony, Zend, etc. allow you to do all your routing in plain PHP code and head over to .htaccess only for the advanced use-cases. For our project, we chose the Slim Framework, because it is lightweight, with little trace, has a powerful HTTP Router, supports dependency injection, and other features. It was easy integrating it into our app.

One of the cool solutions we got up with was the ability to use the Slim Router and Dependency Injection alongside the include/page like structure the project had, like this:

 

$app->fet('/about/', function (Request $request, Response $response) {
                include_once __DIR__ . '/../about.php'; // the same legacy code
});

With Slim came autoloading, which is a must for porting legacy projects into the future. This approach allowed us to slowly refactor the app into a more modern version.