Capitalizing First Letter of Every Sentence in a String Using PHP

PHP Code Snippet

While working on one of my recent tasks at HolidayIQ, I came across a requirement where I had to capitalize the first letter of every sentence on the click of a button and also reduce multiple spaces to single space. The string does not have any HTML tags and is available inside a textarea.

Why JavaScript alone would not work

Starting with JavaScript, trimming off multiple spaces was easy. It could be easily done using REGEX:

 string = string.replace(/\s{2,}/g, ' ');

Capitalizing sentences hence becomes easy by splitting the string into arrays with “. ” or “? ” as delimiters. Though the above code removes extra spaces, it can not recognize ASCII characters like “\n” which is a new line and hence it would not work if you have paragraphs.

Detecting ASCII characters can be done using PHP, but to prevent the reload of the page when the button is clicked, I’d use AJAX.

PHP and JavaScript

Removing multiple spaces in PHP is as easy as it was in JavaScript.

$singleSpace = preg_replace('!\s+!', ' ', $string);

We can now play around with $singleSpace, since it has only single spaces. Splitting sentences based on punctuation characters can be done using preg_split.

$sentences = preg_split('/([.?!]+)/', $singleSpace, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);

The first argument of preg_split contains the pattern to search for. This pattern would be the punctuation characters which separate sentences. The [] indicates that the compiler should search for any one of the three characters within. The + sign indicates matching 1 or more occurrences of the characters to search for.

The second argument is the subject for preg_split to act upon. The third argument can be either 0, -1 or NULL which indicates infinite limit. The fourth argument is a flag. PREG_SPLIT_NO_EMPTY means the function would return only non-empty strings and PREG_SPLIT_DELIM_CAPTURE means the function would return the delimiters too.

The result of preg_split would be the sentences and the punctuation characters we searched for in an alternative fashion. Example:

$str = "hi I am Lalit. this is an exclamation! this is a question?";
$sentences = preg_split('/([.?!]+)/', $str, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r($sentences);

The above code will output:

Array
(
    [0] => hi I am Lalit
    [1] => .
    [2] =>  this is an exclamation
    [3] => !
    [4] =>  this is a question
    [5] => ?
)

All that is left for us to do is joining them with first letter in every sentence being an uppercase one. Since we need to pick up alternative elements of the array for capitalization, we can use the keys of the array with bitwise AND operator to skip alternative elements.

foreach ($sentences as $key => $sentence) {
    $new_string .= ($key & 1) == 0?
        ucfirst(strtolower(trim($sentence))) :
        $sentence.' ';
}

Wrapping up the entire code into a function:

function formatSentence($str){
    $sentences = preg_split('/([.?!]+)/', $string, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
    $new_string = '';
    foreach ($sentences as $key => $sentence) {
        $new_string .= ($key & 1) == 0?
            ucfirst(strtolower(trim($sentence))) :
            $sentence.' ';
    }
    return trim($new_string);

}

The reason why I would use PHP for this is to detect new lines and to do this, I would explode the string with the delimiter “\n” and call  formatSentence() for every element of the exploded array. Once all sentences have been processed, I can implode them using “\n”.

After writing this function in  a PHP file, I simply made an AJAX call to this function and updated the output of this function in the textarea using JavaScript.

If you think, it could be done in a better and more efficient way, do let me know.

Leave a Reply