Having trouble getting some code to auto-capitalize words properly

Hi everyone,

I have the following code snippet which is used to perform some case modification of a sentence. It takes a sentence and then capitalizes the first letter of every word of the sentence.

It has a few exceptions:

  1. It will not capitalize anything defined in $exclude_words
  2. It will not remove the all-caps from the words defined in $capital_words. In fact if a word that is defined in $capital_words is not typed in all caps, it will convert that word to all caps.
  3. It will convert anything that is typed in all-caps and is longer than the character length defined in $max_length to uppercase first letter and lowercase for the rest of the word.

The issue that I am having is this:

If the letter ‘i’ is entered in lowercase such as in the sentence “i really like cake with a tiny bit of icing” it will capitalize the letter ‘i’ which it should do but then it will also capitalize every other instance of the letter ‘i’ anywhere it occurs. The output will be: “I Really LIke Cake WIth a TIny BIt of IcIng”.

The occurrences of ‘i’ inside the word should obviously not be capitalized but I’m not sure how to solve this. I am a beginner and just trying my best to learn as I go along.

Thank you for any help anyone can provide! :smiley:

[php]<?php
$string = “how many words can i type if I have to keep typing words over and over again, this might be interesting!”; // Your input string
$exclude_words = array(“the”,“a”,“of”,“and”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$parts = explode(" ",$string);
foreach ($parts as $word) {
if (!in_array(strtolower($word),$exclude_words)) {
$old_word = $word;
$first_letter = substr($word,0,1);
if (strlen($word) >= $max_length && !in_array(strtolower($word),$capital_words)) { $word = strtolower($word); }
if (in_array(strtolower($word),$capital_words)) { $word = strtoupper($word); }
$newWord = strtoupper($first_letter).substr($word,1);
$string = str_replace($old_word,$newWord,$string);
}
}

echo $string; // This should output the string
?>[/php]

It looks like the problem is that every time you are changing a word, you are going back to the original string and modifying it. Aside from being very inefficient, this means that once you check the letter I, it will go back and replace all instances that match. Since you are using str_replace, it is matching every occurrence of the letter “i”. Try it with the phrase “at the cat sat a bat” and you will see the same thing with every at combination.

There are other ways to do this, but you should be able to make the following adjustment to your code to get it to work:[php]$newWord[] = strtoupper($first_letter).substr($word,1);
}
$string = implode(’ ',$newWord);
}[/php]

Instead of going back and manipulating the original string, this simply builds an array of all the new words and then converts the array to a string with spaces in between each word.

You may be familiar with it already, but if not, check out the function ucwords(). I would consider using it instead to convert the words that you want.

Let me know if this doesn’t make sense or work for you.

First of all I want to say thank you so much for helping me.

I went ahead and implemented the change you suggested however now it is completely stripping out any of the words that are in $exclude_words. It will remove them from the output so if I have the word “and” in a sentence, that word will not show up.

Am I doing something wrong in implementing your code change?

Thank you!

Nope, its not you, its me… sorry!

See if this is doing what you want[php]$string = “how many words can i type if I have to keep typing words over and over again, this might be interesting!”; // Your input string
$exclude_words = array(“the”,“a”,“of”,“and”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$parts = explode(" ",$string);
foreach ($parts as $word) {
if (!in_array(strtolower($word),$exclude_words)) {
$first_letter = substr($word,0,1);
if(strlen($word) >= $max_length && !in_array(strtolower($word),$capital_words)) $newWord[] = strtolower($word);
elseif(in_array(strtolower($word),$capital_words)) $newWord[] = strtoupper($word);
else $newWord[] = strtoupper($first_letter).substr($word,1);
}
else $newWord[] = strtolower($word);
}
$string = implode(’ ',$newWord);

echo $string; // This should output the string[/php]

I tried not to change your code too much. The one thing I noticed is that in your $max_length definition you define it as anything over is case-lowered, but when you are checking the length you are using greater or equal to, which effectively means over 4 instead of 5. This might be what you are looking for, or you may want to remove the “=” from the logic in this if.

Sorry I wasn’t quite following everything on my first post. I tested this and it is working (with exception of the word length) as far as I understand what you are looking to do. If not, let me know and we will get it working the way you want.

Thank you again!

Now the output that I am getting is this:

How Many words Can I Type If I Have To Keep typing words Over and Over again, This might Be interesting!

when I was expecting this:

How Many Words Can I Type If I Have To Keep Typing Words Over and Over Again, This Might Be Interesting!

Also good point about the extra = in the logic for max_length! Will remove that!

Thank you again!

I read your comment on $max_length: // Maximum word length, anything over is case-lowered
To mean that you wanted anything over that length in all lower case.

I should have gone back and read your post, all the rules were clearly stated there.

I think this will do what you want:[php]$string = “how many words can i type if I have to keep typing words over and over again, this might be interesting!”; // Your input string
$exclude_words = array(“the”,“a”,“of”,“and”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$parts = explode(" ",$string);

foreach ($parts as $word)
{
if (!in_array(strtolower($word),$exclude_words))
{
if(in_array(strtolower($word),$capital_words)) $newWord[] = strtoupper($word);
elseif(strlen($word) >=$max_length && !in_array(strtolower($word),$capital_words)) $newWord[] = ucwords($word);
else $newWord[] = ucwords($word);
}
else $newWord[] = strtolower($word);
}

$string = implode(’ ',$newWord);

echo $string; // This should output the string
?>[/php]

I used ucwords() in the code so that you can see its implementation, but you can easily change it back if you prefer. I also removed the equal sign so that it is checking for greater than this many characters, per the comment.

There is one thing that may still need to be changed: In rule 3, you state it will convert anything that is typed in all-caps and is longer… Does this mean that if the word is not in all-caps, but greater than the max, it should leave it unchanged? Example eXCELLENt would ouput eXCELLENT. If that is the case, we will need to add an extra test to the logic. Won’t be hard though.

This worked! Thank you!

About your comment regarding adding the extra rule, you are definitely correct!

I would like to be able to screen for something like ‘eXCELLENT’ and have it change to ‘Excellent’ if possible.

Thank you again so much for your help. I could not have gotten this to work without you.

My pleasure, glad it’s working!

Here is one last version to try. I believe it handles the exception we discussed:[php]$string = “how many words can i type if I have to keep typing words OVER and over again, this might be interesting! eXCELLENt”; // Your input string
$exclude_words = array(“the”,“a”,“of”,“and”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$parts = explode(" ",$string);

foreach ($parts as $word)
{
if (!in_array(strtolower($word),$exclude_words))
{
if(in_array(strtolower($word),$capital_words)) $newWord[] = strtoupper($word);
elseif(strlen($word) >=$max_length && !in_array(strtolower($word),$capital_words)) $newWord[] = ucwords(strtolower($word));
else $newWord[] = ucwords($word);
}
else $newWord[] = strtolower($word);
}

$string = implode(’ ',$newWord);

echo $string; // This should output the string[/php]

Wow malasho this works great.

I had two more questions if you had the time!

1- Did you remove the ‘=’ for greater than or equals to because I still see it and I’m not sure if that one is supposed to be there or not.

2- I was previously inserting my code inside another PHP file used for a script I have. My previous version was like this:

[php]if (is_subclass_of($this, ‘vB_DataManager_ThreadPost’) && is_array($this->validfields[‘title’]))
{
global $exclude_words, $capital_words, $max_length ;
$exclude_words = array(“the”,“a”,“of”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$this->validfields['title'][VF_CODE] = '
    global $exclude_words, $capital_words, $max_length ;
    $retval = $dm->verify_title($data);

    $words = preg_split("/(\W+)/", $data, -1, PREG_SPLIT_DELIM_CAPTURE);
    
    foreach ($words as $word) {
         if (!in_array(strtolower($word),$exclude_words)) {
              $old_word = $word;
              $first_letter = substr($word,0,1);
              if (strlen($word) >= $max_length && !in_array(strtolower($word),$capital_words)) { 
                  $word = strtolower($word); 
              }
          
              if (in_array(strtolower($word),$capital_words)) {
                  $word = strtoupper($word); 
              }
              
          $newWord = strtoupper($first_letter).substr($word,1);
          $data = str_replace($old_word,$newWord,$data);
         }
    }

    return $retval;
';

}[/php]

And now I am trying to merge your changes back into my required format. What I have below is what I tried but it doesn’t seem to work. Any insights you could provide please? Thank you!

[php]if (is_subclass_of($this, ‘vB_DataManager_ThreadPost’) && is_array($this->validfields[‘title’]))
{
global $exclude_words, $capital_words, $max_length ;
$exclude_words = array(“the”,“a”,“of”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$this->validfields['title'][VF_CODE] = '
    global $exclude_words, $capital_words, $max_length ;
    $retval = $dm->verify_title($data);

    $words = preg_split("/(\W+)/", $data, -1, PREG_SPLIT_DELIM_CAPTURE);
    
    foreach ($words as $word) {
	
	if (!in_array(strtolower($word),$exclude_words)){
 if(in_array(strtolower($word),$capital_words)) $newWord[] = strtoupper($word);
 elseif(strlen($word) >=$max_length && !in_array(strtolower($word),$capital_words)) $newWord[] = ucwords(strtolower($word));
 else $newWord[] = ucwords($word);

}
else $newWord[] = strtolower($word);
}

$data = implode(’ ',$newWord);

    return $retval;
';

}[/php]

Thank you! :smiley:

Turned out it was some quotes needing to be double instead of single.

Thank you so much for all of your help! :slight_smile:

Actually it looks like I may have spoken too soon.

This is the code that I now have. It seems to work in almost all cases except if I put in a quotation mark in the input sentence. It outputs the quotation as " instead of ".

Here is the code:

[php]if (is_subclass_of($this, ‘vB_DataManager_ThreadPost’) && is_array($this->validfields[‘title’]))
{
global $exclude_words, $capital_words, $max_length ;
$exclude_words = array(“the”,“a”,“of”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$this->validfields['title'][VF_CODE] = '
    global $exclude_words, $capital_words, $max_length ;
    $retval = $dm->verify_title($data);

    $words = preg_split("/(\W+)/", $data, -1, PREG_SPLIT_DELIM_CAPTURE);
    
    foreach ($words as $word) {
        
        if (!in_array(strtolower($word),$exclude_words)){
        
            if(in_array(strtolower($word),$capital_words)) {
                             $newWord[] = strtoupper($word);
                         } elseif(strlen($word) >=$max_length && !in_array (strtolower ($word), $capital_words)) { 
                             $newWord[] = ucwords(strtolower($word)); 
                         } else {
                             
                             // if longer then 1 character, uppercase
                             if (strlen($word) > 1){
                                 $newWord[] = ucwords($word);
                             // else just bring the single character word in.    
                             } else {
                                 $newWord[] = $word;
                             }
                             
                         }
         } else { 
             $newWord[] = strtolower($word);
         }
     }
                
                $data = implode("",$newWord);

    return $retval;
';

}[/php]

Sorry I wasn’t able to get back to you sooner.

I had a meeting this evening that ran very late. I will look at it tomorrow morning, it should be something minor, but I’m just not seeing it at the moment. I’m seriously sleep deprived at the moment…

No problem, I will make another post in a couple of hours and see if I’ve gotten anywhere with it.

Thank you so much for staying with me on this!

Lets give this a try:[php]if (is_subclass_of($this, ‘vB_DataManager_ThreadPost’) && is_array($this->validfields[‘title’]))
{
global $exclude_words, $capital_words, $max_length ;
$exclude_words = array(“the”,“a”,“of”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$this->validfields['title'][VF_CODE] = '
    global $exclude_words, $capital_words, $max_length ;
    $retval = $dm->verify_title($data);

    $words = preg_split("/(\W+)/", $data, -1, PREG_SPLIT_DELIM_CAPTURE);
    
    foreach ($words as $word) {
        
        if (!in_array(strtolower($word),$exclude_words)){
        
            if(in_array(strtolower($word),$capital_words)) {
                             $newWord[] = strtoupper($word);
                         } elseif(strlen($word) >=$max_length && !in_array (strtolower ($word), $capital_words)) { 
                             $newWord[] = ucwords(strtolower($word)); 
                         } else {
                             
                             // if longer then 1 character, uppercase
                             if (strlen($word) > 1){
                                 $newWord[] = ucwords($word);
                             // else just bring the single character word in.    
                             } else {
                                 $newWord[] = $word;
                             }
                             
                         }
         } else { 
             $newWord[] = strtolower($word);
         }
     }
                
                $data = html_entity_decode(implode("",$newWord));

    return $retval;
';

}[/php]

Wow you are a solider!!

I actually just realized the last code I pasted is wrong because on line 25 the solution I’m trying to do to fix the apostrophe error is incorrect. I need to just use your explode technique rather than using preg_split.

Could we go back to using your final solution, this: http://dpaste.com/806648/

and making it work with my forum script here:

[php]if (is_subclass_of($this, ‘vB_DataManager_ThreadPost’) && is_array($this->validfields[‘title’]))
{
global $exclude_words, $capital_words, $max_length ;
$exclude_words = array(“the”,“a”,“of”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$this->validfields['title'][VF_CODE] = '
    global $exclude_words, $capital_words, $max_length ;
    $retval = $dm->verify_title($data);

    $words = preg_split("/(\W+)/", $data, -1, PREG_SPLIT_DELIM_CAPTURE);
    
    foreach ($words as $word) {
         if (!in_array(strtolower($word),$exclude_words)) {
              $old_word = $word;
              $first_letter = substr($word,0,1);
              if (strlen($word) >= $max_length && !in_array(strtolower($word),$capital_words)) { 
                  $word = strtolower($word); 
              }
          
              if (in_array(strtolower($word),$capital_words)) {
                  $word = strtoupper($word); 
              }
              
          $newWord = strtoupper($first_letter).substr($word,1);
          $data = str_replace($old_word,$newWord,$data);
         }
    }

    return $retval;
';

}[/php]

This is what I am trying right now:

The problem is that the words don’t output with any spaces in them.

Input: ok let’s see if this works if i try it again lol!
Output: OkLet’sSeeIfThisWorksIfITryItAgainLol!

[php]if (is_subclass_of($this, ‘vB_DataManager_ThreadPost’) && is_array($this->validfields[‘title’]))
{
global $exclude_words, $capital_words, $max_length ;
$exclude_words = array(“the”,“a”,“of”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$this->validfields['title'][VF_CODE] = '
    global $exclude_words, $capital_words, $max_length ;
    $retval = $dm->verify_title($data);

    $words = explode(" ",$data);
    
    foreach ($words as $word) {

if (!in_array(strtolower($word),$exclude_words))
{
if(in_array(strtolower($word),$capital_words)) $newWord[] = strtoupper($word);
elseif(strlen($word) >=$max_length && !in_array(strtolower($word),$capital_words)) $newWord[] = ucwords(strtolower($word));
else $newWord[] = ucwords($word);
}
else $newWord[] = strtolower($word);
}

                $data = implode("",$newWord);

    return $retval;
';

}[/php]

Getting one step closer.

Spaces issue from prior post was resolved. The only issue I am seeing now is that punctuation marks next to a word that needs to be capitalized and is in $capital_words keeps that word from becoming capitalized.

If I have ‘lol’ in $capital_words and type lol! then it doesn’t see it as ‘lol’ and doesn’t capitalize the entire word.

[php]if (is_subclass_of($this, ‘vB_DataManager_ThreadPost’) && is_array($this->validfields[‘title’]))
{
global $exclude_words, $capital_words, $max_length ;
$exclude_words = array(“the”,“a”,“of”,“is”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$this->validfields['title'][VF_CODE] = '
    global $exclude_words, $capital_words, $max_length ;
    $retval = $dm->verify_title($data);

    $words = explode(" ",$data);
    
    foreach ($words as $word) {

if (!in_array(strtolower($word),$exclude_words))
{
if(in_array(strtolower($word),$capital_words)) $newWord[] = strtoupper($word);
elseif(strlen($word) >=$max_length && !in_array(strtolower($word),$capital_words)) $newWord[] = ucwords(strtolower($word));
else $newWord[] = ucwords($word);
}
else $newWord[] = strtolower($word);
}

                $data = implode(" ",$newWord);

    return $retval;
';

}[/php]

This shouldn’t be that hard, but I keep dozing off… Sorry!

The punctuation marks create an interesting wrinkle. I am thinking we need to:

Explode on spaces. This creates a “master” array.
Run each “word” through preg_split("/[^a-zA-Z]/i",$data); This will create a sub-array where every non-alpha character is its own element.
Take each element from the sub-array and if it is non-alpha, concatenate it to a $temp string. If it is alpha, concatenate it after running it through the upper-lower routine (no spaces!)
Once we are through the last sub-array element, put the processed “word” ($temp) into our $newWord array.

I am so sorry to not be able to complete this tonight. After I get a little sleep, we should be able to finish this (although you may have it done before I get the chance…)

There may be other ways to do this, but the punctuation raises several wrinkles and I believe the process detailed above will handle it. If you don’t mind a space between all punctuation marks and anything else, you could simply replace the $words = explode… with the preg_split I detailed. It just depends on how you want to treat the punctuation.

Sorry again!

jay

You have nothing to apologize for. In fact, I should be the one apologizing for my lack of being able to do this on my own and constantly requiring your assistance, which is very much appreciated by the way!

I will work on it and keep this thread updated with whatever I figure out. Would love to get your feedback in the morning.

Thank you sir! :slight_smile:

So this is what I have now and it seems to be working except for specific scenario:

If a word has a hypen in it or if any two words are joined together with a punctuation mark and with no space in between them, the second word is not capitalized.

input:
hyphen-word

current output:
Hyphen-word

desired output:
Hyphen-Word

Not sure if that’s possible with my current code but figured I would ask the expert :slight_smile:

[php]if (is_subclass_of($this, ‘vB_DataManager_ThreadPost’) && is_array($this->validfields[‘title’]))
{
global $exclude_words, $capital_words, $max_length ;
$exclude_words = array(“a”,“an”,“and”,“at”,“but”,“by”,“for”,“in”,“nor”,“of”,“on”,“or”,“so”,“the”,“to”,“up”,“yet”); // Exclude analyzing these words
$capital_words = array(“brb”,“lol”,“usa”); // Capital exclusives (leave in lowercase in array)
$max_length = 5; // Maximum word length, anything over is case-lowered

$this->validfields['title'][VF_CODE] = '
    global $exclude_words, $capital_words, $max_length ;
    $retval = $dm->verify_title($data);

    $words = explode(" ",$data);
    
    foreach ($words as $word) {

if (!in_array(strtolower($word),$exclude_words))
{
if(in_array(preg_replace("/[^a-zA-Z]/","",strtolower($word)),$capital_words))
$newWord[] = strtoupper($word);
elseif(strlen($word) >=$max_length && !in_array(strtolower($word),$capital_words)) $newWord[] = ucwords(strtolower($word));
else $newWord[] = ucwords($word);
}
else $newWord[] = strtolower($word);
}

                $data = implode(" ",$newWord);

    return $retval;
';

}[/php]

Sponsor our Newsletter | Privacy Policy | Terms of Service