New PHP script: any security issues?

GEC: Discuss gaming, computers and electronics and venture into the bizarre world of STGODs.

Moderator: Thanas

Post Reply
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

New PHP script: any security issues?

Post by Darth Wong »

I just created a PHP script, mostly for my own use (although anybody can you use it if you like), which converts HTML formatted pages to BBcode. I made it because I hate the way copying and pasting from the web browser often butchers the shit out of a webpage, especially with all kinds of unnecessary line-breaks. I used to have a tutorial for doing this in the Announcements area, but that involved creating your own *nix script so it was a pretty big PITA for most users, and a web-based conversion utility is much simpler.

The question is: is there anything I should be concerned about, security wise? It's a pretty simple script so I can't see how something could go wrong, but I'm no security expert:

http://bbs.stardestroyer.net/html2bbcode.php

Code: Select all

<?
// HTML to BBCode Converter

// Set search/replace variables
unset($pattern);
unset($replacement);

// Eliminate whitespace
$pattern[]="/ [ |\t]+/";
$replacement[]=" ";

// Images (note that .*? is an ungreedy version of .*)
$pattern[]="/<IMG.*?SRC.*?\"(.*?)\".*?>/i";
$replacement[]="[img]\\1[/img]";

// Links
$pattern[]="/<A.[^>]*HREF[^\"]*\"([^\"]*)\".*?>(.*?)<\/A>/i";
$replacement[]="[url=\\1]\\2[/url]";

// Forms
$pattern[]="/<FORM.*?<\/FORM>/i";
$replacement[]="";

// Floats
$pattern[]="/<DIV[^>]*FLOAT.*?>.*?<\/DIV>/i";
$replacement[]="";

// Paragraph structure
$pattern[]="/<P.*?>|<\/P>|<DIV.*?>|<\/DIV>|<BR.*?>|<\/TD>/i";
$replacement[]="\n";
$pattern[]="/<BLOCKQUOTE>/i";
$replacement[]="[quote]";
$pattern[]="/<BLOCKQUOTE[^>]*CITE=\"(.*?)\".*?>/i";
$replacement[]="[quote=\"\\1\"]";
$pattern[]="/</BLOCKQUOTE>/i";
$replacement[]="[/quote]";

// Miscellaneous HTML codes
$pattern[]="/<I>|<I .*?>/i";
$replacement[]="[i]";
$pattern[]="/<\/I>/i";
$replacement[]="[/i]";
$pattern[]="/<B>|<B .*?>/i";
$replacement[]="[b]";
$pattern[]="/<\/B>/i";
$replacement[]="[/b]";
$pattern[]="/<U>|<U .*?>/i";
$replacement[]="[u]";
$pattern[]="/<\/U>/i";
$replacement[]="[/u]";
$pattern[]="/<H1.*?>/i";
$replacement[]="\n[b][size=24]";
$pattern[]="/<\/H1>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<H2.*?>/i";
$replacement[]="\n[b][size=20]";
$pattern[]="/<\/H2>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<H3.*?>/i";
$replacement[]="\n[b][size=16]";
$pattern[]="/<\/H3>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<OL.*?>/i";
$replacement[]="[list=1]";
$pattern[]="/<UL.*?>/i";
$replacement[]="[list]";
$pattern[]="/<\/OL>|<\/UL/i";
$replacement[]="[/list]";
$pattern[]="/<LI.*?>/i";
$replacement[]="[*]";
$pattern[]="/<PRE>/i";
$replacement[]="[code]";
$pattern[]="/<\/PRE>/i";
$replacement[]="
";

// Special characters not processed by html_entity_decode
$pattern[]="/&mdash;|&ndash;|–|—/";
$replacement[]="-";
$pattern[]="/&ldquo;|&rdquo;|"|“|”/";
$replacement[]="\"";
$pattern[]="/&rsquo;|&lsquo;|'|‘|’/";
$replacement[]="'";

// Acquire data
if (isset($_POST['htmlsource']))
{
// Read in data but remove line feeds and carriage returns
$htmlsource=ereg_replace("/\n|\r|\r\n|\n\r/"," ",$_POST['htmlsource']);

// Perform HTML substitution to BBcode
$htmlsource=preg_replace($pattern,$replacement,html_entity_decode($htmlsource));

// Eliminate all remaining HTML tags
$htmlsource=preg_replace("/<.*?>/","",$htmlsource);

// Replace remaining newlines with <br /> tags for output
$htmlsource=preg_replace("/\n/","<br />",htmlspecialchars($htmlsource));

// Output results
echo "<html>\n<body>\n".$htmlsource."\n</body>\n</html>\n";
}
else
{
?>
<html>
<body>
<h1 style="text-align:center">HTML Source to BB Code Converter</h1>
<p>Copy and paste the HTML source code into this text box:</p>
<form action="<?$_SERVER['PHP_SELF']?>" method="post">
<textarea rows="20" cols="80" name="htmlsource"></textarea><br />
<input type="submit" value="Submit" /><input type="reset" />
</form>
</body>
</html>
<?
}
?>[/code]
Last edited by Darth Wong on 2008-08-07 03:49pm, edited 1 time in total.
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
User avatar
Ariphaos
Jedi Council Member
Posts: 1739
Joined: 2005-10-21 02:48am
Location: Twin Cities, MN, USA
Contact:

Post by Ariphaos »

Someone may vet your regexps, but this is the only thing that caught my eye:
$htmlsource=ereg_replace("/\n|\r|\r\n|\n\r/"," ",$_POST['htmlsource']);
Probably want that to be preg_replace, I would assume.

Otherwise, you're just doing a lot of regexps on a submission and outputting bbcode, you're not trusting the information at all except to process it.
Give fire to a man, and he will be warm for a day.
Set him on fire, and he will be warm for life.
User avatar
Starglider
Miles Dyson
Posts: 8709
Joined: 2007-04-05 09:44pm
Location: Isle of Dogs
Contact:

Post by Starglider »

Damn. That reminds me, I still have all the proposed code upgrades for this board waiting to be packaged into a single mod against the new code base. All available time for that got sucked up by contributing to the Armageddon fic (I'm a really slow writer).

Maybe I'll have time to take a look at that over Christmas. Maybe. :(

Anyway, this code looks fine to me.
User avatar
Dooey Jo
Sith Devotee
Posts: 3127
Joined: 2002-08-09 01:09pm
Location: The land beyond the forest; Sweden.
Contact:

Post by Dooey Jo »

There are also the html_entity_decode() and htmlentities() functions for transforming html entities into their applicable characters, and reverse. As well, there is a trim() function that removes whitespace, and a nl2br() that transforms newlines into <br/> nodes.
Image
"Nippon ichi, bitches! Boing-boing."
Mai smote the demonic fires of heck...

Faker Ninjas invented ninjitsu
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

Post by Darth Wong »

Dooey Jo wrote:There are also the html_entity_decode() and htmlentities() functions for transforming html entities into their applicable characters, and reverse. As well, there is a trim() function that removes whitespace, and a nl2br() that transforms newlines into <br/> nodes.
Good idea, using html_entity_decode, although it doesn't get all of the special character codes for some reason. Nevertheless, it cuts down significantly on the size of the script. The trim() function won't work for me because when you enter the text into a textarea input field on an HTML form, it comes through as a single large text variable, not as a series of lines upon which you can use trim(). And nl2br() does the opposite of what I want to do, which is to convert <br /> codes to newlines.
Destructionator XIII wrote:Including translation from a html <code>, or better yet, <pre> block into a bbcode one might be useful too.
Ah yes, that's useful. I edited that into the code.
Starglider wrote:Damn. That reminds me, I still have all the proposed code upgrades for this board waiting to be packaged into a single mod against the new code base. All available time for that got sucked up by contributing to the Armageddon fic (I'm a really slow writer).

Maybe I'll have time to take a look at that over Christmas. Maybe. :(

Anyway, this code looks fine to me.
Good to hear, but you may want to hold off on that. I've been informed that the phpbb team is withdrawing support for the entire phpBB 2.0.x codebase as of this coming October, which means I have to upgrade to version 3.0.x of the software before then. There will be some pretty drastic changes in terms of featuresets when I do this (I'm currently working on modifying the default template to look more like BlackSoul), so we'll have to see what remains to be done afterwards.
Xeriar wrote:Probably want that to be preg_replace, I would assume.
You would think that, but for some reason the NL/CR replacements worked with ereg_replace but not with preg_replace. I wasn't about to beat my head against my desk trying to figure out why, so I just left it that way.
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
User avatar
Dooey Jo
Sith Devotee
Posts: 3127
Joined: 2002-08-09 01:09pm
Location: The land beyond the forest; Sweden.
Contact:

Post by Dooey Jo »

Darth Wong wrote:
Dooey Jo wrote:There are also the html_entity_decode() and htmlentities() functions for transforming html entities into their applicable characters, and reverse. As well, there is a trim() function that removes whitespace, and a nl2br() that transforms newlines into <br/> nodes.
Good idea, using html_entity_decode, although it doesn't get all of the special character codes for some reason. Nevertheless, it cuts down significantly on the size of the script. The trim() function won't work for me because when you enter the text into a textarea input field on an HTML form, it comes through as a single large text variable, not as a series of lines upon which you can use trim(). And nl2br() does the opposite of what I want to do, which is to convert <br /> codes to newlines.
Ah, I forgot that trim() only cuts the whitespace at the beginning and end of a string, so no, it won't work very well here. I figured you could use nl2br() for this line, though:

Code: Select all

   // Replace remaining newlines with <br /> tags for output
   $htmlsource=preg_replace("/\n/","<br />",htmlspecialchars($htmlsource)); 
Image
"Nippon ichi, bitches! Boing-boing."
Mai smote the demonic fires of heck...

Faker Ninjas invented ninjitsu
Post Reply