PHP script debugging help

GEC: Discuss gaming, computers and electronics and venture into the bizarre world of STGODs.

Moderator: Thanas

Post Reply
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

PHP script debugging help

Post by Darth Wong »

Does anyone remember the script from this old thread?

I discovered recently that it only works up to a certain size of text. If I copy and paste a block of text that's too large, it just chokes and treats it as if I didn't submit anything at all. Does anyone know why it would do this?
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
User avatar
phongn
Rebel Leader
Posts: 18487
Joined: 2002-07-03 11:11pm

Post by phongn »

How big is post_max_size set to?
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

Post by Darth Wong »

phongn wrote:How big is post_max_size set to?
8M. Much bigger than the size of text I'm posting.
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

Post by Darth Wong »

Shit. I just discovered that web browsers limit the amount of text you can copy and paste into an HTML TEXTAREA, to 32kB or 64kB: much too restrictive for processing webpages. So the problem is in the browser, not the script itself.

Maybe I need to give it an alternate way of submitting data.
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
User avatar
Dooey Jo
Sith Devotee
Posts: 3127
Joined: 2002-08-09 01:09pm
Location: The land beyond the forest; Sweden.
Contact:

Post by Dooey Jo »

That seems strange. I'm sure things like Wikipedia send way more than 64kB using textboxes. Perhaps you can try using some other enctype for the form; maybe enctype="multipart/form-data", which is usually used for uploading files.
Image
"Nippon ichi, bitches! Boing-boing."
Mai smote the demonic fires of heck...

Faker Ninjas invented ninjitsu
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

Post by Darth Wong »

The enctype didn't help, so I just added an alternate way of submitting data: entering the URL so the script can just download the source itself.
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
User avatar
phongn
Rebel Leader
Posts: 18487
Joined: 2002-07-03 11:11pm

Post by phongn »

Darth Wong wrote:Shit. I just discovered that web browsers limit the amount of text you can copy and paste into an HTML TEXTAREA, to 32kB or 64kB: much too restrictive for processing webpages. So the problem is in the browser, not the script itself.
Apparently it's part of the HTML specification.
Maybe I need to give it an alternate way of submitting data.
You could have files uploaded instead and then process them.
Dooey Jo wrote:That seems strange. I'm sure things like Wikipedia send way more than 64kB using textboxes. Perhaps you can try using some other enctype for the form; maybe enctype="multipart/form-data", which is usually used for uploading files.
64KB of text?! I don't think so.
User avatar
Pu-239
Sith Marauder
Posts: 4727
Joined: 2002-10-21 08:44am
Location: Fake Virginia

Post by Pu-239 »

Wikipedia does let you edit only subsections of an article which helps get around this. Anyway, assuming 1 byte/character and 8 characters/word and not accounting for spaces or punctuation, an article would have to be 8192 words long to hit this limit. Wiki does have some markup which would increase this. Anyway, the current featured article (http://en.wikipedia.org/wiki/Emmy_Noether) is 95KB long.

ah.....the path to happiness is revision of dreams and not fulfillment... -SWPIGWANG
Sufficient Googling is indistinguishable from knowledge -somebody
Anything worth the cost of a missile, which can be located on the battlefield, will be shot at with missiles. If the US military is involved, then things, which are not worth the cost if a missile will also be shot at with missiles. -Sea Skimmer


George Bush makes freedom sound like a giant robot that breaks down a lot. -Darth Raptor
User avatar
Dooey Jo
Sith Devotee
Posts: 3127
Joined: 2002-08-09 01:09pm
Location: The land beyond the forest; Sweden.
Contact:

Post by Dooey Jo »

phongn wrote:
Dooey Jo wrote:That seems strange. I'm sure things like Wikipedia send way more than 64kB using textboxes. Perhaps you can try using some other enctype for the form; maybe enctype="multipart/form-data", which is usually used for uploading files.
64KB of text?! I don't think so.
That's not really that much, especially if there's markup in there, not to mention unicode characters. I just tested posting 74kB of pure text into Google Translate and it worked fine even in IE6, so it must be possible. I also can't find any reference in the specifications to a maximum size for form fields. I know that IE used to limit GET data to 2000 characters or something like that, but that's about it.
Darth Wong wrote:I just added an alternate way of submitting data: entering the URL so the script can just download the source itself.
Yeah, that's probably a more user-friendly solution anyway :)
Image
"Nippon ichi, bitches! Boing-boing."
Mai smote the demonic fires of heck...

Faker Ninjas invented ninjitsu
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

Post by Darth Wong »

New and improved version of the script, with a few other minor improvements, like getting rid of scripts and headers, fixing HRs, and warning you if you've gone over the phpBB post size limit which is 64 kB:

Code: Select all

<?
// HTML to BBCode Converter

// Set search/replace variables
unset($pattern);
unset($replacement);

// Eliminate excess whitespace
$pattern[]="/ [ |\t]+/";
$replacement[]=" ";

// Images (note that .*? is an ungreedy version of .*)
$pattern[]="/<IMG.*?SRC.*?\"(.*?)\".*?>/i";
$replacement[]="[img]\\1[/img]";

// Convert http links to URL BBcode, ignore other link types
$pattern[]="/<A.[^>]*HREF=\"*(HTTP:[^\"]*)\".*?>(.*?)<\/A>/i";
$replacement[]="[url=\\1]\\2[/url]";

// Eliminate forms, floats, headers, and scripts
$pattern[]="/<FORM.*?<\/FORM>/i";
$replacement[]="";
$pattern[]="/<DIV[^>]*FLOAT.*?>.*?<\/DIV>/i";
$replacement[]="";
$pattern[]="/<HEAD.*?<\/HEAD.*?>/i";
$replacement[]="";
$pattern[]="/<SCRIPT.*?<\/SCRIPT.*?>/i";
$replacement[]="";

// Paragraph structure
$pattern[]="/<P.*?>|<\/P>|<DIV.*?>|<\/DIV>|<BR.*?>|<\/TD>/i";
$replacement[]="\n";
$pattern[]="/<BLOCKQUOTE>/i";
$replacement[]="[quote]";
$pattern[]="/<BLOCKQUOTE[^>]*CITE=\"(.*?)\".*?>/i";
$replacement[]="[quote=\"\\1\"]";
$pattern[]="/</BLOCKQUOTE>/i";
$replacement[]="[/quote]";

// Miscellaneous HTML codes
$pattern[]="/<I>|<I .*?>/i";
$replacement[]="[i]";
$pattern[]="/<\/I>/i";
$replacement[]="[/i]";
$pattern[]="/<B>|<B .*?>/i";
$replacement[]="[b]";
$pattern[]="/<\/B>/i";
$replacement[]="[/b]";
$pattern[]="/<U>|<U .*?>/i";
$replacement[]="[u]";
$pattern[]="/<\/U>/i";
$replacement[]="[/u]";
$pattern[]="/<H1.*?>/i";
$replacement[]="\n[b][size=24]";
$pattern[]="/<\/H1>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<H2.*?>/i";
$replacement[]="\n[b][size=20]";
$pattern[]="/<\/H2>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<H3.*?>/i";
$replacement[]="\n[b][size=16]";
$pattern[]="/<\/H3>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<OL.*?>/i";
$replacement[]="[list=1]";
$pattern[]="/<UL.*?>/i";
$replacement[]="[list]";
$pattern[]="/<\/OL>|<\/UL/i";
$replacement[]="[/list]";
$pattern[]="/<LI.*?>/i";
$replacement[]="[*]";
$pattern[]="/<PRE>/i";
$replacement[]="[code]";
$pattern[]="/<\/PRE>/i";
$replacement[]="
";

// Replace horizontal rulers with underscores
$pattern[]="/<HR.*?>/i";
$replacement[]="\n________________________________________\n\t\n";

// Special characters not processed by html_entity_decode
$pattern[]="/&mdash;|&ndash;|–|—/";
$replacement[]="-";
$pattern[]="/&ldquo;|&rdquo;|"|“|”/";
$replacement[]="\"";
$pattern[]="/&rsquo;|&lsquo;|'|‘|’/";
$replacement[]="'";

// Acquire data
unset($htmlsource);
if (isset($_POST['htmlsource']))
{
// Read in data but remove line feeds and carriage returns
$htmlsource=ereg_replace("/\n|\r|\r\n|\n\r/"," ",$_POST['htmlsource']);
}
else if (isset($_POST['url']))
{
$handle=@fopen($_POST['url'],"rb");
if ($handle)
{
stream_set_timeout($handle,10);
$line="";
// Read in source
while (!feof($handle) && strlen($line)<1024*256)
{
$line=rtrim(fgets($handle,1024*256));
$urlsource.=$line." ";
}
// Remove line feeds and carriage returns
$htmlsource=ereg_replace("/\n|\r|\r\n|\n\r/"," ",$urlsource);
}
}

if (isset($htmlsource))
{
// Perform HTML substitution to BBcode
$htmlsource=preg_replace($pattern,$replacement,html_entity_decode($htmlsource));

// Eliminate all remaining HTML tags
$htmlsource=preg_replace("/<.*?>/","",$htmlsource);

// Eliminate large groups of newlines (3 or more)
$htmlsource=preg_replace("/\n[ ]*[\n][ ]*[\n]+/","",$htmlsource);

// Check length
if (strlen($htmlsource)>65535)
{
echo "<h2>Warning: Post length is ".strlen($htmlsource)." characters. ";
echo "The phpBB post size limit is 65535 characters.</h2>\n\n";
}

// Replace remaining newlines with <br /> tags for output
$htmlsource=preg_replace("/\n/","<br />",htmlspecialchars($htmlsource));

// Output results
echo "<html>\n<body>\n".$htmlsource."\n</body>\n</html>\n";
}
else
{
?>
<html>
<body>
<h1 style="text-align:center">HTML Source to BB Code Converter</h1>
<?
if (isset($_POST['textsubmit'])) echo "<p><b>Error</b>: Exceeds script input size limit.</p>\n";
if (isset($_POST['urlsubmit'])) echo "<p><b>Error</b>: URL could not be retrieved.</p>\n";
?>
<p style="margin-bottom:0">Copy and paste the HTML source code into this text box:</p>
<form action="<?$_SERVER['PHP_SELF']?>" method="post">
<textarea rows="20" cols="80" name="htmlsource"></textarea><br />
<input type="submit" name="textsubmit" value="Submit" /><input type="reset" />
</form>
<p style="margin-bottom:0">Alternatively, enter a URL for the website to convert:</p>
<form action="<?$_SERVER['PHP_SELF']?>" method="post">
<input type="text" size="80" name="url" value="http://" />
<input type="submit" name="urlsubmit" value="Submit" /><input type="reset" />
</form>

</body>
</html>
<?
}
?>
[/code]
Last edited by Darth Wong on 2008-09-04 11:15am, edited 3 times in total.
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

Post by Darth Wong »

I tried the new version on a page full of extraneous markup and scripts (the CNN homepage) and it seems to work fine. It also works on my unfinished hidden Reign of Terror fanfic single-page version even though it's well above the phpBB post size limit (that's because it includes all of the chapters ever written on a single page, including three chapters that I never posted publicly and one which was only partially completed).
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
User avatar
Darth Wong
Sith Lord
Sith Lord
Posts: 70028
Joined: 2002-07-03 12:25am
Location: Toronto, Canada
Contact:

Post by Darth Wong »

Does anyone know how to extract the website domain name and path from a URL? In other words:

http://www.cnn.com/ => www.cnn.com
http://www.cnn.com => www.cnn.com
http://www.cnn.com/WORLD/ => www.cnn.com/WORLD
http://www.cnn.com/WORLD/StupidArticle.html => www.cnn.com/WORLD

There doesn't seem to be a built-in function, and maybe I just suck at regexp but I'm having trouble getting it to do what I want.
Image
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing

"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC

"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness

"Viagra commercials appear to save lives" - tharkûn on US health care.

http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
Post Reply