PHP script debugging help
Moderator: Thanas
- Darth Wong
- Sith Lord
- Posts: 70028
- Joined: 2002-07-03 12:25am
- Location: Toronto, Canada
- Contact:
PHP script debugging help
Does anyone remember the script from this old thread?
I discovered recently that it only works up to a certain size of text. If I copy and paste a block of text that's too large, it just chokes and treats it as if I didn't submit anything at all. Does anyone know why it would do this?
I discovered recently that it only works up to a certain size of text. If I copy and paste a block of text that's too large, it just chokes and treats it as if I didn't submit anything at all. Does anyone know why it would do this?
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
- Darth Wong
- Sith Lord
- Posts: 70028
- Joined: 2002-07-03 12:25am
- Location: Toronto, Canada
- Contact:
8M. Much bigger than the size of text I'm posting.phongn wrote:How big is post_max_size set to?
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
- Darth Wong
- Sith Lord
- Posts: 70028
- Joined: 2002-07-03 12:25am
- Location: Toronto, Canada
- Contact:
Shit. I just discovered that web browsers limit the amount of text you can copy and paste into an HTML TEXTAREA, to 32kB or 64kB: much too restrictive for processing webpages. So the problem is in the browser, not the script itself.
Maybe I need to give it an alternate way of submitting data.
Maybe I need to give it an alternate way of submitting data.
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
- Dooey Jo
- Sith Devotee
- Posts: 3127
- Joined: 2002-08-09 01:09pm
- Location: The land beyond the forest; Sweden.
- Contact:
That seems strange. I'm sure things like Wikipedia send way more than 64kB using textboxes. Perhaps you can try using some other enctype for the form; maybe enctype="multipart/form-data", which is usually used for uploading files.
"Nippon ichi, bitches! Boing-boing."
Mai smote the demonic fires of heck...
Faker Ninjas invented ninjitsu
Mai smote the demonic fires of heck...
Faker Ninjas invented ninjitsu
- Darth Wong
- Sith Lord
- Posts: 70028
- Joined: 2002-07-03 12:25am
- Location: Toronto, Canada
- Contact:
The enctype didn't help, so I just added an alternate way of submitting data: entering the URL so the script can just download the source itself.
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
Apparently it's part of the HTML specification.Darth Wong wrote:Shit. I just discovered that web browsers limit the amount of text you can copy and paste into an HTML TEXTAREA, to 32kB or 64kB: much too restrictive for processing webpages. So the problem is in the browser, not the script itself.
You could have files uploaded instead and then process them.Maybe I need to give it an alternate way of submitting data.
64KB of text?! I don't think so.Dooey Jo wrote:That seems strange. I'm sure things like Wikipedia send way more than 64kB using textboxes. Perhaps you can try using some other enctype for the form; maybe enctype="multipart/form-data", which is usually used for uploading files.
Wikipedia does let you edit only subsections of an article which helps get around this. Anyway, assuming 1 byte/character and 8 characters/word and not accounting for spaces or punctuation, an article would have to be 8192 words long to hit this limit. Wiki does have some markup which would increase this. Anyway, the current featured article (http://en.wikipedia.org/wiki/Emmy_Noether) is 95KB long.
ah.....the path to happiness is revision of dreams and not fulfillment... -SWPIGWANG
Sufficient Googling is indistinguishable from knowledge -somebody
Anything worth the cost of a missile, which can be located on the battlefield, will be shot at with missiles. If the US military is involved, then things, which are not worth the cost if a missile will also be shot at with missiles. -Sea Skimmer
George Bush makes freedom sound like a giant robot that breaks down a lot. -Darth Raptor
- Dooey Jo
- Sith Devotee
- Posts: 3127
- Joined: 2002-08-09 01:09pm
- Location: The land beyond the forest; Sweden.
- Contact:
That's not really that much, especially if there's markup in there, not to mention unicode characters. I just tested posting 74kB of pure text into Google Translate and it worked fine even in IE6, so it must be possible. I also can't find any reference in the specifications to a maximum size for form fields. I know that IE used to limit GET data to 2000 characters or something like that, but that's about it.phongn wrote:64KB of text?! I don't think so.Dooey Jo wrote:That seems strange. I'm sure things like Wikipedia send way more than 64kB using textboxes. Perhaps you can try using some other enctype for the form; maybe enctype="multipart/form-data", which is usually used for uploading files.
Yeah, that's probably a more user-friendly solution anywayDarth Wong wrote:I just added an alternate way of submitting data: entering the URL so the script can just download the source itself.
"Nippon ichi, bitches! Boing-boing."
Mai smote the demonic fires of heck...
Faker Ninjas invented ninjitsu
Mai smote the demonic fires of heck...
Faker Ninjas invented ninjitsu
- Darth Wong
- Sith Lord
- Posts: 70028
- Joined: 2002-07-03 12:25am
- Location: Toronto, Canada
- Contact:
New and improved version of the script, with a few other minor improvements, like getting rid of scripts and headers, fixing HRs, and warning you if you've gone over the phpBB post size limit which is 64 kB:
";
// Replace horizontal rulers with underscores
$pattern[]="/<HR.*?>/i";
$replacement[]="\n________________________________________\n\t\n";
// Special characters not processed by html_entity_decode
$pattern[]="/—|–|–|—/";
$replacement[]="-";
$pattern[]="/“|”|"|“|”/";
$replacement[]="\"";
$pattern[]="/’|‘|'|‘|’/";
$replacement[]="'";
// Acquire data
unset($htmlsource);
if (isset($_POST['htmlsource']))
{
// Read in data but remove line feeds and carriage returns
$htmlsource=ereg_replace("/\n|\r|\r\n|\n\r/"," ",$_POST['htmlsource']);
}
else if (isset($_POST['url']))
{
$handle=@fopen($_POST['url'],"rb");
if ($handle)
{
stream_set_timeout($handle,10);
$line="";
// Read in source
while (!feof($handle) && strlen($line)<1024*256)
{
$line=rtrim(fgets($handle,1024*256));
$urlsource.=$line." ";
}
// Remove line feeds and carriage returns
$htmlsource=ereg_replace("/\n|\r|\r\n|\n\r/"," ",$urlsource);
}
}
if (isset($htmlsource))
{
// Perform HTML substitution to BBcode
$htmlsource=preg_replace($pattern,$replacement,html_entity_decode($htmlsource));
// Eliminate all remaining HTML tags
$htmlsource=preg_replace("/<.*?>/","",$htmlsource);
// Eliminate large groups of newlines (3 or more)
$htmlsource=preg_replace("/\n[ ]*[\n][ ]*[\n]+/","",$htmlsource);
// Check length
if (strlen($htmlsource)>65535)
{
echo "<h2>Warning: Post length is ".strlen($htmlsource)." characters. ";
echo "The phpBB post size limit is 65535 characters.</h2>\n\n";
}
// Replace remaining newlines with <br /> tags for output
$htmlsource=preg_replace("/\n/","<br />",htmlspecialchars($htmlsource));
// Output results
echo "<html>\n<body>\n".$htmlsource."\n</body>\n</html>\n";
}
else
{
?>
<html>
<body>
<h1 style="text-align:center">HTML Source to BB Code Converter</h1>
<?
if (isset($_POST['textsubmit'])) echo "<p><b>Error</b>: Exceeds script input size limit.</p>\n";
if (isset($_POST['urlsubmit'])) echo "<p><b>Error</b>: URL could not be retrieved.</p>\n";
?>
<p style="margin-bottom:0">Copy and paste the HTML source code into this text box:</p>
<form action="<?$_SERVER['PHP_SELF']?>" method="post">
<textarea rows="20" cols="80" name="htmlsource"></textarea><br />
<input type="submit" name="textsubmit" value="Submit" /><input type="reset" />
</form>
<p style="margin-bottom:0">Alternatively, enter a URL for the website to convert:</p>
<form action="<?$_SERVER['PHP_SELF']?>" method="post">
<input type="text" size="80" name="url" value="http://" />
<input type="submit" name="urlsubmit" value="Submit" /><input type="reset" />
</form>
</body>
</html>
<?
}
?>
[/code]
Code: Select all
<?
// HTML to BBCode Converter
// Set search/replace variables
unset($pattern);
unset($replacement);
// Eliminate excess whitespace
$pattern[]="/ [ |\t]+/";
$replacement[]=" ";
// Images (note that .*? is an ungreedy version of .*)
$pattern[]="/<IMG.*?SRC.*?\"(.*?)\".*?>/i";
$replacement[]="[img]\\1[/img]";
// Convert http links to URL BBcode, ignore other link types
$pattern[]="/<A.[^>]*HREF=\"*(HTTP:[^\"]*)\".*?>(.*?)<\/A>/i";
$replacement[]="[url=\\1]\\2[/url]";
// Eliminate forms, floats, headers, and scripts
$pattern[]="/<FORM.*?<\/FORM>/i";
$replacement[]="";
$pattern[]="/<DIV[^>]*FLOAT.*?>.*?<\/DIV>/i";
$replacement[]="";
$pattern[]="/<HEAD.*?<\/HEAD.*?>/i";
$replacement[]="";
$pattern[]="/<SCRIPT.*?<\/SCRIPT.*?>/i";
$replacement[]="";
// Paragraph structure
$pattern[]="/<P.*?>|<\/P>|<DIV.*?>|<\/DIV>|<BR.*?>|<\/TD>/i";
$replacement[]="\n";
$pattern[]="/<BLOCKQUOTE>/i";
$replacement[]="[quote]";
$pattern[]="/<BLOCKQUOTE[^>]*CITE=\"(.*?)\".*?>/i";
$replacement[]="[quote=\"\\1\"]";
$pattern[]="/</BLOCKQUOTE>/i";
$replacement[]="[/quote]";
// Miscellaneous HTML codes
$pattern[]="/<I>|<I .*?>/i";
$replacement[]="[i]";
$pattern[]="/<\/I>/i";
$replacement[]="[/i]";
$pattern[]="/<B>|<B .*?>/i";
$replacement[]="[b]";
$pattern[]="/<\/B>/i";
$replacement[]="[/b]";
$pattern[]="/<U>|<U .*?>/i";
$replacement[]="[u]";
$pattern[]="/<\/U>/i";
$replacement[]="[/u]";
$pattern[]="/<H1.*?>/i";
$replacement[]="\n[b][size=24]";
$pattern[]="/<\/H1>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<H2.*?>/i";
$replacement[]="\n[b][size=20]";
$pattern[]="/<\/H2>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<H3.*?>/i";
$replacement[]="\n[b][size=16]";
$pattern[]="/<\/H3>/i";
$replacement[]="[/size][/b]\n";
$pattern[]="/<OL.*?>/i";
$replacement[]="[list=1]";
$pattern[]="/<UL.*?>/i";
$replacement[]="[list]";
$pattern[]="/<\/OL>|<\/UL/i";
$replacement[]="[/list]";
$pattern[]="/<LI.*?>/i";
$replacement[]="[*]";
$pattern[]="/<PRE>/i";
$replacement[]="[code]";
$pattern[]="/<\/PRE>/i";
$replacement[]="
// Replace horizontal rulers with underscores
$pattern[]="/<HR.*?>/i";
$replacement[]="\n________________________________________\n\t\n";
// Special characters not processed by html_entity_decode
$pattern[]="/—|–|–|—/";
$replacement[]="-";
$pattern[]="/“|”|"|“|”/";
$replacement[]="\"";
$pattern[]="/’|‘|'|‘|’/";
$replacement[]="'";
// Acquire data
unset($htmlsource);
if (isset($_POST['htmlsource']))
{
// Read in data but remove line feeds and carriage returns
$htmlsource=ereg_replace("/\n|\r|\r\n|\n\r/"," ",$_POST['htmlsource']);
}
else if (isset($_POST['url']))
{
$handle=@fopen($_POST['url'],"rb");
if ($handle)
{
stream_set_timeout($handle,10);
$line="";
// Read in source
while (!feof($handle) && strlen($line)<1024*256)
{
$line=rtrim(fgets($handle,1024*256));
$urlsource.=$line." ";
}
// Remove line feeds and carriage returns
$htmlsource=ereg_replace("/\n|\r|\r\n|\n\r/"," ",$urlsource);
}
}
if (isset($htmlsource))
{
// Perform HTML substitution to BBcode
$htmlsource=preg_replace($pattern,$replacement,html_entity_decode($htmlsource));
// Eliminate all remaining HTML tags
$htmlsource=preg_replace("/<.*?>/","",$htmlsource);
// Eliminate large groups of newlines (3 or more)
$htmlsource=preg_replace("/\n[ ]*[\n][ ]*[\n]+/","",$htmlsource);
// Check length
if (strlen($htmlsource)>65535)
{
echo "<h2>Warning: Post length is ".strlen($htmlsource)." characters. ";
echo "The phpBB post size limit is 65535 characters.</h2>\n\n";
}
// Replace remaining newlines with <br /> tags for output
$htmlsource=preg_replace("/\n/","<br />",htmlspecialchars($htmlsource));
// Output results
echo "<html>\n<body>\n".$htmlsource."\n</body>\n</html>\n";
}
else
{
?>
<html>
<body>
<h1 style="text-align:center">HTML Source to BB Code Converter</h1>
<?
if (isset($_POST['textsubmit'])) echo "<p><b>Error</b>: Exceeds script input size limit.</p>\n";
if (isset($_POST['urlsubmit'])) echo "<p><b>Error</b>: URL could not be retrieved.</p>\n";
?>
<p style="margin-bottom:0">Copy and paste the HTML source code into this text box:</p>
<form action="<?$_SERVER['PHP_SELF']?>" method="post">
<textarea rows="20" cols="80" name="htmlsource"></textarea><br />
<input type="submit" name="textsubmit" value="Submit" /><input type="reset" />
</form>
<p style="margin-bottom:0">Alternatively, enter a URL for the website to convert:</p>
<form action="<?$_SERVER['PHP_SELF']?>" method="post">
<input type="text" size="80" name="url" value="http://" />
<input type="submit" name="urlsubmit" value="Submit" /><input type="reset" />
</form>
</body>
</html>
<?
}
?>
[/code]
Last edited by Darth Wong on 2008-09-04 11:15am, edited 3 times in total.
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
- Darth Wong
- Sith Lord
- Posts: 70028
- Joined: 2002-07-03 12:25am
- Location: Toronto, Canada
- Contact:
I tried the new version on a page full of extraneous markup and scripts (the CNN homepage) and it seems to work fine. It also works on my unfinished hidden Reign of Terror fanfic single-page version even though it's well above the phpBB post size limit (that's because it includes all of the chapters ever written on a single page, including three chapters that I never posted publicly and one which was only partially completed).
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
- Darth Wong
- Sith Lord
- Posts: 70028
- Joined: 2002-07-03 12:25am
- Location: Toronto, Canada
- Contact:
Does anyone know how to extract the website domain name and path from a URL? In other words:
http://www.cnn.com/ => www.cnn.com
http://www.cnn.com => www.cnn.com
http://www.cnn.com/WORLD/ => www.cnn.com/WORLD
http://www.cnn.com/WORLD/StupidArticle.html => www.cnn.com/WORLD
There doesn't seem to be a built-in function, and maybe I just suck at regexp but I'm having trouble getting it to do what I want.
http://www.cnn.com/ => www.cnn.com
http://www.cnn.com => www.cnn.com
http://www.cnn.com/WORLD/ => www.cnn.com/WORLD
http://www.cnn.com/WORLD/StupidArticle.html => www.cnn.com/WORLD
There doesn't seem to be a built-in function, and maybe I just suck at regexp but I'm having trouble getting it to do what I want.
"It's not evil for God to do it. Or for someone to do it at God's command."- Jonathan Boyd on baby-killing
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html
"you guys are fascinated with the use of those "rules of logic" to the extent that you don't really want to discussus anything."- GC
"I do not believe Russian Roulette is a stupid act" - Embracer of Darkness
"Viagra commercials appear to save lives" - tharkûn on US health care.
http://www.stardestroyer.net/Mike/RantMode/Blurbs.html