• No results found

PHP Compiler Internals

N/A
N/A
Protected

Academic year: 2021

Share "PHP Compiler Internals"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

(Do not be afraid of)

PHP Compiler Internals

Sebastian Bergmann

(2)

Who I Am

Sebastian Bergmann

Involved in the PHP

project since 2000

Creator of PHPUnit

Co-Founder and

Principal Consultant

with thePHP.cc

(3)

Under PHP's Hood

Server API (SAPI)

(mod_php, FastCGI, CLI, ...)

PHP Core

Request Management

File and Network Operations

Extensions

(date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …)

Zend Engine

Compilation and Execution

Memory and Resource Allocation

(4)

How PHP executes code

Lexical Analysis

Converts the source from a sequence of characters into a

(5)

How PHP executes code

Lexical Analysis

Syntax Analysis

Analyzes a sequence of tokens to determine their grammatical

(6)

How PHP executes code

Lexical Analysis

Syntax Analysis

Bytecode Generation

Generate bytecode based on the information gathered by

analyzing the sourcecode

(7)

How PHP executes code

Lexical Analysis

Syntax Analysis

Bytecode Generation

Bytecode Execution

(8)

Lexical Analysis

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

(9)

Lexical Analysis

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

T_OPEN_TAG

(10)

Lexical Analysis

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

T_OPEN_TAG

T_IF

T_WHITESPACE

(

T_STRING

)

T_WHITESPACE

{

T_WHITESPACE

(11)

Lexical Analysis

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

T_OPEN_TAG

T_IF

T_WHITESPACE

(

T_STRING

)

T_WHITESPACE

{

T_WHITESPACE

T_PRINT

T_WHITESPACE

T_CONSTANT_ENCAPSED_STRING

;

(12)

Lexical Analysis

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

T_OPEN_TAG

T_IF

T_WHITESPACE

(

T_STRING

)

T_WHITESPACE

{

T_WHITESPACE

T_PRINT

T_WHITESPACE

T_CONSTANT_ENCAPSED_STRING

;

T_WHITESPACE

}

(13)

Lexical Analysis

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

T_OPEN_TAG

T_IF

T_WHITESPACE

(

T_STRING

)

T_WHITESPACE

{

T_WHITESPACE

T_PRINT

T_WHITESPACE

T_CONSTANT_ENCAPSED_STRING

;

T_WHITESPACE

}

T_WHITESPACE

T_CLOSE_TAG

(14)

Lexical Analysis

T_OPEN_TAG

T_IF

T_WHITESPACE

(

T_STRING

)

T_WHITESPACE

{

T_WHITESPACE

T_PRINT

T_WHITESPACE

T_CONSTANT_ENCAPSED_STRING

;

T_WHITESPACE

}

T_WHITESPACE

T_CLOSE_TAG

Scan a sequence of characters

<?php

if

TRUE

print

'*'

?>

(15)

Lexical Analysis

(16)

Lexical Analysis

You do not want to write a scanner by

hand

At least when the code for the scanner should

be efficient and maintainable

Tools such as flex or re2c generate the

code for a scanner from a set of rules

Scanner Generators

"if"

{

return

T_IF

;

}

<ST_IN_SCRIPTING>

"if"

{

return

T_IF

;

}

(17)

Lexical Analysis

PHP Tokens

 T_ABSTRACT  T_AND_EQUAL  T_ARRAY  T_ARRAY_CAST  T_AS  T_BAD_CHARACTER  T_BOOLEAN_AND  T_BOOLEAN_OR  T_BOOL_CAST  T_BREAK  T_CASE  T_CATCH  T_CHARACTER  T_CLASS  T_CLASS_C  T_CLONE  T_CLOSE_TAG  T_COMMENT  T_CONCAT_EQUAL  T_CONST  T_CONSTANT_ENCAPSED_STRING  T_CONTINUE  T_CURLY_OPEN  T_DEC  T_DECLARE  T_DEFAULT  T_DIR  T_DIV_EQUAL  T_DNUMBER  T_DOC_COMMENT  T_DO  T_DOLLAR_OPEN_CURLY_BRACES  T_DOUBLE_ARROW  T_DOUBLE_CAST  T_DOUBLE_COLON  T_ECHO  T_ELSE  T_ELSEIF  T_EMPTY  T_ENCAPSED_AND_WHITESPACE  T_ENDDECLARE  T_ENDFOR  T_ENDFOREACH  T_ENDIF  T_ENDSWITCH  T_ENDWHILE  T_END_HEREDOC  T_EVAL  T_EXIT  T_EXTENDS  T_FILE  T_FINAL  T_FOR  T_FOREACH  T_FUNCTION  T_FUNC_C  T_GLOBAL  T_GOTO  T_HALT_COMPILER  T_IF  T_IMPLEMENTS  T_INC  T_INCLUDE  T_INCLUDE_ONCE  T_INLINE_HTML  T_INSTANCEOF  T_INT_CAST  T_INTERFACE  T_ISSET  T_IS_EQUAL  T_IS_GREATER_OR_EQUAL  T_IS_IDENTICAL

(18)

Lexical Analysis

PHP Tokens

 T_IS_NOT_EQUAL  T_IS_NOT_IDENTICAL  T_IS_SMALLER_OR_EQUAL  T_LINE  T_LIST  T_LNUMBER  T_LOGICAL_AND  T_LOGICAL_OR  T_LOGICAL_XOR  T_METHOD_C  T_MINUS_EQUAL  T_ML_COMMENT  T_MOD_EQUAL  T_MUL_EQUAL  T_NAMESPACE  T_NS_C  T_NEW  T_NUM_STRING  T_OBJECT_CAST  T_OBJECT_OPERATOR  T_OLD_FUNCTION  T_OPEN_TAG  T_OPEN_TAG_WITH_ECHO  T_OR_EQUAL  T_PAAMAYIM_NEKUDOTAYIM  T_PLUS_EQUAL  T_PRINT  T_PRIVATE  T_PUBLIC  T_PROTECTED  T_REQUIRE  T_REQUIRE_ONCE  T_RETURN  T_SL  T_SL_EQUAL  T_SR  T_SR_EQUAL  T_START_HEREDOC  T_STATIC  T_STRING  T_STRING_CAST  T_STRING_VARNAME  T_SWITCH  T_THROW  T_TRY  T_UNSET  T_UNSET_CAST  T_USE  T_VAR  T_VARIABLE  T_WHILE  T_WHITESPACE  T_XOR_EQUAL

(19)

Syntax Analysis

(20)

Syntax Analysis

You do not want to write a parser by hand

At least when the code for the scanner should

be efficient and maintainable

Tools such as bison or lemon generate

the code for a parser from a set of rules

Parser Generators

T_IF

'('

expr

')'

{

...

}

statement

{

...

}

(21)

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

sb@thinkpad ~ % php -dextension=vld.so -dvld.active=1 -dvld.execute=0 if.php filename: /home/sb/if.php

function name: (null) number of ops: 8

compiled vars: none

line # op fetch ext return operands 2 0 EXT_STMT 1 JMPZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '%2A' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1

PHP Bytecode

Disassembling with vld

(22)

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

sb@thinkpad ~ % bytekit if.php

bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /home/sb/if.php

Function: main Number of oplines: 8

line # opcode result operands

2 0 EXT_STMT 1 JMPZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1

PHP Bytecode

(23)

1

<?php

2

if

(

TRUE

) {

3

print

'*'

;

4

}

5

?>

PHP Bytecode

Bytecode visualization with bytekit-cli

(24)

1

<?php

2

$a

=

1

;

3

$b

=

2

;

4

print

$a

+

$b

;

5

?>

sb@thinkpad ~ % bytekit add.php

bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /home/sb/add.php Function: main

Number of oplines: 10

Compiled variables: !0 = $a, !1 = $b

line # opcode result operands

2 0 EXT_STMT 1 ASSIGN !0, 1 3 2 EXT_STMT 3 ASSIGN !1, 2 4 4 EXT_STMT 5 ADD ~2 !0, !1 6 PRINT ~3 ~2 7 FREE ~3 6 8 EXT_STMT 9 RETURN 1

PHP Bytecode

(25)

PHP Bytecode

List of Opcodes

 NOP  ADD  SUB  MUL  DIV  MOD  SL  SR  CONCAT  BW_OR  BW_AND  BW_XOR  BW_NOT  BOOL_NOT  BOOL_XOR  IS_IDENTICAL  IS_NOT_IDENTICAL  IS_EQUAL  IS_NOT_EQUAL  IS_SMALLER  IS_SMALLER_OR_EQUAL  CAST  QM_ASSIGN  ASSIGN_ADD  ASSIGN_SUB  ASSIGN_MUL  ASSIGN_DIV  ASSIGN_MOD  ASSIGN_SL  ASSIGN_SR  ASSIGN_CONCAT  ASSIGN_BW_OR  ASSIGN_BW_AND  ASSIGN_BW_XOR  PRE_INC  PRE_DEC  POST_INC  POST_DEC  ASSIGN  ASSIGN_REF  ECHO  PRINT  JMPZ  JMPNZ  JMPZNZ  JMPZ_EX  JMPNZ_EX  CASE  SWITCH_FREE  BRK  BOOL  INIT_STRING  ADD_CHAR  ADD_STRING  ADD_VAR  BEGIN_SILENCE  END_SILENCE  INIT_FCALL_BY_NAME  DO_FCALL  DO_FCALL_BY_NAME  RETURN  RECV  RECV_INIT  SEND_VAL  SEND_VAR  SEND_REF  NEW  FREE  INIT_ARRAY  ADD_ARRAY_ELEMENT  INCLUDE_OR_EVAL  UNSET_VAR  UNSET_DIM  UNSET_OBJ  FE_RESET  FE_FETCH  EXIT  FETCH_R  FETCH_DIM_R  FETCH_OBJ_R  FETCH_W  FETCH_DIM_W  FETCH_OBJ_W  FETCH_RW  FETCH_DIM_RW  FETCH_OBJ_RW  FETCH_IS  FETCH_DIM_IS  FETCH_OBJ_IS  FETCH_FUNC_ARG

(26)

PHP Bytecode

List of Opcodes

 FETCH_DIM_FUNC_ARG  FETCH_OBJ_FUNC_ARG  FETCH_UNSET  FETCH_DIM_UNSET  FETCH_OBJ_UNSET  FETCH_DIM_TMP_VAR  FETCH_CONSTANT  EXT_STMT  EXT_FCALL_BEGIN  EXT_FCALL_END  EXT_NOP  TICKS  SEND_VAR_NO_REF  CATCH  THROW  FETCH_CLASS  CLONE  INIT_METHOD_CALL  INIT_STATIC_METHOD_CALL  ISSET_ISEMPTY_VAR  ISSET_ISEMPTY_DIM_OBJ  PRE_INC_OBJ  PRE_DEC_OBJ  POST_INC_OBJ  POST_DEC_OBJ  ASSIGN_OBJ  INSTANCEOF  DECLARE_CLASS  DECLARE_INHERITED_CLASS  DECLARE_FUNCTION  RAISE_ABSTRACT_ERROR  ADD_INTERFACE  VERIFY_ABSTRACT_CLASS  ASSIGN_DIM  ISSET_ISEMPTY_PROP_OBJ  HANDLE_EXCEPTION

(27)
(28)

Test First!

--TEST--unless statement

--FILE--<?php

unless

(

FALSE

) {

print

'unless FALSE is TRUE, this is printed'

;

}

unless

(

TRUE

) {

print

'unless TRUE is TRUE, this is printed'

;

}

?>

--EXPECT--unless FALSE is TRUE, this is printed

(29)

Extending the Compiler

Add token for unless to the scanner

Add rule for unless to the parser

Generate bytecode for unless in the compiler

Add token for unless to ext/tokenizer

(30)

Add unless scanner token

<ST_IN_SCRIPTING>"if" {

return T_IF;

}

<ST_IN_SCRIPTING>

"unless"

{

return

T_UNLESS

;

}

<ST_IN_SCRIPTING>"elseif" {

return T_ELSEIF;

}

<ST_IN_SCRIPTING>"endif" {

return T_ENDIF;

}

<ST_IN_SCRIPTING>"else" {

return T_ELSE;

}

Zend/zend_language_scanner.l

(31)

Add unless parser rule

%token T_NAMESPACE

%token T_NS_C

%token T_DIR

%token T_NS_SEPARATOR

%token T_UNLESS

.

.

unticked_statement:

'{' inner_statement_list '}'

| T_IF '(' expr ')' {

.

.

| T_UNLESS

'('

expr

')'

{

zend_do_unless_cond

(

&$3

,

&$4

TSRMLS_CC

);

}

statement

{

zend_do_if_after_statement

(

&$4

,

1

TSRMLS_CC

);

} {

zend_do_if_end

(

TSRMLS_C

);

}

.

.

Zend/zend_language_parser.y

(32)

How if is compiled

void

zend_do_if_cond

(const

znode *cond

,

znode *closing_bracket_token

TSRMLS_DC

)

{

}

zend_do_if_cond() is called when an if statement is compiled

Zend/zend_compile.c

typedef struct

_znode

{

int

op_type;

union

{

zval constant

;

zend_uint var

;

zend_uint opline_num

;

zend_op_array *op_array

;

zend_op *jmp_addr

;

struct

{

zend_uint var

;

zend_uint type

;

}

EA

;

}

u

;

}

znode

;

(33)

How if is compiled

void zend_do_if_cond

(const znode *cond, znode *closing_bracket_token TSRMLS_DC)

{

int

if_cond_op_number

=

get_next_op_number

(

CG

(

active_op_array

));

zend_op *opline

=

get_next_op

(

CG

(

active_op_array

)

TSRMLS_CC

);

}

Allocate a new opline in the current oparray

Zend/zend_compile.c

struct

_zend_op

{

opcode_handler_t handler

;

znode result

;

znode op1

;

znode op2

;

ulong extended_value

;

uint lineno

;

zend_uchar opcode

;

};

(34)

How if is compiled

void zend_do_if_cond

(const znode *cond, znode *closing_bracket_token TSRMLS_DC)

{

int if_cond_op_number =

get_next_op_number(CG(active_op_array));

zend_op *opline =

get_next_op(CG(active_op_array) TSRMLS_CC);

opline

->

opcode

=

ZEND_JMPZ

;

}

Set the opcode of the new opline to JMPZ (jump if zero)

(35)

How if is compiled

void zend_do_if_cond

(const znode *cond, znode *closing_bracket_token TSRMLS_DC)

{

int if_cond_op_number =

get_next_op_number(CG(active_op_array));

zend_op *opline =

get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ;

opline

->

op1

=

*cond

;

}

Set the first operand of the new opline to the if condition

(36)

How if is compiled

void zend_do_if_cond

(const znode *cond, znode *closing_bracket_token TSRMLS_DC)

{

int if_cond_op_number =

get_next_op_number(CG(active_op_array));

zend_op *opline =

get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ;

opline->op1 = *cond;

closing_bracket_token

->

u

.

opline_num

=

if_cond_op_number

;

SET_UNUSED

(

opline

->

op2

);

INC_BPC

(

CG

(

active_op_array

));

}

Perform book keeping tasks such as marking the second operand of the

new opline as unused or incrementing the backpatching counter for the

current oparray

(37)

Add unless to compiler

void zend_do_unless_cond

(const znode *cond, znode *closing_bracket_token TSRMLS_DC)

{

int unless_cond_op_number =

get_next_op_number(CG(active_op_array));

zend_op *opline =

get_next_op(CG(active_op_array) TSRMLS_CC);

opline

->

opcode

=

ZEND_JMPNZ

;

opline->op1 = *cond;

closing_bracket_token->u.opline_num =

unless_cond_op_number;

SET_UNUSED(opline->op2);

INC_BPC(CG(active_op_array));

}

All we have to do to generate code for the unless statement, as

compared to generate code for the if statement, is to use the JMPNZ

(jump if not zero) opcode instead of the JMPZ (jump if zero) opcode

(38)

Add unless to compiler

1

<?php

2

unless

(

FALSE

) {

3

print

'*'

;

4

}

5

?>

The generated bytecode

sb@thinkpad ~ % bytekit unless.php

bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /home/sb/unless.php Function: main

Number of oplines: 8

line # opcode result operands

2 0 EXT_STMT 1 JMPNZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1

(39)

Run the test

sb@thinkpad php-5.3-unless % make test TESTS=Zend/tests/unless.phpt Build complete.

Don't forget to run 'make test'.

===================================================================== PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php

PHP_SAPI : cli

PHP_VERSION : 5.3.0RC3-dev ZEND_VERSION: 2.3.0

PHP_OS : Linux 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009 i686 GNU/Linux INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.ini

More .INIs :

CWD : /usr/local/src/php/php-5.3-unless Extra dirs :

VALGRIND : Not used

===================================================================== Running selected tests.

PASS unless statement [Zend/tests/unless.phpt]

===================================================================== Number of tests : 1 1 Tests skipped : 0 ( 0.0%) ---Tests warned : 0 ( 0.0%) ( 0.0%) Tests failed : 0 ( 0.0%) ( 0.0%) Expected fail : 0 ( 0.0%) ( 0.0%) Tests passed : 1 (100.0%) (100.0%) ---Time taken : 0 seconds

(40)

Add unless to ext/tokenizer

ext/tokenizer/tokenizer_data.c

sb@thinkpad

tokenizer %

./tokenizer_data_gen.sh

Wrote tokenizer_data.c

(41)

The End

Thank you for your interest!

These slides will be linked soon from

http://sebastian-bergmann.de/

You can vote for this talk on

http://joind.in/582

(42)

Acknowledgements

Thomas Lee, whose Python Language Internals presentation at

OSDC 2008 inspired this presentation

Stefan Esser for creating the Bytekit extension that provides

PHP bytecode access and analysis features

Derick Rethans, David Soria Parra, and Scott MacVicar for reviewing

(43)

References

http://www.php.net/manual/en/tokens.php

http://www.zapt.info/opcodes.html

Sara Golemon: ”Extending and Embedding PHP”

http://derickrethans.nl/vld.php

http://bytekit.org/

(44)

This presentation material is published under the Attribution-Share Alike 3.0 Unported

license.

You are free:

to Share – to copy, distribute and transmit the work.

to Remix – to adapt the work.

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or

licensor (but not in any way that suggests that they endorse you or your use of the

work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the

resulting work only under the same, similar or a compatible license.

For any reuse or distribution, you must make clear to others the license terms of this

work.

Any of the above conditions can be waived if you get permission from the copyright

holder.

Nothing in this license impairs or restricts the author's moral rights.

References

Related documents

With expected advances in data acquisition methods, it is likely that surface- based analysis of grey matter diffusion will become a new standard tool for probing

12 Here we report the Stark deceleration and electrostatic trapping of metastable CO molecules that are laser prepared in either the 共1,0,1兲 or in the 共2,0,2兲 level.. The

Purpose: It aimed at testing the validity and reliability of a validated team-based learning student assessment instrument (TBL-SAI) to assess United Kingdom pharmacy

JWT Access token to be passed in JSON wrapper as string value in “JWTokenValue” field.. The Token can be generated by subscribing to

JWT Access token to be passed in JSON wrapper as string value in “JWTokenValue” field.. The Token can be generated by subscribing to

token Packet Output Set A-bit Data Queue No token Token Premium Service Assured Service Token Bucket Token Bucket.. Two-bit Internal

Foundation age and size are related to the equity allocation of foundations’ portfolios: older and larger foundations diversify more along the lines of the market index breakdown..

(D is three frets lower than F, and C is five frets lower. If D is right for your voice, you could play in D without a capo, or play in C with the capo on the second fret.).