layout: post
title: php中foreach與current探究
date: 2015-11-13
categories: php
tags: [php]
description: foreach中current的奇怪輸出,探究一番(轉(zhuǎn)入簡(jiǎn)書)
隨意轉(zhuǎn)載,請(qǐng)注明出處php中foreach與current探究
引子
最近發(fā)現(xiàn)了一個(gè)問題關(guān)于foreach與current的問題,直接看例子:
Q1:
<?php
$arr = range(1, 3);
var_dump(current($arr));
foreach($arr as $val) {
var_dump(current($arr));
}
var_dump(current($arr));
這段代碼會(huì)得到這個(gè)結(jié)果:
int(1)
int(2)
int(2)
int(2)
int(2)
那么問題來(lái)了,手冊(cè)上說current不會(huì)改變指針指向,為什么之后的var_dump(current($arr))都是輸出int(2)?
Q2:
<?php
$arr = range(1, 3);
var_dump(current($arr));
foreach($arr as $val) {
}
var_dump(current($arr));
這段代碼的結(jié)果是:
int(1)
bool(false)
為什么我執(zhí)行一次空的foreach,current會(huì)變成false,他現(xiàn)在到底指向了哪里?
探究
我將這兩個(gè)問題提交到stackoverflow php: Difficult to understand function current,在回復(fù)中,有人說可能是受php版本影響,https://3v4l.org/4iJj8,發(fā)現(xiàn)的確在php7中問題得到了修復(fù)。

之后到php.net上提交個(gè)bug單

既然說我提的bug重了,那就看看吧。果然,在其中一個(gè)單里找到了跟問題2相同的問題Bug #53405 accessing the iterator inside a foreach loop leads to strange results
看起來(lái)這個(gè)問題也經(jīng)過一番論證,當(dāng)然,最后還是定義為bug。
引用nikic對(duì)補(bǔ)丁的說明,包括問題產(chǎn)生原因以及解決辦法:
Currently there are two ways to iterate an array: Either using the internal
array pointer or using an external HashPosition. Right now the latter isn't
interruption safe though and can't be used in any iteration that runs user
code.
For that reason foreach had to use the IAP for the iteration. This created
a bunch of issues: Firstly foreach is often required to copy the array it
iterates even though it should not be strictly necessary according to COW.
Secondly using the IAP created weird behavior of current() etc in the loop
body, that was furthermore heavily dependent on just how exactly the looping
was done. Thirdly the behavior when modifying the array during iteration
is very unpredictable.
This patch approaches the problem by making external HashPosition array
iterators interruption safe. This is done by adding two new APIs:
void zend_track_hash_position(HashTable *ht, HashPosition *pos);
void zend_untrack_hash_position(HashTable *ht, HashPosition *pos);
Using this functions the HashPosition has to be registered before the
iteration and unregistered after it. If the HashPosition is registered
in such a way the zend_hash operations will properly update the
HashPosition pointer on modification, just like it is usually done for
the IAP.
想要知道具體變更點(diǎn)擊這里
在修復(fù)文檔中,將問題定性為對(duì)foreach的一些邊緣情況缺乏測(cè)試;提到了更新后的foreach實(shí)現(xiàn)中,將FE_RESET、FE_FETCH這兩個(gè)opcode分解成了FE_RESET_R、FE_FETCH_R、FE_RESET_RW、FE_FETCH_RW,后綴_R的應(yīng)用于值傳遞時(shí),_RW應(yīng)用于引用時(shí)。更多的實(shí)現(xiàn)細(xì)節(jié)我也沒看太懂,先放出原文吧,等熟悉了再更新:
Implementation Details
The existing FE_RESET/FE_FETCH opcodes are split into separate FE_RESET_R/FE_FETCH_R opcodes used to implement foreach by value and FE_RESET_RW/FE_FETCH_RW to implement foreach by reference. The suffix _R means that we use array (or object) only for reading, and suffix _RW that we also may indirectly modify it. A new FE_FREE opcode is introduced. It's used at the end of foreach loops, instead of FREE opcode.
Iteration by value over array doesn't use or modify internal array pointer. The value of the pointer is kept in reserved space of temporary variable used for iteration. It's acceptable through Z_FE_POS() macro.
Iteration by reference or by value over plain object implemented using special HashTableIterator structures.
typedef struct _HashTableIterator {
HashTable *ht;
HashPosition pos;
} HashTableIterator;
On entrance into foreach loop FE_RESET_R/RW opcode creates and initializes a new iterator and stores its index in reserved space of temporary variable used for iteration. On exit, FE_FREE opcode removes corresponding iterator.
Iterators are actually allocated in a buffer - EG(ht_iterators), represented by plain array. The more nested foreach by reference iterators the bigger buffer we will need. We start with small preallocated buffer - EG(ht_iterators_slots), and then extend it if necessary in heap. EG(ht_iterators_count) keeps the number of available slots for iterators, EG(ht_iterators_used) - the number of used slots.
struct _zend_executor_globals {
...
uint32_t ht_iterators_count; /* number of allocatd slots */
uint32_t ht_iterators_used; /* number of used slots */
HashTableIterator *ht_iterators;
HashTableIterator ht_iterators_slots[16];
...
}
Creation, deletion and accessing iterators position is implemented through special API.
ZEND_API uint32_t zend_hash_iterator_add(HashTable *ht);
ZEND_API HashPosition zend_hash_iterator_pos(uint32_t idx, HashTable *ht);
ZEND_API void zend_hash_iterator_del(uint32_t idx);
Indirect modification of iterators positions implemented through zend_hash_iterators_update(). It's called when HashTable modification may affects iterator position. For example when element referred by iterator is inserted, or when iterator is set at the end of the array and new element is inserted.
ZEND_API void zend_hash_iterators_update(HashTable *ht, HashPosition from, HashPosition to);
Foe more details see zend_hash_iterators_*() functions implementation in zend_hash.c
更多信息可查看:PHP RFC: Fix "foreach" behavior
小結(jié)
既然這是一個(gè)bug,那么在php7以前的版本中就不要使用這樣的寫法了,以防產(chǎn)生其他的問題。
深度分析
歡迎吐槽,畢竟尚未完全弄懂。以下分析以問題1為例。ps:還沒對(duì)問題2分析
為了方便起見,我先將我代碼中vld的信息拿出來(lái):
Finding entry points
Branch analysis from position: 0
Jump found. Position 1 = 9, Position 2 = 17
Branch analysis from position: 9
Jump found. Position 1 = 10, Position 2 = 17
Branch analysis from position: 10
Jump found. Position 1 = 9
Branch analysis from position: 9
Branch analysis from position: 17
Jump found. Position 1 = -2
Branch analysis from position: 17
filename: /in/RL3TZ
function name: (null)
number of ops: 23
compiled vars: !0 = $arr, !1 = $val
line #* E I O op fetch ext return operands
-------------------------------------------------------------------------------------
2 0 E > SEND_VAL 1
1 SEND_VAL 3
2 DO_FCALL 2 $0 'range'
3 ASSIGN !0, $0
3 4 SEND_REF !0
5 DO_FCALL 1 $2 'current'
6 SEND_VAR_NO_REF 6 $2
7 DO_FCALL 1 'var_dump'
4 8 > FE_RESET $4 !0, ->17
9 > > FE_FETCH $5 $4, ->17
10 > OP_DATA
11 ASSIGN !1, $5
5 12 SEND_REF !0
13 DO_FCALL 1 $7 'current'
14 SEND_VAR_NO_REF 6 $7
15 DO_FCALL 1 'var_dump'
6 16 > JMP ->9
17 > SWITCH_FREE $4
7 18 SEND_REF !0
19 DO_FCALL 1 $9 'current'
20 SEND_VAR_NO_REF 6 $9
21 DO_FCALL 1 'var_dump'
22 > RETURN 1
Generated using Vulcan Logic Dumper, using php 5.6.0
根據(jù)TIPI項(xiàng)目中對(duì)foreach的分析:
源代碼:
$arr = array(1,2,3,4,5);
foreach($arr as $key => $row) {
echo key($arr), '=>', current($arr), "\r\n";
}
問題:為什么foreach循環(huán)體中執(zhí)行key或current會(huì)顯示第二個(gè)元素(非引用情況)?以key函數(shù)為例,我們執(zhí)行函數(shù)調(diào)用時(shí),會(huì)執(zhí)行中間代碼SEND_REF,此中間代碼會(huì)將沒有設(shè)置引用的變量復(fù)制一份并設(shè)置為引用。當(dāng)進(jìn)入循環(huán)體時(shí),PHP內(nèi)核已經(jīng)經(jīng)過了一次fetch操作,相當(dāng)于執(zhí)行了一次next操作,當(dāng)前元素指向第二個(gè)元素。因此我們?cè)趂oreach的循環(huán)體中執(zhí)行key函數(shù)時(shí),key中調(diào)用的數(shù)組變量為PHP執(zhí)行了一次fetch操作的數(shù)組拷貝,此時(shí)foreach的內(nèi)部指針指向第二個(gè)元素。
按這里的解釋,中間代碼第一次進(jìn)行SEND_REF時(shí),會(huì)將變量復(fù)制一份,而復(fù)制這個(gè)變量時(shí),已經(jīng)有過FE_FETCH操作了,所以current變成了第二個(gè)元素。那么再回頭看我的代碼,vld中第4行在foreach之前,已經(jīng)產(chǎn)生了!0,并不是第一次進(jìn)行SEND_REF,那結(jié)果為什么還是一樣的呢,所以,我覺得這個(gè)說法并不靠譜。
擴(kuò)充:
-
深入解析php中的foreach問題
參考問題2,不知道是不是原出處,都不寫轉(zhuǎn)載源差評(píng)
-
深入理解PHP原理之foreach
這里結(jié)合源碼對(duì)foreach進(jìn)行了詳細(xì)講解,orz鳥哥